Skip to content

DTOs

matchbox.common.dtos

Data transfer objects for Matchbox API.

Classes:

Functions:

Attributes:

MatchboxName module-attribute

MatchboxName: TypeAlias = Annotated[str, StringConstraints(pattern='^[a-zA-Z0-9_.-]+$', min_length=1, strip_whitespace=True), AfterValidator(validate_matchbox_name), Field(description='Valid name for Matchbox database objects. Must contain only alphanumeric characters, underscores, dots, or hyphens.', examples=['my-dataset', 'user_data.v2', 'experiment_001'], json_schema_extra={'pattern': '^[a-zA-Z0-9_.-]+$'})]

JsonObject module-attribute

JsonObject: TypeAlias = dict[str, JsonValue]

GroupName module-attribute

GroupName: TypeAlias = MatchboxName

Type alias for group names.

CollectionName module-attribute

CollectionName: TypeAlias = MatchboxName

Type alias for collection names.

RunID module-attribute

RunID: TypeAlias = int

Type alias for run IDs.

SourceStepName module-attribute

SourceStepName: TypeAlias = MatchboxName

Type alias for source step names.

ModelStepName module-attribute

ModelStepName: TypeAlias = MatchboxName

Type alias for model step names.

ResolverStepName module-attribute

ResolverStepName: TypeAlias = MatchboxName

Type alias for resolver step names.

StepName module-attribute

Type alias for any step names.

SourceStepPath module-attribute

SourceStepPath: TypeAlias = StepPath

Type alias for source step paths.

ModelStepPath module-attribute

ModelStepPath: TypeAlias = StepPath

Type alias for model step paths.

ResolverStepPath module-attribute

ResolverStepPath: TypeAlias = StepPath

Type alias for resolver step paths.

OKMessage

Bases: BaseModel


              flowchart TD
              matchbox.common.dtos.OKMessage[OKMessage]

              

              click matchbox.common.dtos.OKMessage href "" "matchbox.common.dtos.OKMessage"
            

Generic HTTP OK response.

Attributes:

status class-attribute instance-attribute

status: str = Field(default='OK')

version class-attribute instance-attribute

version: str = Field(default_factory=lambda: version('matchbox-db'))

BackendCountableType

Bases: StrEnum


              flowchart TD
              matchbox.common.dtos.BackendCountableType[BackendCountableType]

              

              click matchbox.common.dtos.BackendCountableType href "" "matchbox.common.dtos.BackendCountableType"
            

Enumeration of supported backend countable types.

Attributes:

SOURCES class-attribute instance-attribute

SOURCES = 'sources'

MODELS class-attribute instance-attribute

MODELS = 'models'

SOURCE_CLUSTERS class-attribute instance-attribute

SOURCE_CLUSTERS = 'source_clusters'

MODEL_CLUSTERS class-attribute instance-attribute

MODEL_CLUSTERS = 'model_clusters'

CLUSTERS class-attribute instance-attribute

CLUSTERS = 'all_clusters'

CREATES class-attribute instance-attribute

CREATES = 'creates'

MERGES class-attribute instance-attribute

MERGES = 'merges'

PROPOSES class-attribute instance-attribute

PROPOSES = 'proposes'

BackendResourceType

Bases: StrEnum


              flowchart TD
              matchbox.common.dtos.BackendResourceType[BackendResourceType]

              

              click matchbox.common.dtos.BackendResourceType href "" "matchbox.common.dtos.BackendResourceType"
            

Enumeration of resources types referenced by client or API.

Attributes:

COLLECTION class-attribute instance-attribute

COLLECTION = 'collection'

RUN class-attribute instance-attribute

RUN = 'run'

STEP class-attribute instance-attribute

STEP = 'step'

CLUSTER class-attribute instance-attribute

CLUSTER = 'cluster'

USER class-attribute instance-attribute

USER = 'user'

GROUP class-attribute instance-attribute

GROUP = 'group'

JUDGEMENT class-attribute instance-attribute

JUDGEMENT = 'judgement'

SYSTEM class-attribute instance-attribute

SYSTEM = 'system'

BackendParameterType

Bases: StrEnum


              flowchart TD
              matchbox.common.dtos.BackendParameterType[BackendParameterType]

              

              click matchbox.common.dtos.BackendParameterType href "" "matchbox.common.dtos.BackendParameterType"
            

Enumeration of parameter types passable to the API.

Attributes:

SAMPLE_SIZE class-attribute instance-attribute

SAMPLE_SIZE = 'sample_size'

NAME class-attribute instance-attribute

NAME = 'name'

CRUDOperation

Bases: StrEnum


              flowchart TD
              matchbox.common.dtos.CRUDOperation[CRUDOperation]

              

              click matchbox.common.dtos.CRUDOperation href "" "matchbox.common.dtos.CRUDOperation"
            

Enumeration of CRUD operations.

Attributes:

CREATE class-attribute instance-attribute

CREATE = 'create'

UPDATE class-attribute instance-attribute

UPDATE = 'update'

DELETE class-attribute instance-attribute

DELETE = 'delete'

LocationType

Bases: StrEnum


              flowchart TD
              matchbox.common.dtos.LocationType[LocationType]

              

              click matchbox.common.dtos.LocationType href "" "matchbox.common.dtos.LocationType"
            

Enumeration of location types.

Attributes:

RDBMS class-attribute instance-attribute

RDBMS = 'rdbms'

PermissionType

Bases: StrEnum


              flowchart TD
              matchbox.common.dtos.PermissionType[PermissionType]

              

              click matchbox.common.dtos.PermissionType href "" "matchbox.common.dtos.PermissionType"
            

Permission levels for resource access.

Attributes:

READ class-attribute instance-attribute

READ = 'read'

WRITE class-attribute instance-attribute

WRITE = 'write'

ADMIN class-attribute instance-attribute

ADMIN = 'admin'

User

Bases: BaseModel


              flowchart TD
              matchbox.common.dtos.User[User]

              

              click matchbox.common.dtos.User href "" "matchbox.common.dtos.User"
            

User identity.

Attributes:

user_name class-attribute instance-attribute

user_name: str = Field(description='Used as the subject claim in JWTs.')

email class-attribute instance-attribute

email: EmailStr | None = None

LoginResponse

Bases: BaseModel


              flowchart TD
              matchbox.common.dtos.LoginResponse[LoginResponse]

              

              click matchbox.common.dtos.LoginResponse href "" "matchbox.common.dtos.LoginResponse"
            

Response from login endpoint.

Attributes:

user instance-attribute

user: User

setup_mode_admin class-attribute instance-attribute

setup_mode_admin: bool = Field(default=False, description='Whether user was added to admins during setup mode.')

AuthStatusResponse

Bases: BaseModel


              flowchart TD
              matchbox.common.dtos.AuthStatusResponse[AuthStatusResponse]

              

              click matchbox.common.dtos.AuthStatusResponse href "" "matchbox.common.dtos.AuthStatusResponse"
            

Response model for authentication status.

Attributes:

authenticated instance-attribute

authenticated: bool

user class-attribute instance-attribute

user: User | None = None

Group

Bases: BaseModel


              flowchart TD
              matchbox.common.dtos.Group[Group]

              

              click matchbox.common.dtos.Group href "" "matchbox.common.dtos.Group"
            

Group definition.

Attributes:

name instance-attribute

name: GroupName

description class-attribute instance-attribute

description: str | None = None

is_system class-attribute instance-attribute

is_system: bool = False

members class-attribute instance-attribute

members: list[User] = []

PermissionGrant

Bases: BaseModel


              flowchart TD
              matchbox.common.dtos.PermissionGrant[PermissionGrant]

              

              click matchbox.common.dtos.PermissionGrant href "" "matchbox.common.dtos.PermissionGrant"
            

A permission on a resource.

Resource context should always be supplied.

Attributes:

group_name instance-attribute

group_name: GroupName

permission instance-attribute

permission: PermissionType

StepPath

Bases: BaseModel


              flowchart TD
              matchbox.common.dtos.StepPath[StepPath]

              

              click matchbox.common.dtos.StepPath href "" "matchbox.common.dtos.StepPath"
            

Base step identifier with collection, run, and name.

Attributes:

collection instance-attribute

collection: CollectionName

run instance-attribute

run: RunID

name instance-attribute

name: StepName

StepType

Bases: StrEnum


              flowchart TD
              matchbox.common.dtos.StepType[StepType]

              

              click matchbox.common.dtos.StepType href "" "matchbox.common.dtos.StepType"
            

Types of nodes in a DAG.

Attributes:

SOURCE class-attribute instance-attribute

SOURCE = 'source'

MODEL class-attribute instance-attribute

MODEL = 'model'

RESOLVER class-attribute instance-attribute

RESOLVER = 'resolver'

LocationConfig

Bases: BaseModel


              flowchart TD
              matchbox.common.dtos.LocationConfig[LocationConfig]

              

              click matchbox.common.dtos.LocationConfig href "" "matchbox.common.dtos.LocationConfig"
            

Metadata for a location.

Attributes:

type instance-attribute

name instance-attribute

name: str

SourceField

Bases: BaseModel


              flowchart TD
              matchbox.common.dtos.SourceField[SourceField]

              

              click matchbox.common.dtos.SourceField href "" "matchbox.common.dtos.SourceField"
            

A field in a source that can be indexed in the Matchbox database.

Attributes:

name class-attribute instance-attribute

name: str = Field(description='The name of the field in the source after the extract/transform logic has been applied.')

type class-attribute instance-attribute

type: DataTypes = Field(description='The cached field type. Used to ensure a stable hash.')

SourceConfig

Bases: BaseModel


              flowchart TD
              matchbox.common.dtos.SourceConfig[SourceConfig]

              

              click matchbox.common.dtos.SourceConfig href "" "matchbox.common.dtos.SourceConfig"
            

Configuration of a source that can, or has been, indexed in the backend.

They are foundational processes on top of which linking and deduplication models can build new steps.

Methods:

  • validate_key_field

    Ensure that the key field is a string and not in the index fields.

  • prefix

    Get the prefix for the source.

  • qualified_key

    Get the qualified key for the source.

  • qualified_index_fields

    Get the qualified index fields for the source.

  • qualify_field

    Qualify field names with the source name.

  • f

    Qualify one or more field names with the source name.

Attributes:

location_config class-attribute instance-attribute

location_config: LocationConfig = Field(description='The location of the source. Used to run the extract/tansform logic.')

extract_transform class-attribute instance-attribute

extract_transform: str = Field(description='Logic to extract and transform data from the source. Language is location dependent.')

key_field class-attribute instance-attribute

key_field: SourceField = Field(description=dedent("\n            The key field. This is the source's key for unique\n            entities, such as a primary key in a relational database.\n\n            Keys must ALWAYS be a string.\n\n            For example, if the source describes companies, it may have used\n            a Companies House number as its key.\n\n            This key is ALWAYS correct. It should be something generated and\n            owned by the source being indexed.\n            \n            For example, your organisation's CRM ID is a key field within the CRM.\n            \n            A CRM ID entered by hand in another dataset shouldn't be used \n            as a key field.\n        "))

index_fields class-attribute instance-attribute

index_fields: tuple[SourceField, ...] = Field(default=None, description=dedent('\n            The fields to index in this source, after the extract/transform logic \n            has been applied. \n\n            This is usually set manually, and should map onto the columns that the\n            extract/transform logic returns.\n            '))

dependencies property

dependencies: list[StepName]

Local execution prerequisites.

While this can contain information about graph topology, it should only be used to check validity, never to reconstruct it.

parents property

parents: list[StepName]

Direct DAG edges to this node.

validate_key_field

validate_key_field() -> Self

Ensure that the key field is a string and not in the index fields.

prefix

prefix(name: str) -> str

Get the prefix for the source.

Parameters:

  • name
    (str) –

    The name of the source.

Returns:

  • str

    The prefix string (name + “_”).

qualified_key

qualified_key(name: str) -> str

Get the qualified key for the source.

Parameters:

  • name
    (str) –

    The name of the source.

Returns:

  • str

    The qualified key field name.

qualified_index_fields

qualified_index_fields(name: str) -> list[str]

Get the qualified index fields for the source.

Parameters:

  • name
    (str) –

    The name of the source.

Returns:

  • list[str]

    List of qualified index field names.

qualify_field

qualify_field(name: str, field: str) -> str

Qualify field names with the source name.

Parameters:

  • name
    (str) –

    The name of the source.

  • field
    (str) –

    The field name to qualify.

Returns:

  • str

    A single qualified field.

f

Qualify one or more field names with the source name.

Parameters:

  • name
    (str) –

    The name of the source.

  • fields
    (str | Iterable[str]) –

    The field name to qualify, or a list of field names.

Returns:

  • str | list[str]

    A single qualified field, or a list of qualified field names.

QueryCombineType

Bases: StrEnum


              flowchart TD
              matchbox.common.dtos.QueryCombineType[QueryCombineType]

              

              click matchbox.common.dtos.QueryCombineType href "" "matchbox.common.dtos.QueryCombineType"
            

Enumeration of ways to combine multiple rows having the same matchbox ID.

Attributes:

CONCAT class-attribute instance-attribute

CONCAT = 'concat'

EXPLODE class-attribute instance-attribute

EXPLODE = 'explode'

SET_AGG class-attribute instance-attribute

SET_AGG = 'set_agg'

QueryConfig

Bases: BaseModel


              flowchart TD
              matchbox.common.dtos.QueryConfig[QueryConfig]

              

              click matchbox.common.dtos.QueryConfig href "" "matchbox.common.dtos.QueryConfig"
            

Configuration of query generating model inputs.

A QueryConfig is a view onto the step subgraph, a triangulation of a set of sources and an optional resolver. It doesn’t describe topology, which is why it has no .parents attribute.

Methods:

Attributes:

sources instance-attribute

sources: tuple[SourceStepName, ...]

resolver class-attribute instance-attribute

resolver: ResolverStepName | None = None

combine_type class-attribute instance-attribute

combine_type: QueryCombineType = CONCAT

cleaning class-attribute instance-attribute

cleaning: dict[str, str] | None = None

dependencies property

dependencies: list[StepName]

Local execution prerequisites.

While this can contain information about graph topology, it should only be used to check validity, never to reconstruct it.

resolves_from property

resolves_from: StepName

Return the step name that the query resolves from.

validate_steps

validate_steps() -> Self

Ensure that step settings are compatible.

ModelType

Bases: StrEnum


              flowchart TD
              matchbox.common.dtos.ModelType[ModelType]

              

              click matchbox.common.dtos.ModelType href "" "matchbox.common.dtos.ModelType"
            

Enumeration of supported model types.

Attributes:

LINKER class-attribute instance-attribute

LINKER = 'linker'

DEDUPER class-attribute instance-attribute

DEDUPER = 'deduper'

ModelConfig

Bases: BaseModel


              flowchart TD
              matchbox.common.dtos.ModelConfig[ModelConfig]

              

              click matchbox.common.dtos.ModelConfig href "" "matchbox.common.dtos.ModelConfig"
            

Configuration for model that has or could be added to the server.

Methods:

Attributes:

type instance-attribute

type: ModelType

model_class instance-attribute

model_class: str

model_settings instance-attribute

model_settings: JsonObject

left_query instance-attribute

left_query: QueryConfig

right_query class-attribute instance-attribute

right_query: QueryConfig | None = None

dependencies property

dependencies: list[StepName]

Local execution prerequisites.

While this can contain information about graph topology, it should only be used to check validity, never to reconstruct it.

parents property

parents: list[StepName]

Direct DAG edges to this node.

validate_right_query

validate_right_query() -> Self

Ensure that a right query is set if and only if model is linker.

ResolverType

Bases: StrEnum


              flowchart TD
              matchbox.common.dtos.ResolverType[ResolverType]

              

              click matchbox.common.dtos.ResolverType href "" "matchbox.common.dtos.ResolverType"
            

Enumeration of supported resolver methodology types.

Attributes:

COMPONENTS class-attribute instance-attribute

COMPONENTS = 'components'

ResolverConfig

Bases: BaseModel


              flowchart TD
              matchbox.common.dtos.ResolverConfig[ResolverConfig]

              

              click matchbox.common.dtos.ResolverConfig href "" "matchbox.common.dtos.ResolverConfig"
            

Configuration for resolver that combines model and resolver outputs.

Methods:

Attributes:

resolver_class instance-attribute

resolver_class: str

resolver_settings instance-attribute

resolver_settings: JsonObject

inputs instance-attribute

inputs: tuple[ModelStepName, ...]

dependencies property

dependencies: list[ModelStepName]

Local execution prerequisites.

While this can contain information about graph topology, it should only be used to check validity, never to reconstruct it.

parents property

parents: list[ModelStepName]

Direct DAG edges to this node.

validate_inputs

validate_inputs() -> Self

Ensure resolver config has at least one input.

Match

Bases: BaseModel


              flowchart TD
              matchbox.common.dtos.Match[Match]

              

              click matchbox.common.dtos.Match href "" "matchbox.common.dtos.Match"
            

A match between primary keys in the Matchbox database.

Methods:

  • found_or_none

    Ensure that a match has sources and a cluster if target was found.

  • serialise_ids

    Turn set to sorted list when serialising.

Attributes:

cluster instance-attribute

cluster: int | None

source instance-attribute

source_id class-attribute instance-attribute

source_id: set[str] = Field(default_factory=set)

target instance-attribute

target_id class-attribute instance-attribute

target_id: set[str] = Field(default_factory=set)

found_or_none

found_or_none() -> Match

Ensure that a match has sources and a cluster if target was found.

serialise_ids

serialise_ids(id_set: set[str]) -> list[str]

Turn set to sorted list when serialising.

Step

Bases: BaseModel


              flowchart TD
              matchbox.common.dtos.Step[Step]

              

              click matchbox.common.dtos.Step href "" "matchbox.common.dtos.Step"
            

Unified step type with common fields and discriminated config.

Methods:

Attributes:

description class-attribute instance-attribute

description: str | None = Field(default=None, description='Description')

fingerprint instance-attribute

fingerprint: Annotated[bytes, PlainSerializer(hash_to_base64, return_type=str), PlainValidator(base64_to_hash)]

step_type instance-attribute

step_type: StepType

config instance-attribute

validate_description classmethod

validate_description(value: str | None) -> str | None

Ensure the description is not empty if provided.

validate_step_type_matches_config

validate_step_type_matches_config() -> Self

Ensure step_type matches the config type.

Run

Bases: BaseModel


              flowchart TD
              matchbox.common.dtos.Run[Run]

              

              click matchbox.common.dtos.Run href "" "matchbox.common.dtos.Run"
            

A run within a collection.

Attributes:

run_id class-attribute instance-attribute

run_id: RunID | None = Field(description='Unique ID of the run')

is_default class-attribute instance-attribute

is_default: bool = Field(default=False, description='Whether this run is the default in its collection')

is_mutable class-attribute instance-attribute

is_mutable: bool = Field(default=False, description='Whether this run can be modified')

steps class-attribute instance-attribute

steps: dict[StepName, Step] = Field(default_factory=dict, description='Dict of step objects by name within this run')

Collection

Bases: BaseModel


              flowchart TD
              matchbox.common.dtos.Collection[Collection]

              

              click matchbox.common.dtos.Collection href "" "matchbox.common.dtos.Collection"
            

A collection of runs.

Methods:

Attributes:

default_run class-attribute instance-attribute

default_run: RunID | None = Field(default=None, description='ID of default run for this collection')

runs class-attribute instance-attribute

runs: list[RunID] = Field(default_factory=list, description='List of run IDs in this collection')

validate_default_run

validate_default_run() -> Self

Check default run is within all runs.

ResourceOperationStatus

Bases: BaseModel


              flowchart TD
              matchbox.common.dtos.ResourceOperationStatus[ResourceOperationStatus]

              

              click matchbox.common.dtos.ResourceOperationStatus href "" "matchbox.common.dtos.ResourceOperationStatus"
            

Status response for any resource operation.

Methods:

Attributes:

success instance-attribute

success: bool

target instance-attribute

target: str

operation instance-attribute

operation: CRUDOperation

details class-attribute instance-attribute

details: str | None = None

error_examples classmethod

error_examples() -> dict

Examples for error codes.

CountResult

Bases: BaseModel


              flowchart TD
              matchbox.common.dtos.CountResult[CountResult]

              

              click matchbox.common.dtos.CountResult href "" "matchbox.common.dtos.CountResult"
            

Response model for count results.

Attributes:

entities instance-attribute

UploadStage

Bases: StrEnum


              flowchart TD
              matchbox.common.dtos.UploadStage[UploadStage]

              

              click matchbox.common.dtos.UploadStage href "" "matchbox.common.dtos.UploadStage"
            

Enumeration of stages of a file upload and its processing.

Attributes:

READY class-attribute instance-attribute

READY = 'ready'

PROCESSING class-attribute instance-attribute

PROCESSING = 'processing'

COMPLETE class-attribute instance-attribute

COMPLETE = 'complete'

UploadInfo

Bases: BaseModel


              flowchart TD
              matchbox.common.dtos.UploadInfo[UploadInfo]

              

              click matchbox.common.dtos.UploadInfo href "" "matchbox.common.dtos.UploadInfo"
            

Response model for file upload processes.

Attributes:

stage class-attribute instance-attribute

stage: UploadStage | None = None

error class-attribute instance-attribute

error: str | None = None

NotFoundError

Bases: BaseModel


              flowchart TD
              matchbox.common.dtos.NotFoundError[NotFoundError]

              

              click matchbox.common.dtos.NotFoundError href "" "matchbox.common.dtos.NotFoundError"
            

API error for a 404 status code.

Attributes:

details instance-attribute

details: str

entity instance-attribute

InvalidParameterError

Bases: BaseModel


              flowchart TD
              matchbox.common.dtos.InvalidParameterError[InvalidParameterError]

              

              click matchbox.common.dtos.InvalidParameterError href "" "matchbox.common.dtos.InvalidParameterError"
            

API error for a custom 422 status code.

Attributes:

details instance-attribute

details: str

parameter instance-attribute

parameter: BackendParameterType | None

ErrorResponse

Bases: BaseModel


              flowchart TD
              matchbox.common.dtos.ErrorResponse[ErrorResponse]

              

              click matchbox.common.dtos.ErrorResponse href "" "matchbox.common.dtos.ErrorResponse"
            

Unified error response for all HTTP error status codes.

This DTO enables the client to reconstruct the exact exception type that was raised on the server.

Attributes:

exception_type class-attribute instance-attribute

exception_type: MatchboxExceptionType = Field(description='The name of the exception class raised on the server')

message class-attribute instance-attribute

message: str = Field(description='Human-readable error message')

details class-attribute instance-attribute

details: dict[str, Any] | None = Field(default=None, description='Exception-specific data for reconstruction')

DefaultUser

Bases: StrEnum


              flowchart TD
              matchbox.common.dtos.DefaultUser[DefaultUser]

              

              click matchbox.common.dtos.DefaultUser href "" "matchbox.common.dtos.DefaultUser"
            

Default user identities.

Attributes:

PUBLIC class-attribute instance-attribute

PUBLIC = '_public'

DefaultGroup

Bases: StrEnum


              flowchart TD
              matchbox.common.dtos.DefaultGroup[DefaultGroup]

              

              click matchbox.common.dtos.DefaultGroup href "" "matchbox.common.dtos.DefaultGroup"
            

Default group names.

Attributes:

PUBLIC class-attribute instance-attribute

PUBLIC = 'public'

ADMINS class-attribute instance-attribute

ADMINS = 'admins'

validate_matchbox_name

validate_matchbox_name(value: str) -> str

Validate matchbox name format.

Parameters:

  • value

    (str) –

    The name to validate

Returns:

  • str

    The validated name

Raises: