Skip to content

DTOs

matchbox.common.dtos

Data transfer objects for Matchbox API.

Classes:

Attributes:

CollectionName module-attribute

CollectionName: TypeAlias = MatchboxName

Type alias for collection names.

RunID module-attribute

RunID: TypeAlias = int

Type alias for run IDs.

SourceResolutionName module-attribute

SourceResolutionName: TypeAlias = MatchboxName

Type alias for source resolution names.

ModelResolutionName module-attribute

ModelResolutionName: TypeAlias = MatchboxName

Type alias for model resolution names.

ResolutionName module-attribute

Type alias for any resolution names.

SourceResolutionPath module-attribute

SourceResolutionPath: TypeAlias = ResolutionPath

Type alias for source resolution paths.

ModelResolutionPath module-attribute

ModelResolutionPath: TypeAlias = ResolutionPath

Type alias for model resolution paths.

DataTypes

Bases: StrEnum

Enumeration of supported data types.

Uses polars datatypes as its backend.

Methods:

  • to_dtype

    Convert enum value to actual polars dtype.

  • to_pytype

    Convert enum value to actual Python type.

  • from_dtype

    Get enum value from a polars dtype.

  • from_pytype

    Get enum value from a Python type.

Attributes:

BOOLEAN class-attribute instance-attribute

BOOLEAN = 'Boolean'

INT8 class-attribute instance-attribute

INT8 = 'Int8'

INT16 class-attribute instance-attribute

INT16 = 'Int16'

INT32 class-attribute instance-attribute

INT32 = 'Int32'

INT64 class-attribute instance-attribute

INT64 = 'Int64'

UINT8 class-attribute instance-attribute

UINT8 = 'UInt8'

UINT16 class-attribute instance-attribute

UINT16 = 'UInt16'

UINT32 class-attribute instance-attribute

UINT32 = 'UInt32'

UINT64 class-attribute instance-attribute

UINT64 = 'UInt64'

FLOAT32 class-attribute instance-attribute

FLOAT32 = 'Float32'

FLOAT64 class-attribute instance-attribute

FLOAT64 = 'Float64'

DECIMAL class-attribute instance-attribute

DECIMAL = 'Decimal'

STRING class-attribute instance-attribute

STRING = 'String'

BINARY class-attribute instance-attribute

BINARY = 'Binary'

DATE class-attribute instance-attribute

DATE = 'Date'

TIME class-attribute instance-attribute

TIME = 'Time'

DATETIME class-attribute instance-attribute

DATETIME = 'Datetime'

DURATION class-attribute instance-attribute

DURATION = 'Duration'

ARRAY class-attribute instance-attribute

ARRAY = 'Array'

LIST class-attribute instance-attribute

LIST = 'List'

OBJECT class-attribute instance-attribute

OBJECT = 'Object'

CATEGORICAL class-attribute instance-attribute

CATEGORICAL = 'Categorical'

ENUM class-attribute instance-attribute

ENUM = 'Enum'

STRUCT class-attribute instance-attribute

STRUCT = 'Struct'

NULL class-attribute instance-attribute

NULL = 'Null'

to_dtype

to_dtype() -> DataType

Convert enum value to actual polars dtype.

to_pytype

to_pytype() -> type

Convert enum value to actual Python type.

from_dtype classmethod

from_dtype(dtype: DataType) -> DataTypes

Get enum value from a polars dtype.

from_pytype classmethod

from_pytype(pytype: type) -> DataTypes

Get enum value from a Python type.

OKMessage

Bases: BaseModel

Generic HTTP OK response.

Attributes:

status class-attribute instance-attribute

status: str = Field(default='OK')

version class-attribute instance-attribute

version: str = Field(default_factory=lambda: version('matchbox-db'))

LoginAttempt

Bases: BaseModel

Request for log in process.

Attributes:

user_name instance-attribute

user_name: str

LoginResult

Bases: BaseModel

Response from log in process.

Attributes:

user_id instance-attribute

user_id: int

BackendCountableType

Bases: StrEnum

Enumeration of supported backend countable types.

Attributes:

SOURCES class-attribute instance-attribute

SOURCES = 'sources'

MODELS class-attribute instance-attribute

MODELS = 'models'

DATA class-attribute instance-attribute

DATA = 'data'

CLUSTERS class-attribute instance-attribute

CLUSTERS = 'clusters'

CREATES class-attribute instance-attribute

CREATES = 'creates'

MERGES class-attribute instance-attribute

MERGES = 'merges'

PROPOSES class-attribute instance-attribute

PROPOSES = 'proposes'

ModelResultsType

Bases: StrEnum

Enumeration of supported model results types.

Attributes:

PROBABILITIES class-attribute instance-attribute

PROBABILITIES = 'probabilities'

CLUSTERS class-attribute instance-attribute

CLUSTERS = 'clusters'

BackendResourceType

Bases: StrEnum

Enumeration of resources types referenced by client or API.

Attributes:

COLLECTION class-attribute instance-attribute

COLLECTION = 'collection'

RUN class-attribute instance-attribute

RUN = 'run'

RESOLUTION class-attribute instance-attribute

RESOLUTION = 'resolution'

CLUSTER class-attribute instance-attribute

CLUSTER = 'cluster'

USER class-attribute instance-attribute

USER = 'user'

JUDGEMENT class-attribute instance-attribute

JUDGEMENT = 'judgement'

BackendParameterType

Bases: StrEnum

Enumeration of parameter types passable to the API.

Attributes:

SAMPLE_SIZE class-attribute instance-attribute

SAMPLE_SIZE = 'sample_size'

NAME class-attribute instance-attribute

NAME = 'name'

BackendUploadType

Bases: StrEnum

Enumeration of supported backend upload types.

Attributes:

INDEX class-attribute instance-attribute

INDEX = 'index'

RESULTS class-attribute instance-attribute

RESULTS = 'results'

schema property

schema

Get the schema for the upload type.

CRUDOperation

Bases: StrEnum

Enumeration of CRUD operations.

Attributes:

CREATE class-attribute instance-attribute

CREATE = 'create'

UPDATE class-attribute instance-attribute

UPDATE = 'update'

DELETE class-attribute instance-attribute

DELETE = 'delete'

LocationType

Bases: StrEnum

Enumeration of location types.

Attributes:

RDBMS class-attribute instance-attribute

RDBMS = 'rdbms'

MatchboxName

Bases: str

Sub-class of string which validates names for the Matchbox DB.

Attributes:

PATTERN class-attribute instance-attribute

PATTERN = '^[a-zA-Z0-9_.-]+$'

ResolutionPath

Bases: BaseModel

Base resolution identifier with collection, run, and name.

Attributes:

collection instance-attribute

collection: CollectionName

run instance-attribute

run: RunID

name instance-attribute

ResolutionType

Bases: StrEnum

Types of nodes in a resolution.

Attributes:

SOURCE class-attribute instance-attribute

SOURCE = 'source'

MODEL class-attribute instance-attribute

MODEL = 'model'

LocationConfig

Bases: BaseModel

Metadata for a location.

Attributes:

type instance-attribute

name instance-attribute

name: str

SourceField

Bases: BaseModel

A field in a source that can be indexed in the Matchbox database.

Attributes:

name class-attribute instance-attribute

name: str = Field(description='The name of the field in the source after the extract/transform logic has been applied.')

type class-attribute instance-attribute

type: DataTypes = Field(description='The cached field type. Used to ensure a stable hash.')

SourceConfig

Bases: BaseModel

Configuration of a source that can, or has been, indexed in the backend.

They are foundational processes on top of which linking and deduplication models can build new resolutions.

Methods:

  • validate_key_field

    Ensure that the key field is a string and not in the index fields.

  • prefix

    Get the prefix for the source.

  • qualified_key

    Get the qualified key for the source.

  • qualified_index_fields

    Get the qualified index fields for the source.

  • qualify_field

    Qualify field names with the source name.

  • f

    Qualify one or more field names with the source name.

Attributes:

location_config class-attribute instance-attribute

location_config: LocationConfig = Field(description='The location of the source. Used to run the extract/tansform logic.')

extract_transform class-attribute instance-attribute

extract_transform: str = Field(description='Logic to extract and transform data from the source. Language is location dependent.')

key_field class-attribute instance-attribute

key_field: SourceField = Field(description=dedent("\n            The key field. This is the source's key for unique\n            entities, such as a primary key in a relational database.\n\n            Keys must ALWAYS be a string.\n\n            For example, if the source describes companies, it may have used\n            a Companies House number as its key.\n\n            This key is ALWAYS correct. It should be something generated and\n            owned by the source being indexed.\n            \n            For example, your organisation's CRM ID is a key field within the CRM.\n            \n            A CRM ID entered by hand in another dataset shouldn't be used \n            as a key field.\n        "))

index_fields class-attribute instance-attribute

index_fields: tuple[SourceField, ...] = Field(default=None, description=dedent('\n            The fields to index in this source, after the extract/transform logic \n            has been applied. \n\n            This is usually set manually, and should map onto the columns that the\n            extract/transform logic returns.\n            '))

dependencies property

dependencies: list[ResolutionPath]

Return all resolution names that this source needs.

Provided for symmetry with ModelConfig.

validate_key_field

validate_key_field() -> Self

Ensure that the key field is a string and not in the index fields.

prefix

prefix(name: str) -> str

Get the prefix for the source.

Parameters:

  • name
    (str) –

    The name of the source.

Returns:

  • str

    The prefix string (name + “_”).

qualified_key

qualified_key(name: str) -> str

Get the qualified key for the source.

Parameters:

  • name
    (str) –

    The name of the source.

Returns:

  • str

    The qualified key field name.

qualified_index_fields

qualified_index_fields(name: str) -> list[str]

Get the qualified index fields for the source.

Parameters:

  • name
    (str) –

    The name of the source.

Returns:

  • list[str]

    List of qualified index field names.

qualify_field

qualify_field(name: str, field: str) -> str

Qualify field names with the source name.

Parameters:

  • name
    (str) –

    The name of the source.

  • field
    (str) –

    The field name to qualify.

Returns:

  • str

    A single qualified field.

f

Qualify one or more field names with the source name.

Parameters:

  • name
    (str) –

    The name of the source.

  • fields
    (str | Iterable[str]) –

    The field name to qualify, or a list of field names.

Returns:

  • str | list[str]

    A single qualified field, or a list of qualified field names.

QueryCombineType

Bases: StrEnum

Enumeration of ways to combine multiple rows having the same matchbox ID.

Attributes:

CONCAT class-attribute instance-attribute

CONCAT = 'concat'

EXPLODE class-attribute instance-attribute

EXPLODE = 'explode'

SET_AGG class-attribute instance-attribute

SET_AGG = 'set_agg'

QueryConfig

Bases: BaseModel

Configuration of query generating model inputs.

Methods:

Attributes:

source_resolutions instance-attribute

source_resolutions: tuple[SourceResolutionPath, ...]

model_resolution class-attribute instance-attribute

model_resolution: ModelResolutionPath | None = None

combine_type class-attribute instance-attribute

combine_type: QueryCombineType = CONCAT

threshold class-attribute instance-attribute

threshold: int | None = None

cleaning class-attribute instance-attribute

cleaning: dict[str, str] | None = None

dependencies property

dependencies: list[ResolutionPath]

Return all resolution names that this query needs.

point_of_truth property

point_of_truth

Return path of resolution that will be used as point of truth.

validate_resolutions

validate_resolutions() -> Self

Ensure that resolution settings are compatible.

validate_cleaning_dict classmethod

validate_cleaning_dict(v: dict[str, str] | None) -> str | None

Validate cleaning as valid SQL.

ModelType

Bases: StrEnum

Enumeration of supported model types.

Attributes:

LINKER class-attribute instance-attribute

LINKER = 'linker'

DEDUPER class-attribute instance-attribute

DEDUPER = 'deduper'

ModelConfig

Bases: BaseModel

Configuration for model that has or could be added to the server.

Methods:

Attributes:

type instance-attribute

type: ModelType

model_class instance-attribute

model_class: str

model_settings instance-attribute

model_settings: str

left_query instance-attribute

left_query: QueryConfig

right_query class-attribute instance-attribute

right_query: QueryConfig | None = None

dependencies property

dependencies: list[ResolutionPath]

Return all resolution names that this model needs.

validate_right_query

validate_right_query() -> Self

Ensure that a right query is set if and only if model is linker.

validate_settings_json classmethod

validate_settings_json(value: str) -> str

Ensure that the model settings is valid JSON.

Match

Bases: BaseModel

A match between primary keys in the Matchbox database.

Methods:

  • found_or_none

    Ensure that a match has sources and a cluster if target was found.

  • serialise_ids

    Turn set to sorted list when serialising.

Attributes:

cluster instance-attribute

cluster: int | None

source instance-attribute

source_id class-attribute instance-attribute

source_id: set[str] = Field(default_factory=set)

target instance-attribute

target_id class-attribute instance-attribute

target_id: set[str] = Field(default_factory=set)

found_or_none

found_or_none() -> Match

Ensure that a match has sources and a cluster if target was found.

serialise_ids

serialise_ids(id_set: set[str])

Turn set to sorted list when serialising.

Resolution

Bases: BaseModel

Unified resolution type with common fields and discriminated config.

Methods:

Attributes:

description class-attribute instance-attribute

description: str | None = Field(default=None, description='Description')

truth class-attribute instance-attribute

truth: int | None = Field(default=None, ge=0, le=100, strict=True)

resolution_type instance-attribute

resolution_type: ResolutionType

config instance-attribute

validate_description classmethod

validate_description(value: str | None) -> str | None

Ensure the description is not empty if provided.

validate_resolution_type_matches_config

validate_resolution_type_matches_config()

Ensure resolution_type matches the config type.

validate_truth_matches_type

validate_truth_matches_type()

Ensure truth field matches resolution type requirements.

Run

Bases: BaseModel

A run within a collection.

Attributes:

run_id class-attribute instance-attribute

run_id: RunID | None = Field(description='Unique ID of the run')

is_default class-attribute instance-attribute

is_default: bool = Field(default=False, description='Whether this run is the default in its collection')

is_mutable class-attribute instance-attribute

is_mutable: bool = Field(default=False, description='Whether this run can be modified')

resolutions class-attribute instance-attribute

resolutions: dict[ResolutionName, Resolution] = Field(default_factory=dict, description='Dict of resolution objects by name within this run')

Collection

Bases: BaseModel

A collection of runs.

Methods:

Attributes:

default_run class-attribute instance-attribute

default_run: RunID | None = Field(default=None, description='ID of default run for this collection')

runs class-attribute instance-attribute

runs: list[RunID] = Field(default_factory=list, description='List of run IDs in this collection')

validate_default_run

validate_default_run() -> Self

Check default run is within all runs.

ResourceOperationStatus

Bases: BaseModel

Status response for any resource operation.

Methods:

Attributes:

success instance-attribute

success: bool

name instance-attribute

operation instance-attribute

operation: CRUDOperation

details class-attribute instance-attribute

details: str | None = None

status_409_examples classmethod

status_409_examples() -> dict

Examples for 409 status code.

status_500_examples classmethod

status_500_examples() -> dict

Examples for 500 status code.

CountResult

Bases: BaseModel

Response model for count results.

Attributes:

entities instance-attribute

UploadStage

Bases: StrEnum

Enumeration of stages of a file upload and its processing.

Attributes:

READY class-attribute instance-attribute

READY = 'ready'

AWAITING_UPLOAD class-attribute instance-attribute

AWAITING_UPLOAD = 'awaiting_upload'

QUEUED class-attribute instance-attribute

QUEUED = 'queued'

PROCESSING class-attribute instance-attribute

PROCESSING = 'processing'

COMPLETE class-attribute instance-attribute

COMPLETE = 'complete'

FAILED class-attribute instance-attribute

FAILED = 'failed'

UNKNOWN class-attribute instance-attribute

UNKNOWN = 'unknown'

UploadStatus

Bases: BaseModel

Response model for any file upload processes.

Methods:

Attributes:

id instance-attribute

id: str

stage instance-attribute

stage: UploadStage

update_timestamp instance-attribute

update_timestamp: datetime

details class-attribute instance-attribute

details: str | None = None

entity class-attribute instance-attribute

entity: BackendUploadType | None = None

get_http_code

get_http_code() -> int

Get the HTTP status code for the upload stage.

status_400_examples classmethod

status_400_examples() -> dict

Examples for 400 status code.

NotFoundError

Bases: BaseModel

API error for a 404 status code.

Attributes:

details instance-attribute

details: str

entity instance-attribute

InvalidParameterError

Bases: BaseModel

API error for a custom 422 status code.

Attributes:

details instance-attribute

details: str

parameter instance-attribute

parameter: BackendParameterType | None