DTOs
matchbox.common.dtos
¶
Data transfer objects for Matchbox API.
Classes:
-
DataTypes
–Enumeration of supported data types.
-
OKMessage
–Generic HTTP OK response.
-
LoginAttempt
–Request for log in process.
-
LoginResult
–Response from log in process.
-
BackendCountableType
–Enumeration of supported backend countable types.
-
ModelResultsType
–Enumeration of supported model results types.
-
BackendResourceType
–Enumeration of resources types referenced by client or API.
-
BackendParameterType
–Enumeration of parameter types passable to the API.
-
BackendUploadType
–Enumeration of supported backend upload types.
-
CRUDOperation
–Enumeration of CRUD operations.
-
LocationType
–Enumeration of location types.
-
MatchboxName
–Sub-class of string which validates names for the Matchbox DB.
-
ResolutionPath
–Base resolution identifier with collection, run, and name.
-
ResolutionType
–Types of nodes in a resolution.
-
LocationConfig
–Metadata for a location.
-
SourceField
–A field in a source that can be indexed in the Matchbox database.
-
SourceConfig
–Configuration of a source that can, or has been, indexed in the backend.
-
QueryCombineType
–Enumeration of ways to combine multiple rows having the same matchbox ID.
-
QueryConfig
–Configuration of query generating model inputs.
-
ModelType
–Enumeration of supported model types.
-
ModelConfig
–Configuration for model that has or could be added to the server.
-
Match
–A match between primary keys in the Matchbox database.
-
Resolution
–Unified resolution type with common fields and discriminated config.
-
Run
–A run within a collection.
-
Collection
–A collection of runs.
-
ResourceOperationStatus
–Status response for any resource operation.
-
CountResult
–Response model for count results.
-
UploadStage
–Enumeration of stages of a file upload and its processing.
-
UploadStatus
–Response model for any file upload processes.
-
NotFoundError
–API error for a 404 status code.
-
InvalidParameterError
–API error for a custom 422 status code.
Attributes:
-
CollectionName
(TypeAlias
) –Type alias for collection names.
-
RunID
(TypeAlias
) –Type alias for run IDs.
-
SourceResolutionName
(TypeAlias
) –Type alias for source resolution names.
-
ModelResolutionName
(TypeAlias
) –Type alias for model resolution names.
-
ResolutionName
(TypeAlias
) –Type alias for any resolution names.
-
SourceResolutionPath
(TypeAlias
) –Type alias for source resolution paths.
-
ModelResolutionPath
(TypeAlias
) –Type alias for model resolution paths.
CollectionName
module-attribute
¶
CollectionName: TypeAlias = MatchboxName
Type alias for collection names.
SourceResolutionName
module-attribute
¶
SourceResolutionName: TypeAlias = MatchboxName
Type alias for source resolution names.
ModelResolutionName
module-attribute
¶
ModelResolutionName: TypeAlias = MatchboxName
Type alias for model resolution names.
ResolutionName
module-attribute
¶
ResolutionName: TypeAlias = SourceResolutionName | ModelResolutionName
Type alias for any resolution names.
SourceResolutionPath
module-attribute
¶
SourceResolutionPath: TypeAlias = ResolutionPath
Type alias for source resolution paths.
ModelResolutionPath
module-attribute
¶
ModelResolutionPath: TypeAlias = ResolutionPath
Type alias for model resolution paths.
DataTypes
¶
Bases: StrEnum
Enumeration of supported data types.
Uses polars datatypes as its backend.
Methods:
-
to_dtype
–Convert enum value to actual polars dtype.
-
to_pytype
–Convert enum value to actual Python type.
-
from_dtype
–Get enum value from a polars dtype.
-
from_pytype
–Get enum value from a Python type.
Attributes:
OKMessage
¶
LoginAttempt
¶
LoginResult
¶
BackendResourceType
¶
Bases: StrEnum
Enumeration of resources types referenced by client or API.
Attributes:
-
COLLECTION
– -
RUN
– -
RESOLUTION
– -
CLUSTER
– -
USER
– -
JUDGEMENT
–
BackendParameterType
¶
CRUDOperation
¶
LocationType
¶
MatchboxName
¶
ResolutionPath
¶
Bases: BaseModel
Base resolution identifier with collection, run, and name.
Attributes:
-
collection
(CollectionName
) – -
run
(RunID
) – -
name
(ResolutionName
) –
ResolutionType
¶
LocationConfig
¶
SourceField
¶
Bases: BaseModel
A field in a source that can be indexed in the Matchbox database.
Attributes:
SourceConfig
¶
Bases: BaseModel
Configuration of a source that can, or has been, indexed in the backend.
They are foundational processes on top of which linking and deduplication models can build new resolutions.
Methods:
-
validate_key_field
–Ensure that the key field is a string and not in the index fields.
-
prefix
–Get the prefix for the source.
-
qualified_key
–Get the qualified key for the source.
-
qualified_index_fields
–Get the qualified index fields for the source.
-
qualify_field
–Qualify field names with the source name.
-
f
–Qualify one or more field names with the source name.
Attributes:
-
location_config
(LocationConfig
) – -
extract_transform
(str
) – -
key_field
(SourceField
) – -
index_fields
(tuple[SourceField, ...]
) – -
dependencies
(list[ResolutionPath]
) –Return all resolution names that this source needs.
location_config
class-attribute
instance-attribute
¶
location_config: LocationConfig = Field(description='The location of the source. Used to run the extract/tansform logic.')
extract_transform
class-attribute
instance-attribute
¶
extract_transform: str = Field(description='Logic to extract and transform data from the source. Language is location dependent.')
key_field
class-attribute
instance-attribute
¶
key_field: SourceField = Field(description=dedent("\n The key field. This is the source's key for unique\n entities, such as a primary key in a relational database.\n\n Keys must ALWAYS be a string.\n\n For example, if the source describes companies, it may have used\n a Companies House number as its key.\n\n This key is ALWAYS correct. It should be something generated and\n owned by the source being indexed.\n \n For example, your organisation's CRM ID is a key field within the CRM.\n \n A CRM ID entered by hand in another dataset shouldn't be used \n as a key field.\n "))
index_fields
class-attribute
instance-attribute
¶
index_fields: tuple[SourceField, ...] = Field(default=None, description=dedent('\n The fields to index in this source, after the extract/transform logic \n has been applied. \n\n This is usually set manually, and should map onto the columns that the\n extract/transform logic returns.\n '))
dependencies
property
¶
dependencies: list[ResolutionPath]
Return all resolution names that this source needs.
Provided for symmetry with ModelConfig.
validate_key_field
¶
validate_key_field() -> Self
Ensure that the key field is a string and not in the index fields.
prefix
¶
qualified_key
¶
qualified_index_fields
¶
qualify_field
¶
f
¶
QueryCombineType
¶
Bases: StrEnum
Enumeration of ways to combine multiple rows having the same matchbox ID.
Attributes:
QueryConfig
¶
Bases: BaseModel
Configuration of query generating model inputs.
Methods:
-
validate_resolutions
–Ensure that resolution settings are compatible.
-
validate_cleaning_dict
–Validate cleaning as valid SQL.
Attributes:
-
source_resolutions
(tuple[SourceResolutionPath, ...]
) – -
model_resolution
(ModelResolutionPath | None
) – -
combine_type
(QueryCombineType
) – -
threshold
(int | None
) – -
cleaning
(dict[str, str] | None
) – -
dependencies
(list[ResolutionPath]
) –Return all resolution names that this query needs.
-
point_of_truth
–Return path of resolution that will be used as point of truth.
model_resolution
class-attribute
instance-attribute
¶
model_resolution: ModelResolutionPath | None = None
dependencies
property
¶
dependencies: list[ResolutionPath]
Return all resolution names that this query needs.
point_of_truth
property
¶
Return path of resolution that will be used as point of truth.
ModelType
¶
ModelConfig
¶
Bases: BaseModel
Configuration for model that has or could be added to the server.
Methods:
-
validate_right_query
–Ensure that a right query is set if and only if model is linker.
-
validate_settings_json
–Ensure that the model settings is valid JSON.
Attributes:
-
type
(ModelType
) – -
model_class
(str
) – -
model_settings
(str
) – -
left_query
(QueryConfig
) – -
right_query
(QueryConfig | None
) – -
dependencies
(list[ResolutionPath]
) –Return all resolution names that this model needs.
dependencies
property
¶
dependencies: list[ResolutionPath]
Return all resolution names that this model needs.
Match
¶
Bases: BaseModel
A match between primary keys in the Matchbox database.
Methods:
-
found_or_none
–Ensure that a match has sources and a cluster if target was found.
-
serialise_ids
–Turn set to sorted list when serialising.
Attributes:
Resolution
¶
Bases: BaseModel
Unified resolution type with common fields and discriminated config.
Methods:
-
validate_description
–Ensure the description is not empty if provided.
-
validate_resolution_type_matches_config
–Ensure resolution_type matches the config type.
-
validate_truth_matches_type
–Ensure truth field matches resolution type requirements.
Attributes:
-
description
(str | None
) – -
truth
(int | None
) – -
resolution_type
(ResolutionType
) – -
config
(SourceConfig | ModelConfig
) –
description
class-attribute
instance-attribute
¶
description: str | None = Field(default=None, description='Description')
truth
class-attribute
instance-attribute
¶
truth: int | None = Field(default=None, ge=0, le=100, strict=True)
validate_description
classmethod
¶
Ensure the description is not empty if provided.
validate_resolution_type_matches_config
¶
Ensure resolution_type matches the config type.
validate_truth_matches_type
¶
Ensure truth field matches resolution type requirements.
Run
¶
Bases: BaseModel
A run within a collection.
Attributes:
-
run_id
(RunID | None
) – -
is_default
(bool
) – -
is_mutable
(bool
) – -
resolutions
(dict[ResolutionName, Resolution]
) –
run_id
class-attribute
instance-attribute
¶
run_id: RunID | None = Field(description='Unique ID of the run')
is_default
class-attribute
instance-attribute
¶
is_default: bool = Field(default=False, description='Whether this run is the default in its collection')
is_mutable
class-attribute
instance-attribute
¶
is_mutable: bool = Field(default=False, description='Whether this run can be modified')
resolutions
class-attribute
instance-attribute
¶
resolutions: dict[ResolutionName, Resolution] = Field(default_factory=dict, description='Dict of resolution objects by name within this run')
Collection
¶
Bases: BaseModel
A collection of runs.
Methods:
-
validate_default_run
–Check default run is within all runs.
Attributes:
-
default_run
(RunID | None
) – -
runs
(list[RunID]
) –
ResourceOperationStatus
¶
Bases: BaseModel
Status response for any resource operation.
Methods:
-
status_409_examples
–Examples for 409 status code.
-
status_500_examples
–Examples for 500 status code.
Attributes:
-
success
(bool
) – -
name
(ResolutionPath | CollectionName | RunID
) – -
operation
(CRUDOperation
) – -
details
(str | None
) –
CountResult
¶
Bases: BaseModel
Response model for count results.
Attributes:
-
entities
(dict[BackendCountableType, int]
) –
UploadStage
¶
Bases: StrEnum
Enumeration of stages of a file upload and its processing.
Attributes:
-
READY
– -
AWAITING_UPLOAD
– -
QUEUED
– -
PROCESSING
– -
COMPLETE
– -
FAILED
– -
UNKNOWN
–
UploadStatus
¶
Bases: BaseModel
Response model for any file upload processes.
Methods:
-
get_http_code
–Get the HTTP status code for the upload stage.
-
status_400_examples
–Examples for 400 status code.
Attributes:
-
id
(str
) – -
stage
(UploadStage
) – -
update_timestamp
(datetime
) – -
details
(str | None
) – -
entity
(BackendUploadType | None
) –
NotFoundError
¶
Bases: BaseModel
API error for a 404 status code.
Attributes:
-
details
(str
) – -
entity
(BackendResourceType
) –
InvalidParameterError
¶
Bases: BaseModel
API error for a custom 422 status code.
Attributes:
-
details
(str
) – -
parameter
(BackendParameterType | None
) –