Overview¶
matchbox.server
¶
Matchbox server.
Includes the API, and database adapters for various backends.
Modules:
-
api
–Matchbox API.
-
base
–Base classes and utilities for Matchbox database adapters.
-
postgresql
–PostgreSQL adapter for Matchbox server.
-
uploads
–Worker logic to process user uploads.
Classes:
-
MatchboxDBAdapter
–An abstract base class for Matchbox database adapters.
-
MatchboxServerSettings
–Settings for the Matchbox application.
MatchboxDBAdapter
¶
Bases: ABC
An abstract base class for Matchbox database adapters.
Methods:
-
query
–Queries the database from an optional point of truth.
-
match
–Matches an ID in a source resolution and returns the keys in the targets.
-
create_collection
–Create a new collection.
-
get_collection
–Get collection metadata.
-
list_collections
–List all collection names.
-
delete_collection
–Delete a collection and all its versions.
-
create_run
–Create a new run.
-
set_run_mutable
–Set the mutability of a run.
-
set_run_default
–Set the default status of a run.
-
get_run
–Get run metadata and resolutions.
-
delete_run
–Delete a run and all its resolutions.
-
create_resolution
–Writes a resolution to Matchbox.
-
get_resolution
–Get a resolution from its path.
-
delete_resolution
–Delete a resolution from the database.
-
insert_source_data
–Inserts hash data for a source resolution.
-
insert_model_data
–Inserts results data for a model resolution.
-
get_model_data
–Get the results for a model resolution.
-
set_model_truth
–Sets the truth threshold for this model, changing the default clusters.
-
get_model_truth
–Gets the current truth threshold for this model.
-
validate_ids
–Validates a list of IDs exist in the database.
-
dump
–Dumps the entire database to a snapshot.
-
drop
–Hard clear the database by dropping all tables and re-creating.
-
clear
–Soft clear the database by deleting all rows but retaining tables.
-
restore
–Restores the database from a snapshot.
-
login
–Receives a user name and returns user ID.
-
insert_judgement
–Adds an evaluation judgement to the database.
-
get_judgements
–Retrieves all evaluation judgements.
-
compare_models
–Compare metrics of models based on evaluation data.
-
sample_for_eval
–Sample a cluster to validate.
Attributes:
-
settings
(MatchboxServerSettings
) – -
sources
(ListableAndCountable
) – -
models
(Countable
) – -
data
(Countable
) – -
clusters
(Countable
) – -
creates
(Countable
) – -
merges
(Countable
) – -
proposes
(Countable
) – -
source_resolutions
(Countable
) –
query
abstractmethod
¶
query(source: SourceResolutionPath, point_of_truth: ResolutionPath | None = None, threshold: int | None = None, return_leaf_id: bool = False, limit: int | None = None) -> Table
Queries the database from an optional point of truth.
Parameters:
-
source
¶SourceResolutionPath
) –the resolution pathidentifying the source to query
-
point_of_truth
¶optional
, default:None
) –the resolution path to use for filtering results If not specified, will use the source resolution for the queried source
-
threshold
¶optional
, default:None
) –the threshold to use for creating clusters If None, uses the models’ default threshold If an integer, uses that threshold for the specified model, and the model’s cached thresholds for its ancestors
-
return_leaf_id
¶optional
, default:False
) –whether to return cluster ID of leaves
-
limit
¶optional
, default:None
) –the number to use in a limit clause. Useful for testing
Returns:
-
Table
–The resulting matchbox IDs in Arrow format
match
abstractmethod
¶
match(key: str, source: SourceResolutionPath, targets: list[SourceResolutionPath], point_of_truth: ResolutionPath, threshold: int | None = None) -> list[Match]
Matches an ID in a source resolution and returns the keys in the targets.
Parameters:
-
key
¶str
) –The key to match from the source.
-
source
¶SourceResolutionPath
) –The path of the source resolution.
-
targets
¶list[SourceResolutionPath]
) –The paths of the target source resolutions.
-
point_of_truth
¶ResolutionPath
) –The path of the resolution to use for matching.
-
threshold
¶optional
, default:None
) –the threshold to use for creating clusters If None, uses the resolutions’ default threshold If an integer, uses that threshold for the specified resolution, and the resolution’s cached thresholds for its ancestors Will use these threshold values instead of the cached thresholds
create_collection
abstractmethod
¶
create_collection(name: CollectionName) -> Collection
Create a new collection.
Parameters:
-
name
¶CollectionName
) –The name of the collection to create.
Returns:
-
Collection
–A Collection object containing its metadata, versions, and resolutions.
get_collection
abstractmethod
¶
get_collection(name: CollectionName) -> Collection
Get collection metadata.
Parameters:
-
name
¶CollectionName
) –The name of the collection to get.
Returns:
-
Collection
–A Collection object containing its metadata, versions, and resolutions.
list_collections
abstractmethod
¶
list_collections() -> list[CollectionName]
delete_collection
abstractmethod
¶
delete_collection(name: CollectionName, certain: bool) -> None
Delete a collection and all its versions.
Parameters:
-
name
¶CollectionName
) –The name of the collection to delete.
-
certain
¶bool
) –Whether to delete the collection without confirmation.
create_run
abstractmethod
¶
create_run(collection: CollectionName) -> Run
Create a new run.
Parameters:
-
collection
¶CollectionName
) –The name of the collection to create the run in.
Returns:
-
Run
–A Run object containing its metadata and resolutions.
set_run_mutable
abstractmethod
¶
set_run_mutable(collection: CollectionName, run_id: RunID, mutable: bool) -> Run
set_run_default
abstractmethod
¶
set_run_default(collection: CollectionName, run_id: RunID, default: bool) -> Run
get_run
abstractmethod
¶
get_run(collection: CollectionName, run_id: RunID) -> Run
Get run metadata and resolutions.
Parameters:
-
collection
¶CollectionName
) –The name of the collection containing the run.
-
run_id
¶RunID
) –The ID of the run to get.
Returns:
-
Run
–A Run object containing its metadata and resolutions.
delete_run
abstractmethod
¶
delete_run(collection: CollectionName, run_id: RunID, certain: bool) -> None
create_resolution
abstractmethod
¶
create_resolution(resolution: Resolution, path: ResolutionPath) -> None
Writes a resolution to Matchbox.
Parameters:
-
resolution
¶Resolution
) –Resolution object with a source or model config
-
path
¶ResolutionPath
) –The resolution path for the source
Raises:
-
MatchboxModelConfigError
–If the configuration is invalid, such as the ModelConfig’s resolutions sharing ancestors
get_resolution
abstractmethod
¶
get_resolution(path: ResolutionPath, validate: ResolutionType | None = None) -> Resolution
Get a resolution from its path.
Parameters:
-
path
¶ResolutionPath
) –The resolution path for the source
-
validate
¶ResolutionType | None
, default:None
) –The expected type of the resolution
Returns:
-
Resolution
–A Resolution object
delete_resolution
abstractmethod
¶
delete_resolution(path: ResolutionPath, certain: bool) -> None
Delete a resolution from the database.
Parameters:
-
path
¶ResolutionPath
) –The name of the resolution to delete.
-
certain
¶bool
) –Whether to delete the model without confirmation.
insert_source_data
abstractmethod
¶
insert_source_data(path: SourceResolutionPath, data_hashes: Table) -> None
Inserts hash data for a source resolution.
Parameters:
-
path
¶SourceResolutionPath
) –The path of the source resolution to index.
-
data_hashes
¶Table
) –The Arrow table with the hash of each data row
insert_model_data
abstractmethod
¶
insert_model_data(path: ModelResolutionPath, results: Table) -> None
Inserts results data for a model resolution.
get_model_data
abstractmethod
¶
get_model_data(path: ModelResolutionPath) -> Table
Get the results for a model resolution.
set_model_truth
abstractmethod
¶
set_model_truth(path: ModelResolutionPath, truth: int) -> None
Sets the truth threshold for this model, changing the default clusters.
get_model_truth
abstractmethod
¶
get_model_truth(path: ModelResolutionPath) -> int
Gets the current truth threshold for this model.
validate_ids
abstractmethod
¶
dump
abstractmethod
¶
dump() -> MatchboxSnapshot
Dumps the entire database to a snapshot.
Returns:
-
MatchboxSnapshot
–A MatchboxSnapshot object of type “postgres” with the database’s current state.
drop
abstractmethod
¶
clear
abstractmethod
¶
restore
abstractmethod
¶
restore(snapshot: MatchboxSnapshot) -> None
Restores the database from a snapshot.
Parameters:
-
snapshot
¶MatchboxSnapshot
) –A MatchboxSnapshot object of type “postgres” with the database’s state
Raises:
-
TypeError
–If the snapshot is not compatible with PostgreSQL
insert_judgement
abstractmethod
¶
get_judgements
abstractmethod
¶
get_judgements() -> tuple[Table, Table]
Retrieves all evaluation judgements.
Returns:
-
Table
–Two PyArrow tables with the judgments and their expansion.
-
Table
–See
matchbox.common.arrow
for information on the schema.
compare_models
abstractmethod
¶
compare_models(paths: list[ModelResolutionPath]) -> ModelComparison
Compare metrics of models based on evaluation data.
Parameters:
-
paths
¶list[ModelResolutionPath]
) –List of paths of model resolutions to be compared.
Returns:
-
ModelComparison
–A model comparison object, listing metrics for each model.
sample_for_eval
abstractmethod
¶
MatchboxServerSettings
¶
Bases: BaseSettings
Settings for the Matchbox application.
Methods:
-
check_settings
–Check that legal combinations of settings are provided.
Attributes:
-
batch_size
(int
) – -
backend_type
(MatchboxBackends
) – -
datastore
(MatchboxDatastoreSettings
) – -
task_runner
(Literal['api', 'celery']
) – -
redis_uri
(str | None
) – -
uploads_expiry_minutes
(int | None
) – -
authorisation
(bool
) – -
public_key
(SecretStr | None
) – -
log_level
(LogLevelType
) –
matchbox.server.base
¶
Base classes and utilities for Matchbox database adapters.
Classes:
-
MatchboxBackends
–The available backends for Matchbox.
-
MatchboxSnapshot
–A snapshot of the Matchbox database.
-
MatchboxDatastoreSettings
–Settings specific to the datastore configuration.
-
MatchboxServerSettings
–Settings for the Matchbox application.
-
BackendManager
–Manages the Matchbox backend instance and settings.
-
Countable
–A protocol for objects that can be counted.
-
Listable
–A protocol for objects that can be listed.
-
ListableAndCountable
–A protocol for objects that can be counted and listed.
-
MatchboxDBAdapter
–An abstract base class for Matchbox database adapters.
Functions:
-
get_backend_settings
–Get the appropriate settings class based on the backend type.
-
get_backend_class
–Get the appropriate backend class based on the backend type.
-
settings_to_backend
–Create backend adapter with injected settings.
-
initialise_matchbox
–Initialise the Matchbox backend based on environment variables.
MatchboxBackends
¶
MatchboxSnapshot
¶
Bases: BaseModel
A snapshot of the Matchbox database.
Methods:
-
check_serialisable
–Validate that the value can be serialised to JSON.
Attributes:
-
backend_type
(MatchboxBackends
) – -
data
(Any
) –
MatchboxDatastoreSettings
¶
Bases: BaseSettings
Settings specific to the datastore configuration.
Methods:
-
get_client
–Returns an S3 client for the datastore.
Attributes:
-
host
(str | None
) – -
port
(int | None
) – -
access_key_id
(SecretStr | None
) – -
secret_access_key
(SecretStr | None
) – -
default_region
(str | None
) – -
cache_bucket_name
(str
) –
get_client
¶
Returns an S3 client for the datastore.
Creates S3 buckets if they don’t exist.
MatchboxServerSettings
¶
Bases: BaseSettings
Settings for the Matchbox application.
Methods:
-
check_settings
–Check that legal combinations of settings are provided.
Attributes:
-
batch_size
(int
) – -
backend_type
(MatchboxBackends
) – -
datastore
(MatchboxDatastoreSettings
) – -
task_runner
(Literal['api', 'celery']
) – -
redis_uri
(str | None
) – -
uploads_expiry_minutes
(int | None
) – -
authorisation
(bool
) – -
public_key
(SecretStr | None
) – -
log_level
(LogLevelType
) –
BackendManager
¶
Manages the Matchbox backend instance and settings.
Methods:
-
initialise
–Initialise the backend with the given settings.
-
get_backend
–Get the backend instance.
-
get_settings
–Get the backend settings.
initialise
classmethod
¶
initialise(settings: MatchboxServerSettings)
Initialise the backend with the given settings.
Countable
¶
Listable
¶
ListableAndCountable
¶
A protocol for objects that can be counted and listed.
Methods:
MatchboxDBAdapter
¶
Bases: ABC
An abstract base class for Matchbox database adapters.
Methods:
-
query
–Queries the database from an optional point of truth.
-
match
–Matches an ID in a source resolution and returns the keys in the targets.
-
create_collection
–Create a new collection.
-
get_collection
–Get collection metadata.
-
list_collections
–List all collection names.
-
delete_collection
–Delete a collection and all its versions.
-
create_run
–Create a new run.
-
set_run_mutable
–Set the mutability of a run.
-
set_run_default
–Set the default status of a run.
-
get_run
–Get run metadata and resolutions.
-
delete_run
–Delete a run and all its resolutions.
-
create_resolution
–Writes a resolution to Matchbox.
-
get_resolution
–Get a resolution from its path.
-
delete_resolution
–Delete a resolution from the database.
-
insert_source_data
–Inserts hash data for a source resolution.
-
insert_model_data
–Inserts results data for a model resolution.
-
get_model_data
–Get the results for a model resolution.
-
set_model_truth
–Sets the truth threshold for this model, changing the default clusters.
-
get_model_truth
–Gets the current truth threshold for this model.
-
validate_ids
–Validates a list of IDs exist in the database.
-
dump
–Dumps the entire database to a snapshot.
-
drop
–Hard clear the database by dropping all tables and re-creating.
-
clear
–Soft clear the database by deleting all rows but retaining tables.
-
restore
–Restores the database from a snapshot.
-
login
–Receives a user name and returns user ID.
-
insert_judgement
–Adds an evaluation judgement to the database.
-
get_judgements
–Retrieves all evaluation judgements.
-
compare_models
–Compare metrics of models based on evaluation data.
-
sample_for_eval
–Sample a cluster to validate.
Attributes:
-
settings
(MatchboxServerSettings
) – -
sources
(ListableAndCountable
) – -
models
(Countable
) – -
data
(Countable
) – -
clusters
(Countable
) – -
creates
(Countable
) – -
merges
(Countable
) – -
proposes
(Countable
) – -
source_resolutions
(Countable
) –
query
abstractmethod
¶
query(source: SourceResolutionPath, point_of_truth: ResolutionPath | None = None, threshold: int | None = None, return_leaf_id: bool = False, limit: int | None = None) -> Table
Queries the database from an optional point of truth.
Parameters:
-
source
¶SourceResolutionPath
) –the resolution pathidentifying the source to query
-
point_of_truth
¶optional
, default:None
) –the resolution path to use for filtering results If not specified, will use the source resolution for the queried source
-
threshold
¶optional
, default:None
) –the threshold to use for creating clusters If None, uses the models’ default threshold If an integer, uses that threshold for the specified model, and the model’s cached thresholds for its ancestors
-
return_leaf_id
¶optional
, default:False
) –whether to return cluster ID of leaves
-
limit
¶optional
, default:None
) –the number to use in a limit clause. Useful for testing
Returns:
-
Table
–The resulting matchbox IDs in Arrow format
match
abstractmethod
¶
match(key: str, source: SourceResolutionPath, targets: list[SourceResolutionPath], point_of_truth: ResolutionPath, threshold: int | None = None) -> list[Match]
Matches an ID in a source resolution and returns the keys in the targets.
Parameters:
-
key
¶str
) –The key to match from the source.
-
source
¶SourceResolutionPath
) –The path of the source resolution.
-
targets
¶list[SourceResolutionPath]
) –The paths of the target source resolutions.
-
point_of_truth
¶ResolutionPath
) –The path of the resolution to use for matching.
-
threshold
¶optional
, default:None
) –the threshold to use for creating clusters If None, uses the resolutions’ default threshold If an integer, uses that threshold for the specified resolution, and the resolution’s cached thresholds for its ancestors Will use these threshold values instead of the cached thresholds
create_collection
abstractmethod
¶
create_collection(name: CollectionName) -> Collection
Create a new collection.
Parameters:
-
name
¶CollectionName
) –The name of the collection to create.
Returns:
-
Collection
–A Collection object containing its metadata, versions, and resolutions.
get_collection
abstractmethod
¶
get_collection(name: CollectionName) -> Collection
Get collection metadata.
Parameters:
-
name
¶CollectionName
) –The name of the collection to get.
Returns:
-
Collection
–A Collection object containing its metadata, versions, and resolutions.
list_collections
abstractmethod
¶
list_collections() -> list[CollectionName]
delete_collection
abstractmethod
¶
delete_collection(name: CollectionName, certain: bool) -> None
Delete a collection and all its versions.
Parameters:
-
name
¶CollectionName
) –The name of the collection to delete.
-
certain
¶bool
) –Whether to delete the collection without confirmation.
create_run
abstractmethod
¶
create_run(collection: CollectionName) -> Run
Create a new run.
Parameters:
-
collection
¶CollectionName
) –The name of the collection to create the run in.
Returns:
-
Run
–A Run object containing its metadata and resolutions.
set_run_mutable
abstractmethod
¶
set_run_mutable(collection: CollectionName, run_id: RunID, mutable: bool) -> Run
set_run_default
abstractmethod
¶
set_run_default(collection: CollectionName, run_id: RunID, default: bool) -> Run
get_run
abstractmethod
¶
get_run(collection: CollectionName, run_id: RunID) -> Run
Get run metadata and resolutions.
Parameters:
-
collection
¶CollectionName
) –The name of the collection containing the run.
-
run_id
¶RunID
) –The ID of the run to get.
Returns:
-
Run
–A Run object containing its metadata and resolutions.
delete_run
abstractmethod
¶
delete_run(collection: CollectionName, run_id: RunID, certain: bool) -> None
create_resolution
abstractmethod
¶
create_resolution(resolution: Resolution, path: ResolutionPath) -> None
Writes a resolution to Matchbox.
Parameters:
-
resolution
¶Resolution
) –Resolution object with a source or model config
-
path
¶ResolutionPath
) –The resolution path for the source
Raises:
-
MatchboxModelConfigError
–If the configuration is invalid, such as the ModelConfig’s resolutions sharing ancestors
get_resolution
abstractmethod
¶
get_resolution(path: ResolutionPath, validate: ResolutionType | None = None) -> Resolution
Get a resolution from its path.
Parameters:
-
path
¶ResolutionPath
) –The resolution path for the source
-
validate
¶ResolutionType | None
, default:None
) –The expected type of the resolution
Returns:
-
Resolution
–A Resolution object
delete_resolution
abstractmethod
¶
delete_resolution(path: ResolutionPath, certain: bool) -> None
Delete a resolution from the database.
Parameters:
-
path
¶ResolutionPath
) –The name of the resolution to delete.
-
certain
¶bool
) –Whether to delete the model without confirmation.
insert_source_data
abstractmethod
¶
insert_source_data(path: SourceResolutionPath, data_hashes: Table) -> None
Inserts hash data for a source resolution.
Parameters:
-
path
¶SourceResolutionPath
) –The path of the source resolution to index.
-
data_hashes
¶Table
) –The Arrow table with the hash of each data row
insert_model_data
abstractmethod
¶
insert_model_data(path: ModelResolutionPath, results: Table) -> None
Inserts results data for a model resolution.
get_model_data
abstractmethod
¶
get_model_data(path: ModelResolutionPath) -> Table
Get the results for a model resolution.
set_model_truth
abstractmethod
¶
set_model_truth(path: ModelResolutionPath, truth: int) -> None
Sets the truth threshold for this model, changing the default clusters.
get_model_truth
abstractmethod
¶
get_model_truth(path: ModelResolutionPath) -> int
Gets the current truth threshold for this model.
validate_ids
abstractmethod
¶
dump
abstractmethod
¶
dump() -> MatchboxSnapshot
Dumps the entire database to a snapshot.
Returns:
-
MatchboxSnapshot
–A MatchboxSnapshot object of type “postgres” with the database’s current state.
drop
abstractmethod
¶
clear
abstractmethod
¶
restore
abstractmethod
¶
restore(snapshot: MatchboxSnapshot) -> None
Restores the database from a snapshot.
Parameters:
-
snapshot
¶MatchboxSnapshot
) –A MatchboxSnapshot object of type “postgres” with the database’s state
Raises:
-
TypeError
–If the snapshot is not compatible with PostgreSQL
insert_judgement
abstractmethod
¶
get_judgements
abstractmethod
¶
get_judgements() -> tuple[Table, Table]
Retrieves all evaluation judgements.
Returns:
-
Table
–Two PyArrow tables with the judgments and their expansion.
-
Table
–See
matchbox.common.arrow
for information on the schema.
compare_models
abstractmethod
¶
compare_models(paths: list[ModelResolutionPath]) -> ModelComparison
Compare metrics of models based on evaluation data.
Parameters:
-
paths
¶list[ModelResolutionPath]
) –List of paths of model resolutions to be compared.
Returns:
-
ModelComparison
–A model comparison object, listing metrics for each model.
sample_for_eval
abstractmethod
¶
get_backend_settings
¶
get_backend_settings(backend_type: MatchboxBackends) -> type[MatchboxServerSettings]
Get the appropriate settings class based on the backend type.
get_backend_class
¶
get_backend_class(backend_type: MatchboxBackends) -> type[MatchboxDBAdapter]
Get the appropriate backend class based on the backend type.
settings_to_backend
¶
settings_to_backend(settings: MatchboxServerSettings) -> MatchboxDBAdapter
Create backend adapter with injected settings.
initialise_matchbox
¶
Initialise the Matchbox backend based on environment variables.