Overview¶

matchbox.server ¶

Matchbox server.

Includes the API, and database adapters for various backends.

Modules:

api –

Matchbox API.
base –

Base classes and utilities for Matchbox database adapters.
postgresql –

PostgreSQL adapter for Matchbox server.

Classes:

MatchboxDBAdapter –

An abstract base class for Matchbox database adapters.
MatchboxServerSettings –

Settings for the Matchbox application.

MatchboxDBAdapter ¶

Bases: ABC

An abstract base class for Matchbox database adapters.

Methods:

query –

Queries the database from an optional point of truth.
match –

Matches an ID in a source resolution and returns the keys in the targets.
index –

Indexes a source in your warehouse to Matchbox.
get_source_config –

Get a source configuration from its resolution name.
get_resolution_source_configs –

Get a list of source configurations queriable from a resolution.
validate_ids –

Validates a list of IDs exist in the database.
validate_hashes –

Validates a list of hashes exist in the database.
cluster_id_to_hash –

Get a lookup of Cluster hashes from a list of IDs.
get_resolution_graph –

Get the full resolution graph.
dump –

Dumps the entire database to a snapshot.
drop –

Hard clear the database by dropping all tables and re-creating.
clear –

Soft clear the database by deleting all rows but retaining tables.
restore –

Restores the database from a snapshot.
insert_model –

Writes a model to Matchbox.
get_model –

Get a model from the database.
set_model_results –

Set the results for a model.
get_model_results –

Get the results for a model.
set_model_truth –

Sets the truth threshold for this model, changing the default clusters.
get_model_truth –

Gets the current truth threshold for this model.
get_model_ancestors –

Gets the current truth values of all ancestors.
set_model_ancestors_cache –

Updates the cached ancestor thresholds.
get_model_ancestors_cache –

Gets the cached ancestor thresholds.
delete_resolution –

Delete a resolution from the database.
login –

Receives a user name and returns user ID.
insert_judgement –

Adds an evaluation judgement to the database.
get_judgements –

Retrieves all evaluation judgements.
compare_models –

Compare metrics of models based on evaluation data.
sample_for_eval –

Sample a cluster to validate.

Attributes:

settings (MatchboxServerSettings) –
sources (ListableAndCountable) –
models (Countable) –
data (Countable) –
clusters (Countable) –
creates (Countable) –
merges (Countable) –
proposes (Countable) –
source_resolutions (Countable) –

settings `instance-attribute` ¶

settings: MatchboxServerSettings

sources `instance-attribute` ¶

sources: ListableAndCountable

models `instance-attribute` ¶

models: Countable

data `instance-attribute` ¶

data: Countable

clusters `instance-attribute` ¶

clusters: Countable

creates `instance-attribute` ¶

creates: Countable

merges `instance-attribute` ¶

merges: Countable

proposes `instance-attribute` ¶

proposes: Countable

source_resolutions `instance-attribute` ¶

source_resolutions: Countable

query `abstractmethod` ¶

query(
    source: SourceResolutionName,
    resolution: ResolutionName | None = None,
    threshold: int | None = None,
    return_leaf_id: bool = False,
    limit: int = None,
) -> Table

Queries the database from an optional point of truth.

Parameters:

source ¶
(SourceResolutionName) –

the SourceResolutionName string identifying the source to query
resolution ¶
(optional, default: None ) –

the resolution to use for filtering results If not specified, will use the source resolution for the queried source
threshold ¶
(optional, default: None ) –

the threshold to use for creating clusters If None, uses the models’ default threshold If an integer, uses that threshold for the specified model, and the model’s cached thresholds for its ancestors
return_leaf_id ¶
(optional, default: False ) –

whether to return cluster ID of leaves
limit ¶
(optional, default: None ) –

the number to use in a limit clause. Useful for testing

Returns:

Table –

The resulting matchbox IDs in Arrow format

match `abstractmethod` ¶

match(
    key: str,
    source: SourceResolutionName,
    targets: list[SourceResolutionName],
    resolution: ResolutionName,
    threshold: int | None = None,
) -> list[Match]

Matches an ID in a source resolution and returns the keys in the targets.

Parameters:

key ¶
(str) –

The key to match from the source.
source ¶
(SourceResolutionName) –

The name of the source resolution.
targets ¶
(list[SourceResolutionName]) –

The names of the target source resolutions.
resolution ¶
(ResolutionName) –

The name of the resolution to use for matching.
threshold ¶
(optional, default: None ) –

the threshold to use for creating clusters If None, uses the resolutions’ default threshold If an integer, uses that threshold for the specified resolution, and the resolution’s cached thresholds for its ancestors Will use these threshold values instead of the cached thresholds

index `abstractmethod` ¶

index(
    source_config: SourceConfig, data_hashes: Table
) -> None

Indexes a source in your warehouse to Matchbox.

Parameters:

source_config ¶
(SourceConfig) –

The source configuration to index.
data_hashes ¶
(Table) –

The Arrow table with the hash of each data row

get_source_config `abstractmethod` ¶

get_source_config(
    name: SourceResolutionName,
) -> SourceConfig

Get a source configuration from its resolution name.

Parameters:

name ¶
(SourceResolutionName) –

The name resolution name for the source

Returns:

SourceConfig –

A SourceConfig object

get_resolution_source_configs `abstractmethod` ¶

get_resolution_source_configs(
    name: ResolutionName,
) -> list[SourceConfig]

Get a list of source configurations queriable from a resolution.

Parameters:

name ¶
(ResolutionName) –

Name of the resolution to query.

Returns:

list[SourceConfig] –

List of relevant SourceConfig objects.

validate_ids `abstractmethod` ¶

validate_ids(ids: list[int]) -> bool

Validates a list of IDs exist in the database.

Parameters:

ids ¶
(list[int]) –

A list of IDs to validate.

Raises:

MatchboxDataNotFound –

If some items don’t exist in the target table.

validate_hashes `abstractmethod` ¶

validate_hashes(hashes: list[bytes]) -> bool

Validates a list of hashes exist in the database.

Parameters:

hashes ¶
(list[bytes]) –

A list of hashes to validate.

Raises:

MatchboxDataNotFound –

If some items don’t exist in the target table.

cluster_id_to_hash `abstractmethod` ¶

cluster_id_to_hash(
    ids: list[int],
) -> dict[int, bytes | None]

Get a lookup of Cluster hashes from a list of IDs.

Parameters:

ids ¶
(list[int]) –

A list of IDs to get hashes for.

Returns:

dict[int, bytes | None] –

A dictionary mapping IDs to hashes.

get_resolution_graph `abstractmethod` ¶

get_resolution_graph() -> ResolutionGraph

Get the full resolution graph.

dump `abstractmethod` ¶

dump() -> MatchboxSnapshot

Dumps the entire database to a snapshot.

Returns:

MatchboxSnapshot –

A MatchboxSnapshot object of type “postgres” with the database’s current state.

drop `abstractmethod` ¶

drop(certain: bool) -> None

Hard clear the database by dropping all tables and re-creating.

Parameters:

certain ¶
(bool) –

Whether to drop the database without confirmation.

clear `abstractmethod` ¶

clear(certain: bool) -> None

Soft clear the database by deleting all rows but retaining tables.

Parameters:

certain ¶
(bool) –

Whether to delete the database without confirmation.

restore `abstractmethod` ¶

restore(snapshot: MatchboxSnapshot) -> None

Restores the database from a snapshot.

Parameters:

snapshot ¶
(MatchboxSnapshot) –

A MatchboxSnapshot object of type “postgres” with the database’s state

Raises:

TypeError –

If the snapshot is not compatible with PostgreSQL

insert_model `abstractmethod` ¶

insert_model(model_config: ModelConfig) -> None

Writes a model to Matchbox.

Parameters:

model_config ¶
(ModelConfig) –

ModelConfig object with the model’s metadata

Raises:

MatchboxDataNotFound –

If, for a linker, the source models weren’t found in the database
MatchboxModelConfigError –

If the model configuration is invalid, such as the resolutions sharing ancestors

get_model `abstractmethod` ¶

get_model(name: ModelResolutionName) -> ModelConfig

Get a model from the database.

set_model_results `abstractmethod` ¶

set_model_results(
    name: ModelResolutionName, results: Table
) -> None

Set the results for a model.

get_model_results `abstractmethod` ¶

get_model_results(name: ModelResolutionName) -> Table

Get the results for a model.

set_model_truth `abstractmethod` ¶

set_model_truth(
    name: ModelResolutionName, truth: float
) -> None

Sets the truth threshold for this model, changing the default clusters.

get_model_truth `abstractmethod` ¶

get_model_truth(name: ModelResolutionName) -> float

Gets the current truth threshold for this model.

get_model_ancestors `abstractmethod` ¶

get_model_ancestors(
    name: ModelResolutionName,
) -> list[ModelAncestor]

Gets the current truth values of all ancestors.

Returns a list of ModelAncestor objects mapping model resolution names to their current truth thresholds.

Unlike ancestors_cache which returns cached values, this property returns the current truth values of all ancestor models.

set_model_ancestors_cache `abstractmethod` ¶

set_model_ancestors_cache(
    name: ModelResolutionName,
    ancestors_cache: list[ModelAncestor],
) -> None

Updates the cached ancestor thresholds.

Parameters:

name ¶
(ModelResolutionName) –

The name of the model to update
ancestors_cache ¶
(list[ModelAncestor]) –

List of ModelAncestor objects mapping model resolution names to their truth thresholds

get_model_ancestors_cache `abstractmethod` ¶

get_model_ancestors_cache(
    name: ModelResolutionName,
) -> list[ModelAncestor]

Gets the cached ancestor thresholds.

Returns a list of ModelAncestor objects mapping model resolution names to their cached truth thresholds.

This is required because each point of truth needs to be stable, so we choose when to update it, caching the ancestor’s values in the model itself.

delete_resolution `abstractmethod` ¶

delete_resolution(
    name: ResolutionName, certain: bool
) -> None

Delete a resolution from the database.

Parameters:

name ¶
(ResolutionName) –

The name of the resolution to delete.
certain ¶
(bool) –

Whether to delete the model without confirmation.

login `abstractmethod` ¶

login(user_name: str) -> int

Receives a user name and returns user ID.

insert_judgement `abstractmethod` ¶

insert_judgement(judgement: Judgement) -> None

Adds an evaluation judgement to the database.

Parameters:

judgement ¶
(Judgement) –

representation of the proposed clusters.

get_judgements `abstractmethod` ¶

get_judgements() -> tuple[Table, Table]

Retrieves all evaluation judgements.

Returns:

Table –

Two PyArrow tables with the judgments and their expansion.
Table –

See matchbox.common.arrow for information on the schema.

compare_models `abstractmethod` ¶

compare_models(
    resolutions: list[ModelResolutionName],
) -> ModelComparison

Compare metrics of models based on evaluation data.

Parameters:

resolutions ¶
(list[ModelResolutionName]) –

List of names of model resolutions to be compared.

Returns:

ModelComparison –

A model comparison object, listing metrics for each model.

sample_for_eval `abstractmethod` ¶

sample_for_eval(
    n: int, resolution: ModelResolutionName, user_id: int
) -> Table

Sample a cluster to validate.

Parameters:

n ¶
(int) –

Number of clusters to sample
resolution ¶
(ModelResolutionName) –

Name of resolution from which to sample
user_id ¶
(int) –

ID of user requesting the sample

Returns:

Table –

An Arrow table with the same schema as returned by query()

MatchboxServerSettings ¶

Bases: BaseSettings

Settings for the Matchbox application.

Attributes:

batch_size (int) –
backend_type (MatchboxBackends) –
datastore (MatchboxDatastoreSettings) –
api_key (SecretStr | None) –
log_level (LogLevelType) –

batch_size `class-attribute` `instance-attribute` ¶

batch_size: int = Field(default=250000)

backend_type `instance-attribute` ¶

backend_type: MatchboxBackends

datastore `instance-attribute` ¶

datastore: MatchboxDatastoreSettings

api_key `class-attribute` `instance-attribute` ¶

api_key: SecretStr | None = Field(default=None)

log_level `class-attribute` `instance-attribute` ¶

log_level: LogLevelType = 'INFO'

matchbox.server.base ¶

Base classes and utilities for Matchbox database adapters.

Classes:

MatchboxBackends –

The available backends for Matchbox.
MatchboxSnapshot –

A snapshot of the Matchbox database.
MatchboxDatastoreSettings –

Settings specific to the datastore configuration.
MatchboxServerSettings –

Settings for the Matchbox application.
BackendManager –

Manages the Matchbox backend instance and settings.
Countable –

A protocol for objects that can be counted.
Listable –

A protocol for objects that can be listed.
ListableAndCountable –

A protocol for objects that can be counted and listed.
MatchboxDBAdapter –

An abstract base class for Matchbox database adapters.

Functions:

get_backend_settings –

Get the appropriate settings class based on the backend type.
get_backend_class –

Get the appropriate backend class based on the backend type.
settings_to_backend –

Create backend adapter with injected settings.
initialise_matchbox –

Initialise the Matchbox backend based on environment variables.

MatchboxBackends ¶

Bases: StrEnum

The available backends for Matchbox.

Attributes:

POSTGRES –

POSTGRES `class-attribute` `instance-attribute` ¶

POSTGRES = 'postgres'

MatchboxSnapshot ¶

Bases: BaseModel

A snapshot of the Matchbox database.

Methods:

check_serialisable –

Validate that the value can be serialised to JSON.

Attributes:

backend_type (MatchboxBackends) –
data (Any) –

backend_type `instance-attribute` ¶

backend_type: MatchboxBackends

data `instance-attribute` ¶

data: Any

check_serialisable `classmethod` ¶

check_serialisable(value: Any) -> Any

Validate that the value can be serialised to JSON.

MatchboxDatastoreSettings ¶

Bases: BaseSettings

Settings specific to the datastore configuration.

Methods:

get_client –

Returns an S3 client for the datastore.

Attributes:

host (str | None) –
port (int | None) –
access_key_id (SecretStr | None) –
secret_access_key (SecretStr | None) –
default_region (str | None) –
cache_bucket_name (str) –

host `class-attribute` `instance-attribute` ¶

host: str | None = None

port `class-attribute` `instance-attribute` ¶

port: int | None = None

access_key_id `class-attribute` `instance-attribute` ¶

access_key_id: SecretStr | None = None

secret_access_key `class-attribute` `instance-attribute` ¶

secret_access_key: SecretStr | None = None

default_region `class-attribute` `instance-attribute` ¶

default_region: str | None = None

cache_bucket_name `instance-attribute` ¶

cache_bucket_name: str

get_client ¶

get_client() -> S3Client

Returns an S3 client for the datastore.

Creates S3 buckets if they don’t exist.

MatchboxServerSettings ¶

Bases: BaseSettings

Settings for the Matchbox application.

Attributes:

batch_size (int) –
backend_type (MatchboxBackends) –
datastore (MatchboxDatastoreSettings) –
api_key (SecretStr | None) –
log_level (LogLevelType) –

batch_size `class-attribute` `instance-attribute` ¶

batch_size: int = Field(default=250000)

backend_type `instance-attribute` ¶

backend_type: MatchboxBackends

datastore `instance-attribute` ¶

datastore: MatchboxDatastoreSettings

api_key `class-attribute` `instance-attribute` ¶

api_key: SecretStr | None = Field(default=None)

log_level `class-attribute` `instance-attribute` ¶

log_level: LogLevelType = 'INFO'

BackendManager ¶

Manages the Matchbox backend instance and settings.

Methods:

initialise –

Initialise the backend with the given settings.
get_backend –

Get the backend instance.
get_settings –

Get the backend settings.

initialise `classmethod` ¶

initialise(settings: MatchboxServerSettings)

Initialise the backend with the given settings.

get_backend `classmethod` ¶

get_backend() -> MatchboxDBAdapter

Get the backend instance.

get_settings `classmethod` ¶

get_settings() -> MatchboxServerSettings

Get the backend settings.

Countable ¶

Bases: Protocol

A protocol for objects that can be counted.

Methods:

count –

Counts the number of items in the object.

count ¶

count() -> int

Counts the number of items in the object.

Listable ¶

Bases: Protocol

A protocol for objects that can be listed.

Methods:

list_all –

Lists the items in the object.

list_all ¶

list_all() -> list[str]

Lists the items in the object.

ListableAndCountable ¶

Bases: Countable, Listable

A protocol for objects that can be counted and listed.

Methods:

list_all –

Lists the items in the object.
count –

Counts the number of items in the object.

list_all ¶

list_all() -> list[str]

Lists the items in the object.

count ¶

count() -> int

Counts the number of items in the object.

MatchboxDBAdapter ¶

Bases: ABC

An abstract base class for Matchbox database adapters.

Methods:

query –

Queries the database from an optional point of truth.
match –

Matches an ID in a source resolution and returns the keys in the targets.
index –

Indexes a source in your warehouse to Matchbox.
get_source_config –

Get a source configuration from its resolution name.
get_resolution_source_configs –

Get a list of source configurations queriable from a resolution.
validate_ids –

Validates a list of IDs exist in the database.
validate_hashes –

Validates a list of hashes exist in the database.
cluster_id_to_hash –

Get a lookup of Cluster hashes from a list of IDs.
get_resolution_graph –

Get the full resolution graph.
dump –

Dumps the entire database to a snapshot.
drop –

Hard clear the database by dropping all tables and re-creating.
clear –

Soft clear the database by deleting all rows but retaining tables.
restore –

Restores the database from a snapshot.
insert_model –

Writes a model to Matchbox.
get_model –

Get a model from the database.
set_model_results –

Set the results for a model.
get_model_results –

Get the results for a model.
set_model_truth –

Sets the truth threshold for this model, changing the default clusters.
get_model_truth –

Gets the current truth threshold for this model.
get_model_ancestors –

Gets the current truth values of all ancestors.
set_model_ancestors_cache –

Updates the cached ancestor thresholds.
get_model_ancestors_cache –

Gets the cached ancestor thresholds.
delete_resolution –

Delete a resolution from the database.
login –

Receives a user name and returns user ID.
insert_judgement –

Adds an evaluation judgement to the database.
get_judgements –

Retrieves all evaluation judgements.
compare_models –

Compare metrics of models based on evaluation data.
sample_for_eval –

Sample a cluster to validate.

Attributes:

settings (MatchboxServerSettings) –
sources (ListableAndCountable) –
models (Countable) –
data (Countable) –
clusters (Countable) –
creates (Countable) –
merges (Countable) –
proposes (Countable) –
source_resolutions (Countable) –

settings `instance-attribute` ¶

settings: MatchboxServerSettings

sources `instance-attribute` ¶

sources: ListableAndCountable

models `instance-attribute` ¶

models: Countable

data `instance-attribute` ¶

data: Countable

clusters `instance-attribute` ¶

clusters: Countable

creates `instance-attribute` ¶

creates: Countable

merges `instance-attribute` ¶

merges: Countable

proposes `instance-attribute` ¶

proposes: Countable

source_resolutions `instance-attribute` ¶

source_resolutions: Countable

query `abstractmethod` ¶

query(
    source: SourceResolutionName,
    resolution: ResolutionName | None = None,
    threshold: int | None = None,
    return_leaf_id: bool = False,
    limit: int = None,
) -> Table

Queries the database from an optional point of truth.

Parameters:

source ¶
(SourceResolutionName) –

the SourceResolutionName string identifying the source to query
resolution ¶
(optional, default: None ) –

the resolution to use for filtering results If not specified, will use the source resolution for the queried source
threshold ¶
(optional, default: None ) –

the threshold to use for creating clusters If None, uses the models’ default threshold If an integer, uses that threshold for the specified model, and the model’s cached thresholds for its ancestors
return_leaf_id ¶
(optional, default: False ) –

whether to return cluster ID of leaves
limit ¶
(optional, default: None ) –

the number to use in a limit clause. Useful for testing

Returns:

Table –

The resulting matchbox IDs in Arrow format

match `abstractmethod` ¶

match(
    key: str,
    source: SourceResolutionName,
    targets: list[SourceResolutionName],
    resolution: ResolutionName,
    threshold: int | None = None,
) -> list[Match]

Matches an ID in a source resolution and returns the keys in the targets.

Parameters:

key ¶
(str) –

The key to match from the source.
source ¶
(SourceResolutionName) –

The name of the source resolution.
targets ¶
(list[SourceResolutionName]) –

The names of the target source resolutions.
resolution ¶
(ResolutionName) –

The name of the resolution to use for matching.
threshold ¶
(optional, default: None ) –

the threshold to use for creating clusters If None, uses the resolutions’ default threshold If an integer, uses that threshold for the specified resolution, and the resolution’s cached thresholds for its ancestors Will use these threshold values instead of the cached thresholds

index `abstractmethod` ¶

index(
    source_config: SourceConfig, data_hashes: Table
) -> None

Indexes a source in your warehouse to Matchbox.

Parameters:

source_config ¶
(SourceConfig) –

The source configuration to index.
data_hashes ¶
(Table) –

The Arrow table with the hash of each data row

get_source_config `abstractmethod` ¶

get_source_config(
    name: SourceResolutionName,
) -> SourceConfig

Get a source configuration from its resolution name.

Parameters:

name ¶
(SourceResolutionName) –

The name resolution name for the source

Returns:

SourceConfig –

A SourceConfig object

get_resolution_source_configs `abstractmethod` ¶

get_resolution_source_configs(
    name: ResolutionName,
) -> list[SourceConfig]

Get a list of source configurations queriable from a resolution.

Parameters:

name ¶
(ResolutionName) –

Name of the resolution to query.

Returns:

list[SourceConfig] –

List of relevant SourceConfig objects.

validate_ids `abstractmethod` ¶

validate_ids(ids: list[int]) -> bool

Validates a list of IDs exist in the database.

Parameters:

ids ¶
(list[int]) –

A list of IDs to validate.

Raises:

MatchboxDataNotFound –

If some items don’t exist in the target table.

validate_hashes `abstractmethod` ¶

validate_hashes(hashes: list[bytes]) -> bool

Validates a list of hashes exist in the database.

Parameters:

hashes ¶
(list[bytes]) –

A list of hashes to validate.

Raises:

MatchboxDataNotFound –

If some items don’t exist in the target table.

cluster_id_to_hash `abstractmethod` ¶

cluster_id_to_hash(
    ids: list[int],
) -> dict[int, bytes | None]

Get a lookup of Cluster hashes from a list of IDs.

Parameters:

ids ¶
(list[int]) –

A list of IDs to get hashes for.

Returns:

dict[int, bytes | None] –

A dictionary mapping IDs to hashes.

get_resolution_graph `abstractmethod` ¶

get_resolution_graph() -> ResolutionGraph

Get the full resolution graph.

dump `abstractmethod` ¶

dump() -> MatchboxSnapshot

Dumps the entire database to a snapshot.

Returns:

MatchboxSnapshot –

A MatchboxSnapshot object of type “postgres” with the database’s current state.

drop `abstractmethod` ¶

drop(certain: bool) -> None

Hard clear the database by dropping all tables and re-creating.

Parameters:

certain ¶
(bool) –

Whether to drop the database without confirmation.

clear `abstractmethod` ¶

clear(certain: bool) -> None

Soft clear the database by deleting all rows but retaining tables.

Parameters:

certain ¶
(bool) –

Whether to delete the database without confirmation.

restore `abstractmethod` ¶

restore(snapshot: MatchboxSnapshot) -> None

Restores the database from a snapshot.

Parameters:

snapshot ¶
(MatchboxSnapshot) –

A MatchboxSnapshot object of type “postgres” with the database’s state

Raises:

TypeError –

If the snapshot is not compatible with PostgreSQL

insert_model `abstractmethod` ¶

insert_model(model_config: ModelConfig) -> None

Writes a model to Matchbox.

Parameters:

model_config ¶
(ModelConfig) –

ModelConfig object with the model’s metadata

Raises:

MatchboxDataNotFound –

If, for a linker, the source models weren’t found in the database
MatchboxModelConfigError –

If the model configuration is invalid, such as the resolutions sharing ancestors

get_model `abstractmethod` ¶

get_model(name: ModelResolutionName) -> ModelConfig

Get a model from the database.

set_model_results `abstractmethod` ¶

set_model_results(
    name: ModelResolutionName, results: Table
) -> None

Set the results for a model.

get_model_results `abstractmethod` ¶

get_model_results(name: ModelResolutionName) -> Table

Get the results for a model.

set_model_truth `abstractmethod` ¶

set_model_truth(
    name: ModelResolutionName, truth: float
) -> None

Sets the truth threshold for this model, changing the default clusters.

get_model_truth `abstractmethod` ¶

get_model_truth(name: ModelResolutionName) -> float

Gets the current truth threshold for this model.

get_model_ancestors `abstractmethod` ¶

get_model_ancestors(
    name: ModelResolutionName,
) -> list[ModelAncestor]

Gets the current truth values of all ancestors.

Returns a list of ModelAncestor objects mapping model resolution names to their current truth thresholds.

Unlike ancestors_cache which returns cached values, this property returns the current truth values of all ancestor models.

set_model_ancestors_cache `abstractmethod` ¶

set_model_ancestors_cache(
    name: ModelResolutionName,
    ancestors_cache: list[ModelAncestor],
) -> None

Updates the cached ancestor thresholds.

Parameters:

name ¶
(ModelResolutionName) –

The name of the model to update
ancestors_cache ¶
(list[ModelAncestor]) –

List of ModelAncestor objects mapping model resolution names to their truth thresholds

get_model_ancestors_cache `abstractmethod` ¶

get_model_ancestors_cache(
    name: ModelResolutionName,
) -> list[ModelAncestor]

Gets the cached ancestor thresholds.

Returns a list of ModelAncestor objects mapping model resolution names to their cached truth thresholds.

This is required because each point of truth needs to be stable, so we choose when to update it, caching the ancestor’s values in the model itself.

delete_resolution `abstractmethod` ¶

delete_resolution(
    name: ResolutionName, certain: bool
) -> None

Delete a resolution from the database.

Parameters:

name ¶
(ResolutionName) –

The name of the resolution to delete.
certain ¶
(bool) –

Whether to delete the model without confirmation.

login `abstractmethod` ¶

login(user_name: str) -> int

Receives a user name and returns user ID.

insert_judgement `abstractmethod` ¶

insert_judgement(judgement: Judgement) -> None

Adds an evaluation judgement to the database.

Parameters:

judgement ¶
(Judgement) –

representation of the proposed clusters.

get_judgements `abstractmethod` ¶

get_judgements() -> tuple[Table, Table]

Retrieves all evaluation judgements.

Returns:

Table –

Two PyArrow tables with the judgments and their expansion.
Table –

See matchbox.common.arrow for information on the schema.

compare_models `abstractmethod` ¶

compare_models(
    resolutions: list[ModelResolutionName],
) -> ModelComparison

Compare metrics of models based on evaluation data.

Parameters:

resolutions ¶
(list[ModelResolutionName]) –

List of names of model resolutions to be compared.

Returns:

ModelComparison –

A model comparison object, listing metrics for each model.

sample_for_eval `abstractmethod` ¶

sample_for_eval(
    n: int, resolution: ModelResolutionName, user_id: int
) -> Table

Sample a cluster to validate.

Parameters:

n ¶
(int) –

Number of clusters to sample
resolution ¶
(ModelResolutionName) –

Name of resolution from which to sample
user_id ¶
(int) –

ID of user requesting the sample

Returns:

Table –

An Arrow table with the same schema as returned by query()

get_backend_settings ¶

get_backend_settings(
    backend_type: MatchboxBackends,
) -> type[MatchboxServerSettings]

Get the appropriate settings class based on the backend type.

get_backend_class ¶

get_backend_class(
    backend_type: MatchboxBackends,
) -> type[MatchboxDBAdapter]

Get the appropriate backend class based on the backend type.

settings_to_backend ¶

settings_to_backend(
    settings: MatchboxServerSettings,
) -> MatchboxDBAdapter

Create backend adapter with injected settings.

initialise_matchbox ¶

initialise_matchbox() -> None

Initialise the Matchbox backend based on environment variables.

Overview¶

matchbox.server ¶

MatchboxDBAdapter ¶

settings instance-attribute ¶

sources instance-attribute ¶

models instance-attribute ¶

data instance-attribute ¶

clusters instance-attribute ¶

creates instance-attribute ¶

merges instance-attribute ¶

proposes instance-attribute ¶

source_resolutions instance-attribute ¶

query abstractmethod ¶

source ¶

resolution ¶

threshold ¶

return_leaf_id ¶

limit ¶

match abstractmethod ¶

key ¶

source ¶

targets ¶

resolution ¶

threshold ¶

index abstractmethod ¶

source_config ¶

data_hashes ¶

get_source_config abstractmethod ¶

name ¶

get_resolution_source_configs abstractmethod ¶

name ¶

validate_ids abstractmethod ¶

ids ¶

validate_hashes abstractmethod ¶

hashes ¶

cluster_id_to_hash abstractmethod ¶

ids ¶

get_resolution_graph abstractmethod ¶

dump abstractmethod ¶

drop abstractmethod ¶

certain ¶

clear abstractmethod ¶

certain ¶

restore abstractmethod ¶

snapshot ¶

insert_model abstractmethod ¶

model_config ¶

get_model abstractmethod ¶

set_model_results abstractmethod ¶

get_model_results abstractmethod ¶

set_model_truth abstractmethod ¶

get_model_truth abstractmethod ¶

get_model_ancestors abstractmethod ¶

set_model_ancestors_cache abstractmethod ¶

name ¶

ancestors_cache ¶

get_model_ancestors_cache abstractmethod ¶

delete_resolution abstractmethod ¶

name ¶

certain ¶

login abstractmethod ¶

insert_judgement abstractmethod ¶

judgement ¶

get_judgements abstractmethod ¶

compare_models abstractmethod ¶

resolutions ¶

sample_for_eval abstractmethod ¶

n ¶

resolution ¶

user_id ¶

MatchboxServerSettings ¶

batch_size class-attribute instance-attribute ¶

backend_type instance-attribute ¶

datastore instance-attribute ¶

api_key class-attribute instance-attribute ¶

log_level class-attribute instance-attribute ¶

matchbox.server.base ¶

MatchboxBackends ¶

POSTGRES class-attribute instance-attribute ¶

MatchboxSnapshot ¶

settings `instance-attribute` ¶

sources `instance-attribute` ¶

models `instance-attribute` ¶

data `instance-attribute` ¶

clusters `instance-attribute` ¶

creates `instance-attribute` ¶

merges `instance-attribute` ¶

proposes `instance-attribute` ¶

source_resolutions `instance-attribute` ¶

query `abstractmethod` ¶

`source` ¶

`resolution` ¶

`threshold` ¶

`return_leaf_id` ¶

`limit` ¶

match `abstractmethod` ¶

`key` ¶

`source` ¶

`targets` ¶

`resolution` ¶

`threshold` ¶

index `abstractmethod` ¶

`source_config` ¶

`data_hashes` ¶

get_source_config `abstractmethod` ¶

`name` ¶

get_resolution_source_configs `abstractmethod` ¶

`name` ¶

validate_ids `abstractmethod` ¶

`ids` ¶

validate_hashes `abstractmethod` ¶

`hashes` ¶

cluster_id_to_hash `abstractmethod` ¶

`ids` ¶

get_resolution_graph `abstractmethod` ¶

dump `abstractmethod` ¶

drop `abstractmethod` ¶

`certain` ¶

clear `abstractmethod` ¶

`certain` ¶

restore `abstractmethod` ¶

`snapshot` ¶

insert_model `abstractmethod` ¶

`model_config` ¶

get_model `abstractmethod` ¶

set_model_results `abstractmethod` ¶

get_model_results `abstractmethod` ¶

set_model_truth `abstractmethod` ¶

get_model_truth `abstractmethod` ¶

get_model_ancestors `abstractmethod` ¶

set_model_ancestors_cache `abstractmethod` ¶

`name` ¶

`ancestors_cache` ¶

get_model_ancestors_cache `abstractmethod` ¶

delete_resolution `abstractmethod` ¶

`name` ¶

`certain` ¶

login `abstractmethod` ¶

insert_judgement `abstractmethod` ¶

`judgement` ¶

get_judgements `abstractmethod` ¶

compare_models `abstractmethod` ¶

`resolutions` ¶

sample_for_eval `abstractmethod` ¶

`n` ¶

`resolution` ¶

`user_id` ¶

batch_size `class-attribute` `instance-attribute` ¶

backend_type `instance-attribute` ¶

datastore `instance-attribute` ¶

api_key `class-attribute` `instance-attribute` ¶

log_level `class-attribute` `instance-attribute` ¶

POSTGRES `class-attribute` `instance-attribute` ¶

backend_type `instance-attribute` ¶

data `instance-attribute` ¶

check_serialisable `classmethod` ¶

host `class-attribute` `instance-attribute` ¶

port `class-attribute` `instance-attribute` ¶

access_key_id `class-attribute` `instance-attribute` ¶

secret_access_key `class-attribute` `instance-attribute` ¶