PostgreSQL¶
A backend adapter for deploying Matchbox using PostgreSQL.
There are two graph-like trees in place here.
- In the resolution subgraph the tree is implemented as closure table, enabling quick querying of root to leaf paths at the cost of redundancy
- In the data subgraph the tree is implemented as a modified closure table which only stores the “root” and “leaf” relationships for each model
- The leaf IDs
- The model’s proposed cluster IDs at that threshold – the roots
erDiagram
Collections {
bigint collection_id PK
text name
}
Runs {
bigint run_id PK
bigint collection_id FK
boolean is_mutable
boolean is_default
}
Resolutions {
bigint resolution_id PK
bigint run_id FK
text name
text description
text type
bytea fingerprint
smallint truth
enum upload_stage
}
ResolutionFrom {
bigint parent PK,FK
bigint child PK,FK
integer level
smallint truth_cache
}
SourceConfigs {
bigint source_config_id PK
bigint resolution_id FK
text location_type
text location_name
text extract_transform
}
SourceFields {
bigint field_id PK
bigint source_config_id FK
integer index
text name
text type
boolean is_key
}
ModelConfigs {
bigint model_config_id PK
bigint resolution_id FK
text model_class
jsonb model_settings
jsonb left_query
jsonb right_query
}
Clusters {
bigint cluster_id PK
bytea cluster_hash
}
ClusterSourceKey {
bigint key_id PK
bigint cluster_id FK
bigint source_config_id FK
text key
}
Contains {
bigint root PK,FK
bigint leaf PK,FK
}
PKSpace {
bigint id PK
bigint next_cluster_id
bigint next_cluster_keys_id
}
Probabilities {
bigint resolution_id PK,FK
bigint cluster_id PK,FK
smallint probability
}
Results {
bigint result_id PK
bigint resolution_id FK
bigint left_id FK
bigint right_id FK
smallint probability
}
Users {
bigint user_id PK
text name
text email
}
Groups {
bigint group_id PK
text name
text description
boolean is_system
}
UserGroups {
bigint user_id PK,FK
bigint group_id PK,FK
}
Permissions {
bigint permission_id PK
text permission
bigint group_id FK
bigint collection_id FK
boolean is_system
}
EvalJudgements {
bigint judgement_id PK
bigint user_id FK
bigint endorsed_cluster_id FK
bigint shown_cluster_id FK
datetime timestamp
}
Collections ||--o{ Runs : ""
Collections ||--o{ Permissions : ""
Runs ||--o{ Resolutions : ""
Resolutions ||--o{ ResolutionFrom : "parent"
ResolutionFrom }o--|| Resolutions : "child"
Resolutions |o--|| SourceConfigs : ""
Resolutions |o--|| ModelConfigs : ""
Resolutions ||--o{ Probabilities : ""
Resolutions ||--o{ Results : ""
SourceConfigs ||--o{ SourceFields : ""
SourceConfigs ||--o{ ClusterSourceKey : ""
Clusters ||--o{ ClusterSourceKey : ""
Clusters ||--o{ Contains : "root"
Contains }o--|| Clusters : "leaf"
Clusters ||--o{ Probabilities : ""
Clusters ||--o{ Results : "left_id"
Clusters ||--o{ Results : "right_id"
Clusters ||--o{ EvalJudgements : "endorsed_cluster_id"
Clusters ||--o{ EvalJudgements : "shown_cluster_id"
Users ||--o{ UserGroups : ""
Users ||--o{ EvalJudgements : ""
Groups ||--o{ UserGroups : ""
Groups ||--o{ Permissions : ""
matchbox.server.postgresql
¶
PostgreSQL adapter for Matchbox server.
Modules:
-
adapter–Composed PostgreSQL adapter for Matchbox server.
-
db–Matchbox PostgreSQL database connection.
-
mixin–A module for defining mixins for the PostgreSQL backend ORM.
-
orm–ORM classes for the Matchbox PostgreSQL database.
-
utils–Utilities for using the PostgreSQL backend.
Classes:
-
MatchboxPostgres–A PostgreSQL adapter for Matchbox.
-
MatchboxPostgresSettings–Settings for the Matchbox PostgreSQL backend.
MatchboxPostgres
¶
MatchboxPostgres(settings: MatchboxPostgresSettings)
Bases: MatchboxPostgresQueryMixin, MatchboxPostgresEvaluationMixin, MatchboxPostgresCollectionsMixin, MatchboxPostgresAdminMixin, MatchboxDBAdapter
A PostgreSQL adapter for Matchbox.
Methods:
-
query– -
match– -
create_collection– -
get_collection– -
list_collections– -
delete_collection– -
create_run– -
set_run_mutable– -
set_run_default– -
get_run– -
delete_run– -
create_resolution– -
get_resolution– -
update_resolution– -
delete_resolution– -
lock_resolution_data– -
unlock_resolution_data– -
get_resolution_stage– -
insert_source_data– -
insert_model_data– -
get_model_data– -
validate_ids– -
dump– -
drop– -
clear– -
restore– -
delete_orphans– -
login– -
get_user_groups– -
list_groups– -
get_group– -
create_group– -
delete_group– -
add_user_to_group– -
remove_user_from_group– -
check_permission– -
get_permissions– -
grant_permission– -
revoke_permission– -
insert_judgement– -
get_judgements– -
sample_for_eval–Sample some clusters from a resolution.
Attributes:
-
settings– -
sources– -
models– -
source_clusters– -
model_clusters– -
all_clusters– -
creates– -
merges– -
proposes– -
source_resolutions–
source_resolutions
instance-attribute
¶
source_resolutions = FilteredResolutions(sources=True, models=False)
query
¶
query(source: SourceResolutionPath, point_of_truth: ResolutionPath | None = None, threshold: int | None = None, return_leaf_id: bool = False, limit: int | None = None) -> Table
match
¶
match(key: str, source: SourceResolutionPath, targets: list[SourceResolutionPath], point_of_truth: ResolutionPath, threshold: int | None = None) -> list[Match]
unlock_resolution_data
¶
unlock_resolution_data(path: ResolutionPath, complete: bool = False) -> None
check_permission
¶
check_permission(user_name: str, permission: PermissionType, resource: Literal[SYSTEM] | CollectionName) -> bool
get_permissions
¶
get_permissions(resource: Literal[SYSTEM] | CollectionName) -> list[PermissionGrant]
grant_permission
¶
grant_permission(group_name: GroupName, permission: PermissionType, resource: Literal[SYSTEM] | CollectionName) -> None
revoke_permission
¶
revoke_permission(group_name: GroupName, permission: PermissionType, resource: Literal[SYSTEM] | CollectionName) -> None
sample_for_eval
¶
sample_for_eval(n: int, path: ModelResolutionPath, user_name: str) -> Table
Sample some clusters from a resolution.
MatchboxPostgresSettings
¶
Bases: MatchboxServerSettings
Settings for the Matchbox PostgreSQL backend.
Inherits the core settings and adds the PostgreSQL-specific settings.
Methods:
-
check_settings–Check that legal combinations of settings are provided.
Attributes:
-
model_config– -
batch_size(int) – -
datastore(MatchboxDatastoreSettings) – -
task_runner(Literal['api', 'celery']) – -
redis_uri(str | None) – -
uploads_expiry_minutes(int | None) – -
authorisation(bool) – -
public_key(SecretStr | None) – -
log_level(LogLevelType) – -
backend_type(MatchboxBackends) – -
postgres(MatchboxPostgresCoreSettings) –
model_config
class-attribute
instance-attribute
¶
model_config = SettingsConfigDict(env_prefix='MB__SERVER__', env_nested_delimiter='__', use_enum_values=True, env_file='.env', env_file_encoding='utf-8', extra='ignore')
postgres
class-attribute
instance-attribute
¶
postgres: MatchboxPostgresCoreSettings = Field(default_factory=MatchboxPostgresCoreSettings)
adapter
¶
Composed PostgreSQL adapter for Matchbox server.
Modules:
-
admin–Admin PostgreSQL mixin for Matchbox server.
-
collections–Collections PostgreSQL mixin for Matchbox server.
-
eval–Evaluation PostgreSQL mixin for Matchbox server.
-
main–Composed PostgreSQL adapter for Matchbox server.
-
query–Query PostgreSQL mixin for Matchbox server.
Classes:
-
MatchboxPostgres–A PostgreSQL adapter for Matchbox.
-
MatchboxPostgresSettings–Settings for the Matchbox PostgreSQL backend.
MatchboxPostgres
¶
MatchboxPostgres(settings: MatchboxPostgresSettings)
Bases: MatchboxPostgresQueryMixin, MatchboxPostgresEvaluationMixin, MatchboxPostgresCollectionsMixin, MatchboxPostgresAdminMixin, MatchboxDBAdapter
A PostgreSQL adapter for Matchbox.
Methods:
-
query– -
match– -
create_collection– -
get_collection– -
list_collections– -
delete_collection– -
create_run– -
set_run_mutable– -
set_run_default– -
get_run– -
delete_run– -
create_resolution– -
get_resolution– -
update_resolution– -
delete_resolution– -
lock_resolution_data– -
unlock_resolution_data– -
get_resolution_stage– -
insert_source_data– -
insert_model_data– -
get_model_data– -
validate_ids– -
dump– -
drop– -
clear– -
restore– -
delete_orphans– -
login– -
get_user_groups– -
list_groups– -
get_group– -
create_group– -
delete_group– -
add_user_to_group– -
remove_user_from_group– -
check_permission– -
get_permissions– -
grant_permission– -
revoke_permission– -
insert_judgement– -
get_judgements– -
sample_for_eval–Sample some clusters from a resolution.
Attributes:
-
settings– -
sources– -
models– -
source_clusters– -
model_clusters– -
all_clusters– -
creates– -
merges– -
proposes– -
source_resolutions–
source_resolutions
instance-attribute
¶
source_resolutions = FilteredResolutions(sources=True, models=False)
query
¶
query(source: SourceResolutionPath, point_of_truth: ResolutionPath | None = None, threshold: int | None = None, return_leaf_id: bool = False, limit: int | None = None) -> Table
match
¶
match(key: str, source: SourceResolutionPath, targets: list[SourceResolutionPath], point_of_truth: ResolutionPath, threshold: int | None = None) -> list[Match]
unlock_resolution_data
¶
unlock_resolution_data(path: ResolutionPath, complete: bool = False) -> None
check_permission
¶
check_permission(user_name: str, permission: PermissionType, resource: Literal[SYSTEM] | CollectionName) -> bool
get_permissions
¶
get_permissions(resource: Literal[SYSTEM] | CollectionName) -> list[PermissionGrant]
grant_permission
¶
grant_permission(group_name: GroupName, permission: PermissionType, resource: Literal[SYSTEM] | CollectionName) -> None
revoke_permission
¶
revoke_permission(group_name: GroupName, permission: PermissionType, resource: Literal[SYSTEM] | CollectionName) -> None
sample_for_eval
¶
sample_for_eval(n: int, path: ModelResolutionPath, user_name: str) -> Table
Sample some clusters from a resolution.
MatchboxPostgresSettings
¶
Bases: MatchboxServerSettings
Settings for the Matchbox PostgreSQL backend.
Inherits the core settings and adds the PostgreSQL-specific settings.
Methods:
-
check_settings–Check that legal combinations of settings are provided.
Attributes:
-
model_config– -
batch_size(int) – -
datastore(MatchboxDatastoreSettings) – -
task_runner(Literal['api', 'celery']) – -
redis_uri(str | None) – -
uploads_expiry_minutes(int | None) – -
authorisation(bool) – -
public_key(SecretStr | None) – -
log_level(LogLevelType) – -
backend_type(MatchboxBackends) – -
postgres(MatchboxPostgresCoreSettings) –
model_config
class-attribute
instance-attribute
¶
model_config = SettingsConfigDict(env_prefix='MB__SERVER__', env_nested_delimiter='__', use_enum_values=True, env_file='.env', env_file_encoding='utf-8', extra='ignore')
postgres
class-attribute
instance-attribute
¶
postgres: MatchboxPostgresCoreSettings = Field(default_factory=MatchboxPostgresCoreSettings)
admin
¶
Admin PostgreSQL mixin for Matchbox server.
Classes:
-
MatchboxPostgresAdminMixin–Admin mixin for the PostgreSQL adapter for Matchbox.
MatchboxPostgresAdminMixin
¶
Admin mixin for the PostgreSQL adapter for Matchbox.
Methods:
-
login– -
get_user_groups– -
list_groups– -
get_group– -
create_group– -
delete_group– -
add_user_to_group– -
remove_user_from_group– -
check_permission– -
get_permissions– -
grant_permission– -
revoke_permission– -
validate_ids– -
dump– -
drop– -
clear– -
restore– -
delete_orphans–
check_permission
¶
check_permission(user_name: str, permission: PermissionType, resource: Literal[SYSTEM] | CollectionName) -> bool
get_permissions
¶
get_permissions(resource: Literal[SYSTEM] | CollectionName) -> list[PermissionGrant]
grant_permission
¶
grant_permission(group_name: GroupName, permission: PermissionType, resource: Literal[SYSTEM] | CollectionName) -> None
revoke_permission
¶
revoke_permission(group_name: GroupName, permission: PermissionType, resource: Literal[SYSTEM] | CollectionName) -> None
collections
¶
Collections PostgreSQL mixin for Matchbox server.
Classes:
-
MatchboxPostgresCollectionsMixin–Collections mixin for the PostgreSQL adapter for Matchbox.
MatchboxPostgresCollectionsMixin
¶
Collections mixin for the PostgreSQL adapter for Matchbox.
Methods:
-
create_collection– -
get_collection– -
list_collections– -
delete_collection– -
create_run– -
set_run_mutable– -
set_run_default– -
get_run– -
delete_run– -
create_resolution– -
get_resolution– -
update_resolution– -
delete_resolution– -
lock_resolution_data– -
unlock_resolution_data– -
get_resolution_stage– -
insert_source_data– -
insert_model_data– -
get_model_data–
unlock_resolution_data
¶
unlock_resolution_data(path: ResolutionPath, complete: bool = False) -> None
eval
¶
Evaluation PostgreSQL mixin for Matchbox server.
Classes:
-
MatchboxPostgresEvaluationMixin–Evaluation mixin for the PostgreSQL adapter for Matchbox.
MatchboxPostgresEvaluationMixin
¶
Evaluation mixin for the PostgreSQL adapter for Matchbox.
Methods:
-
insert_judgement– -
get_judgements– -
sample_for_eval–Sample some clusters from a resolution.
main
¶
Composed PostgreSQL adapter for Matchbox server.
Classes:
-
FilteredClusters–Wrapper class for filtered cluster queries.
-
FilteredProbabilities–Wrapper class for filtered probability queries.
-
FilteredResolutions–Wrapper class for filtered resolution queries.
-
MatchboxPostgres–A PostgreSQL adapter for Matchbox.
FilteredClusters
¶
Bases: BaseModel
Wrapper class for filtered cluster queries.
Methods:
-
count–Counts the number of clusters in the database.
Attributes:
-
has_source(bool | None) –
FilteredProbabilities
¶
Bases: BaseModel
Wrapper class for filtered probability queries.
Methods:
-
count–Counts the number of probabilities in the database.
Attributes:
-
over_truth(bool) –
FilteredResolutions
¶
Bases: BaseModel
Wrapper class for filtered resolution queries.
Methods:
-
count–Counts the number of resolutions in the database.
Attributes:
MatchboxPostgres
¶
MatchboxPostgres(settings: MatchboxPostgresSettings)
Bases: MatchboxPostgresQueryMixin, MatchboxPostgresEvaluationMixin, MatchboxPostgresCollectionsMixin, MatchboxPostgresAdminMixin, MatchboxDBAdapter
A PostgreSQL adapter for Matchbox.
Methods:
-
query– -
match– -
create_collection– -
get_collection– -
list_collections– -
delete_collection– -
create_run– -
set_run_mutable– -
set_run_default– -
get_run– -
delete_run– -
create_resolution– -
get_resolution– -
update_resolution– -
delete_resolution– -
lock_resolution_data– -
unlock_resolution_data– -
get_resolution_stage– -
insert_source_data– -
insert_model_data– -
get_model_data– -
validate_ids– -
dump– -
drop– -
clear– -
restore– -
delete_orphans– -
login– -
get_user_groups– -
list_groups– -
get_group– -
create_group– -
delete_group– -
add_user_to_group– -
remove_user_from_group– -
check_permission– -
get_permissions– -
grant_permission– -
revoke_permission– -
insert_judgement– -
get_judgements– -
sample_for_eval–Sample some clusters from a resolution.
Attributes:
-
settings– -
sources– -
models– -
source_clusters– -
model_clusters– -
all_clusters– -
creates– -
merges– -
proposes– -
source_resolutions–
source_resolutions
instance-attribute
¶
source_resolutions = FilteredResolutions(sources=True, models=False)
query
¶
query(source: SourceResolutionPath, point_of_truth: ResolutionPath | None = None, threshold: int | None = None, return_leaf_id: bool = False, limit: int | None = None) -> Table
match
¶
match(key: str, source: SourceResolutionPath, targets: list[SourceResolutionPath], point_of_truth: ResolutionPath, threshold: int | None = None) -> list[Match]
unlock_resolution_data
¶
unlock_resolution_data(path: ResolutionPath, complete: bool = False) -> None
check_permission
¶
check_permission(user_name: str, permission: PermissionType, resource: Literal[SYSTEM] | CollectionName) -> bool
get_permissions
¶
get_permissions(resource: Literal[SYSTEM] | CollectionName) -> list[PermissionGrant]
grant_permission
¶
grant_permission(group_name: GroupName, permission: PermissionType, resource: Literal[SYSTEM] | CollectionName) -> None
revoke_permission
¶
revoke_permission(group_name: GroupName, permission: PermissionType, resource: Literal[SYSTEM] | CollectionName) -> None
sample_for_eval
¶
sample_for_eval(n: int, path: ModelResolutionPath, user_name: str) -> Table
Sample some clusters from a resolution.
query
¶
Query PostgreSQL mixin for Matchbox server.
Classes:
-
MatchboxPostgresQueryMixin–Query mixin for the PostgreSQL adapter for Matchbox.
MatchboxPostgresQueryMixin
¶
Query mixin for the PostgreSQL adapter for Matchbox.
Methods:
query
¶
query(source: SourceResolutionPath, point_of_truth: ResolutionPath | None = None, threshold: int | None = None, return_leaf_id: bool = False, limit: int | None = None) -> Table
match
¶
match(key: str, source: SourceResolutionPath, targets: list[SourceResolutionPath], point_of_truth: ResolutionPath, threshold: int | None = None) -> list[Match]
db
¶
Matchbox PostgreSQL database connection.
Classes:
-
MatchboxPostgresCoreSettings–PostgreSQL-specific settings for Matchbox.
-
MatchboxPostgresSettings–Settings for the Matchbox PostgreSQL backend.
-
MatchboxDatabase–Matchbox PostgreSQL database connection.
Attributes:
-
MBDB–
MatchboxPostgresCoreSettings
¶
MatchboxPostgresSettings
¶
Bases: MatchboxServerSettings
Settings for the Matchbox PostgreSQL backend.
Inherits the core settings and adds the PostgreSQL-specific settings.
Methods:
-
check_settings–Check that legal combinations of settings are provided.
Attributes:
-
backend_type(MatchboxBackends) – -
postgres(MatchboxPostgresCoreSettings) – -
model_config– -
batch_size(int) – -
datastore(MatchboxDatastoreSettings) – -
task_runner(Literal['api', 'celery']) – -
redis_uri(str | None) – -
uploads_expiry_minutes(int | None) – -
authorisation(bool) – -
public_key(SecretStr | None) – -
log_level(LogLevelType) –
postgres
class-attribute
instance-attribute
¶
postgres: MatchboxPostgresCoreSettings = Field(default_factory=MatchboxPostgresCoreSettings)
model_config
class-attribute
instance-attribute
¶
model_config = SettingsConfigDict(env_prefix='MB__SERVER__', env_nested_delimiter='__', use_enum_values=True, env_file='.env', env_file_encoding='utf-8', extra='ignore')
MatchboxDatabase
¶
MatchboxDatabase(settings: MatchboxPostgresSettings)
Matchbox PostgreSQL database connection.
Methods:
-
connection_string–Get the connection string for PostgreSQL.
-
get_engine–Get the database engine.
-
get_session–Get a new session.
-
get_adbc_connection–Get a new ADBC connection wrapped by a SQLAlchemy pool proxy.
-
run_migrations–Create the database and all tables expected in the schema.
-
clear_database–Delete all rows in every table in the database schema.
-
drop_database–Drop all tables in the database schema and re-recreate them.
-
vacuum_analyze–Run VACUUM ANALYZE on specified tables.
Attributes:
-
settings– -
MatchboxBase– -
alembic_config– -
sorted_tables(list[Table]) –Return a list of SQLAlchemy tables in order of creation.
MatchboxBase
instance-attribute
¶
sorted_tables
property
¶
sorted_tables: list[Table]
Return a list of SQLAlchemy tables in order of creation.
connection_string
¶
Get the connection string for PostgreSQL.
get_adbc_connection
¶
Get a new ADBC connection wrapped by a SQLAlchemy pool proxy.
The connection must be used within a context manager.
run_migrations
¶
Create the database and all tables expected in the schema.
clear_database
¶
Delete all rows in every table in the database schema.
- TRUNCATE tables that are part of the core ORM (preserves structure)
- DROP tables that are not in the ORM (removes temporary/test tables)
drop_database
¶
Drop all tables in the database schema and re-recreate them.
vacuum_analyze
¶
vacuum_analyze(*table_names: str) -> None
Run VACUUM ANALYZE on specified tables.
VACUUM ANALYZE reclaims storage and updates statistics for the query planner. PostgreSQL may not fully utilise indexes until VACUUM ANALYZE is run. According to https://www.postgresql.org/docs/current/sql-vacuum.html, VACUUM ANALYZE is recommended over just ANALYZE for optimal performance.
Parameters:
mixin
¶
A module for defining mixins for the PostgreSQL backend ORM.
Classes:
-
CountMixin–A mixin for counting the number of rows in a table.
Attributes:
-
T–
orm
¶
ORM classes for the Matchbox PostgreSQL database.
Classes:
-
Collections–Named collections of resolutions and runs.
-
Runs–Runs of collections of resolutions.
-
ResolutionFrom–Resolution lineage closure table with cached truth values.
-
Resolutions–Table of resolution points corresponding to models, and sources.
-
PKSpace–Table used to reserve ranges of primary keys.
-
SourceFields–Table for storing column details for SourceConfigs.
-
ClusterSourceKey–Table for storing source primary keys for clusters.
-
SourceConfigs–Table of source_configs of data for Matchbox.
-
ModelConfigs–Table of model configs for Matchbox.
-
Contains–Cluster lineage table.
-
Clusters–Table of indexed data and clusters that match it.
-
UserGroups–Association table for user-group membership.
-
Users–Table of user identities.
-
Groups–Groups for permission management.
-
Permissions–Permissions granted to groups on resources.
-
EvalJudgements–Table of evaluation judgements produced by human validators.
-
Probabilities–Table of probabilities that a cluster is correct, according to a resolution.
-
Results–Table of results for a resolution.
Collections
¶
Bases: CountMixin, MatchboxBase
Named collections of resolutions and runs.
Methods:
-
from_name–Resolve a collection name to a Collections object.
-
to_dto–Convert ORM collection to a matchbox.common Collection object.
-
count–Counts the number of rows in the table.
Attributes:
-
__tablename__– -
collection_id(Mapped[int]) – -
name(Mapped[str]) – -
runs(Mapped[list[Runs]]) – -
permissions(Mapped[list[Permissions]]) – -
__table_args__–
collection_id
class-attribute
instance-attribute
¶
collection_id: Mapped[int] = mapped_column(BIGINT, primary_key=True, autoincrement=True)
runs
class-attribute
instance-attribute
¶
permissions
class-attribute
instance-attribute
¶
permissions: Mapped[list[Permissions]] = relationship(back_populates='collection', passive_deletes=True)
__table_args__
class-attribute
instance-attribute
¶
from_name
classmethod
¶
from_name(name: CollectionName, session: Session | None = None) -> Collections
Resolve a collection name to a Collections object.
Parameters:
-
(name¶CollectionName) –The name of the collection to resolve.
-
(session¶Session | None, default:None) –Optional session to use for the query.
Raises:
-
MatchboxCollectionNotFoundError–If the collection doesn’t exist.
Runs
¶
Bases: CountMixin, MatchboxBase
Runs of collections of resolutions.
Methods:
-
from_id–Resolve a collection and run name to a Runs object.
-
to_dto–Convert ORM run to a matchbox.common Run object.
-
count–Counts the number of rows in the table.
Attributes:
-
__tablename__– -
run_id(Mapped[int]) – -
collection_id(Mapped[int]) – -
is_mutable(Mapped[bool]) – -
is_default(Mapped[bool]) – -
collection(Mapped[Collections]) – -
resolutions(Mapped[list[Resolutions]]) – -
__table_args__–
run_id
class-attribute
instance-attribute
¶
run_id: Mapped[int] = mapped_column(BIGINT, primary_key=True, autoincrement=True)
collection_id
class-attribute
instance-attribute
¶
collection_id: Mapped[int] = mapped_column(BIGINT, ForeignKey('collections.collection_id', ondelete='CASCADE'), nullable=False)
is_mutable
class-attribute
instance-attribute
¶
is_mutable: Mapped[bool] = mapped_column(BOOLEAN, default=False, nullable=True)
is_default
class-attribute
instance-attribute
¶
is_default: Mapped[bool] = mapped_column(BOOLEAN, default=False, nullable=True)
collection
class-attribute
instance-attribute
¶
collection: Mapped[Collections] = relationship(back_populates='runs')
resolutions
class-attribute
instance-attribute
¶
resolutions: Mapped[list[Resolutions]] = relationship(back_populates='run')
__table_args__
class-attribute
instance-attribute
¶
__table_args__ = (UniqueConstraint('collection_id', 'run_id', name='unique_run_id'), Index('ix_default_run_collection', 'collection_id', unique=True, postgresql_where=text('is_default = true')))
from_id
classmethod
¶
from_id(collection: CollectionName, run_id: RunID, session: Session | None = None) -> Runs
Resolve a collection and run name to a Runs object.
Parameters:
-
(collection¶CollectionName) –The name of the collection containing the run.
-
(run_id¶RunID) –The ID of the run within that collection.
-
(session¶Session | None, default:None) –Optional session to use for the query.
Raises:
-
MatchboxRunNotFoundError–If the run doesn’t exist.
ResolutionFrom
¶
Bases: CountMixin, MatchboxBase
Resolution lineage closure table with cached truth values.
Methods:
-
count–Counts the number of rows in the table.
Attributes:
-
__tablename__– -
parent(Mapped[int]) – -
child(Mapped[int]) – -
level(Mapped[int]) – -
truth_cache(Mapped[int | None]) – -
__table_args__–
parent
class-attribute
instance-attribute
¶
parent: Mapped[int] = mapped_column(BIGINT, ForeignKey('resolutions.resolution_id', ondelete='CASCADE'), primary_key=True)
child
class-attribute
instance-attribute
¶
child: Mapped[int] = mapped_column(BIGINT, ForeignKey('resolutions.resolution_id', ondelete='CASCADE'), primary_key=True)
level
class-attribute
instance-attribute
¶
level: Mapped[int] = mapped_column(INTEGER, nullable=False)
truth_cache
class-attribute
instance-attribute
¶
truth_cache: Mapped[int | None] = mapped_column(SMALLINT, nullable=True)
__table_args__
class-attribute
instance-attribute
¶
__table_args__ = (CheckConstraint('parent != child', name='no_self_reference'), CheckConstraint('level > 0', name='positive_level'))
Resolutions
¶
Bases: CountMixin, MatchboxBase
Table of resolution points corresponding to models, and sources.
Resolutions produce probabilities or own data in the clusters table.
Methods:
-
get_lineage–Returns lineage ordered by priority.
-
from_path–Resolves a resolution name to a Resolution object.
-
from_dto–Create a Resolutions instance from a Resolution DTO object.
-
to_dto–Convert ORM resolution to a matchbox.common Resolution object.
-
count–Counts the number of rows in the table.
Attributes:
-
__tablename__– -
resolution_id(Mapped[int]) – -
run_id(Mapped[int]) – -
upload_stage(Mapped[UploadStage]) – -
name(Mapped[str]) – -
description(Mapped[str | None]) – -
type(Mapped[str]) – -
fingerprint(Mapped[bytes]) – -
truth(Mapped[int | None]) – -
source_config(Mapped[Optional[SourceConfigs]]) – -
model_config(Mapped[Optional[ModelConfigs]]) – -
probabilities(Mapped[list[Probabilities]]) – -
results(Mapped[list[Results]]) – -
children(Mapped[list[Resolutions]]) – -
run(Mapped[Runs]) – -
__table_args__– -
ancestors(set[Resolutions]) –Returns all ancestors (parents, grandparents, etc.) of this resolution.
-
descendants(set[Resolutions]) –Returns descendants (children, grandchildren, etc.) of this resolution.
resolution_id
class-attribute
instance-attribute
¶
resolution_id: Mapped[int] = mapped_column(BIGINT, primary_key=True, autoincrement=True)
run_id
class-attribute
instance-attribute
¶
run_id: Mapped[int] = mapped_column(BIGINT, ForeignKey('runs.run_id', ondelete='CASCADE'), nullable=False)
upload_stage
class-attribute
instance-attribute
¶
upload_stage: Mapped[UploadStage] = mapped_column(Enum(UploadStage, native_enum=True, name='upload_stages', schema='mb'), nullable=False, default=READY)
description
class-attribute
instance-attribute
¶
description: Mapped[str | None] = mapped_column(TEXT, nullable=True)
fingerprint
class-attribute
instance-attribute
¶
fingerprint: Mapped[bytes] = mapped_column(BYTEA, nullable=False)
truth
class-attribute
instance-attribute
¶
truth: Mapped[int | None] = mapped_column(SMALLINT, nullable=True)
source_config
class-attribute
instance-attribute
¶
source_config: Mapped[Optional[SourceConfigs]] = relationship(back_populates='source_resolution', uselist=False)
model_config
class-attribute
instance-attribute
¶
model_config: Mapped[Optional[ModelConfigs]] = relationship(back_populates='model_resolution', uselist=False)
probabilities
class-attribute
instance-attribute
¶
probabilities: Mapped[list[Probabilities]] = relationship(back_populates='proposed_by', passive_deletes=True)
results
class-attribute
instance-attribute
¶
children
class-attribute
instance-attribute
¶
children: Mapped[list[Resolutions]] = relationship(secondary=__table__, primaryjoin='Resolutions.resolution_id == ResolutionFrom.parent', secondaryjoin='Resolutions.resolution_id == ResolutionFrom.child', backref='parents')
run
class-attribute
instance-attribute
¶
run: Mapped[Runs] = relationship(back_populates='resolutions')
__table_args__
class-attribute
instance-attribute
¶
__table_args__ = (CheckConstraint("type IN ('model', 'source')", name='resolution_type_constraints'), UniqueConstraint('run_id', 'name', name='resolutions_name_key'))
ancestors
property
¶
ancestors: set[Resolutions]
Returns all ancestors (parents, grandparents, etc.) of this resolution.
descendants
property
¶
descendants: set[Resolutions]
Returns descendants (children, grandchildren, etc.) of this resolution.
get_lineage
¶
get_lineage(sources: list[SourceConfigs] | None = None, threshold: int | None = None) -> list[tuple[int, int, float | None]]
Returns lineage ordered by priority.
Highest priority (lowest level) first, then by resolution_id for stability.
Parameters:
-
(sources¶list[SourceConfigs] | None, default:None) –If provided, only return lineage paths that lead to these sources
-
(threshold¶int | None, default:None) –If provided, override this resolution’s threshold
Returns:
from_path
classmethod
¶
from_path(path: ResolutionPath, res_type: ResolutionType | None = None, session: Session | None = None, for_update: bool = False) -> Resolutions
Resolves a resolution name to a Resolution object.
Parameters:
-
(path¶ResolutionPath) –The path of the resolution to resolve.
-
(res_type¶ResolutionType | None, default:None) –A resolution type to use as filter.
-
(session¶Session | None, default:None) –A session to get the resolution for updates.
-
(for_update¶bool, default:False) –Locks the row until updated.
Raises:
-
MatchboxResolutionNotFoundError–If the resolution doesn’t exist.
from_dto
classmethod
¶
from_dto(resolution: Resolution, path: ResolutionPath, session: Session) -> Resolutions
Create a Resolutions instance from a Resolution DTO object.
The resolution will be added to the session and flushed (but not committed).
For model resolutions, lineage entries will be created automatically.
Parameters:
-
(resolution¶Resolution) –The Resolution DTO to convert
-
(path¶ResolutionPath) –The full resolution path
-
(session¶Session) –Database session (caller must commit)
Returns:
-
Resolutions–A Resolutions ORM instance with ID and relationships established
PKSpace
¶
Bases: MatchboxBase
Table used to reserve ranges of primary keys.
Methods:
-
initialise–Create PKSpace tracking row if not exists.
-
reserve_block–Atomically get next available ID for table, and increment it.
Attributes:
-
__tablename__– -
id(Mapped[int]) – -
next_cluster_id(Mapped[int]) – -
next_cluster_keys_id(Mapped[int]) –
SourceFields
¶
Bases: CountMixin, MatchboxBase
Table for storing column details for SourceConfigs.
Methods:
-
count–Counts the number of rows in the table.
Attributes:
-
__tablename__– -
field_id(Mapped[int]) – -
source_config_id(Mapped[int]) – -
index(Mapped[int]) – -
name(Mapped[str]) – -
type(Mapped[str]) – -
is_key(Mapped[bool]) – -
source_config(Mapped[SourceConfigs]) – -
__table_args__–
field_id
class-attribute
instance-attribute
¶
field_id: Mapped[int] = mapped_column(BIGINT, primary_key=True)
source_config_id
class-attribute
instance-attribute
¶
source_config_id: Mapped[int] = mapped_column(BIGINT, ForeignKey('source_configs.source_config_id', ondelete='CASCADE'), nullable=False)
index
class-attribute
instance-attribute
¶
index: Mapped[int] = mapped_column(INTEGER, nullable=False)
is_key
class-attribute
instance-attribute
¶
is_key: Mapped[bool] = mapped_column(BOOLEAN, nullable=False)
source_config
class-attribute
instance-attribute
¶
source_config: Mapped[SourceConfigs] = relationship(back_populates='fields', foreign_keys=[source_config_id])
__table_args__
class-attribute
instance-attribute
¶
__table_args__ = (UniqueConstraint('source_config_id', 'index', name='unique_index'), Index('ix_source_columns_source_config_id', 'source_config_id'), Index('ix_unique_key_field', 'source_config_id', unique=True, postgresql_where=text('is_key = true')))
ClusterSourceKey
¶
Bases: CountMixin, MatchboxBase
Table for storing source primary keys for clusters.
Methods:
-
count–Counts the number of rows in the table.
Attributes:
-
__tablename__– -
key_id(Mapped[int]) – -
cluster_id(Mapped[int]) – -
source_config_id(Mapped[int]) – -
key(Mapped[str]) – -
cluster(Mapped[Clusters]) – -
source_config(Mapped[SourceConfigs]) – -
__table_args__–
key_id
class-attribute
instance-attribute
¶
key_id: Mapped[int] = mapped_column(BIGINT, primary_key=True)
cluster_id
class-attribute
instance-attribute
¶
cluster_id: Mapped[int] = mapped_column(BIGINT, ForeignKey('clusters.cluster_id', ondelete='CASCADE'), nullable=False)
source_config_id
class-attribute
instance-attribute
¶
source_config_id: Mapped[int] = mapped_column(BIGINT, ForeignKey('source_configs.source_config_id', ondelete='CASCADE'), nullable=False)
cluster
class-attribute
instance-attribute
¶
cluster: Mapped[Clusters] = relationship(back_populates='keys')
source_config
class-attribute
instance-attribute
¶
source_config: Mapped[SourceConfigs] = relationship(back_populates='cluster_keys')
__table_args__
class-attribute
instance-attribute
¶
__table_args__ = (Index('ix_cluster_keys_cluster_id', 'cluster_id'), Index('ix_cluster_keys_keys', 'key'), Index('ix_cluster_keys_source_config_id', 'source_config_id'), UniqueConstraint('key_id', 'source_config_id', name='unique_keys_source'))
SourceConfigs
¶
SourceConfigs(key_field: SourceFields | None = None, index_fields: list[SourceFields] | None = None, **kwargs: Any)
Bases: CountMixin, MatchboxBase
Table of source_configs of data for Matchbox.
Methods:
-
list_all–Returns all source_configs in the database.
-
from_dto–Create a SourceConfigs instance from a Resolution DTO object.
-
to_dto–Convert ORM source to a matchbox.common.SourceConfig object.
-
count–Counts the number of rows in the table.
Attributes:
-
__tablename__– -
source_config_id(Mapped[int]) – -
resolution_id(Mapped[int]) – -
location_type(Mapped[str]) – -
location_name(Mapped[str]) – -
extract_transform(Mapped[str]) – -
name(str) –Get the name of the related resolution.
-
source_resolution(Mapped[Resolutions]) – -
fields(Mapped[list[SourceFields]]) – -
key_field(Mapped[Optional[SourceFields]]) – -
index_fields(Mapped[list[SourceFields]]) – -
cluster_keys(Mapped[list[ClusterSourceKey]]) – -
clusters(Mapped[list[Clusters]]) –
source_config_id
class-attribute
instance-attribute
¶
source_config_id: Mapped[int] = mapped_column(BIGINT, Identity(start=1), primary_key=True)
resolution_id
class-attribute
instance-attribute
¶
resolution_id: Mapped[int] = mapped_column(BIGINT, ForeignKey('resolutions.resolution_id', ondelete='CASCADE'), nullable=False)
location_type
class-attribute
instance-attribute
¶
location_type: Mapped[str] = mapped_column(TEXT, nullable=False)
location_name
class-attribute
instance-attribute
¶
location_name: Mapped[str] = mapped_column(TEXT, nullable=False)
extract_transform
class-attribute
instance-attribute
¶
extract_transform: Mapped[str] = mapped_column(TEXT, nullable=False)
source_resolution
class-attribute
instance-attribute
¶
source_resolution: Mapped[Resolutions] = relationship(back_populates='source_config')
fields
class-attribute
instance-attribute
¶
fields: Mapped[list[SourceFields]] = relationship(back_populates='source_config', passive_deletes=True, cascade='all, delete-orphan')
key_field
class-attribute
instance-attribute
¶
key_field: Mapped[Optional[SourceFields]] = relationship(primaryjoin='and_(SourceConfigs.source_config_id == SourceFields.source_config_id, SourceFields.is_key == True)', viewonly=True, uselist=False)
index_fields
class-attribute
instance-attribute
¶
index_fields: Mapped[list[SourceFields]] = relationship(primaryjoin='and_(SourceConfigs.source_config_id == SourceFields.source_config_id, SourceFields.is_key == False)', viewonly=True, order_by='SourceFields.index', collection_class=list)
cluster_keys
class-attribute
instance-attribute
¶
cluster_keys: Mapped[list[ClusterSourceKey]] = relationship(back_populates='source_config', passive_deletes=True)
clusters
class-attribute
instance-attribute
¶
clusters: Mapped[list[Clusters]] = relationship(secondary=__table__, primaryjoin='SourceConfigs.source_config_id == ClusterSourceKey.source_config_id', secondaryjoin='ClusterSourceKey.cluster_id == Clusters.cluster_id', viewonly=True)
list_all
classmethod
¶
list_all() -> list[SourceConfigs]
Returns all source_configs in the database.
from_dto
classmethod
¶
from_dto(config: SourceConfig) -> SourceConfigs
Create a SourceConfigs instance from a Resolution DTO object.
ModelConfigs
¶
Bases: CountMixin, MatchboxBase
Table of model configs for Matchbox.
Methods:
-
list_all–Returns all model_configs in the database.
-
from_dto–Create a SourceConfigs instance from a Resolution DTO object.
-
to_dto–Convert ORM source to a matchbox.common.ModelConfig object.
-
count–Counts the number of rows in the table.
Attributes:
-
__tablename__– -
model_config_id(Mapped[int]) – -
resolution_id(Mapped[int]) – -
model_class(Mapped[str]) – -
model_settings(Mapped[dict]) – -
left_query(Mapped[dict]) – -
right_query(Mapped[dict | None]) – -
name(str) –Get the name of the related resolution.
-
model_resolution(Mapped[Resolutions]) –
model_config_id
class-attribute
instance-attribute
¶
model_config_id: Mapped[int] = mapped_column(BIGINT, Identity(start=1), primary_key=True)
resolution_id
class-attribute
instance-attribute
¶
resolution_id: Mapped[int] = mapped_column(BIGINT, ForeignKey('resolutions.resolution_id', ondelete='CASCADE'), nullable=False)
model_class
class-attribute
instance-attribute
¶
model_class: Mapped[str] = mapped_column(TEXT, nullable=False)
model_settings
class-attribute
instance-attribute
¶
model_settings: Mapped[dict] = mapped_column(JSONB, nullable=False)
left_query
class-attribute
instance-attribute
¶
left_query: Mapped[dict] = mapped_column(JSONB, nullable=False)
right_query
class-attribute
instance-attribute
¶
right_query: Mapped[dict | None] = mapped_column(JSONB, nullable=True)
model_resolution
class-attribute
instance-attribute
¶
model_resolution: Mapped[Resolutions] = relationship(back_populates='model_config')
from_dto
classmethod
¶
from_dto(config: ModelConfig) -> ModelConfigs
Create a SourceConfigs instance from a Resolution DTO object.
Contains
¶
Bases: CountMixin, MatchboxBase
Cluster lineage table.
Methods:
-
count–Counts the number of rows in the table.
Attributes:
-
__tablename__– -
root(Mapped[int]) – -
leaf(Mapped[int]) – -
__table_args__–
root
class-attribute
instance-attribute
¶
root: Mapped[int] = mapped_column(BIGINT, ForeignKey('clusters.cluster_id', ondelete='CASCADE'), primary_key=True)
leaf
class-attribute
instance-attribute
¶
leaf: Mapped[int] = mapped_column(BIGINT, ForeignKey('clusters.cluster_id', ondelete='CASCADE'), primary_key=True)
__table_args__
class-attribute
instance-attribute
¶
__table_args__ = (CheckConstraint('root != leaf', name='no_self_containment'), Index('ix_contains_root_leaf', 'root', 'leaf'), Index('ix_contains_leaf_root', 'leaf', 'root'))
Clusters
¶
Bases: CountMixin, MatchboxBase
Table of indexed data and clusters that match it.
Methods:
-
count–Counts the number of rows in the table.
Attributes:
-
__tablename__– -
cluster_id(Mapped[int]) – -
cluster_hash(Mapped[bytes]) – -
keys(Mapped[list[ClusterSourceKey]]) – -
probabilities(Mapped[list[Probabilities]]) – -
leaves(Mapped[list[Clusters]]) – -
source_configs(Mapped[list[SourceConfigs]]) – -
__table_args__–
cluster_id
class-attribute
instance-attribute
¶
cluster_id: Mapped[int] = mapped_column(BIGINT, primary_key=True)
cluster_hash
class-attribute
instance-attribute
¶
cluster_hash: Mapped[bytes] = mapped_column(BYTEA, nullable=False)
keys
class-attribute
instance-attribute
¶
keys: Mapped[list[ClusterSourceKey]] = relationship(back_populates='cluster', passive_deletes=True)
probabilities
class-attribute
instance-attribute
¶
probabilities: Mapped[list[Probabilities]] = relationship(back_populates='proposes', passive_deletes=True)
leaves
class-attribute
instance-attribute
¶
leaves: Mapped[list[Clusters]] = relationship(secondary=__table__, primaryjoin='Clusters.cluster_id == Contains.root', secondaryjoin='Clusters.cluster_id == Contains.leaf', backref='roots')
source_configs
class-attribute
instance-attribute
¶
source_configs: Mapped[list[SourceConfigs]] = relationship(secondary=__table__, primaryjoin='Clusters.cluster_id == ClusterSourceKey.cluster_id', secondaryjoin='ClusterSourceKey.source_config_id == SourceConfigs.source_config_id', viewonly=True)
__table_args__
class-attribute
instance-attribute
¶
UserGroups
¶
Users
¶
Bases: CountMixin, MatchboxBase
Table of user identities.
Methods:
-
count–Counts the number of rows in the table.
Attributes:
-
__tablename__– -
user_id(Mapped[int]) – -
name(Mapped[str]) – -
email(Mapped[str]) – -
judgements(Mapped[list[EvalJudgements]]) – -
groups(Mapped[list[Groups]]) – -
__table_args__–
user_id
class-attribute
instance-attribute
¶
user_id: Mapped[int] = mapped_column(BIGINT, primary_key=True)
judgements
class-attribute
instance-attribute
¶
judgements: Mapped[list[EvalJudgements]] = relationship(back_populates='user')
groups
class-attribute
instance-attribute
¶
__table_args__
class-attribute
instance-attribute
¶
Groups
¶
Bases: CountMixin, MatchboxBase
Groups for permission management.
Methods:
-
count–Counts the number of rows in the table.
Attributes:
-
__tablename__– -
group_id(Mapped[int]) – -
name(Mapped[str]) – -
description(Mapped[str | None]) – -
is_system(Mapped[bool]) – -
members(Mapped[list[Users]]) – -
permissions(Mapped[list[Permissions]]) – -
__table_args__–
group_id
class-attribute
instance-attribute
¶
group_id: Mapped[int] = mapped_column(BIGINT, primary_key=True, autoincrement=True)
description
class-attribute
instance-attribute
¶
description: Mapped[str | None] = mapped_column(TEXT, nullable=True)
is_system
class-attribute
instance-attribute
¶
is_system: Mapped[bool] = mapped_column(BOOLEAN, default=False, nullable=False)
members
class-attribute
instance-attribute
¶
permissions
class-attribute
instance-attribute
¶
permissions: Mapped[list[Permissions]] = relationship(back_populates='group', passive_deletes=True)
__table_args__
class-attribute
instance-attribute
¶
Permissions
¶
Bases: CountMixin, MatchboxBase
Permissions granted to groups on resources.
Each resource type should have one column. This creates lots of nulls, which are cheap in PostgreSQL and are on an ultimately small table, and avoids a polymorphic association.
Methods:
-
count–Counts the number of rows in the table.
Attributes:
-
__tablename__– -
permission_id(Mapped[int]) – -
permission(Mapped[str]) – -
group_id(Mapped[int]) – -
collection_id(Mapped[int | None]) – -
is_system(Mapped[bool | None]) – -
group(Mapped[Groups]) – -
collection(Mapped[Collections | None]) – -
__table_args__–
permission_id
class-attribute
instance-attribute
¶
permission_id: Mapped[int] = mapped_column(BIGINT, primary_key=True, autoincrement=True)
permission
class-attribute
instance-attribute
¶
permission: Mapped[str] = mapped_column(TEXT, nullable=False)
group_id
class-attribute
instance-attribute
¶
group_id: Mapped[int] = mapped_column(BIGINT, ForeignKey('groups.group_id', ondelete='CASCADE'), nullable=False)
collection_id
class-attribute
instance-attribute
¶
collection_id: Mapped[int | None] = mapped_column(BIGINT, ForeignKey('collections.collection_id', ondelete='CASCADE'), nullable=True)
is_system
class-attribute
instance-attribute
¶
is_system: Mapped[bool | None] = mapped_column(BOOLEAN, nullable=True)
group
class-attribute
instance-attribute
¶
group: Mapped[Groups] = relationship(back_populates='permissions')
collection
class-attribute
instance-attribute
¶
collection: Mapped[Collections | None] = relationship(back_populates='permissions')
__table_args__
class-attribute
instance-attribute
¶
__table_args__ = (CheckConstraint("permission IN ('read', 'write', 'admin')", name='valid_permission'), CheckConstraint('(collection_id IS NOT NULL AND is_system IS NULL) OR (collection_id IS NULL AND is_system = true)', name='exactly_one_resource'), UniqueConstraint('permission', 'group_id', 'collection_id', 'is_system', name='unique_permission_grant'))
EvalJudgements
¶
Bases: CountMixin, MatchboxBase
Table of evaluation judgements produced by human validators.
Methods:
-
count–Counts the number of rows in the table.
Attributes:
-
__tablename__– -
judgement_id(Mapped[int]) – -
user_id(Mapped[int]) – -
endorsed_cluster_id(Mapped[int]) – -
shown_cluster_id(Mapped[int]) – -
tag(Mapped[str]) – -
timestamp(Mapped[DateTime]) – -
user(Mapped[Users]) –
judgement_id
class-attribute
instance-attribute
¶
judgement_id: Mapped[int] = mapped_column(BIGINT, primary_key=True)
user_id
class-attribute
instance-attribute
¶
user_id: Mapped[int] = mapped_column(BIGINT, ForeignKey('users.user_id', ondelete='CASCADE'), nullable=False)
endorsed_cluster_id
class-attribute
instance-attribute
¶
endorsed_cluster_id: Mapped[int] = mapped_column(BIGINT, ForeignKey('clusters.cluster_id', ondelete='CASCADE'), nullable=False)
shown_cluster_id
class-attribute
instance-attribute
¶
shown_cluster_id: Mapped[int] = mapped_column(BIGINT, ForeignKey('clusters.cluster_id', ondelete='CASCADE'), nullable=False)
timestamp
class-attribute
instance-attribute
¶
Probabilities
¶
Bases: CountMixin, MatchboxBase
Table of probabilities that a cluster is correct, according to a resolution.
Methods:
-
count–Counts the number of rows in the table.
Attributes:
-
__tablename__– -
resolution_id(Mapped[int]) – -
cluster_id(Mapped[int]) – -
probability(Mapped[int]) – -
proposed_by(Mapped[Resolutions]) – -
proposes(Mapped[Clusters]) – -
__table_args__–
resolution_id
class-attribute
instance-attribute
¶
resolution_id: Mapped[int] = mapped_column(BIGINT, ForeignKey('resolutions.resolution_id', ondelete='CASCADE'), primary_key=True)
cluster_id
class-attribute
instance-attribute
¶
cluster_id: Mapped[int] = mapped_column(BIGINT, ForeignKey('clusters.cluster_id', ondelete='CASCADE'), primary_key=True)
probability
class-attribute
instance-attribute
¶
probability: Mapped[int] = mapped_column(SMALLINT, nullable=False)
proposed_by
class-attribute
instance-attribute
¶
proposed_by: Mapped[Resolutions] = relationship(back_populates='probabilities')
proposes
class-attribute
instance-attribute
¶
proposes: Mapped[Clusters] = relationship(back_populates='probabilities')
__table_args__
class-attribute
instance-attribute
¶
__table_args__ = (CheckConstraint('probability BETWEEN 0 AND 100', name='valid_probability'), Index('ix_probabilities_resolution', 'resolution_id'))
Results
¶
Bases: CountMixin, MatchboxBase
Table of results for a resolution.
Stores the raw left/right probabilities created by a model.
Methods:
-
count–Counts the number of rows in the table.
Attributes:
-
__tablename__– -
result_id(Mapped[int]) – -
resolution_id(Mapped[int]) – -
left_id(Mapped[int]) – -
right_id(Mapped[int]) – -
probability(Mapped[int]) – -
proposed_by(Mapped[Resolutions]) – -
__table_args__–
result_id
class-attribute
instance-attribute
¶
result_id: Mapped[int] = mapped_column(BIGINT, primary_key=True, autoincrement=True)
resolution_id
class-attribute
instance-attribute
¶
resolution_id: Mapped[int] = mapped_column(BIGINT, ForeignKey('resolutions.resolution_id', ondelete='CASCADE'), nullable=False)
left_id
class-attribute
instance-attribute
¶
left_id: Mapped[int] = mapped_column(BIGINT, ForeignKey('clusters.cluster_id', ondelete='CASCADE'), nullable=False)
right_id
class-attribute
instance-attribute
¶
right_id: Mapped[int] = mapped_column(BIGINT, ForeignKey('clusters.cluster_id', ondelete='CASCADE'), nullable=False)
probability
class-attribute
instance-attribute
¶
probability: Mapped[int] = mapped_column(SMALLINT, nullable=False)
proposed_by
class-attribute
instance-attribute
¶
proposed_by: Mapped[Resolutions] = relationship(back_populates='results')
__table_args__
class-attribute
instance-attribute
¶
__table_args__ = (Index('ix_results_resolution', 'resolution_id'), CheckConstraint('probability BETWEEN 0 AND 100', name='valid_probability'), UniqueConstraint('resolution_id', 'left_id', 'right_id'))
utils
¶
Utilities for using the PostgreSQL backend.
Modules:
-
db–General utilities for the PostgreSQL backend.
-
insert–Utilities for inserting data into the PostgreSQL backend.
-
query–Utilities for querying and matching in the PostgreSQL backend.
-
results–Utilities for querying model results from the PostgreSQL backend.
db
¶
General utilities for the PostgreSQL backend.
Functions:
-
dump–Dumps the entire database to a snapshot.
-
restore–Restores the database from a snapshot.
-
sqa_profiled–SQLAlchemy profiler.
-
compile_sql–Compiles a SQLAlchemy statement into a string.
-
large_append–Append a PyArrow table to a PostgreSQL table using ADBC.
-
ingest_to_temporary_table–Context manager to ingest Arrow data to a temporary table with explicit types.
dump
¶
dump() -> MatchboxSnapshot
Dumps the entire database to a snapshot.
Returns:
-
MatchboxSnapshot–A MatchboxSnapshot object of type “postgres” with the database’s current state.
restore
¶
restore(snapshot: MatchboxSnapshot, batch_size: int) -> None
Restores the database from a snapshot.
Parameters:
-
(snapshot¶MatchboxSnapshot) –A MatchboxSnapshot object of type “postgres” with the database’s state
-
(batch_size¶int) –The number of records to insert in each batch
Raises:
-
ValueError–If the snapshot is missing data
sqa_profiled
¶
sqa_profiled() -> Generator[None, None, None]
SQLAlchemy profiler.
Taken directly from their docs: https://docs.sqlalchemy.org/en/20/faq/performance.html#query-profiling
compile_sql
¶
large_append
¶
large_append(data: Table, table_class: DeclarativeMeta, adbc_connection: PoolProxiedConnection, max_chunksize: int | None = None) -> None
Append a PyArrow table to a PostgreSQL table using ADBC.
This function does not support upserting and will error if keys clash. This method does not auto-commit, which is the responsibility of the caller.
Parameters:
-
(data¶Table) –A PyArrow table to write.
-
(table_class¶DeclarativeMeta) –The SQLAlchemy ORM class for the table to write to.
-
(adbc_connection¶PoolProxiedConnection) –An ADBC connection from the pool. This is returned by MBDB.get_adbc_connection() and needs to be used via a context manager.
-
(max_chunksize¶int | None, default:None) –Size of data chunks to be read and copied.
ingest_to_temporary_table
¶
ingest_to_temporary_table(table_name: str, schema_name: str, data: Table, column_types: dict[str, type[TypeEngine]], max_chunksize: int | None = None) -> Generator[Table, None, None]
Context manager to ingest Arrow data to a temporary table with explicit types.
Parameters:
-
(table_name¶str) –Base name for the temporary table
-
(schema_name¶str) –Schema where the temporary table will be created
-
(data¶Table) –PyArrow table containing the data to ingest
-
(column_types¶dict[str, type[TypeEngine]]) –Map of column names to SQLAlchemy types
-
(max_chunksize¶int | None, default:None) –Optional maximum chunk size for batches
Returns:
-
None–A SQLAlchemy Table object representing the temporary table
insert
¶
Utilities for inserting data into the PostgreSQL backend.
Functions:
-
insert_hashes–Indexes hash data for a source within Matchbox.
-
insert_results–Writes a results table to Matchbox.
insert_hashes
¶
insert_hashes(path: SourceResolutionPath, data_hashes: Table, batch_size: int) -> None
Indexes hash data for a source within Matchbox.
Parameters:
-
(path¶SourceResolutionPath) –The path of the source resolution
-
(data_hashes¶Table) –Arrow table containing hash data
-
(batch_size¶int) –Batch size for bulk operations
Raises:
-
MatchboxResolutionNotFoundError–If the specified resolution doesn’t exist.
-
MatchboxResolutionInvalidData–If data fingerprint conflicts with resolution.
-
MatchboxResolutionExistingData–If data was already inserted for resolution.
insert_results
¶
insert_results(path: ModelResolutionPath, results: Table, batch_size: int) -> None
Writes a results table to Matchbox.
The PostgreSQL backend stores clusters in a hierarchical structure, where each component references its parent component at a higher threshold.
This means two-item components are synonymous with their original pairwise probabilities.
This allows easy querying of clusters at any threshold.
Parameters:
-
(path¶ModelResolutionPath) –The path of the model resolution to upload results for
-
(results¶Table) –A PyArrow results table with left_id, right_id, probability
-
(batch_size¶int) –Number of records to insert in each batch
Raises:
-
MatchboxResolutionNotFoundError–If the specified resolution doesn’t exist.
-
MatchboxResolutionInvalidData–If data fingerprint conflicts with resolution.
-
MatchboxResolutionExistingData–If data was already inserted for resolution.
query
¶
Utilities for querying and matching in the PostgreSQL backend.
Functions:
-
build_unified_query–Build a query to resolve cluster assignments across resolution hierarchies.
-
query–Queries Matchbox to retrieve linked data for a source.
-
get_parent_clusters_and_leaves–Query clusters and their leaves for all parent resolutions.
-
match–Matches an ID in a source resolution and returns the keys in the targets.
Attributes:
-
T–
build_unified_query
¶
build_unified_query(resolution: Resolutions, sources: list[SourceConfigs] | None = None, threshold: int | None = None, level: Literal['leaf', 'key'] = 'leaf', get_hashes: bool = False) -> Select
Build a query to resolve cluster assignments across resolution hierarchies.
This function creates SQL that determines which cluster each source record belongs to by traversing up a resolution hierarchy and applying priority-based cluster selection.
The query uses COALESCE to implement a priority system where higher-level
resolutions can “claim” records, with lower levels only processing unclaimed
records:
- Lineage discovery: Queries the resolution hierarchy to find all ancestor resolutions, ordered by priority (lowest level = highest priority)
- Source filtering: When
sourcesis provided, constrains results to only include clusters from those specific source configurations - Threshold application: Applies probability thresholds to determine which clusters qualify at each resolution level
- Subquery construction: For each model resolution in the lineage, builds a subquery that finds qualifying clusters via the Contains→Probabilities join. Each joined subquery adds a new cluster column which is then merged via…
COALESCEassembly: Joins all subqueries to source data and usesCOALESCEto select the highest-priority cluster assignment for each record
The level changes the data returned:
"leaf": Returns both root and leaf cluster IDs. For unmerged source clusters, the root and leaf properties will be the same."key": In addition to the above, it also returns the source key. This will give more rows than"leaf"because it needs a row for every key attached to a leaf.
Additionally, if get_hashes is set to True, the root and leaf hashes are returned.
query
¶
query(source: SourceResolutionPath, point_of_truth: ResolutionPath | None = None, threshold: int | None = None, return_leaf_id: bool = False, limit: int = None) -> Table
Queries Matchbox to retrieve linked data for a source.
Retrieves all linked data for a given source, resolving through hierarchy if needed.
- Simple case: If querying the same resolution as the source, just select cluster IDs and keys directly from ClusterSourceKey
- Hierarchy case: Uses the unified query builder to traverse up the resolution hierarchy, applying COALESCE priority logic to determine which parent cluster each source record belongs to
- Priority resolution: When multiple model resolutions could assign a record to different clusters, COALESCE ensures higher-priority resolutions win
Returns all records with their final resolved cluster IDs.
get_parent_clusters_and_leaves
¶
Query clusters and their leaves for all parent resolutions.
For a given resolution, find all its parent resolutions and return complete cluster compositions.
- Parent discovery: Queries ResolutionFrom to find all direct parent resolutions (level 1)
- Cluster building: For each parent, runs the full unified query to get all cluster assignments with both root and leaf information
- Aggregation: Collects all leaf nodes belonging to each root cluster across all parent resolutions
Return a dictionary mapping cluster IDs to their complete leaf compositions and metadata.
match
¶
match(key: str, source: SourceResolutionPath, targets: list[SourceResolutionPath], point_of_truth: ResolutionPath, threshold: int | None = None) -> list[Match]
Matches an ID in a source resolution and returns the keys in the targets.
Given a specific key in a source, find what it matches to in target sources through a resolution hierarchy.
- Target cluster identification: Uses COALESCE priority CTE to determine which cluster the input key belongs to at the resolution level
- Matching leaves discovery: Builds UNION ALL query with branches for:
- Direct cluster members (source-only case)
- Members connected through each model resolution in the hierarchy
- Cross-reference: Joins the target cluster with all possible matching leaves, filtering for the requested target sources
Organises matches by source configuration and returns structured Match objects for each target.
results
¶
Utilities for querying model results from the PostgreSQL backend.
Classes:
-
SourceInfo–Information about a model’s sources.
Functions:
-
get_model_config–Get metadata for a model resolution.
SourceInfo
¶
Bases: NamedTuple
Information about a model’s sources.
Attributes:
-
left(int) – -
right(int | None) – -
left_ancestors(set[int]) – -
right_ancestors(set[int] | None) –
get_model_config
¶
get_model_config(resolution: Resolutions) -> ModelConfig
Get metadata for a model resolution.