PostgreSQL¶
A backend adapter for deploying Matchbox with PostgreSQL.
This backend stores two connected structures:
- An execution graph in
stepsandstep_from, covering sources, models, and resolvers. - A data graph in
clusters,contains,model_edges,resolver_clusters, andcluster_source_key.
Source steps index source clusters. Model steps store score edges between clusters. Resolver steps point to the clusters that form a published entity view.
erDiagram
Collections {
bigint collection_id PK
text name
}
Runs {
bigint run_id PK
bigint collection_id FK
boolean is_mutable
boolean is_default
}
Steps {
bigint step_id PK
bigint run_id FK
text name
text description
text type
bytea fingerprint
enum upload_stage
}
StepFrom {
bigint parent PK,FK
bigint child PK,FK
integer level
}
SourceConfigs {
bigint source_config_id PK
bigint step_id FK
text location_type
text location_name
text extract_transform
}
SourceFields {
bigint field_id PK
bigint source_config_id FK
integer index
text name
text type
boolean is_key
}
ModelConfigs {
bigint model_config_id PK
bigint step_id FK
text model_class
jsonb model_settings
jsonb left_query
jsonb right_query
}
ResolverConfigs {
bigint resolver_config_id PK
bigint step_id FK
text resolver_class
jsonb resolver_settings
}
Clusters {
bigint cluster_id PK
bytea cluster_hash
}
ClusterSourceKey {
bigint key_id PK
bigint cluster_id FK
bigint source_config_id FK
text key
}
Contains {
bigint root PK,FK
bigint leaf PK,FK
}
ModelEdges {
bigint result_id PK
bigint step_id FK
bigint left_id FK
bigint right_id FK
real score
}
ResolverClusters {
bigint step_id PK,FK
bigint cluster_id PK,FK
}
Users {
bigint user_id PK
text name
text email
}
Groups {
bigint group_id PK
text name
text description
boolean is_system
}
UserGroups {
bigint user_id PK,FK
bigint group_id PK,FK
}
Permissions {
bigint permission_id PK
text permission
bigint group_id FK
bigint collection_id FK
boolean is_system
}
EvalJudgements {
bigint judgement_id PK
bigint user_id FK
bigint endorsed_cluster_id FK
bigint shown_cluster_id FK
datetime timestamp
}
Collections ||--o{ Runs : ""
Collections ||--o{ Permissions : ""
Runs ||--o{ Steps : ""
Steps ||--o{ StepFrom : "parent"
StepFrom }o--|| Steps : "child"
Steps |o--|| SourceConfigs : ""
Steps |o--|| ModelConfigs : ""
Steps |o--|| ResolverConfigs : ""
Steps ||--o{ ModelEdges : ""
Steps ||--o{ ResolverClusters : ""
SourceConfigs ||--o{ SourceFields : ""
SourceConfigs ||--o{ ClusterSourceKey : ""
Clusters ||--o{ ClusterSourceKey : ""
Clusters ||--o{ Contains : "root"
Contains }o--|| Clusters : "leaf"
Clusters ||--o{ ModelEdges : "left_id"
Clusters ||--o{ ModelEdges : "right_id"
Clusters ||--o{ ResolverClusters : ""
Clusters ||--o{ EvalJudgements : "endorsed_cluster_id"
Clusters ||--o{ EvalJudgements : "shown_cluster_id"
Users ||--o{ UserGroups : ""
Users ||--o{ EvalJudgements : ""
Groups ||--o{ UserGroups : ""
Groups ||--o{ Permissions : ""
matchbox.server.postgresql
¶
PostgreSQL adapter for Matchbox server.
Modules:
-
adapter–Composed PostgreSQL adapter for Matchbox server.
-
db–Matchbox PostgreSQL database connection.
-
mixin–A module for defining mixins for the PostgreSQL backend ORM.
-
orm–ORM classes for the Matchbox PostgreSQL database.
-
utils–Utilities for using the PostgreSQL backend.
Classes:
-
MatchboxPostgres–A PostgreSQL adapter for Matchbox.
-
MatchboxPostgresSettings–Settings for the Matchbox PostgreSQL backend.
MatchboxPostgres
¶
MatchboxPostgres(settings: MatchboxPostgresSettings)
Bases: MatchboxPostgresQueryMixin, MatchboxPostgresEvaluationMixin, MatchboxPostgresCollectionsMixin, MatchboxPostgresAdminMixin, MatchboxPostgresGroupsMixin, MatchboxDBAdapter
flowchart TD
matchbox.server.postgresql.MatchboxPostgres[MatchboxPostgres]
matchbox.server.postgresql.adapter.query.MatchboxPostgresQueryMixin[MatchboxPostgresQueryMixin]
matchbox.server.postgresql.adapter.eval.MatchboxPostgresEvaluationMixin[MatchboxPostgresEvaluationMixin]
matchbox.server.postgresql.adapter.collections.MatchboxPostgresCollectionsMixin[MatchboxPostgresCollectionsMixin]
matchbox.server.postgresql.adapter.admin.MatchboxPostgresAdminMixin[MatchboxPostgresAdminMixin]
matchbox.server.postgresql.adapter.groups.MatchboxPostgresGroupsMixin[MatchboxPostgresGroupsMixin]
matchbox.server.base.MatchboxDBAdapter[MatchboxDBAdapter]
matchbox.server.postgresql.adapter.query.MatchboxPostgresQueryMixin --> matchbox.server.postgresql.MatchboxPostgres
matchbox.server.postgresql.adapter.eval.MatchboxPostgresEvaluationMixin --> matchbox.server.postgresql.MatchboxPostgres
matchbox.server.postgresql.adapter.eval._MixinBase --> matchbox.server.postgresql.adapter.eval.MatchboxPostgresEvaluationMixin
matchbox.server.postgresql.adapter.collections.MatchboxPostgresCollectionsMixin --> matchbox.server.postgresql.MatchboxPostgres
matchbox.server.postgresql.adapter.admin.MatchboxPostgresAdminMixin --> matchbox.server.postgresql.MatchboxPostgres
matchbox.server.postgresql.adapter.groups.MatchboxPostgresGroupsMixin --> matchbox.server.postgresql.MatchboxPostgres
matchbox.server.base.MatchboxDBAdapter --> matchbox.server.postgresql.MatchboxPostgres
click matchbox.server.postgresql.MatchboxPostgres href "" "matchbox.server.postgresql.MatchboxPostgres"
click matchbox.server.postgresql.adapter.query.MatchboxPostgresQueryMixin href "" "matchbox.server.postgresql.adapter.query.MatchboxPostgresQueryMixin"
click matchbox.server.postgresql.adapter.eval.MatchboxPostgresEvaluationMixin href "" "matchbox.server.postgresql.adapter.eval.MatchboxPostgresEvaluationMixin"
click matchbox.server.postgresql.adapter.collections.MatchboxPostgresCollectionsMixin href "" "matchbox.server.postgresql.adapter.collections.MatchboxPostgresCollectionsMixin"
click matchbox.server.postgresql.adapter.admin.MatchboxPostgresAdminMixin href "" "matchbox.server.postgresql.adapter.admin.MatchboxPostgresAdminMixin"
click matchbox.server.postgresql.adapter.groups.MatchboxPostgresGroupsMixin href "" "matchbox.server.postgresql.adapter.groups.MatchboxPostgresGroupsMixin"
click matchbox.server.base.MatchboxDBAdapter href "" "matchbox.server.base.MatchboxDBAdapter"
A PostgreSQL adapter for Matchbox.
Methods:
-
query– -
match– -
create_collection– -
get_collection– -
list_collections– -
delete_collection– -
create_run– -
set_run_mutable– -
set_run_default– -
get_run– -
delete_run– -
create_step– -
get_step– -
update_step– -
delete_step– -
lock_step_data– -
unlock_step_data– -
get_step_stage– -
insert_source_data– -
insert_model_data– -
insert_resolver_data– -
get_model_data– -
get_resolver_data– -
validate_ids– -
dump– -
drop– -
clear– -
restore– -
delete_orphans– -
login– -
get_user_groups– -
list_groups– -
get_group– -
create_group– -
delete_group– -
add_user_to_group– -
remove_user_from_group– -
check_permission– -
get_permissions– -
grant_permission– -
revoke_permission– -
insert_judgement– -
get_judgements– -
sample_for_eval–Sample some clusters from a resolver step.
Attributes:
-
settings– -
sources– -
models– -
resolvers– -
source_clusters– -
model_clusters– -
all_clusters– -
creates– -
merges– -
proposes– -
source_steps– -
users–
source_steps
instance-attribute
¶
source_steps = FilteredSteps(sources=True, models=False, resolvers=False)
query
¶
query(source: SourceStepPath, resolver: ResolverStepPath | None = None, return_leaf_id: bool = False, limit: int | None = None) -> Table
match
¶
match(key: str, source: SourceStepPath, targets: list[SourceStepPath], resolver: ResolverStepPath) -> list[Match]
create_collection
¶
create_collection(name: CollectionName, permissions: list[PermissionGrant]) -> Collection
check_permission
¶
check_permission(user_name: str, permission: PermissionType, resource: Literal[SYSTEM] | CollectionName) -> bool
get_permissions
¶
get_permissions(resource: Literal[SYSTEM] | CollectionName) -> list[PermissionGrant]
grant_permission
¶
grant_permission(group_name: GroupName, permission: PermissionType, resource: Literal[SYSTEM] | CollectionName) -> None
revoke_permission
¶
revoke_permission(group_name: GroupName, permission: PermissionType, resource: Literal[SYSTEM] | CollectionName) -> None
sample_for_eval
¶
sample_for_eval(n: int, path: ResolverStepPath, user_name: str) -> Table
Sample some clusters from a resolver step.
MatchboxPostgresSettings
¶
Bases: MatchboxServerSettings
flowchart TD
matchbox.server.postgresql.MatchboxPostgresSettings[MatchboxPostgresSettings]
matchbox.server.base.MatchboxServerSettings[MatchboxServerSettings]
matchbox.server.base.MatchboxServerSettings --> matchbox.server.postgresql.MatchboxPostgresSettings
click matchbox.server.postgresql.MatchboxPostgresSettings href "" "matchbox.server.postgresql.MatchboxPostgresSettings"
click matchbox.server.base.MatchboxServerSettings href "" "matchbox.server.base.MatchboxServerSettings"
Settings for the Matchbox PostgreSQL backend.
Inherits the core settings and adds the PostgreSQL-specific settings.
Methods:
-
validate_public_key–Validate and normalise PEM public key format.
-
check_settings–Check that legal combinations of settings are provided.
Attributes:
-
model_config– -
batch_size(int) – -
datastore(MatchboxDatastoreSettings) – -
task_runner(Literal['api', 'celery']) – -
redis_uri(str | None) – -
uploads_expiry_minutes(int | None) – -
authorisation(bool) – -
public_key(SecretBytes | None) – -
log_level(LogLevelType) – -
backend_type(MatchboxBackends) – -
postgres(MatchboxPostgresCoreSettings) –
model_config
class-attribute
instance-attribute
¶
model_config = SettingsConfigDict(env_prefix='MB__SERVER__', env_nested_delimiter='__', use_enum_values=True, env_file='.env', env_file_encoding='utf-8', extra='ignore')
public_key
class-attribute
instance-attribute
¶
postgres
class-attribute
instance-attribute
¶
postgres: MatchboxPostgresCoreSettings = Field(default_factory=MatchboxPostgresCoreSettings)
validate_public_key
classmethod
¶
Validate and normalise PEM public key format.
adapter
¶
Composed PostgreSQL adapter for Matchbox server.
Modules:
-
admin–Admin PostgreSQL mixin for Matchbox server.
-
collections–Collections PostgreSQL mixin for Matchbox server.
-
eval–Evaluation PostgreSQL mixin for Matchbox server.
-
groups–Groups PostgreSQL mixin for Matchbox server.
-
main–Composed PostgreSQL adapter for Matchbox server.
-
query–Query PostgreSQL mixin for Matchbox server.
Classes:
-
MatchboxPostgres–A PostgreSQL adapter for Matchbox.
-
MatchboxPostgresSettings–Settings for the Matchbox PostgreSQL backend.
MatchboxPostgres
¶
MatchboxPostgres(settings: MatchboxPostgresSettings)
Bases: MatchboxPostgresQueryMixin, MatchboxPostgresEvaluationMixin, MatchboxPostgresCollectionsMixin, MatchboxPostgresAdminMixin, MatchboxPostgresGroupsMixin, MatchboxDBAdapter
flowchart TD
matchbox.server.postgresql.adapter.MatchboxPostgres[MatchboxPostgres]
matchbox.server.postgresql.adapter.query.MatchboxPostgresQueryMixin[MatchboxPostgresQueryMixin]
matchbox.server.postgresql.adapter.eval.MatchboxPostgresEvaluationMixin[MatchboxPostgresEvaluationMixin]
matchbox.server.postgresql.adapter.collections.MatchboxPostgresCollectionsMixin[MatchboxPostgresCollectionsMixin]
matchbox.server.postgresql.adapter.admin.MatchboxPostgresAdminMixin[MatchboxPostgresAdminMixin]
matchbox.server.postgresql.adapter.groups.MatchboxPostgresGroupsMixin[MatchboxPostgresGroupsMixin]
matchbox.server.base.MatchboxDBAdapter[MatchboxDBAdapter]
matchbox.server.postgresql.adapter.query.MatchboxPostgresQueryMixin --> matchbox.server.postgresql.adapter.MatchboxPostgres
matchbox.server.postgresql.adapter.eval.MatchboxPostgresEvaluationMixin --> matchbox.server.postgresql.adapter.MatchboxPostgres
matchbox.server.postgresql.adapter.eval._MixinBase --> matchbox.server.postgresql.adapter.eval.MatchboxPostgresEvaluationMixin
matchbox.server.postgresql.adapter.collections.MatchboxPostgresCollectionsMixin --> matchbox.server.postgresql.adapter.MatchboxPostgres
matchbox.server.postgresql.adapter.admin.MatchboxPostgresAdminMixin --> matchbox.server.postgresql.adapter.MatchboxPostgres
matchbox.server.postgresql.adapter.groups.MatchboxPostgresGroupsMixin --> matchbox.server.postgresql.adapter.MatchboxPostgres
matchbox.server.base.MatchboxDBAdapter --> matchbox.server.postgresql.adapter.MatchboxPostgres
click matchbox.server.postgresql.adapter.MatchboxPostgres href "" "matchbox.server.postgresql.adapter.MatchboxPostgres"
click matchbox.server.postgresql.adapter.query.MatchboxPostgresQueryMixin href "" "matchbox.server.postgresql.adapter.query.MatchboxPostgresQueryMixin"
click matchbox.server.postgresql.adapter.eval.MatchboxPostgresEvaluationMixin href "" "matchbox.server.postgresql.adapter.eval.MatchboxPostgresEvaluationMixin"
click matchbox.server.postgresql.adapter.collections.MatchboxPostgresCollectionsMixin href "" "matchbox.server.postgresql.adapter.collections.MatchboxPostgresCollectionsMixin"
click matchbox.server.postgresql.adapter.admin.MatchboxPostgresAdminMixin href "" "matchbox.server.postgresql.adapter.admin.MatchboxPostgresAdminMixin"
click matchbox.server.postgresql.adapter.groups.MatchboxPostgresGroupsMixin href "" "matchbox.server.postgresql.adapter.groups.MatchboxPostgresGroupsMixin"
click matchbox.server.base.MatchboxDBAdapter href "" "matchbox.server.base.MatchboxDBAdapter"
A PostgreSQL adapter for Matchbox.
Methods:
-
query– -
match– -
create_collection– -
get_collection– -
list_collections– -
delete_collection– -
create_run– -
set_run_mutable– -
set_run_default– -
get_run– -
delete_run– -
create_step– -
get_step– -
update_step– -
delete_step– -
lock_step_data– -
unlock_step_data– -
get_step_stage– -
insert_source_data– -
insert_model_data– -
insert_resolver_data– -
get_model_data– -
get_resolver_data– -
validate_ids– -
dump– -
drop– -
clear– -
restore– -
delete_orphans– -
login– -
get_user_groups– -
list_groups– -
get_group– -
create_group– -
delete_group– -
add_user_to_group– -
remove_user_from_group– -
check_permission– -
get_permissions– -
grant_permission– -
revoke_permission– -
insert_judgement– -
get_judgements– -
sample_for_eval–Sample some clusters from a resolver step.
Attributes:
-
settings– -
sources– -
models– -
resolvers– -
source_clusters– -
model_clusters– -
all_clusters– -
creates– -
merges– -
proposes– -
source_steps– -
users–
source_steps
instance-attribute
¶
source_steps = FilteredSteps(sources=True, models=False, resolvers=False)
query
¶
query(source: SourceStepPath, resolver: ResolverStepPath | None = None, return_leaf_id: bool = False, limit: int | None = None) -> Table
match
¶
match(key: str, source: SourceStepPath, targets: list[SourceStepPath], resolver: ResolverStepPath) -> list[Match]
create_collection
¶
create_collection(name: CollectionName, permissions: list[PermissionGrant]) -> Collection
check_permission
¶
check_permission(user_name: str, permission: PermissionType, resource: Literal[SYSTEM] | CollectionName) -> bool
get_permissions
¶
get_permissions(resource: Literal[SYSTEM] | CollectionName) -> list[PermissionGrant]
grant_permission
¶
grant_permission(group_name: GroupName, permission: PermissionType, resource: Literal[SYSTEM] | CollectionName) -> None
revoke_permission
¶
revoke_permission(group_name: GroupName, permission: PermissionType, resource: Literal[SYSTEM] | CollectionName) -> None
sample_for_eval
¶
sample_for_eval(n: int, path: ResolverStepPath, user_name: str) -> Table
Sample some clusters from a resolver step.
MatchboxPostgresSettings
¶
Bases: MatchboxServerSettings
flowchart TD
matchbox.server.postgresql.adapter.MatchboxPostgresSettings[MatchboxPostgresSettings]
matchbox.server.base.MatchboxServerSettings[MatchboxServerSettings]
matchbox.server.base.MatchboxServerSettings --> matchbox.server.postgresql.adapter.MatchboxPostgresSettings
click matchbox.server.postgresql.adapter.MatchboxPostgresSettings href "" "matchbox.server.postgresql.adapter.MatchboxPostgresSettings"
click matchbox.server.base.MatchboxServerSettings href "" "matchbox.server.base.MatchboxServerSettings"
Settings for the Matchbox PostgreSQL backend.
Inherits the core settings and adds the PostgreSQL-specific settings.
Methods:
-
validate_public_key–Validate and normalise PEM public key format.
-
check_settings–Check that legal combinations of settings are provided.
Attributes:
-
model_config– -
batch_size(int) – -
datastore(MatchboxDatastoreSettings) – -
task_runner(Literal['api', 'celery']) – -
redis_uri(str | None) – -
uploads_expiry_minutes(int | None) – -
authorisation(bool) – -
public_key(SecretBytes | None) – -
log_level(LogLevelType) – -
backend_type(MatchboxBackends) – -
postgres(MatchboxPostgresCoreSettings) –
model_config
class-attribute
instance-attribute
¶
model_config = SettingsConfigDict(env_prefix='MB__SERVER__', env_nested_delimiter='__', use_enum_values=True, env_file='.env', env_file_encoding='utf-8', extra='ignore')
public_key
class-attribute
instance-attribute
¶
postgres
class-attribute
instance-attribute
¶
postgres: MatchboxPostgresCoreSettings = Field(default_factory=MatchboxPostgresCoreSettings)
validate_public_key
classmethod
¶
Validate and normalise PEM public key format.
admin
¶
Admin PostgreSQL mixin for Matchbox server.
Classes:
-
MatchboxPostgresAdminMixin–Admin mixin for the PostgreSQL adapter for Matchbox.
MatchboxPostgresAdminMixin
¶
Admin mixin for the PostgreSQL adapter for Matchbox.
Methods:
-
login– -
check_permission– -
get_permissions– -
grant_permission– -
revoke_permission– -
validate_ids– -
dump– -
drop– -
clear– -
restore– -
delete_orphans–
Attributes:
check_permission
¶
check_permission(user_name: str, permission: PermissionType, resource: Literal[SYSTEM] | CollectionName) -> bool
get_permissions
¶
get_permissions(resource: Literal[SYSTEM] | CollectionName) -> list[PermissionGrant]
grant_permission
¶
grant_permission(group_name: GroupName, permission: PermissionType, resource: Literal[SYSTEM] | CollectionName) -> None
revoke_permission
¶
revoke_permission(group_name: GroupName, permission: PermissionType, resource: Literal[SYSTEM] | CollectionName) -> None
collections
¶
Collections PostgreSQL mixin for Matchbox server.
Classes:
-
MatchboxPostgresCollectionsMixin–Collections mixin for the PostgreSQL adapter for Matchbox.
MatchboxPostgresCollectionsMixin
¶
Collections mixin for the PostgreSQL adapter for Matchbox.
Methods:
-
create_collection– -
get_collection– -
list_collections– -
delete_collection– -
create_run– -
set_run_mutable– -
set_run_default– -
get_run– -
delete_run– -
create_step– -
get_step– -
update_step– -
delete_step– -
lock_step_data– -
unlock_step_data– -
get_step_stage– -
insert_source_data– -
insert_model_data– -
insert_resolver_data– -
get_model_data– -
get_resolver_data–
create_collection
¶
create_collection(name: CollectionName, permissions: list[PermissionGrant]) -> Collection
eval
¶
Evaluation PostgreSQL mixin for Matchbox server.
Classes:
-
MatchboxPostgresEvaluationMixin–Evaluation mixin for the PostgreSQL adapter for Matchbox.
MatchboxPostgresEvaluationMixin
¶
Bases: _MixinBase
flowchart TD
matchbox.server.postgresql.adapter.eval.MatchboxPostgresEvaluationMixin[MatchboxPostgresEvaluationMixin]
matchbox.server.postgresql.adapter.eval._MixinBase --> matchbox.server.postgresql.adapter.eval.MatchboxPostgresEvaluationMixin
click matchbox.server.postgresql.adapter.eval.MatchboxPostgresEvaluationMixin href "" "matchbox.server.postgresql.adapter.eval.MatchboxPostgresEvaluationMixin"
Evaluation mixin for the PostgreSQL adapter for Matchbox.
Methods:
-
insert_judgement– -
get_judgements– -
sample_for_eval–Sample some clusters from a resolver step.
groups
¶
Groups PostgreSQL mixin for Matchbox server.
Classes:
-
MatchboxPostgresGroupsMixin–Groups mixin for the PostgreSQL adapter for Matchbox.
MatchboxPostgresGroupsMixin
¶
Groups mixin for the PostgreSQL adapter for Matchbox.
Methods:
-
get_user_groups– -
list_groups– -
get_group– -
create_group– -
delete_group– -
add_user_to_group– -
remove_user_from_group–
main
¶
Composed PostgreSQL adapter for Matchbox server.
Classes:
-
FilteredClusters–Wrapper class for filtered cluster queries.
-
FilteredProbabilities–Wrapper class for filtered model edge queries.
-
FilteredSteps–Wrapper class for filtered step queries.
-
MatchboxPostgres–A PostgreSQL adapter for Matchbox.
FilteredClusters
¶
Bases: BaseModel
flowchart TD
matchbox.server.postgresql.adapter.main.FilteredClusters[FilteredClusters]
click matchbox.server.postgresql.adapter.main.FilteredClusters href "" "matchbox.server.postgresql.adapter.main.FilteredClusters"
Wrapper class for filtered cluster queries.
Methods:
-
count–Counts the number of clusters in the database.
Attributes:
-
has_source(bool | None) –
FilteredProbabilities
¶
Bases: BaseModel
flowchart TD
matchbox.server.postgresql.adapter.main.FilteredProbabilities[FilteredProbabilities]
click matchbox.server.postgresql.adapter.main.FilteredProbabilities href "" "matchbox.server.postgresql.adapter.main.FilteredProbabilities"
Wrapper class for filtered model edge queries.
Methods:
-
count–Counts the number of model edges in the database.
FilteredSteps
¶
Bases: BaseModel
flowchart TD
matchbox.server.postgresql.adapter.main.FilteredSteps[FilteredSteps]
click matchbox.server.postgresql.adapter.main.FilteredSteps href "" "matchbox.server.postgresql.adapter.main.FilteredSteps"
Wrapper class for filtered step queries.
Methods:
-
count–Counts the number of steps in the database.
Attributes:
MatchboxPostgres
¶
MatchboxPostgres(settings: MatchboxPostgresSettings)
Bases: MatchboxPostgresQueryMixin, MatchboxPostgresEvaluationMixin, MatchboxPostgresCollectionsMixin, MatchboxPostgresAdminMixin, MatchboxPostgresGroupsMixin, MatchboxDBAdapter
flowchart TD
matchbox.server.postgresql.adapter.main.MatchboxPostgres[MatchboxPostgres]
matchbox.server.postgresql.adapter.query.MatchboxPostgresQueryMixin[MatchboxPostgresQueryMixin]
matchbox.server.postgresql.adapter.eval.MatchboxPostgresEvaluationMixin[MatchboxPostgresEvaluationMixin]
matchbox.server.postgresql.adapter.collections.MatchboxPostgresCollectionsMixin[MatchboxPostgresCollectionsMixin]
matchbox.server.postgresql.adapter.admin.MatchboxPostgresAdminMixin[MatchboxPostgresAdminMixin]
matchbox.server.postgresql.adapter.groups.MatchboxPostgresGroupsMixin[MatchboxPostgresGroupsMixin]
matchbox.server.base.MatchboxDBAdapter[MatchboxDBAdapter]
matchbox.server.postgresql.adapter.query.MatchboxPostgresQueryMixin --> matchbox.server.postgresql.adapter.main.MatchboxPostgres
matchbox.server.postgresql.adapter.eval.MatchboxPostgresEvaluationMixin --> matchbox.server.postgresql.adapter.main.MatchboxPostgres
matchbox.server.postgresql.adapter.eval._MixinBase --> matchbox.server.postgresql.adapter.eval.MatchboxPostgresEvaluationMixin
matchbox.server.postgresql.adapter.collections.MatchboxPostgresCollectionsMixin --> matchbox.server.postgresql.adapter.main.MatchboxPostgres
matchbox.server.postgresql.adapter.admin.MatchboxPostgresAdminMixin --> matchbox.server.postgresql.adapter.main.MatchboxPostgres
matchbox.server.postgresql.adapter.groups.MatchboxPostgresGroupsMixin --> matchbox.server.postgresql.adapter.main.MatchboxPostgres
matchbox.server.base.MatchboxDBAdapter --> matchbox.server.postgresql.adapter.main.MatchboxPostgres
click matchbox.server.postgresql.adapter.main.MatchboxPostgres href "" "matchbox.server.postgresql.adapter.main.MatchboxPostgres"
click matchbox.server.postgresql.adapter.query.MatchboxPostgresQueryMixin href "" "matchbox.server.postgresql.adapter.query.MatchboxPostgresQueryMixin"
click matchbox.server.postgresql.adapter.eval.MatchboxPostgresEvaluationMixin href "" "matchbox.server.postgresql.adapter.eval.MatchboxPostgresEvaluationMixin"
click matchbox.server.postgresql.adapter.collections.MatchboxPostgresCollectionsMixin href "" "matchbox.server.postgresql.adapter.collections.MatchboxPostgresCollectionsMixin"
click matchbox.server.postgresql.adapter.admin.MatchboxPostgresAdminMixin href "" "matchbox.server.postgresql.adapter.admin.MatchboxPostgresAdminMixin"
click matchbox.server.postgresql.adapter.groups.MatchboxPostgresGroupsMixin href "" "matchbox.server.postgresql.adapter.groups.MatchboxPostgresGroupsMixin"
click matchbox.server.base.MatchboxDBAdapter href "" "matchbox.server.base.MatchboxDBAdapter"
A PostgreSQL adapter for Matchbox.
Methods:
-
query– -
match– -
create_collection– -
get_collection– -
list_collections– -
delete_collection– -
create_run– -
set_run_mutable– -
set_run_default– -
get_run– -
delete_run– -
create_step– -
get_step– -
update_step– -
delete_step– -
lock_step_data– -
unlock_step_data– -
get_step_stage– -
insert_source_data– -
insert_model_data– -
insert_resolver_data– -
get_model_data– -
get_resolver_data– -
validate_ids– -
dump– -
drop– -
clear– -
restore– -
delete_orphans– -
login– -
get_user_groups– -
list_groups– -
get_group– -
create_group– -
delete_group– -
add_user_to_group– -
remove_user_from_group– -
check_permission– -
get_permissions– -
grant_permission– -
revoke_permission– -
insert_judgement– -
get_judgements– -
sample_for_eval–Sample some clusters from a resolver step.
Attributes:
-
settings– -
sources– -
models– -
resolvers– -
source_clusters– -
model_clusters– -
all_clusters– -
creates– -
merges– -
proposes– -
source_steps– -
users–
source_steps
instance-attribute
¶
source_steps = FilteredSteps(sources=True, models=False, resolvers=False)
query
¶
query(source: SourceStepPath, resolver: ResolverStepPath | None = None, return_leaf_id: bool = False, limit: int | None = None) -> Table
match
¶
match(key: str, source: SourceStepPath, targets: list[SourceStepPath], resolver: ResolverStepPath) -> list[Match]
create_collection
¶
create_collection(name: CollectionName, permissions: list[PermissionGrant]) -> Collection
check_permission
¶
check_permission(user_name: str, permission: PermissionType, resource: Literal[SYSTEM] | CollectionName) -> bool
get_permissions
¶
get_permissions(resource: Literal[SYSTEM] | CollectionName) -> list[PermissionGrant]
grant_permission
¶
grant_permission(group_name: GroupName, permission: PermissionType, resource: Literal[SYSTEM] | CollectionName) -> None
revoke_permission
¶
revoke_permission(group_name: GroupName, permission: PermissionType, resource: Literal[SYSTEM] | CollectionName) -> None
sample_for_eval
¶
sample_for_eval(n: int, path: ResolverStepPath, user_name: str) -> Table
Sample some clusters from a resolver step.
query
¶
Query PostgreSQL mixin for Matchbox server.
Classes:
-
MatchboxPostgresQueryMixin–Query mixin for the PostgreSQL adapter for Matchbox.
MatchboxPostgresQueryMixin
¶
Query mixin for the PostgreSQL adapter for Matchbox.
Methods:
query
¶
query(source: SourceStepPath, resolver: ResolverStepPath | None = None, return_leaf_id: bool = False, limit: int | None = None) -> Table
match
¶
match(key: str, source: SourceStepPath, targets: list[SourceStepPath], resolver: ResolverStepPath) -> list[Match]
db
¶
Matchbox PostgreSQL database connection.
Classes:
-
MatchboxPostgresCoreSettings–PostgreSQL-specific settings for Matchbox.
-
MatchboxPostgresSettings–Settings for the Matchbox PostgreSQL backend.
-
MatchboxDatabase–Matchbox PostgreSQL database connection.
Attributes:
-
MBDB–
MatchboxPostgresCoreSettings
¶
Bases: BaseModel
flowchart TD
matchbox.server.postgresql.db.MatchboxPostgresCoreSettings[MatchboxPostgresCoreSettings]
click matchbox.server.postgresql.db.MatchboxPostgresCoreSettings href "" "matchbox.server.postgresql.db.MatchboxPostgresCoreSettings"
PostgreSQL-specific settings for Matchbox.
Methods:
-
get_alembic_config–Get the Alembic config.
Attributes:
-
host(str) – -
port(int) – -
user(str) – -
password(str) – -
database(str) – -
db_schema(str) – -
alembic_config(Path) –
alembic_config
class-attribute
instance-attribute
¶
MatchboxPostgresSettings
¶
Bases: MatchboxServerSettings
flowchart TD
matchbox.server.postgresql.db.MatchboxPostgresSettings[MatchboxPostgresSettings]
matchbox.server.base.MatchboxServerSettings[MatchboxServerSettings]
matchbox.server.base.MatchboxServerSettings --> matchbox.server.postgresql.db.MatchboxPostgresSettings
click matchbox.server.postgresql.db.MatchboxPostgresSettings href "" "matchbox.server.postgresql.db.MatchboxPostgresSettings"
click matchbox.server.base.MatchboxServerSettings href "" "matchbox.server.base.MatchboxServerSettings"
Settings for the Matchbox PostgreSQL backend.
Inherits the core settings and adds the PostgreSQL-specific settings.
Methods:
-
validate_public_key–Validate and normalise PEM public key format.
-
check_settings–Check that legal combinations of settings are provided.
Attributes:
-
backend_type(MatchboxBackends) – -
postgres(MatchboxPostgresCoreSettings) – -
model_config– -
batch_size(int) – -
datastore(MatchboxDatastoreSettings) – -
task_runner(Literal['api', 'celery']) – -
redis_uri(str | None) – -
uploads_expiry_minutes(int | None) – -
authorisation(bool) – -
public_key(SecretBytes | None) – -
log_level(LogLevelType) –
postgres
class-attribute
instance-attribute
¶
postgres: MatchboxPostgresCoreSettings = Field(default_factory=MatchboxPostgresCoreSettings)
model_config
class-attribute
instance-attribute
¶
model_config = SettingsConfigDict(env_prefix='MB__SERVER__', env_nested_delimiter='__', use_enum_values=True, env_file='.env', env_file_encoding='utf-8', extra='ignore')
public_key
class-attribute
instance-attribute
¶
validate_public_key
classmethod
¶
Validate and normalise PEM public key format.
MatchboxDatabase
¶
MatchboxDatabase(settings: MatchboxPostgresSettings)
Matchbox PostgreSQL database connection.
Methods:
-
connection_string–Get the connection string for PostgreSQL.
-
get_engine–Get the database engine.
-
get_session–Get a new session.
-
get_adbc_connection–Get a new ADBC connection wrapped by a SQLAlchemy pool proxy.
-
run_migrations–Create the database and all tables expected in the schema.
-
clear_database–Delete all rows in every table in the database schema.
-
drop_database–Drop all tables in the database schema and re-recreate them.
-
vacuum_analyze–Run VACUUM ANALYZE on specified tables.
Attributes:
-
settings– -
MatchboxBase– -
alembic_config– -
sorted_tables(list[Table]) –Return a list of SQLAlchemy tables in order of creation.
MatchboxBase
instance-attribute
¶
sorted_tables
property
¶
sorted_tables: list[Table]
Return a list of SQLAlchemy tables in order of creation.
connection_string
¶
Get the connection string for PostgreSQL.
get_adbc_connection
¶
Get a new ADBC connection wrapped by a SQLAlchemy pool proxy.
The pool proxy is held and managed within the context manager, yielding the connection directly.
run_migrations
¶
Create the database and all tables expected in the schema.
clear_database
¶
Delete all rows in every table in the database schema.
- TRUNCATE tables that are part of the core ORM (preserves structure)
- DROP tables that are not in the ORM (removes temporary/test tables)
drop_database
¶
Drop all tables in the database schema and re-recreate them.
vacuum_analyze
¶
vacuum_analyze(*table_names: str) -> None
Run VACUUM ANALYZE on specified tables.
VACUUM ANALYZE reclaims storage and updates statistics for the query planner. PostgreSQL may not fully utilise indexes until VACUUM ANALYZE is run. According to https://www.postgresql.org/docs/current/sql-vacuum.html, VACUUM ANALYZE is recommended over just ANALYZE for optimal performance.
Parameters:
mixin
¶
A module for defining mixins for the PostgreSQL backend ORM.
Classes:
-
CountMixin–A mixin for counting the number of rows in a table.
Attributes:
-
T–
orm
¶
ORM classes for the Matchbox PostgreSQL database.
Classes:
-
Collections–Named collections of steps and runs.
-
Runs–Runs of collections of steps.
-
StepFrom–Step lineage closure table.
-
Steps–Table of steps corresponding to models, resolvers, and sources.
-
SourceFields–Table for storing column details for SourceConfigs.
-
ClusterSourceKey–Table for storing source primary keys for clusters.
-
SourceConfigs–Table of source_configs of data for Matchbox.
-
ModelConfigs–Table of model configs for Matchbox.
-
ResolverConfigs–Table of resolver configs for Matchbox.
-
Contains–Cluster lineage table.
-
Clusters–Table of indexed data and clusters that match it.
-
UserGroups–Association table for user-group membership.
-
Users–Table of user identities.
-
Groups–Groups for permission management.
-
Permissions–Permissions granted to groups on resources.
-
EvalJudgements–Table of evaluation judgements produced by human validators.
-
ModelEdges–Table of results for a model step.
-
ResolverClusters–Association table linking resolver steps to cluster IDs.
Collections
¶
Bases: CountMixin, MatchboxBase
flowchart TD
matchbox.server.postgresql.orm.Collections[Collections]
matchbox.server.postgresql.mixin.CountMixin[CountMixin]
matchbox.server.postgresql.mixin.CountMixin --> matchbox.server.postgresql.orm.Collections
click matchbox.server.postgresql.orm.Collections href "" "matchbox.server.postgresql.orm.Collections"
click matchbox.server.postgresql.mixin.CountMixin href "" "matchbox.server.postgresql.mixin.CountMixin"
Named collections of steps and runs.
Methods:
-
from_name–Resolve a collection name to a Collections object.
-
to_dto–Convert ORM collection to a matchbox.common Collection object.
-
count–Counts the number of rows in the table.
Attributes:
-
__tablename__– -
collection_id(Mapped[int]) – -
name(Mapped[str]) – -
runs(Mapped[list[Runs]]) – -
permissions(Mapped[list[Permissions]]) – -
__table_args__–
collection_id
class-attribute
instance-attribute
¶
collection_id: Mapped[int] = mapped_column(BIGINT, primary_key=True, autoincrement=True)
runs
class-attribute
instance-attribute
¶
permissions
class-attribute
instance-attribute
¶
permissions: Mapped[list[Permissions]] = relationship(back_populates='collection', passive_deletes=True)
__table_args__
class-attribute
instance-attribute
¶
from_name
classmethod
¶
from_name(name: CollectionName, session: Session | None = None) -> Collections
Resolve a collection name to a Collections object.
Parameters:
-
(name¶CollectionName) –The name of the collection to resolve.
-
(session¶Session | None, default:None) –Optional session to use for the query.
Raises:
-
MatchboxCollectionNotFoundError–If the collection doesn’t exist.
Runs
¶
Bases: CountMixin, MatchboxBase
flowchart TD
matchbox.server.postgresql.orm.Runs[Runs]
matchbox.server.postgresql.mixin.CountMixin[CountMixin]
matchbox.server.postgresql.mixin.CountMixin --> matchbox.server.postgresql.orm.Runs
click matchbox.server.postgresql.orm.Runs href "" "matchbox.server.postgresql.orm.Runs"
click matchbox.server.postgresql.mixin.CountMixin href "" "matchbox.server.postgresql.mixin.CountMixin"
Runs of collections of steps.
Methods:
-
from_id–Resolve a collection and run name to a Runs object.
-
to_dto–Convert ORM run to a matchbox.common Run object.
-
count–Counts the number of rows in the table.
Attributes:
-
__tablename__– -
run_id(Mapped[int]) – -
collection_id(Mapped[int]) – -
is_mutable(Mapped[bool]) – -
is_default(Mapped[bool]) – -
collection(Mapped[Collections]) – -
steps(Mapped[list[Steps]]) – -
__table_args__–
run_id
class-attribute
instance-attribute
¶
run_id: Mapped[int] = mapped_column(BIGINT, primary_key=True, autoincrement=True)
collection_id
class-attribute
instance-attribute
¶
collection_id: Mapped[int] = mapped_column(BIGINT, ForeignKey('collections.collection_id', ondelete='CASCADE'), nullable=False)
is_mutable
class-attribute
instance-attribute
¶
is_mutable: Mapped[bool] = mapped_column(BOOLEAN, default=False, nullable=True)
is_default
class-attribute
instance-attribute
¶
is_default: Mapped[bool] = mapped_column(BOOLEAN, default=False, nullable=True)
collection
class-attribute
instance-attribute
¶
collection: Mapped[Collections] = relationship(back_populates='runs')
steps
class-attribute
instance-attribute
¶
__table_args__
class-attribute
instance-attribute
¶
__table_args__ = (UniqueConstraint('collection_id', 'run_id', name='unique_run_id'), Index('ix_default_run_collection', 'collection_id', unique=True, postgresql_where=text('is_default = true')))
from_id
classmethod
¶
from_id(collection: CollectionName, run_id: RunID, session: Session | None = None) -> Runs
Resolve a collection and run name to a Runs object.
Parameters:
-
(collection¶CollectionName) –The name of the collection containing the run.
-
(run_id¶RunID) –The ID of the run within that collection.
-
(session¶Session | None, default:None) –Optional session to use for the query.
Raises:
-
MatchboxRunNotFoundError–If the run doesn’t exist.
StepFrom
¶
Bases: CountMixin, MatchboxBase
flowchart TD
matchbox.server.postgresql.orm.StepFrom[StepFrom]
matchbox.server.postgresql.mixin.CountMixin[CountMixin]
matchbox.server.postgresql.mixin.CountMixin --> matchbox.server.postgresql.orm.StepFrom
click matchbox.server.postgresql.orm.StepFrom href "" "matchbox.server.postgresql.orm.StepFrom"
click matchbox.server.postgresql.mixin.CountMixin href "" "matchbox.server.postgresql.mixin.CountMixin"
Step lineage closure table.
Methods:
-
count–Counts the number of rows in the table.
Attributes:
-
__tablename__– -
parent(Mapped[int]) – -
child(Mapped[int]) – -
level(Mapped[int]) – -
__table_args__–
parent
class-attribute
instance-attribute
¶
parent: Mapped[int] = mapped_column(BIGINT, ForeignKey('steps.step_id', ondelete='CASCADE'), primary_key=True)
child
class-attribute
instance-attribute
¶
child: Mapped[int] = mapped_column(BIGINT, ForeignKey('steps.step_id', ondelete='CASCADE'), primary_key=True)
level
class-attribute
instance-attribute
¶
level: Mapped[int] = mapped_column(INTEGER, nullable=False, primary_key=True)
__table_args__
class-attribute
instance-attribute
¶
__table_args__ = (CheckConstraint('parent != child', name='no_self_reference'), CheckConstraint('level > 0', name='positive_level'))
Steps
¶
Bases: CountMixin, MatchboxBase
flowchart TD
matchbox.server.postgresql.orm.Steps[Steps]
matchbox.server.postgresql.mixin.CountMixin[CountMixin]
matchbox.server.postgresql.mixin.CountMixin --> matchbox.server.postgresql.orm.Steps
click matchbox.server.postgresql.orm.Steps href "" "matchbox.server.postgresql.orm.Steps"
click matchbox.server.postgresql.mixin.CountMixin href "" "matchbox.server.postgresql.mixin.CountMixin"
Table of steps corresponding to models, resolvers, and sources.
Models produce edges and resolvers produce cluster assignments.
Methods:
-
get_lineage–Returns lineage ordered by priority.
-
from_path–Resolve a step path to a Step ORM object.
-
from_dto–Create a Steps instance from a Step DTO object.
-
to_dto–Convert ORM step to a matchbox.common Step object.
-
count–Counts the number of rows in the table.
Attributes:
-
__tablename__– -
step_id(Mapped[int]) – -
run_id(Mapped[int]) – -
upload_stage(Mapped[UploadStage]) – -
name(Mapped[str]) – -
description(Mapped[str | None]) – -
type(Mapped[str]) – -
fingerprint(Mapped[bytes]) – -
source_config(Mapped[Optional[SourceConfigs]]) – -
model_config(Mapped[Optional[ModelConfigs]]) – -
resolver_config(Mapped[Optional[ResolverConfigs]]) – -
model_edges(Mapped[list[ModelEdges]]) – -
resolver_clusters(Mapped[list[ResolverClusters]]) – -
parents(Mapped[list[Steps]]) – -
run(Mapped[Runs]) – -
__table_args__– -
ancestors(set[Steps]) –Return all ancestors (parents, grandparents, etc.) of this step.
-
descendants(set[Steps]) –Return descendants (children, grandchildren, etc.) of this step.
step_id
class-attribute
instance-attribute
¶
step_id: Mapped[int] = mapped_column(BIGINT, primary_key=True, autoincrement=True)
run_id
class-attribute
instance-attribute
¶
run_id: Mapped[int] = mapped_column(BIGINT, ForeignKey('runs.run_id', ondelete='CASCADE'), nullable=False)
upload_stage
class-attribute
instance-attribute
¶
upload_stage: Mapped[UploadStage] = mapped_column(Enum(UploadStage, native_enum=True, name='upload_stages', schema='mb'), nullable=False, default=READY)
description
class-attribute
instance-attribute
¶
description: Mapped[str | None] = mapped_column(TEXT, nullable=True)
fingerprint
class-attribute
instance-attribute
¶
fingerprint: Mapped[bytes] = mapped_column(BYTEA, nullable=False)
source_config
class-attribute
instance-attribute
¶
source_config: Mapped[Optional[SourceConfigs]] = relationship(back_populates='source_step', uselist=False)
model_config
class-attribute
instance-attribute
¶
model_config: Mapped[Optional[ModelConfigs]] = relationship(back_populates='model_step', uselist=False)
resolver_config
class-attribute
instance-attribute
¶
resolver_config: Mapped[Optional[ResolverConfigs]] = relationship(back_populates='resolver_step', uselist=False)
model_edges
class-attribute
instance-attribute
¶
model_edges: Mapped[list[ModelEdges]] = relationship(back_populates='proposed_by', passive_deletes=True)
resolver_clusters
class-attribute
instance-attribute
¶
resolver_clusters: Mapped[list[ResolverClusters]] = relationship(back_populates='proposed_by', passive_deletes=True)
parents
class-attribute
instance-attribute
¶
parents: Mapped[list[Steps]] = relationship(secondary=__table__, primaryjoin=dedent(' and_(\n Steps.step_id == StepFrom.child,\n StepFrom.level == 1\n )\n '), secondaryjoin='Steps.step_id == StepFrom.parent', viewonly=True, order_by='StepFrom.parent')
__table_args__
class-attribute
instance-attribute
¶
__table_args__ = (CheckConstraint("type IN ('model', 'source', 'resolver')", name='step_type_constraints'), UniqueConstraint('run_id', 'name', name='steps_name_key'))
ancestors
property
¶
Return all ancestors (parents, grandparents, etc.) of this step.
descendants
property
¶
Return descendants (children, grandchildren, etc.) of this step.
get_lineage
¶
get_lineage(sources: list[SourceConfigs] | None = None, queryable_only: bool = False) -> list[tuple[int, int | None]]
Returns lineage ordered by priority.
Highest priority (lowest level) first, then by step_id for stability.
Parameters:
-
(sources¶list[SourceConfigs] | None, default:None) –If provided, only return lineage paths that lead to these sources
-
(queryable_only¶bool, default:False) –If true, only include queryable step types
Returns:
from_path
classmethod
¶
from_path(path: StepPath, res_type: StepType | None = None, session: Session | None = None, for_update: bool = False) -> Steps
Resolve a step path to a Step ORM object.
Parameters:
-
(path¶StepPath) –The path of the step to resolve.
-
(res_type¶StepType | None, default:None) –A step type to use as filter.
-
(session¶Session | None, default:None) –A session to get the step for updates.
-
(for_update¶bool, default:False) –Locks the row until updated.
Raises:
-
MatchboxStepNotFoundError–If the step doesn’t exist.
from_dto
classmethod
¶
Create a Steps instance from a Step DTO object.
The step will be added to the session and flushed (but not committed).
For model steps, lineage entries will be created automatically.
Parameters:
-
(step¶Step) –The Step DTO to convert
-
(path¶StepPath) –The full step path
-
(session¶Session) –Database session (caller must commit)
Returns:
-
Steps–A Steps ORM instance with ID and relationships established
SourceFields
¶
Bases: CountMixin, MatchboxBase
flowchart TD
matchbox.server.postgresql.orm.SourceFields[SourceFields]
matchbox.server.postgresql.mixin.CountMixin[CountMixin]
matchbox.server.postgresql.mixin.CountMixin --> matchbox.server.postgresql.orm.SourceFields
click matchbox.server.postgresql.orm.SourceFields href "" "matchbox.server.postgresql.orm.SourceFields"
click matchbox.server.postgresql.mixin.CountMixin href "" "matchbox.server.postgresql.mixin.CountMixin"
Table for storing column details for SourceConfigs.
Methods:
-
count–Counts the number of rows in the table.
Attributes:
-
__tablename__– -
field_id(Mapped[int]) – -
source_config_id(Mapped[int]) – -
index(Mapped[int]) – -
name(Mapped[str]) – -
type(Mapped[str]) – -
is_key(Mapped[bool]) – -
source_config(Mapped[SourceConfigs]) – -
__table_args__–
field_id
class-attribute
instance-attribute
¶
field_id: Mapped[int] = mapped_column(BIGINT, primary_key=True)
source_config_id
class-attribute
instance-attribute
¶
source_config_id: Mapped[int] = mapped_column(BIGINT, ForeignKey('source_configs.source_config_id', ondelete='CASCADE'), nullable=False)
index
class-attribute
instance-attribute
¶
index: Mapped[int] = mapped_column(INTEGER, nullable=False)
is_key
class-attribute
instance-attribute
¶
is_key: Mapped[bool] = mapped_column(BOOLEAN, nullable=False)
source_config
class-attribute
instance-attribute
¶
source_config: Mapped[SourceConfigs] = relationship(back_populates='fields', foreign_keys=[source_config_id])
__table_args__
class-attribute
instance-attribute
¶
__table_args__ = (UniqueConstraint('source_config_id', 'index', name='unique_index'), Index('ix_source_columns_source_config_id', 'source_config_id'), Index('ix_unique_key_field', 'source_config_id', unique=True, postgresql_where=text('is_key = true')))
ClusterSourceKey
¶
Bases: CountMixin, MatchboxBase
flowchart TD
matchbox.server.postgresql.orm.ClusterSourceKey[ClusterSourceKey]
matchbox.server.postgresql.mixin.CountMixin[CountMixin]
matchbox.server.postgresql.mixin.CountMixin --> matchbox.server.postgresql.orm.ClusterSourceKey
click matchbox.server.postgresql.orm.ClusterSourceKey href "" "matchbox.server.postgresql.orm.ClusterSourceKey"
click matchbox.server.postgresql.mixin.CountMixin href "" "matchbox.server.postgresql.mixin.CountMixin"
Table for storing source primary keys for clusters.
Methods:
-
count–Counts the number of rows in the table.
Attributes:
-
__tablename__– -
key_id(Mapped[int]) – -
cluster_id(Mapped[int]) – -
source_config_id(Mapped[int]) – -
key(Mapped[str]) – -
cluster(Mapped[Clusters]) – -
source_config(Mapped[SourceConfigs]) – -
__table_args__–
key_id
class-attribute
instance-attribute
¶
key_id: Mapped[int] = mapped_column(BIGINT, primary_key=True)
cluster_id
class-attribute
instance-attribute
¶
cluster_id: Mapped[int] = mapped_column(BIGINT, ForeignKey('clusters.cluster_id', ondelete='CASCADE'), nullable=False)
source_config_id
class-attribute
instance-attribute
¶
source_config_id: Mapped[int] = mapped_column(BIGINT, ForeignKey('source_configs.source_config_id', ondelete='CASCADE'), nullable=False)
cluster
class-attribute
instance-attribute
¶
cluster: Mapped[Clusters] = relationship(back_populates='keys')
source_config
class-attribute
instance-attribute
¶
source_config: Mapped[SourceConfigs] = relationship(back_populates='cluster_keys')
__table_args__
class-attribute
instance-attribute
¶
__table_args__ = (Index('ix_cluster_keys_cluster_id', 'cluster_id'), Index('ix_cluster_keys_keys', 'key'), Index('ix_cluster_keys_source_config_id', 'source_config_id'), UniqueConstraint('key_id', 'source_config_id', name='unique_keys_source'))
SourceConfigs
¶
SourceConfigs(key_field: SourceFields | None = None, index_fields: list[SourceFields] | None = None, **kwargs: Any)
Bases: CountMixin, MatchboxBase
flowchart TD
matchbox.server.postgresql.orm.SourceConfigs[SourceConfigs]
matchbox.server.postgresql.mixin.CountMixin[CountMixin]
matchbox.server.postgresql.mixin.CountMixin --> matchbox.server.postgresql.orm.SourceConfigs
click matchbox.server.postgresql.orm.SourceConfigs href "" "matchbox.server.postgresql.orm.SourceConfigs"
click matchbox.server.postgresql.mixin.CountMixin href "" "matchbox.server.postgresql.mixin.CountMixin"
Table of source_configs of data for Matchbox.
Methods:
-
list_all–Returns all source_configs in the database.
-
from_dto–Create a SourceConfigs instance from a Step DTO object.
-
to_dto–Convert ORM source to a matchbox.common.SourceConfig object.
-
count–Counts the number of rows in the table.
Attributes:
-
__tablename__– -
source_config_id(Mapped[int]) – -
step_id(Mapped[int]) – -
location_type(Mapped[str]) – -
location_name(Mapped[str]) – -
extract_transform(Mapped[str]) – -
name(str) –Get the name of the related step.
-
source_step(Mapped[Steps]) – -
fields(Mapped[list[SourceFields]]) – -
key_field(Mapped[Optional[SourceFields]]) – -
index_fields(Mapped[list[SourceFields]]) – -
cluster_keys(Mapped[list[ClusterSourceKey]]) – -
clusters(Mapped[list[Clusters]]) –
source_config_id
class-attribute
instance-attribute
¶
source_config_id: Mapped[int] = mapped_column(BIGINT, Identity(start=1), primary_key=True)
step_id
class-attribute
instance-attribute
¶
step_id: Mapped[int] = mapped_column(BIGINT, ForeignKey('steps.step_id', ondelete='CASCADE'), nullable=False)
location_type
class-attribute
instance-attribute
¶
location_type: Mapped[str] = mapped_column(TEXT, nullable=False)
location_name
class-attribute
instance-attribute
¶
location_name: Mapped[str] = mapped_column(TEXT, nullable=False)
extract_transform
class-attribute
instance-attribute
¶
extract_transform: Mapped[str] = mapped_column(TEXT, nullable=False)
source_step
class-attribute
instance-attribute
¶
source_step: Mapped[Steps] = relationship(back_populates='source_config')
fields
class-attribute
instance-attribute
¶
fields: Mapped[list[SourceFields]] = relationship(back_populates='source_config', passive_deletes=True, cascade='all, delete-orphan')
key_field
class-attribute
instance-attribute
¶
key_field: Mapped[Optional[SourceFields]] = relationship(primaryjoin='and_(SourceConfigs.source_config_id == SourceFields.source_config_id, SourceFields.is_key == True)', viewonly=True, uselist=False)
index_fields
class-attribute
instance-attribute
¶
index_fields: Mapped[list[SourceFields]] = relationship(primaryjoin='and_(SourceConfigs.source_config_id == SourceFields.source_config_id, SourceFields.is_key == False)', viewonly=True, order_by='SourceFields.index', collection_class=list)
cluster_keys
class-attribute
instance-attribute
¶
cluster_keys: Mapped[list[ClusterSourceKey]] = relationship(back_populates='source_config', passive_deletes=True)
clusters
class-attribute
instance-attribute
¶
clusters: Mapped[list[Clusters]] = relationship(secondary=__table__, primaryjoin='SourceConfigs.source_config_id == ClusterSourceKey.source_config_id', secondaryjoin='ClusterSourceKey.cluster_id == Clusters.cluster_id', viewonly=True)
list_all
classmethod
¶
list_all() -> list[SourceConfigs]
Returns all source_configs in the database.
from_dto
classmethod
¶
from_dto(config: SourceConfig) -> SourceConfigs
Create a SourceConfigs instance from a Step DTO object.
ModelConfigs
¶
Bases: CountMixin, MatchboxBase
flowchart TD
matchbox.server.postgresql.orm.ModelConfigs[ModelConfigs]
matchbox.server.postgresql.mixin.CountMixin[CountMixin]
matchbox.server.postgresql.mixin.CountMixin --> matchbox.server.postgresql.orm.ModelConfigs
click matchbox.server.postgresql.orm.ModelConfigs href "" "matchbox.server.postgresql.orm.ModelConfigs"
click matchbox.server.postgresql.mixin.CountMixin href "" "matchbox.server.postgresql.mixin.CountMixin"
Table of model configs for Matchbox.
Methods:
-
list_all–Returns all model_configs in the database.
-
from_dto–Create a SourceConfigs instance from a Step DTO object.
-
to_dto–Convert ORM source to a matchbox.common.ModelConfig object.
-
count–Counts the number of rows in the table.
Attributes:
-
__tablename__– -
model_config_id(Mapped[int]) – -
step_id(Mapped[int]) – -
model_class(Mapped[str]) – -
model_settings(Mapped[dict[str, Any]]) – -
left_query(Mapped[dict[str, Any]]) – -
right_query(Mapped[dict[str, Any] | None]) – -
name(str) –Get the name of the related step.
-
model_step(Mapped[Steps]) –
model_config_id
class-attribute
instance-attribute
¶
model_config_id: Mapped[int] = mapped_column(BIGINT, Identity(start=1), primary_key=True)
step_id
class-attribute
instance-attribute
¶
step_id: Mapped[int] = mapped_column(BIGINT, ForeignKey('steps.step_id', ondelete='CASCADE'), nullable=False)
model_class
class-attribute
instance-attribute
¶
model_class: Mapped[str] = mapped_column(TEXT, nullable=False)
model_settings
class-attribute
instance-attribute
¶
left_query
class-attribute
instance-attribute
¶
right_query
class-attribute
instance-attribute
¶
model_step
class-attribute
instance-attribute
¶
model_step: Mapped[Steps] = relationship(back_populates='model_config')
from_dto
classmethod
¶
from_dto(config: ModelConfig) -> ModelConfigs
Create a SourceConfigs instance from a Step DTO object.
ResolverConfigs
¶
Bases: CountMixin, MatchboxBase
flowchart TD
matchbox.server.postgresql.orm.ResolverConfigs[ResolverConfigs]
matchbox.server.postgresql.mixin.CountMixin[CountMixin]
matchbox.server.postgresql.mixin.CountMixin --> matchbox.server.postgresql.orm.ResolverConfigs
click matchbox.server.postgresql.orm.ResolverConfigs href "" "matchbox.server.postgresql.orm.ResolverConfigs"
click matchbox.server.postgresql.mixin.CountMixin href "" "matchbox.server.postgresql.mixin.CountMixin"
Table of resolver configs for Matchbox.
Methods:
-
from_dto–Create a ResolverConfigs instance from a Step DTO object.
-
to_dto–Convert ORM resolver config to a matchbox.common ResolverConfig object.
-
count–Counts the number of rows in the table.
Attributes:
-
__tablename__– -
resolver_config_id(Mapped[int]) – -
step_id(Mapped[int]) – -
resolver_class(Mapped[str]) – -
resolver_settings(Mapped[dict[str, Any]]) – -
resolver_step(Mapped[Steps]) – -
__table_args__–
resolver_config_id
class-attribute
instance-attribute
¶
resolver_config_id: Mapped[int] = mapped_column(BIGINT, Identity(start=1), primary_key=True)
step_id
class-attribute
instance-attribute
¶
step_id: Mapped[int] = mapped_column(BIGINT, ForeignKey('steps.step_id', ondelete='CASCADE'), nullable=False)
resolver_class
class-attribute
instance-attribute
¶
resolver_class: Mapped[str] = mapped_column(TEXT, nullable=False)
resolver_settings
class-attribute
instance-attribute
¶
resolver_step
class-attribute
instance-attribute
¶
resolver_step: Mapped[Steps] = relationship(back_populates='resolver_config')
__table_args__
class-attribute
instance-attribute
¶
from_dto
classmethod
¶
from_dto(config: ResolverConfig) -> ResolverConfigs
Create a ResolverConfigs instance from a Step DTO object.
to_dto
¶
to_dto() -> ResolverConfig
Convert ORM resolver config to a matchbox.common ResolverConfig object.
Contains
¶
Bases: CountMixin, MatchboxBase
flowchart TD
matchbox.server.postgresql.orm.Contains[Contains]
matchbox.server.postgresql.mixin.CountMixin[CountMixin]
matchbox.server.postgresql.mixin.CountMixin --> matchbox.server.postgresql.orm.Contains
click matchbox.server.postgresql.orm.Contains href "" "matchbox.server.postgresql.orm.Contains"
click matchbox.server.postgresql.mixin.CountMixin href "" "matchbox.server.postgresql.mixin.CountMixin"
Cluster lineage table.
Methods:
-
count–Counts the number of rows in the table.
Attributes:
-
__tablename__– -
root(Mapped[int]) – -
leaf(Mapped[int]) – -
__table_args__–
root
class-attribute
instance-attribute
¶
root: Mapped[int] = mapped_column(BIGINT, ForeignKey('clusters.cluster_id', ondelete='CASCADE'), primary_key=True)
leaf
class-attribute
instance-attribute
¶
leaf: Mapped[int] = mapped_column(BIGINT, ForeignKey('clusters.cluster_id', ondelete='CASCADE'), primary_key=True)
__table_args__
class-attribute
instance-attribute
¶
__table_args__ = (CheckConstraint('root != leaf', name='no_self_containment'), UniqueConstraint('root', 'leaf'), Index('ix_contains_root_leaf', 'root', 'leaf'), Index('ix_contains_leaf_root', 'leaf', 'root'))
Clusters
¶
Bases: CountMixin, MatchboxBase
flowchart TD
matchbox.server.postgresql.orm.Clusters[Clusters]
matchbox.server.postgresql.mixin.CountMixin[CountMixin]
matchbox.server.postgresql.mixin.CountMixin --> matchbox.server.postgresql.orm.Clusters
click matchbox.server.postgresql.orm.Clusters href "" "matchbox.server.postgresql.orm.Clusters"
click matchbox.server.postgresql.mixin.CountMixin href "" "matchbox.server.postgresql.mixin.CountMixin"
Table of indexed data and clusters that match it.
Methods:
-
count–Counts the number of rows in the table.
Attributes:
-
__tablename__– -
cluster_id(Mapped[int]) – -
cluster_hash(Mapped[bytes]) – -
keys(Mapped[list[ClusterSourceKey]]) – -
leaves(Mapped[list[Clusters]]) – -
source_configs(Mapped[list[SourceConfigs]]) – -
__table_args__–
cluster_id
class-attribute
instance-attribute
¶
cluster_id: Mapped[int] = mapped_column(BIGINT, primary_key=True)
cluster_hash
class-attribute
instance-attribute
¶
cluster_hash: Mapped[bytes] = mapped_column(BYTEA, nullable=False)
keys
class-attribute
instance-attribute
¶
keys: Mapped[list[ClusterSourceKey]] = relationship(back_populates='cluster', passive_deletes=True)
leaves
class-attribute
instance-attribute
¶
leaves: Mapped[list[Clusters]] = relationship(secondary=__table__, primaryjoin='Clusters.cluster_id == Contains.root', secondaryjoin='Clusters.cluster_id == Contains.leaf', backref='roots')
source_configs
class-attribute
instance-attribute
¶
source_configs: Mapped[list[SourceConfigs]] = relationship(secondary=__table__, primaryjoin='Clusters.cluster_id == ClusterSourceKey.cluster_id', secondaryjoin='ClusterSourceKey.source_config_id == SourceConfigs.source_config_id', viewonly=True)
__table_args__
class-attribute
instance-attribute
¶
UserGroups
¶
Bases: MatchboxBase
flowchart TD
matchbox.server.postgresql.orm.UserGroups[UserGroups]
click matchbox.server.postgresql.orm.UserGroups href "" "matchbox.server.postgresql.orm.UserGroups"
Association table for user-group membership.
Attributes:
-
__tablename__– -
user_id(Mapped[int]) – -
group_id(Mapped[int]) –
Users
¶
Bases: CountMixin, MatchboxBase
flowchart TD
matchbox.server.postgresql.orm.Users[Users]
matchbox.server.postgresql.mixin.CountMixin[CountMixin]
matchbox.server.postgresql.mixin.CountMixin --> matchbox.server.postgresql.orm.Users
click matchbox.server.postgresql.orm.Users href "" "matchbox.server.postgresql.orm.Users"
click matchbox.server.postgresql.mixin.CountMixin href "" "matchbox.server.postgresql.mixin.CountMixin"
Table of user identities.
Methods:
-
count–Counts the number of rows in the table.
Attributes:
-
__tablename__– -
user_id(Mapped[int]) – -
name(Mapped[str]) – -
email(Mapped[str]) – -
judgements(Mapped[list[EvalJudgements]]) – -
groups(Mapped[list[Groups]]) – -
__table_args__–
user_id
class-attribute
instance-attribute
¶
user_id: Mapped[int] = mapped_column(BIGINT, primary_key=True)
judgements
class-attribute
instance-attribute
¶
judgements: Mapped[list[EvalJudgements]] = relationship(back_populates='user')
groups
class-attribute
instance-attribute
¶
__table_args__
class-attribute
instance-attribute
¶
Groups
¶
Bases: CountMixin, MatchboxBase
flowchart TD
matchbox.server.postgresql.orm.Groups[Groups]
matchbox.server.postgresql.mixin.CountMixin[CountMixin]
matchbox.server.postgresql.mixin.CountMixin --> matchbox.server.postgresql.orm.Groups
click matchbox.server.postgresql.orm.Groups href "" "matchbox.server.postgresql.orm.Groups"
click matchbox.server.postgresql.mixin.CountMixin href "" "matchbox.server.postgresql.mixin.CountMixin"
Groups for permission management.
Methods:
-
initialise–Create standard users, groups, and permissions.
-
count–Counts the number of rows in the table.
Attributes:
-
__tablename__– -
group_id(Mapped[int]) – -
name(Mapped[str]) – -
description(Mapped[str | None]) – -
is_system(Mapped[bool]) – -
members(Mapped[list[Users]]) – -
permissions(Mapped[list[Permissions]]) – -
__table_args__–
group_id
class-attribute
instance-attribute
¶
group_id: Mapped[int] = mapped_column(BIGINT, primary_key=True, autoincrement=True)
description
class-attribute
instance-attribute
¶
description: Mapped[str | None] = mapped_column(TEXT, nullable=True)
is_system
class-attribute
instance-attribute
¶
is_system: Mapped[bool] = mapped_column(BOOLEAN, default=False, nullable=False)
members
class-attribute
instance-attribute
¶
permissions
class-attribute
instance-attribute
¶
permissions: Mapped[list[Permissions]] = relationship(back_populates='group', passive_deletes=True)
__table_args__
class-attribute
instance-attribute
¶
Permissions
¶
Bases: CountMixin, MatchboxBase
flowchart TD
matchbox.server.postgresql.orm.Permissions[Permissions]
matchbox.server.postgresql.mixin.CountMixin[CountMixin]
matchbox.server.postgresql.mixin.CountMixin --> matchbox.server.postgresql.orm.Permissions
click matchbox.server.postgresql.orm.Permissions href "" "matchbox.server.postgresql.orm.Permissions"
click matchbox.server.postgresql.mixin.CountMixin href "" "matchbox.server.postgresql.mixin.CountMixin"
Permissions granted to groups on resources.
Each resource type should have one column. This creates lots of nulls, which are cheap in PostgreSQL and are on an ultimately small table, and avoids a polymorphic association.
Methods:
-
count–Counts the number of rows in the table.
Attributes:
-
__tablename__– -
permission_id(Mapped[int]) – -
permission(Mapped[str]) – -
group_id(Mapped[int]) – -
collection_id(Mapped[int | None]) – -
is_system(Mapped[bool | None]) – -
group(Mapped[Groups]) – -
collection(Mapped[Collections | None]) – -
__table_args__–
permission_id
class-attribute
instance-attribute
¶
permission_id: Mapped[int] = mapped_column(BIGINT, primary_key=True, autoincrement=True)
permission
class-attribute
instance-attribute
¶
permission: Mapped[str] = mapped_column(TEXT, nullable=False)
group_id
class-attribute
instance-attribute
¶
group_id: Mapped[int] = mapped_column(BIGINT, ForeignKey('groups.group_id', ondelete='CASCADE'), nullable=False)
collection_id
class-attribute
instance-attribute
¶
collection_id: Mapped[int | None] = mapped_column(BIGINT, ForeignKey('collections.collection_id', ondelete='CASCADE'), nullable=True)
is_system
class-attribute
instance-attribute
¶
is_system: Mapped[bool | None] = mapped_column(BOOLEAN, nullable=True)
group
class-attribute
instance-attribute
¶
group: Mapped[Groups] = relationship(back_populates='permissions')
collection
class-attribute
instance-attribute
¶
collection: Mapped[Collections | None] = relationship(back_populates='permissions')
__table_args__
class-attribute
instance-attribute
¶
__table_args__ = (CheckConstraint("permission IN ('read', 'write', 'admin')", name='valid_permission'), CheckConstraint('(collection_id IS NOT NULL AND is_system IS NULL) OR (collection_id IS NULL AND is_system = true)', name='exactly_one_resource'), UniqueConstraint('permission', 'group_id', 'collection_id', 'is_system', name='unique_permission_grant', postgresql_nulls_not_distinct=True))
EvalJudgements
¶
Bases: CountMixin, MatchboxBase
flowchart TD
matchbox.server.postgresql.orm.EvalJudgements[EvalJudgements]
matchbox.server.postgresql.mixin.CountMixin[CountMixin]
matchbox.server.postgresql.mixin.CountMixin --> matchbox.server.postgresql.orm.EvalJudgements
click matchbox.server.postgresql.orm.EvalJudgements href "" "matchbox.server.postgresql.orm.EvalJudgements"
click matchbox.server.postgresql.mixin.CountMixin href "" "matchbox.server.postgresql.mixin.CountMixin"
Table of evaluation judgements produced by human validators.
Methods:
-
count–Counts the number of rows in the table.
Attributes:
-
__tablename__– -
judgement_id(Mapped[int]) – -
user_id(Mapped[int]) – -
endorsed_cluster_id(Mapped[int]) – -
shown_cluster_id(Mapped[int]) – -
tag(Mapped[str]) – -
timestamp(Mapped[DateTime]) – -
user(Mapped[Users]) –
judgement_id
class-attribute
instance-attribute
¶
judgement_id: Mapped[int] = mapped_column(BIGINT, primary_key=True)
user_id
class-attribute
instance-attribute
¶
user_id: Mapped[int] = mapped_column(BIGINT, ForeignKey('users.user_id', ondelete='CASCADE'), nullable=False)
endorsed_cluster_id
class-attribute
instance-attribute
¶
endorsed_cluster_id: Mapped[int] = mapped_column(BIGINT, ForeignKey('clusters.cluster_id', ondelete='CASCADE'), nullable=False)
shown_cluster_id
class-attribute
instance-attribute
¶
shown_cluster_id: Mapped[int] = mapped_column(BIGINT, ForeignKey('clusters.cluster_id', ondelete='CASCADE'), nullable=False)
timestamp
class-attribute
instance-attribute
¶
ModelEdges
¶
Bases: CountMixin, MatchboxBase
flowchart TD
matchbox.server.postgresql.orm.ModelEdges[ModelEdges]
matchbox.server.postgresql.mixin.CountMixin[CountMixin]
matchbox.server.postgresql.mixin.CountMixin --> matchbox.server.postgresql.orm.ModelEdges
click matchbox.server.postgresql.orm.ModelEdges href "" "matchbox.server.postgresql.orm.ModelEdges"
click matchbox.server.postgresql.mixin.CountMixin href "" "matchbox.server.postgresql.mixin.CountMixin"
Table of results for a model step.
Stores the raw left/right scores created by a model.
Methods:
-
count–Counts the number of rows in the table.
Attributes:
-
__tablename__– -
result_id(Mapped[int]) – -
step_id(Mapped[int]) – -
left_id(Mapped[int]) – -
right_id(Mapped[int]) – -
score(Mapped[float]) – -
proposed_by(Mapped[Steps]) – -
__table_args__–
result_id
class-attribute
instance-attribute
¶
result_id: Mapped[int] = mapped_column(BIGINT, primary_key=True, autoincrement=True)
step_id
class-attribute
instance-attribute
¶
step_id: Mapped[int] = mapped_column(BIGINT, ForeignKey('steps.step_id', ondelete='CASCADE'), nullable=False)
left_id
class-attribute
instance-attribute
¶
left_id: Mapped[int] = mapped_column(BIGINT, ForeignKey('clusters.cluster_id', ondelete='CASCADE'), nullable=False)
right_id
class-attribute
instance-attribute
¶
right_id: Mapped[int] = mapped_column(BIGINT, ForeignKey('clusters.cluster_id', ondelete='CASCADE'), nullable=False)
score
class-attribute
instance-attribute
¶
score: Mapped[float] = mapped_column(REAL, nullable=False)
proposed_by
class-attribute
instance-attribute
¶
proposed_by: Mapped[Steps] = relationship(back_populates='model_edges')
__table_args__
class-attribute
instance-attribute
¶
__table_args__ = (Index('ix_model_edges_step', 'step_id'), CheckConstraint('score >= 0.0 AND score <= 1.0', name='valid_score'), UniqueConstraint('step_id', 'left_id', 'right_id'))
ResolverClusters
¶
Bases: CountMixin, MatchboxBase
flowchart TD
matchbox.server.postgresql.orm.ResolverClusters[ResolverClusters]
matchbox.server.postgresql.mixin.CountMixin[CountMixin]
matchbox.server.postgresql.mixin.CountMixin --> matchbox.server.postgresql.orm.ResolverClusters
click matchbox.server.postgresql.orm.ResolverClusters href "" "matchbox.server.postgresql.orm.ResolverClusters"
click matchbox.server.postgresql.mixin.CountMixin href "" "matchbox.server.postgresql.mixin.CountMixin"
Association table linking resolver steps to cluster IDs.
Methods:
-
count–Counts the number of rows in the table.
Attributes:
-
__tablename__– -
step_id(Mapped[int]) – -
cluster_id(Mapped[int]) – -
proposed_by(Mapped[Steps]) – -
__table_args__–
step_id
class-attribute
instance-attribute
¶
step_id: Mapped[int] = mapped_column(BIGINT, ForeignKey('steps.step_id', ondelete='CASCADE'), primary_key=True)
cluster_id
class-attribute
instance-attribute
¶
cluster_id: Mapped[int] = mapped_column(BIGINT, ForeignKey('clusters.cluster_id', ondelete='CASCADE'), primary_key=True)
proposed_by
class-attribute
instance-attribute
¶
proposed_by: Mapped[Steps] = relationship(back_populates='resolver_clusters')
__table_args__
class-attribute
instance-attribute
¶
utils
¶
Utilities for using the PostgreSQL backend.
Modules:
-
db–General utilities for the PostgreSQL backend.
-
insert–Utilities for inserting data into the PostgreSQL backend.
-
query–Utilities for querying and matching in the PostgreSQL backend.
db
¶
General utilities for the PostgreSQL backend.
Functions:
-
dump–Dumps the entire database to a snapshot.
-
restore–Restores the database from a snapshot.
-
grant_permission–Grant permission within a session.
-
sqa_profiled–SQLAlchemy profiler.
-
compile_sql–Compiles a SQLAlchemy statement into a string.
-
ingest_to_temporary_table–Context manager to ingest Arrow data to a temporary table with explicit types.
dump
¶
dump() -> MatchboxSnapshot
Dumps the entire database to a snapshot.
Returns:
-
MatchboxSnapshot–A MatchboxSnapshot object of type “postgres” with the database’s current state.
restore
¶
restore(snapshot: MatchboxSnapshot, batch_size: int) -> None
Restores the database from a snapshot.
Parameters:
-
(snapshot¶MatchboxSnapshot) –A MatchboxSnapshot object of type “postgres” with the database’s state
-
(batch_size¶int) –The number of records to insert in each batch
Raises:
-
ValueError–If the snapshot is missing data
grant_permission
¶
grant_permission(session: Session, group_name: GroupName, permission: PermissionType, resource: Literal[SYSTEM] | CollectionName) -> None
Grant permission within a session.
Committing (or otherwise) left to the caller.
sqa_profiled
¶
sqa_profiled() -> Generator[None, None, None]
SQLAlchemy profiler.
Taken directly from their docs: https://docs.sqlalchemy.org/en/20/faq/performance.html#query-profiling
compile_sql
¶
ingest_to_temporary_table
¶
ingest_to_temporary_table(table_name: str, schema_name: str, data: Table, column_types: dict[str, TypeEngine], max_chunksize: int | None = None) -> Generator[Table, None, None]
Context manager to ingest Arrow data to a temporary table with explicit types.
Parameters:
-
(table_name¶str) –Base name for the temporary table
-
(schema_name¶str) –Schema where the temporary table will be created
-
(data¶Table) –PyArrow table containing the data to ingest
-
(column_types¶dict[str, TypeEngine]) –Map of column names to SQLAlchemy type instances
-
(max_chunksize¶int | None, default:None) –Optional maximum chunk size for batches
Returns:
-
None–A SQLAlchemy Table object representing the temporary table
insert
¶
Utilities for inserting data into the PostgreSQL backend.
Functions:
-
insert_hashes–Indexes hash data for a source.
-
insert_model_edges–Writes model edges to Matchbox.
-
insert_clusters–Write resolver cluster assignments.
insert_hashes
¶
insert_hashes(path: SourceStepPath, data_hashes: Table, batch_size: int) -> None
Indexes hash data for a source.
Parameters:
-
(path¶SourceStepPath) –The path of the source step
-
(data_hashes¶Table) –Arrow table containing hash data
-
(batch_size¶int) –Batch size for bulk operations
Raises:
-
MatchboxStepNotFoundError–If the specified step doesn’t exist.
-
MatchboxStepInvalidData–If data fingerprint conflicts with the step.
-
MatchboxStepExistingData–If data was already inserted for the step.
insert_model_edges
¶
insert_model_edges(path: ModelStepPath, results: Table, batch_size: int) -> None
Writes model edges to Matchbox.
insert_clusters
¶
insert_clusters(path: ResolverStepPath, cluster_assignments: Table, batch_size: int) -> None
Write resolver cluster assignments.
The function proceeds in three phases:
- Validate: check fingerprint, ensure no prior data, short-circuit if the upload is empty
- Compute hashes: ingest assignments to a temp table, expand each child cluster to its leaves, and derive a cluster hash per parent cluster (Python round-trip via _compute_resolver_hashes)
- Insert everything: with both temp tables live in one session, materialise new Clusters rows, then insert Contains and ResolverClusters membership rows
Parameters:
-
(path¶ResolverStepPath) –The resolver step path to upload cluster assignments for
-
(cluster_assignments¶Table) –Arrow table conforming to SCHEMA_CLUSTERS, having (parent_id, child_id) columns
-
(batch_size¶int) –Batch size for temporary table ingestion
Raises:
-
MatchboxStepNotFoundError–If the step doesn’t exist
-
MatchboxStepInvalidData–If the fingerprint doesn’t match
-
MatchboxStepExistingData–If clusters already exist for this resolver
query
¶
Utilities for querying and matching in the PostgreSQL backend.
Functions:
-
require_complete_resolver–Resolve and validate a resolver path for query-time operations.
-
resolver_membership_subquery–Build root_id/leaf_id membership rows for a resolver.
-
query–Query Matchbox to retrieve linked data for a source.
-
match–Match a source key against targets via a resolver.
require_complete_resolver
¶
require_complete_resolver(session: Session, path: ResolverStepPath) -> Steps
Resolve and validate a resolver path for query-time operations.
resolver_membership_subquery
¶
Build root_id/leaf_id membership rows for a resolver.
query
¶
query(source: SourceStepPath, resolver: ResolverStepPath | None = None, return_leaf_id: bool = False, limit: int | None = None) -> Table
Query Matchbox to retrieve linked data for a source.
match
¶
match(key: str, source: SourceStepPath, targets: list[SourceStepPath], resolver: ResolverStepPath) -> list[Match]
Match a source key against targets via a resolver.