Skip to content

Arrow

matchbox.common.arrow

Common Arrow utilities.

Classes:

Functions:

Attributes:

SCHEMA_QUERY module-attribute

SCHEMA_QUERY: Final[Schema] = schema([('id', int64()), ('key', large_string())])

Data transfer schema for root cluster IDs keyed to primary keys.

SCHEMA_QUERY_WITH_LEAVES module-attribute

SCHEMA_QUERY_WITH_LEAVES = append(field('leaf_id', int64()))

Data transfer schema for root cluster IDs keyed to primary keys and leaf IDs.

SCHEMA_INDEX module-attribute

SCHEMA_INDEX: Final[Schema] = schema([('hash', large_binary()), ('keys', large_list(large_string()))])

Data transfer schema for data to be indexed in Matchbox.

SCHEMA_RESULTS module-attribute

SCHEMA_RESULTS: Final[Schema] = schema([('left_id', uint64()), ('right_id', uint64()), ('probability', uint8())])

Data transfer schema for the results of a deduplication or linking process.

SCHEMA_JUDGEMENTS module-attribute

SCHEMA_JUDGEMENTS: Final[Schema] = schema([('user_id', uint64()), ('endorsed', uint64()), ('shown', uint64())])

Data transfer schema for retrieved evaluation judgements from users.

SCHEMA_CLUSTER_EXPANSION module-attribute

SCHEMA_CLUSTER_EXPANSION: Final[Schema] = schema([('root', uint64()), ('leaves', list_(uint64()))])

Data transfer schema for mapping from a cluster ID to all its source cluster IDs.

SCHEMA_EVAL_SAMPLES module-attribute

SCHEMA_EVAL_SAMPLES: Final[Schema] = schema([('root', uint64()), ('leaf', uint64()), ('key', large_string()), ('source', large_string())])

Data transfer schema for evaluation samples.

JudgementsZipFilenames

Bases: StrEnum

Enumeration of file names in ZIP file with downloaded judgements.

Attributes:

JUDGEMENTS class-attribute instance-attribute

JUDGEMENTS = 'judgements.parquet'

EXPANSION class-attribute instance-attribute

EXPANSION = 'expansion.parquet'

table_to_buffer

table_to_buffer(table: Table) -> BytesIO

Converts an Arrow table to a BytesIO buffer.

check_schema

check_schema(expected: Schema, actual: Schema) -> None

Validate equality of Arrow schemas.