Arrow
matchbox.common.arrow
¶
Common Arrow utilities.
Classes:
-
JudgementsZipFilenames
–Enumeration of file names in ZIP file with downloaded judgements.
Functions:
-
table_to_buffer
–Converts an Arrow table to a BytesIO buffer.
-
check_schema
–Validate equality of Arrow schemas.
Attributes:
-
SCHEMA_QUERY
(Final[Schema]
) –Data transfer schema for root cluster IDs keyed to primary keys.
-
SCHEMA_QUERY_WITH_LEAVES
–Data transfer schema for root cluster IDs keyed to primary keys and leaf IDs.
-
SCHEMA_INDEX
(Final[Schema]
) –Data transfer schema for data to be indexed in Matchbox.
-
SCHEMA_RESULTS
(Final[Schema]
) –Data transfer schema for the results of a deduplication or linking process.
-
SCHEMA_JUDGEMENTS
(Final[Schema]
) –Data transfer schema for retrieved evaluation judgements from users.
-
SCHEMA_CLUSTER_EXPANSION
(Final[Schema]
) –Data transfer schema for mapping from a cluster ID to all its source cluster IDs.
-
SCHEMA_EVAL_SAMPLES
(Final[Schema]
) –Data transfer schema for evaluation samples.
SCHEMA_QUERY
module-attribute
¶
SCHEMA_QUERY: Final[Schema] = schema([('id', int64()), ('key', large_string())])
Data transfer schema for root cluster IDs keyed to primary keys.
SCHEMA_QUERY_WITH_LEAVES
module-attribute
¶
Data transfer schema for root cluster IDs keyed to primary keys and leaf IDs.
SCHEMA_INDEX
module-attribute
¶
SCHEMA_INDEX: Final[Schema] = schema([('hash', large_binary()), ('keys', large_list(large_string()))])
Data transfer schema for data to be indexed in Matchbox.
SCHEMA_RESULTS
module-attribute
¶
SCHEMA_RESULTS: Final[Schema] = schema([('left_id', uint64()), ('right_id', uint64()), ('probability', uint8())])
Data transfer schema for the results of a deduplication or linking process.
SCHEMA_JUDGEMENTS
module-attribute
¶
SCHEMA_JUDGEMENTS: Final[Schema] = schema([('user_id', uint64()), ('endorsed', uint64()), ('shown', uint64())])
Data transfer schema for retrieved evaluation judgements from users.
SCHEMA_CLUSTER_EXPANSION
module-attribute
¶
SCHEMA_CLUSTER_EXPANSION: Final[Schema] = schema([('root', uint64()), ('leaves', list_(uint64()))])
Data transfer schema for mapping from a cluster ID to all its source cluster IDs.
SCHEMA_EVAL_SAMPLES
module-attribute
¶
SCHEMA_EVAL_SAMPLES: Final[Schema] = schema([('root', uint64()), ('leaf', uint64()), ('key', large_string()), ('source', large_string())])
Data transfer schema for evaluation samples.
JudgementsZipFilenames
¶
Bases: StrEnum
Enumeration of file names in ZIP file with downloaded judgements.
Attributes:
-
JUDGEMENTS
– -
EXPANSION
–
table_to_buffer
¶
table_to_buffer(table: Table) -> BytesIO
Converts an Arrow table to a BytesIO buffer.
check_schema
¶
Validate equality of Arrow schemas.