Skip to content

Arrow

matchbox.common.arrow

Common Arrow utilities.

Functions:

Attributes:

  • SCHEMA_MB_IDS (Final[Schema]) –

    Data transfer schema for Matchbox IDs keyed to primary keys.

  • SCHEMA_INDEX (Final[Schema]) –

    Data transfer schema for data to be indexed in Matchbox.

  • SCHEMA_RESULTS (Final[Schema]) –

    Data transfer schema for the results of a deduplication or linking process.

SCHEMA_MB_IDS module-attribute

SCHEMA_MB_IDS: Final[Schema] = schema(
    [("id", int64()), ("source_pk", large_string())]
)

Data transfer schema for Matchbox IDs keyed to primary keys.

SCHEMA_INDEX module-attribute

SCHEMA_INDEX: Final[Schema] = schema(
    [
        ("hash", large_binary()),
        ("source_pk", large_list(large_string())),
    ]
)

Data transfer schema for data to be indexed in Matchbox.

SCHEMA_RESULTS module-attribute

SCHEMA_RESULTS: Final[Schema] = schema(
    [
        ("left_id", uint64()),
        ("right_id", uint64()),
        ("probability", uint8()),
    ]
)

Data transfer schema for the results of a deduplication or linking process.

table_to_buffer

table_to_buffer(table: Table) -> BytesIO

Converts an Arrow table to a BytesIO buffer.