Skip to content

Locations

matchbox.client.locations

Interface to locations where source data is stored.

Classes:

Functions:

Attributes:

CLIENT_CLASSES module-attribute

CLIENT_CLASSES = {SQLALCHEMY: Engine}

ClientType

Bases: StrEnum

Enumeration of valid location clients.

Attributes:

SQLALCHEMY class-attribute instance-attribute

SQLALCHEMY = 'sqlalchemy'

Location

Location(name: str)

Bases: ABC

A location for a data source.

Methods:

  • set_client

    Set client for location and return the location.

  • connect

    Establish connection to the data location.

  • validate_extract_transform

    Validate ET logic against this location’s query language.

  • infer_types

    Extract all data types from the ET logic.

  • execute

    Execute ET logic against location and return batches.

  • from_config

    Initialise location from a location config.

Attributes:

config instance-attribute

config = LocationConfig(type=location_type, name=name)

client property

client: Engine | None

Retrieve client.

location_type abstractmethod property

location_type: LocationType

Output location type string.

client_type abstractmethod property

client_type: ClientType

Client type string.

set_client

set_client(client: Any) -> Self

Set client for location and return the location.

connect abstractmethod

connect() -> bool

Establish connection to the data location.

Raises:

validate_extract_transform abstractmethod

validate_extract_transform(extract_transform: str) -> bool

Validate ET logic against this location’s query language.

Raises:

infer_types abstractmethod

infer_types(extract_transform: str) -> dict[str, DataTypes]

Extract all data types from the ET logic.

execute abstractmethod

execute(extract_transform: str, batch_size: int | None = None, rename: dict[str, str] | Callable | None = None, return_type: QueryReturnType = POLARS, keys: tuple[str, list[str]] | None = None) -> Iterator[QueryReturnClass]

Execute ET logic against location and return batches.

Parameters:

  • extract_transform
    (str) –

    The ET logic to execute.

  • batch_size
    (int | None, default: None ) –

    The size of the batches to return.

  • rename
    (dict[str, str] | Callable | None, default: None ) –

    Renaming to apply after the ET logic is executed.

    • If a dictionary is provided, it will be used to rename the columns.
    • If a callable is provided, it will take the old name as input and return the new name.
  • return_type
    (QueryReturnType, default: POLARS ) –

    The type of data to return. Defaults to “polars”.

  • keys
    (tuple[str, list[str]] | None, default: None ) –

    Rule to only retrieve rows by specific keys. The key of the dictionary is a field name on which to filter. Filters source entries where the key field is in the dict values.

Raises:

from_config

from_config(config: LocationConfig) -> Self

Initialise location from a location config.

RelationalDBLocation

RelationalDBLocation(name: str)

Bases: Location

A location for a relational database.

Methods:

  • connect

    Establish connection to the data location.

  • validate_extract_transform

    Check that the SQL statement only contains a single data-extracting command.

  • infer_types

    Extract all data types from the ET logic.

  • execute

    Execute ET logic against location and return batches.

  • set_client

    Set client for location and return the location.

  • from_config

    Initialise location from a location config.

Attributes:

client instance-attribute

client: Engine

Retrieve client.

location_type class-attribute instance-attribute

location_type: LocationType = RDBMS

Output location type string.

client_type class-attribute instance-attribute

client_type: ClientType = SQLALCHEMY

Client type string.

config instance-attribute

config = LocationConfig(type=location_type, name=name)

connect

connect() -> bool

Establish connection to the data location.

Raises:

validate_extract_transform

validate_extract_transform(extract_transform: str) -> None

Check that the SQL statement only contains a single data-extracting command.

We are NOT attempting a full sanitisation of the SQL statement

Validation is done purely to stop accidental mistakes, not malicious actors
Users should only run indexing using SourceConfigs they trust and have read,
using least privilege credentials

Parameters:

  • extract_transform
    (str) –

    The SQL statement to validate

Raises:

infer_types

infer_types(extract_transform: str) -> dict[str, DataTypes]

Extract all data types from the ET logic.

execute

execute(extract_transform: str, batch_size: int | None = None, rename: dict[str, str] | Callable | None = None, return_type: QueryReturnType = POLARS, keys: tuple[str, list[str]] | None = None, schema_overrides: dict[str, DataType] | None = None) -> Generator[QueryReturnClass, None, None]

Execute ET logic against location and return batches.

Parameters:

  • extract_transform
    (str) –

    The ET logic to execute.

  • batch_size
    (int | None, default: None ) –

    The size of the batches to return.

  • rename
    (dict[str, str] | Callable | None, default: None ) –

    Renaming to apply after the ET logic is executed.

    • If a dictionary is provided, it will be used to rename the columns.
    • If a callable is provided, it will take the old name as input and return the new name.
  • return_type
    (QueryReturnType, default: POLARS ) –

    The type of data to return. Defaults to “polars”.

  • keys
    (tuple[str, list[str]] | None, default: None ) –

    Rule to only retrieve rows by specific keys. The key of the dictionary is a field name on which to filter. Filters source entries where the key field is in the dict values.

Raises:

set_client

set_client(client: Any) -> Self

Set client for location and return the location.

from_config

from_config(config: LocationConfig) -> Self

Initialise location from a location config.

requires_client

requires_client(method: Callable[..., T]) -> Callable[..., T]

Decorator that checks if client is set before executing a method.

A helper method for Location subclasses.

Raises:

location_type_to_class

location_type_to_class(location_type: LocationType) -> type[Location]

Map location type string to the corresponding class.