Locations
matchbox.client.locations
¶
Interface to locations where source data is stored.
Classes:
-
ClientType–Enumeration of valid location clients.
-
Location–A location for a data source.
-
RelationalDBLocation–A location for a relational database.
Functions:
-
requires_client–Decorator that checks if client is set before executing a method.
-
location_type_to_class–Map location type string to the corresponding class.
Attributes:
Location
¶
Location(name: str)
Bases: ABC
A location for a data source.
Methods:
-
set_client–Set client for location and return the location.
-
connect–Establish connection to the data location.
-
validate_extract_transform–Validate ET logic against this location’s query language.
-
infer_types–Extract all data types from the ET logic.
-
execute–Execute ET logic against location and return batches.
-
from_config–Initialise location from a location config.
Attributes:
-
config– -
client(Engine | None) –Retrieve client.
-
location_type(LocationType) –Output location type string.
-
client_type(ClientType) –Client type string.
validate_extract_transform
abstractmethod
¶
Validate ET logic against this location’s query language.
Raises:
-
MatchboxSourceExtractTransformError–If the ET logic is invalid.
infer_types
abstractmethod
¶
Extract all data types from the ET logic.
execute
abstractmethod
¶
execute(extract_transform: str, batch_size: int | None = None, rename: dict[str, str] | Callable | None = None, return_type: QueryReturnType = POLARS, keys: tuple[str, list[str]] | None = None) -> Iterator[QueryReturnClass]
Execute ET logic against location and return batches.
Parameters:
-
(extract_transform¶str) –The ET logic to execute.
-
(batch_size¶int | None, default:None) –The size of the batches to return.
-
(rename¶dict[str, str] | Callable | None, default:None) –Renaming to apply after the ET logic is executed.
- If a dictionary is provided, it will be used to rename the columns.
- If a callable is provided, it will take the old name as input and return the new name.
-
(return_type¶QueryReturnType, default:POLARS) –The type of data to return. Defaults to “polars”.
-
(keys¶tuple[str, list[str]] | None, default:None) –Rule to only retrieve rows by specific keys. The key of the dictionary is a field name on which to filter. Filters source entries where the key field is in the dict values.
Raises:
-
AttributeError–If the cliet is not set.
from_config
¶
from_config(config: LocationConfig) -> Self
Initialise location from a location config.
RelationalDBLocation
¶
RelationalDBLocation(name: str)
Bases: Location
A location for a relational database.
Methods:
-
connect–Establish connection to the data location.
-
validate_extract_transform–Check that the SQL statement only contains a single data-extracting command.
-
infer_types–Extract all data types from the ET logic.
-
execute–Execute ET logic against location and return batches.
-
set_client–Set client for location and return the location.
-
from_config–Initialise location from a location config.
Attributes:
-
client(Engine) –Retrieve client.
-
location_type(LocationType) –Output location type string.
-
client_type(ClientType) –Client type string.
-
config–
location_type
class-attribute
instance-attribute
¶
location_type: LocationType = RDBMS
Output location type string.
client_type
class-attribute
instance-attribute
¶
client_type: ClientType = SQLALCHEMY
Client type string.
validate_extract_transform
¶
validate_extract_transform(extract_transform: str) -> None
Check that the SQL statement only contains a single data-extracting command.
We are NOT attempting a full sanitisation of the SQL statement
Validation is done purely to stop accidental mistakes, not malicious actors¶
Users should only run indexing using SourceConfigs they trust and have read,¶
using least privilege credentials¶
Parameters:
Raises:
-
ParseError–If the SQL statement cannot be parsed
-
MatchboxSourceExtractTransformError–If validation requirements are not met
infer_types
¶
Extract all data types from the ET logic.
execute
¶
execute(extract_transform: str, batch_size: int | None = None, rename: dict[str, str] | Callable | None = None, return_type: QueryReturnType = POLARS, keys: tuple[str, list[str]] | None = None, schema_overrides: dict[str, DataType] | None = None) -> Generator[QueryReturnClass, None, None]
Execute ET logic against location and return batches.
Parameters:
-
(extract_transform¶str) –The ET logic to execute.
-
(batch_size¶int | None, default:None) –The size of the batches to return.
-
(rename¶dict[str, str] | Callable | None, default:None) –Renaming to apply after the ET logic is executed.
- If a dictionary is provided, it will be used to rename the columns.
- If a callable is provided, it will take the old name as input and return the new name.
-
(return_type¶QueryReturnType, default:POLARS) –The type of data to return. Defaults to “polars”.
-
(keys¶tuple[str, list[str]] | None, default:None) –Rule to only retrieve rows by specific keys. The key of the dictionary is a field name on which to filter. Filters source entries where the key field is in the dict values.
Raises:
-
AttributeError–If the cliet is not set.
from_config
¶
from_config(config: LocationConfig) -> Self
Initialise location from a location config.
requires_client
¶
Decorator that checks if client is set before executing a method.
A helper method for Location subclasses.
Raises:
-
MatchboxSourceClientError–If the client is not set.
location_type_to_class
¶
location_type_to_class(location_type: LocationType) -> type[Location]
Map location type string to the corresponding class.