Skip to content

Database

matchbox.common.db

Common database utilities for Matchbox.

Functions:

  • sql_to_df

    Executes the given SQLAlchemy statement using Polars.

  • get_schema_table_names

    Takes a string table name and returns the unquoted schema and table as a tuple.

  • fullname_to_prefix

    Converts a full name to a prefix for column names.

Attributes:

ReturnTypeStr module-attribute

ReturnTypeStr = Literal['arrow', 'pandas', 'polars']

QueryReturnType module-attribute

QueryReturnType = Table | DataFrame | DataFrame

sql_to_df

sql_to_df(
    stmt: Select,
    engine: Engine,
    return_type: Literal["arrow", "pandas", "polars"],
    *,
    iter_batches: Literal[False] = False,
    batch_size: int | None = None,
    schema_overrides: dict[str, Any] | None = None,
    execute_options: dict[str, Any] | None = None,
) -> QueryReturnType
sql_to_df(
    stmt: Select,
    engine: Engine,
    return_type: Literal["arrow", "pandas", "polars"],
    *,
    iter_batches: Literal[True],
    batch_size: int | None = None,
    schema_overrides: dict[str, Any] | None = None,
    execute_options: dict[str, Any] | None = None,
) -> Iterator[QueryReturnType]
sql_to_df(
    stmt: Select,
    engine: Engine,
    return_type: ReturnTypeStr = "pandas",
    *,
    iter_batches: bool = False,
    batch_size: int | None = None,
    schema_overrides: dict[str, Any] | None = None,
    execute_options: dict[str, Any] | None = None,
) -> QueryReturnType | Iterator[QueryReturnType]

Executes the given SQLAlchemy statement using Polars.

Parameters:

  • stmt

    (Select) –

    A SQLAlchemy Select statement to be executed.

  • engine

    (Engine) –

    A SQLAlchemy Engine object for the database connection.

  • return_type

    (str, default: 'pandas' ) –

    The type of the return value. One of “arrow”, “pandas”, or “polars”.

  • iter_batches

    (bool, default: False ) –

    If True, return an iterator that yields each batch separately. If False, return a single DataFrame with all results. Default is False.

  • batch_size

    (int | None, default: None ) –

    Indicate the size of each batch when processing data in batches. Default is None.

  • schema_overrides

    (dict[str, Any] | None, default: None ) –

    A dictionary mapping column names to dtypes. Default is None.

  • execute_options

    (dict[str, Any] | None, default: None ) –

    These options will be passed through into the underlying query execution method as kwargs. Default is None.

Returns:

Raises:

  • ValueError

    If the engine URL is not properly configured or if an unsupported return type is specified.

get_schema_table_names

get_schema_table_names(full_name: str) -> tuple[str, str]

Takes a string table name and returns the unquoted schema and table as a tuple.

Parameters:

  • full_name

    (str) –

    A string indicating a table’s full name

Returns:

  • (schema, table)

    A tuple of schema and table name. If schema cannot be inferred, returns None.

Raises:

  • ValueError

    When the function can’t detect either a schema.table or table format in the input

fullname_to_prefix

fullname_to_prefix(fullname: str) -> str

Converts a full name to a prefix for column names.