Skip to content

Database

matchbox.common.db

Common database utilities for Matchbox.

Classes:

Functions:

  • sql_to_df

    Executes the given SQLAlchemy statement or SQL string using Polars.

Attributes:

QueryReturnClass module-attribute

QueryReturnClass: TypeAlias = Table | DataFrame | DataFrame

QueryReturnType

Bases: StrEnum

Enumeration of dataframe types to return from query.

Attributes:

PANDAS class-attribute instance-attribute

PANDAS = 'pandas'

POLARS class-attribute instance-attribute

POLARS = 'polars'

ARROW class-attribute instance-attribute

ARROW = 'arrow'

sql_to_df

sql_to_df(stmt: str, connection: Engine | Connection, return_type: QueryReturnType, *, return_batches: Literal[False] = False, batch_size: int | None = None, rename: dict[str, str] | Callable | None = None, schema_overrides: dict[str, DataType] | None = None, execute_options: dict[str, Any] | None = None) -> QueryReturnClass
sql_to_df(stmt: str, connection: Engine | Connection, return_type: QueryReturnType, *, return_batches: Literal[True], batch_size: int | None = None, rename: dict[str, str] | Callable | None = None, schema_overrides: dict[str, DataType] | None = None, execute_options: dict[str, Any] | None = None) -> Iterator[QueryReturnClass]
sql_to_df(stmt: str, connection: Engine | Connection, return_type: QueryReturnType = PANDAS, *, return_batches: bool = False, batch_size: int | None = None, rename: dict[str, str] | Callable | None = None, schema_overrides: dict[str, DataType] | None = None, execute_options: dict[str, Any] | None = None) -> QueryReturnClass | Iterator[QueryReturnClass]

Executes the given SQLAlchemy statement or SQL string using Polars.

Parameters:

  • stmt

    (str) –

    A SQL string to be executed.

  • connection

    (Engine | Connection) –

    A SQLAlchemy Engine object or ADBC connection.

  • return_type

    (QueryReturnType, default: PANDAS ) –

    The type of the return value. One of “arrow”, “pandas”, or “polars”.

  • return_batches

    (bool, default: False ) –

    If True, return an iterator that yields each batch separately. If False, return a single DataFrame with all results. Default is False.

  • batch_size

    (int | None, default: None ) –

    Indicate the size of each batch when processing data in batches. Default is None.

  • rename

    (dict[str, str] | Callable | None, default: None ) –

    A dictionary mapping old column names to new column names, or a callable that takes a DataFrame and returns a DataFrame with renamed columns. Default is None.

  • schema_overrides

    (dict[str, DataType] | None, default: None ) –

    A dictionary mapping column names to dtypes. Default is None.

  • execute_options

    (dict[str, Any] | None, default: None ) –

    These options will be passed through into the underlying query execution method as kwargs. Default is None.

Returns:

Raises:

  • ValueError
    • If the connection is not properly configured or if an unsupported return type is specified.
    • If batch_size and return_batches are either both set or both unset.