Database
matchbox.common.db
¶
Common database utilities for Matchbox.
Classes:
-
QueryReturnType–Enumeration of dataframe types to return from query.
Functions:
-
sql_to_df–Executes the given SQLAlchemy statement or SQL string using Polars.
Attributes:
QueryReturnType
¶
sql_to_df
¶
sql_to_df(stmt: str, connection: Engine | Connection, return_type: QueryReturnType, *, return_batches: Literal[False] = False, batch_size: int | None = None, rename: dict[str, str] | Callable | None = None, schema_overrides: dict[str, DataType] | None = None, execute_options: dict[str, Any] | None = None) -> QueryReturnClass
sql_to_df(stmt: str, connection: Engine | Connection, return_type: QueryReturnType, *, return_batches: Literal[True], batch_size: int | None = None, rename: dict[str, str] | Callable | None = None, schema_overrides: dict[str, DataType] | None = None, execute_options: dict[str, Any] | None = None) -> Iterator[QueryReturnClass]
sql_to_df(stmt: str, connection: Engine | Connection, return_type: QueryReturnType = PANDAS, *, return_batches: bool = False, batch_size: int | None = None, rename: dict[str, str] | Callable | None = None, schema_overrides: dict[str, DataType] | None = None, execute_options: dict[str, Any] | None = None) -> QueryReturnClass | Iterator[QueryReturnClass]
Executes the given SQLAlchemy statement or SQL string using Polars.
Parameters:
-
(stmt¶str) –A SQL string to be executed.
-
(connection¶Engine | Connection) –A SQLAlchemy Engine object or ADBC connection.
-
(return_type¶QueryReturnType, default:PANDAS) –The type of the return value. One of “arrow”, “pandas”, or “polars”.
-
(return_batches¶bool, default:False) –If True, return an iterator that yields each batch separately. If False, return a single DataFrame with all results. Default is False.
-
(batch_size¶int | None, default:None) –Indicate the size of each batch when processing data in batches. Default is None.
-
(rename¶dict[str, str] | Callable | None, default:None) –A dictionary mapping old column names to new column names, or a callable that takes a DataFrame and returns a DataFrame with renamed columns. Default is None.
-
(schema_overrides¶dict[str, DataType] | None, default:None) –A dictionary mapping column names to dtypes. Default is None.
-
(execute_options¶dict[str, Any] | None, default:None) –These options will be passed through into the underlying query execution method as kwargs. Default is None.
Returns:
-
QueryReturnClass | Iterator[QueryReturnClass]–If return_batches is False: A dataframe of the query results in the specified format.
-
QueryReturnClass | Iterator[QueryReturnClass]–If return_batches is True: An iterator of dataframes in the specified format.
Raises:
-
ValueError–- If the connection is not properly configured or if an unsupported return type is specified.
- If batch_size and return_batches are either both set or both unset.