Database
matchbox.common.db
¶
Common database utilities for Matchbox.
Functions:
-
sql_to_df
–Executes the given SQLAlchemy statement using Polars.
-
get_schema_table_names
–Takes a string table name and returns the unquoted schema and table as a tuple.
-
fullname_to_prefix
–Converts a full name to a prefix for column names.
Attributes:
sql_to_df
¶
sql_to_df(
stmt: Select,
engine: Engine,
return_type: Literal["arrow", "pandas", "polars"],
*,
iter_batches: Literal[False] = False,
batch_size: int | None = None,
schema_overrides: dict[str, Any] | None = None,
execute_options: dict[str, Any] | None = None,
) -> QueryReturnType
sql_to_df(
stmt: Select,
engine: Engine,
return_type: Literal["arrow", "pandas", "polars"],
*,
iter_batches: Literal[True],
batch_size: int | None = None,
schema_overrides: dict[str, Any] | None = None,
execute_options: dict[str, Any] | None = None,
) -> Iterator[QueryReturnType]
sql_to_df(
stmt: Select,
engine: Engine,
return_type: ReturnTypeStr = "pandas",
*,
iter_batches: bool = False,
batch_size: int | None = None,
schema_overrides: dict[str, Any] | None = None,
execute_options: dict[str, Any] | None = None,
) -> QueryReturnType | Iterator[QueryReturnType]
Executes the given SQLAlchemy statement using Polars.
Parameters:
-
stmt
¶Select
) –A SQLAlchemy Select statement to be executed.
-
engine
¶Engine
) –A SQLAlchemy Engine object for the database connection.
-
return_type
¶str
, default:'pandas'
) –The type of the return value. One of “arrow”, “pandas”, or “polars”.
-
iter_batches
¶bool
, default:False
) –If True, return an iterator that yields each batch separately. If False, return a single DataFrame with all results. Default is False.
-
batch_size
¶int | None
, default:None
) –Indicate the size of each batch when processing data in batches. Default is None.
-
schema_overrides
¶dict[str, Any] | None
, default:None
) –A dictionary mapping column names to dtypes. Default is None.
-
execute_options
¶dict[str, Any] | None
, default:None
) –These options will be passed through into the underlying query execution method as kwargs. Default is None.
Returns:
-
QueryReturnType | Iterator[QueryReturnType]
–If iter_batches is False: A dataframe of the query results in the specified format.
-
QueryReturnType | Iterator[QueryReturnType]
–If iter_batches is True: An iterator of dataframes in the specified format.
Raises:
-
ValueError
–If the engine URL is not properly configured or if an unsupported return type is specified.
get_schema_table_names
¶
Takes a string table name and returns the unquoted schema and table as a tuple.
Parameters:
Returns:
-
(schema, table)
–A tuple of schema and table name. If schema cannot be inferred, returns None.
Raises:
-
ValueError
–When the function can’t detect either a schema.table or table format in the input