Database
matchbox.common.db
¶
Common database utilities for Matchbox.
Functions:
-
sql_to_df
–Executes the given SQLAlchemy statement or SQL string using Polars.
-
validate_sql_for_data_extraction
–Validates that the SQL statement only contains a single data-extracting command.
-
clean_uri
–Clean a database URI.
Attributes:
sql_to_df
¶
sql_to_df(
stmt: str,
connection: Engine | Connection,
return_type: Literal["arrow", "pandas", "polars"],
*,
return_batches: Literal[False] = False,
batch_size: int | None = None,
rename: dict[str, str] | Callable | None = None,
schema_overrides: dict[str, Any] | None = None,
execute_options: dict[str, Any] | None = None,
) -> QueryReturnType
sql_to_df(
stmt: str,
connection: Engine | Connection,
return_type: Literal["arrow", "pandas", "polars"],
*,
return_batches: Literal[True],
batch_size: int | None = None,
rename: dict[str, str] | Callable | None = None,
schema_overrides: dict[str, Any] | None = None,
execute_options: dict[str, Any] | None = None,
) -> Iterator[QueryReturnType]
sql_to_df(
stmt: str,
connection: Engine | Connection,
return_type: ReturnTypeStr = "pandas",
*,
return_batches: bool = False,
batch_size: int | None = None,
rename: dict[str, str] | Callable | None = None,
schema_overrides: dict[str, Any] | None = None,
execute_options: dict[str, Any] | None = None,
) -> QueryReturnType | Iterator[QueryReturnType]
Executes the given SQLAlchemy statement or SQL string using Polars.
Parameters:
-
stmt
¶str
) –A SQL string to be executed.
-
connection
¶Engine | Connection
) –A SQLAlchemy Engine object or ADBC connection.
-
return_type
¶str
, default:'pandas'
) –The type of the return value. One of “arrow”, “pandas”, or “polars”.
-
return_batches
¶bool
, default:False
) –If True, return an iterator that yields each batch separately. If False, return a single DataFrame with all results. Default is False.
-
batch_size
¶int | None
, default:None
) –Indicate the size of each batch when processing data in batches. Default is None.
-
rename
¶dict[str, str] | Callable | None
, default:None
) –A dictionary mapping old column names to new column names, or a callable that takes a DataFrame and returns a DataFrame with renamed columns. Default is None.
-
schema_overrides
¶dict[str, Any] | None
, default:None
) –A dictionary mapping column names to dtypes. Default is None.
-
execute_options
¶dict[str, Any] | None
, default:None
) –These options will be passed through into the underlying query execution method as kwargs. Default is None.
Returns:
-
QueryReturnType | Iterator[QueryReturnType]
–If return_batches is False: A dataframe of the query results in the specified format.
-
QueryReturnType | Iterator[QueryReturnType]
–If return_batches is True: An iterator of dataframes in the specified format.
Raises:
-
ValueError
–- If the connection is not properly configured or if an unsupported return type is specified.
- If batch_size and return_batches are either both set or both unset.
validate_sql_for_data_extraction
¶
Validates that the SQL statement only contains a single data-extracting command.
Parameters:
Returns:
-
bool
(bool
) –True if the SQL statement is valid
Raises:
-
ParseError
–If the SQL statement cannot be parsed
-
MatchboxSourceExtractTransformError
–If validation requirements are not met
clean_uri
¶
clean_uri(
uri: str | AnyUrl,
strip_driver: bool = True,
strip_credentials: bool = True,
strip_query_fragment: bool = True,
) -> AnyUrl
Clean a database URI.
Optionally removes driver, credentials, and/or query/fragment components.
Parameters:
-
uri
¶str | AnyUrl
) –A database URI as a string or AnyUrl object
-
strip_driver
¶bool
, default:True
) –Whether to strip the driver component (e.g., ‘postgresql+psycopg2’ -> ‘postgresql’)
-
strip_credentials
¶bool
, default:True
) –Whether to strip username and password components
-
strip_query_fragment
¶bool
, default:True
) –Whether to strip query and fragment components
Returns:
-
AnyUrl
–An AnyUrl object with the specified components removed