Scenarios
matchbox.common.factories.scenarios
¶
Scenario factories for creating TestkitDAG scenarios.
Functions:
-
register_scenario–Decorator to register a new scenario builder function.
-
create_bare_scenario–Create a bare TestkitDAG scenario.
-
create_index_scenario–Create an index TestkitDAG scenario.
-
create_dedupe_scenario–Create a dedupe TestkitDAG scenario.
-
create_probabilistic_dedupe_scenario–Create a probabilistic dedupe TestkitDAG scenario.
-
create_link_scenario–Create a link TestkitDAG scenario.
-
create_alt_dedupe_scenario–Create a TestkitDAG scenario with two alternative dedupers.
-
create_convergent_partial_scenario–Create a TestkitDAG scenario with convergent sources.
-
create_convergent_scenario–Create a TestkitDAG scenario with convergent sources, deduped.
-
create_mega_scenario–Create a TestkitDAG scenario that produces large clusters.
-
setup_scenario–Context manager for creating TestkitDAG scenarios.
Attributes:
register_scenario
¶
register_scenario(name: str) -> Callable[[ScenarioBuilder], ScenarioBuilder]
Decorator to register a new scenario builder function.
create_bare_scenario
¶
create_bare_scenario(backend: MatchboxDBAdapter, warehouse_engine: Engine, n_entities: int = 10, seed: int = 42, **kwargs: Any) -> TestkitDAG
Create a bare TestkitDAG scenario.
create_index_scenario
¶
create_index_scenario(backend: MatchboxDBAdapter, warehouse_engine: Engine, n_entities: int = 10, seed: int = 42, **kwargs: Any) -> TestkitDAG
Create an index TestkitDAG scenario.
create_dedupe_scenario
¶
create_dedupe_scenario(backend: MatchboxDBAdapter, warehouse_engine: Engine, n_entities: int = 10, seed: int = 42, **kwargs: Any) -> TestkitDAG
Create a dedupe TestkitDAG scenario.
create_probabilistic_dedupe_scenario
¶
create_probabilistic_dedupe_scenario(backend: MatchboxDBAdapter, warehouse_engine: Engine, n_entities: int = 10, seed: int = 42, **kwargs: Any) -> TestkitDAG
Create a probabilistic dedupe TestkitDAG scenario.
create_link_scenario
¶
create_link_scenario(backend: MatchboxDBAdapter, warehouse_engine: Engine, n_entities: int = 10, seed: int = 42, **kwargs: Any) -> TestkitDAG
Create a link TestkitDAG scenario.
create_alt_dedupe_scenario
¶
create_alt_dedupe_scenario(backend: MatchboxDBAdapter, warehouse_engine: Engine, n_entities: int = 10, seed: int = 42, **kwargs: Any) -> TestkitDAG
Create a TestkitDAG scenario with two alternative dedupers.
create_convergent_partial_scenario
¶
create_convergent_partial_scenario(backend: MatchboxDBAdapter, warehouse_engine: Engine, n_entities: int = 10, seed: int = 42, **kwargs: Any) -> TestkitDAG
Create a TestkitDAG scenario with convergent sources.
This is where two sources index almost identically. TestkitDAG contains two indexed sources with repetition, and two naive dedupe models that haven’t yet had their results inserted.
create_convergent_scenario
¶
create_convergent_scenario(backend: MatchboxDBAdapter, warehouse_engine: Engine, n_entities: int = 10, seed: int = 42, **kwargs: Any) -> TestkitDAG
Create a TestkitDAG scenario with convergent sources, deduped.
This is where two sources index almost identically. TestkitDAG contains two indexed sources with repetition, and two naive dedupe models, all inserted.
create_mega_scenario
¶
create_mega_scenario(backend: MatchboxDBAdapter, warehouse_engine: Engine, n_entities: int = 10, seed: int = 42, **kwargs: Any) -> TestkitDAG
Create a TestkitDAG scenario that produces large clusters.
Aims to produce “mega” clusters with more features than the CLI has screen rows, and more variations than the CLI has screen columns.
Parameters:
-
(backend¶MatchboxDBAdapter) –MatchboxDBAdapter instance
-
(warehouse_engine¶Engine) –SQLAlchemy engine for data warehouse
-
(n_entities¶int, default:10) –Number of true product entities to generate
-
(seed¶int, default:42) –Random seed for reproducibility
-
(**kwargs¶Any, default:{}) –Additional arguments
Returns:
-
TestkitDAG–TestkitDAG with extensive product data and linking model
setup_scenario
¶
setup_scenario(backend: MatchboxDBAdapter, scenario_type: Literal['bare', 'index', 'dedupe', 'link', 'probabilistic_dedupe', 'alt_dedupe', 'convergent'], warehouse: Engine, n_entities: int = 10, seed: int = 42, **kwargs: dict[str, Any]) -> Generator[TestkitDAG, None, None]
Context manager for creating TestkitDAG scenarios.