Skip to content

Resolvers

matchbox.common.factories.resolvers

Factory helpers for resolver testkits.

Classes:

Functions:

MockResolverSettings

Bases: ResolverSettings


              flowchart TD
              matchbox.common.factories.resolvers.MockResolverSettings[MockResolverSettings]
              matchbox.client.resolvers.base.ResolverSettings[ResolverSettings]

                              matchbox.client.resolvers.base.ResolverSettings --> matchbox.common.factories.resolvers.MockResolverSettings
                


              click matchbox.common.factories.resolvers.MockResolverSettings href "" "matchbox.common.factories.resolvers.MockResolverSettings"
              click matchbox.client.resolvers.base.ResolverSettings href "" "matchbox.client.resolvers.base.ResolverSettings"
            

Settings type for MockResolver.

Methods:

Attributes:

thresholds class-attribute instance-attribute

thresholds: dict[ModelStepName, Annotated[float, Field(ge=0.0, le=1.0)]] = Field(default_factory=dict)

validate_inputs

validate_inputs(model_names: Iterable[ModelStepName]) -> None

Validate all model names are present in thresholds.

MockResolver

Bases: ResolverMethod


              flowchart TD
              matchbox.common.factories.resolvers.MockResolver[MockResolver]
              matchbox.client.resolvers.base.ResolverMethod[ResolverMethod]

                              matchbox.client.resolvers.base.ResolverMethod --> matchbox.common.factories.resolvers.MockResolver
                


              click matchbox.common.factories.resolvers.MockResolver href "" "matchbox.common.factories.resolvers.MockResolver"
              click matchbox.client.resolvers.base.ResolverMethod href "" "matchbox.client.resolvers.base.ResolverMethod"
            

Mock resolver methodology used by resolver testkits.

Methods:

Attributes:

resolver_type class-attribute

resolver_type: ResolverType = COMPONENTS

settings instance-attribute

compute_clusters

compute_clusters(model_edges: Mapping[ModelStepName, DataFrame]) -> DataFrame

Compute mock clusters with connected components.

ResolverTestkit

Bases: BaseModel


              flowchart TD
              matchbox.common.factories.resolvers.ResolverTestkit[ResolverTestkit]

              

              click matchbox.common.factories.resolvers.ResolverTestkit href "" "matchbox.common.factories.resolvers.ResolverTestkit"
            

Resolver plus local expected data for tests.

Methods:

  • query

    Thin wrapper to Query this testkit’s Sources via its Resolver.

  • fake_run

    Set resolver results without running the resolver.

  • into_dag

    Return kwargs for explicit DAG insertion.

Attributes:

resolver instance-attribute

resolver: Resolver

assignments instance-attribute

assignments: DataFrame

entities instance-attribute

entities: tuple[ClusterEntity, ...]

name property

name: str

Return resolver name.

query

query() -> Query

Thin wrapper to Query this testkit’s Sources via its Resolver.

fake_run

fake_run() -> Self

Set resolver results without running the resolver.

into_dag

into_dag() -> dict[str, Any]

Return kwargs for explicit DAG insertion.

resolver_factory

resolver_factory(dag: DAG | None = None, inputs: Iterable[ModelTestkit] | None = None, true_entities: Iterable[SourceEntity] | None = None, name: ResolverStepName | None = None, description: str | None = None, thresholds: Mapping[ModelStepName, float] | None = None, seed: int = 42) -> ResolverTestkit

Generate a complete resolver testkit.

Allows autoconfiguration with minimal settings, or more nuanced control.

Can either be used to generate a resolver in a pipeline, interconnected with existing testkit objects, or generate a standalone resolver with random data.

Parameters:

  • dag

    (DAG | None, default: None ) –

    DAG containing this resolver. Inferred from the first input testkit if not provided. A default DAG is created when inputs are also absent.

  • inputs

    (Iterable[ModelTestkit] | None, default: None ) –

    An iterable of ModelTestkit objects to use as resolver inputs. If None, a single default deduper model testkit is created automatically. All inputs must belong to the same DAG.

  • true_entities

    (Iterable[SourceEntity] | None, default: None ) –

    Ground truth SourceEntity objects used to generate the expected cluster assignments. If None, the resolver testkit will have no expected entities.

  • name

    (ResolverStepName | None, default: None ) –

    Name of the resolver. Defaults to a randomly generated word suffixed with ‘_resolver’.

  • description

    (str | None, default: None ) –

    Description of the resolver.

  • thresholds

    (Mapping[ModelStepName, float] | None, default: None ) –

    Per-model score thresholds in [0.0, 1.0]. If omitted, defaults to 0.0 for all resolver inputs.

  • seed

    (int, default: 42 ) –

    Random seed for reproducibility.

Returns:

  • ResolverTestkit ( ResolverTestkit ) –

    A resolver testkit with generated assignments and expected entities.

Raises:

  • TypeError

    If any element of inputs is not a ModelTestkit.

  • ValueError

    If inputs belong to different DAGs.