Skip to content

Evaluation

matchbox.common.eval

Common operations to produce model evaluation scores.

Classes:

  • Judgement

    User determination on how to group source clusters from a model cluster.

Functions:

  • precision_recall

    From models and eval data, compute scores inspired by precision-recall.

  • process_judgements

    Convert judgements to pairs, net counts per pair, and set of source cluster IDs.

Attributes:

Pair module-attribute

Pair: TypeAlias = tuple[int, int]

Pairs module-attribute

Pairs: TypeAlias = set[Pair]

PrecisionRecall module-attribute

PrecisionRecall: TypeAlias = tuple[float, float]

ModelComparison module-attribute

Judgement

Bases: BaseModel

User determination on how to group source clusters from a model cluster.

Methods:

  • check_endorsed

    Ensure no cluster IDs are repeated in the endorsement.

Attributes:

user_id instance-attribute

user_id: int

shown class-attribute instance-attribute

shown: int = Field(
    description="ID of the model cluster shown to the user"
)

endorsed class-attribute instance-attribute

endorsed: list[list[int]] = Field(
    description="Groups of source cluster IDs that user thinks belong together"
)

check_endorsed classmethod

check_endorsed(value: list[list[int]])

Ensure no cluster IDs are repeated in the endorsement.

precision_recall

precision_recall(
    models_root_leaf: list[Table],
    judgements: Table,
    expansion: Table,
) -> list[PrecisionRecall]

From models and eval data, compute scores inspired by precision-recall.

This function does the following:

  • Convert model and judgement clusters to implied pair-wise connections. For judgments, this includes the pairs shown to users, but rejected. Sum how many times pairs were endorsed (+1) or rejected (-1).
  • Keep only the pairs where leaves are present in all models and in the judgements, so the comparison is fair.
  • If a validation pair was rejected as many times as it was endorsed, discard it from both model and validation pairs.
  • If a validation pair was rejected more times than it was endorsed, remove it from validation pairs, but keep it in model pairs.
  • Precision and recall are computed for each model against validation pairs.

At the moment, this function ignores user IDs.

Parameters:

  • models_root_leaf

    (list[Table]) –

    list of tables with root and leaf columns, one per model. They must include all the clusters that resolve from a model, all the way to the original source clusters if no model in the lineage merged them.

  • judgements

    (Table) –

    Dataframe following matchbox.common.arrow.SCHEMA_JUDGEMENTS.

  • expansion

    (Table) –

    Dataframe following matchbox.common.arrow.SCHEMA_CLUSTER_EXPANSION.

Returns:

process_judgements

process_judgements(
    judgements: DataFrame, expansion: DataFrame
) -> tuple[Pairs, dict[Pair, float], set[int]]

Convert judgements to pairs, net counts per pair, and set of source cluster IDs.

In general, pairs include all (sorted) pair-wise combinations of elements in a list of cluster IDs. For example, (123) will give us (1,2), (1,3), (2,3). We, however, need to capture the difference between when a user is shown (12) and endorses (12), vs. when the user is shown (123) and endorses (12). In the second case, the user implies a negative judgement over pairs (1,3) and (2,3). We return the net value of pairs by summing 1 for an endorsement and subtracting 1 for a rejection.

This function relies on the input data being well-formed. For example,

  • All shown cluster IDs need to have an expansion.
  • All endorsed cluster IDs need to have an expansion, unless they’re leaves.
  • No partial splintered clusters, i.e. if (123)->(12) is in the data, (123)->(3) must be as well.

Parameters:

  • judgements

    (DataFrame) –

    Dataframe following matchbox.common.arrow.SCHEMA_JUDGEMENTS.

  • expansion

    (DataFrame) –

    Dataframe following matchbox.common.arrow.SCHEMA_CLUSTER_EXPANSION.

Returns: