Evaluation
matchbox.common.eval
¶
Common operations to produce model evaluation scores.
Classes:
-
Judgement
–User determination on how to group source clusters from a model cluster.
Functions:
-
precision_recall
–From models and eval data, compute scores inspired by precision-recall.
-
process_judgements
–Convert judgements to pairs, net counts per pair, and set of source cluster IDs.
Attributes:
-
Pair
(TypeAlias
) – -
Pairs
(TypeAlias
) – -
PrecisionRecall
(TypeAlias
) – -
ModelComparison
(TypeAlias
) –
ModelComparison
module-attribute
¶
ModelComparison: TypeAlias = dict[
ModelResolutionName, PrecisionRecall
]
Judgement
¶
Bases: BaseModel
User determination on how to group source clusters from a model cluster.
Methods:
-
check_endorsed
–Ensure no cluster IDs are repeated in the endorsement.
Attributes:
precision_recall
¶
precision_recall(
models_root_leaf: list[Table],
judgements: Table,
expansion: Table,
) -> list[PrecisionRecall]
From models and eval data, compute scores inspired by precision-recall.
This function does the following:
- Convert model and judgement clusters to implied pair-wise connections. For judgments, this includes the pairs shown to users, but rejected. Sum how many times pairs were endorsed (+1) or rejected (-1).
- Keep only the pairs where leaves are present in all models and in the judgements, so the comparison is fair.
- If a validation pair was rejected as many times as it was endorsed, discard it from both model and validation pairs.
- If a validation pair was rejected more times than it was endorsed, remove it from validation pairs, but keep it in model pairs.
- Precision and recall are computed for each model against validation pairs.
At the moment, this function ignores user IDs.
Parameters:
-
models_root_leaf
¶list[Table]
) –list of tables with root and leaf columns, one per model. They must include all the clusters that resolve from a model, all the way to the original source clusters if no model in the lineage merged them.
-
judgements
¶Table
) –Dataframe following
matchbox.common.arrow.SCHEMA_JUDGEMENTS
. -
expansion
¶Table
) –Dataframe following
matchbox.common.arrow.SCHEMA_CLUSTER_EXPANSION
.
Returns:
-
list[PrecisionRecall]
–List of tuples of precision and recall scores, one per model.
process_judgements
¶
process_judgements(
judgements: DataFrame, expansion: DataFrame
) -> tuple[Pairs, dict[Pair, float], set[int]]
Convert judgements to pairs, net counts per pair, and set of source cluster IDs.
In general, pairs include all (sorted) pair-wise combinations of elements in a list of cluster IDs. For example, (123) will give us (1,2), (1,3), (2,3). We, however, need to capture the difference between when a user is shown (12) and endorses (12), vs. when the user is shown (123) and endorses (12). In the second case, the user implies a negative judgement over pairs (1,3) and (2,3). We return the net value of pairs by summing 1 for an endorsement and subtracting 1 for a rejection.
This function relies on the input data being well-formed. For example,
- All shown cluster IDs need to have an expansion.
- All endorsed cluster IDs need to have an expansion, unless they’re leaves.
- No partial splintered clusters, i.e. if (123)->(12) is in the data, (123)->(3) must be as well.
Parameters:
-
judgements
¶DataFrame
) –Dataframe following
matchbox.common.arrow.SCHEMA_JUDGEMENTS
. -
expansion
¶DataFrame
) –Dataframe following
matchbox.common.arrow.SCHEMA_CLUSTER_EXPANSION
.
Returns: