Evaluation
matchbox.common.eval
¶
Common operations to produce model evaluation scores.
Classes:
-
Judgement–User determination on how to group source clusters from a model cluster.
Functions:
-
precision_recall–From models and eval data, compute scores inspired by precision-recall.
-
process_judgements–Convert judgements to pairs, net counts per pair, and set of source cluster IDs.
Attributes:
Judgement
¶
Bases: BaseModel
User determination on how to group source clusters from a model cluster.
Methods:
-
check_no_duplicates–Ensure no cluster IDs are repeated in the endorsement.
-
check_consistency–Ensure union of endorsed clusters matches shown cluster.
Attributes:
shown
class-attribute
instance-attribute
¶
endorsed
class-attribute
instance-attribute
¶
endorsed: list[list[int]] = Field(description='Groups of source cluster IDs that user thinks belong together')
precision_recall
¶
precision_recall(models_root_leaf: list[DataFrame], judgements: DataFrame, expansion: DataFrame) -> list[PrecisionRecall]
From models and eval data, compute scores inspired by precision-recall.
This function does the following:
- Convert model and judgement clusters to implied pair-wise connections. For judgments, this includes the pairs shown to users, but rejected. Sum how many times pairs were endorsed (+1) or rejected (-1).
- Keep only the pairs where leaves are present in all models and in the judgements, so the comparison is fair.
- If a validation pair was rejected as many times as it was endorsed, discard it from both model and validation pairs.
- If a validation pair was rejected more times than it was endorsed, remove it from validation pairs, but keep it in model pairs.
- Precision and recall are computed for each model against validation pairs.
At the moment, this function ignores user IDs.
Parameters:
-
(models_root_leaf¶list[DataFrame]) –list of tables with root and leaf columns, one per model. They must include all the clusters that resolve from a model, all the way to the original source clusters if no model in the lineage merged them.
-
(judgements¶DataFrame) –Dataframe following
matchbox.common.arrow.SCHEMA_JUDGEMENTS. -
(expansion¶DataFrame) –Dataframe following
matchbox.common.arrow.SCHEMA_CLUSTER_EXPANSION.
Returns:
-
list[PrecisionRecall]–List of tuples of precision and recall scores, one per model.
process_judgements
¶
process_judgements(judgements: DataFrame, expansion: DataFrame) -> tuple[Pairs, dict[Pair, float], set[int]]
Convert judgements to pairs, net counts per pair, and set of source cluster IDs.
In general, pairs include all (sorted) pair-wise combinations of elements in a list of cluster IDs. For example, (123) will give us (1,2), (1,3), (2,3). We, however, need to capture the difference between when a user is shown (12) and endorses (12), vs. when the user is shown (123) and endorses (12). In the second case, the user implies a negative judgement over pairs (1,3) and (2,3). We return the net value of pairs by summing 1 for an endorsement and subtracting 1 for a rejection.
This function relies on the input data being well-formed. For example,
- All shown cluster IDs need to have an expansion.
- All endorsed cluster IDs need to have an expansion, unless they’re leaves.
- No partial splintered clusters, i.e. if (123)->(12) is in the data, (123)->(3) must be as well.
Parameters:
-
(judgements¶DataFrame) –Dataframe following
matchbox.common.arrow.SCHEMA_JUDGEMENTS. -
(expansion¶DataFrame) –Dataframe following
matchbox.common.arrow.SCHEMA_CLUSTER_EXPANSION.
Returns: