Evaluation

matchbox.common.eval ¶

Common operations to produce model evaluation scores.

Classes:

Judgement –

User determination on how to group source clusters from a model cluster.

Functions:

precision_recall –

From models and eval data, compute scores inspired by precision-recall.
process_judgements –

Convert judgements to pairs, net counts per pair, and set of source cluster IDs.

Attributes:

Pair (TypeAlias) –
Pairs (TypeAlias) –
PrecisionRecall (TypeAlias) –

Pair `module-attribute` ¶

Pair: TypeAlias = tuple[int, int]

Pairs `module-attribute` ¶

Pairs: TypeAlias = set[Pair]

PrecisionRecall `module-attribute` ¶

PrecisionRecall: TypeAlias = tuple[float, float]

Judgement ¶

Bases: BaseModel

User determination on how to group source clusters from a model cluster.

Methods:

check_no_duplicates –

Ensure no cluster IDs are repeated in the endorsement.
check_consistency –

Ensure union of endorsed clusters matches shown cluster.

Attributes:

user_name (str) –
tag (str | None) –
shown (list[int]) –
endorsed (list[list[int]]) –

user_name `instance-attribute` ¶

user_name: str

tag `class-attribute` `instance-attribute` ¶

tag: str | None = None

shown `class-attribute` `instance-attribute` ¶

shown: list[int] = Field(description='IDs of the source clusters shown to user at once')

endorsed `class-attribute` `instance-attribute` ¶

endorsed: list[list[int]] = Field(description='Groups of source cluster IDs that user thinks belong together')

check_no_duplicates `classmethod` ¶

check_no_duplicates(value: list[list[int]]) -> list[list[int]]

Ensure no cluster IDs are repeated in the endorsement.

check_consistency ¶

check_consistency() -> Self

Ensure union of endorsed clusters matches shown cluster.

precision_recall ¶

precision_recall(models_root_leaf: list[DataFrame], judgements: DataFrame, expansion: DataFrame) -> list[PrecisionRecall]

From models and eval data, compute scores inspired by precision-recall.

This function does the following:

Convert model and judgement clusters to implied pair-wise connections. For judgments, this includes the pairs shown to users, but rejected. Sum how many times pairs were endorsed (+1) or rejected (-1).
Keep only the pairs where leaves are present in all models and in the judgements, so the comparison is fair.
If a validation pair was rejected as many times as it was endorsed, discard it from both model and validation pairs.
If a validation pair was rejected more times than it was endorsed, remove it from validation pairs, but keep it in model pairs.
Precision and recall are computed for each model against validation pairs.

At the moment, this function ignores user IDs.

Parameters:

models_root_leaf ¶
(list[DataFrame]) –

list of tables with root and leaf columns, one per model. They must include all the clusters that resolve from a model, all the way to the original source clusters if no model in the lineage merged them.
judgements ¶
(DataFrame) –

Dataframe following matchbox.common.arrow.SCHEMA_JUDGEMENTS.
expansion ¶
(DataFrame) –

Dataframe following matchbox.common.arrow.SCHEMA_CLUSTER_EXPANSION.

Returns:

list[PrecisionRecall] –

List of tuples of precision and recall scores, one per model.

process_judgements ¶

process_judgements(judgements: DataFrame, expansion: DataFrame) -> tuple[Pairs, dict[Pair, float], set[int]]

Convert judgements to pairs, net counts per pair, and set of source cluster IDs.

In general, pairs include all (sorted) pair-wise combinations of elements in a list of cluster IDs. For example, (123) will give us (1,2), (1,3), (2,3). We, however, need to capture the difference between when a user is shown (12) and endorses (12), vs. when the user is shown (123) and endorses (12). In the second case, the user implies a negative judgement over pairs (1,3) and (2,3). We return the net value of pairs by summing 1 for an endorsement and subtracting 1 for a rejection.

This function relies on the input data being well-formed. For example,

All shown cluster IDs need to have an expansion.
All endorsed cluster IDs need to have an expansion, unless they’re leaves.
No partial splintered clusters, i.e. if (123)->(12) is in the data, (123)->(3) must be as well.

Parameters:

judgements ¶
(DataFrame) –

Dataframe following matchbox.common.arrow.SCHEMA_JUDGEMENTS.
expansion ¶
(DataFrame) –

Dataframe following matchbox.common.arrow.SCHEMA_CLUSTER_EXPANSION.

Returns:

Pairs –

Tuple of:
dict[Pair, float] –
- Set of pairs, for all endorsements and rejections.
set[int] –
- Dict mapping pairs to net (positive or negative) number of judgements.
tuple[Pairs, dict[Pair, float], set[int]] –
- Set of all cluster IDs shown to users.

Evaluation

matchbox.common.eval ¶

Pair `module-attribute` ¶

Pairs `module-attribute` ¶

PrecisionRecall `module-attribute` ¶

Judgement ¶

user_name `instance-attribute` ¶

tag `class-attribute` `instance-attribute` ¶

shown `class-attribute` `instance-attribute` ¶

endorsed `class-attribute` `instance-attribute` ¶

check_no_duplicates `classmethod` ¶

check_consistency ¶

precision_recall ¶

`models_root_leaf` ¶

`judgements` ¶

`expansion` ¶

process_judgements ¶

`judgements` ¶

`expansion` ¶

Evaluation

matchbox.common.eval ¶

Pair module-attribute ¶

Pairs module-attribute ¶

PrecisionRecall module-attribute ¶

Judgement ¶

user_name instance-attribute ¶

tag class-attribute instance-attribute ¶

shown class-attribute instance-attribute ¶

endorsed class-attribute instance-attribute ¶

check_no_duplicates classmethod ¶

check_consistency ¶

precision_recall ¶

models_root_leaf ¶

judgements ¶

expansion ¶

process_judgements ¶

judgements ¶

expansion ¶

Pair `module-attribute` ¶

Pairs `module-attribute` ¶

PrecisionRecall `module-attribute` ¶

user_name `instance-attribute` ¶

tag `class-attribute` `instance-attribute` ¶

shown `class-attribute` `instance-attribute` ¶

endorsed `class-attribute` `instance-attribute` ¶

check_no_duplicates `classmethod` ¶

`models_root_leaf` ¶

`judgements` ¶

`expansion` ¶

`judgements` ¶

`expansion` ¶