Skip to content

Evaluation

matchbox.client.eval

Module implementing client-side evaluation features.

Modules:

  • utils

    Collection of client-side functions in aid of model evaluation.

Classes:

  • EvalData

    Object which caches evaluation data to measure performance of models.

Functions:

  • compare_models

    Compare metrics of models based on evaluation data.

  • get_samples

    Retrieve samples enriched with source data, grouped by resolution cluster.

EvalData

EvalData()

Object which caches evaluation data to measure performance of models.

Methods:

precision_recall

precision_recall(results: Results, threshold: float) -> PrecisionRecall

Computes precision and recall at one threshold.

compare_models

Compare metrics of models based on evaluation data.

Parameters:

Returns:

  • ModelComparison

    A model comparison object, listing metrics for each model.

get_samples

get_samples(n: int, dag: DAG, user_id: int, clients: dict[str, Any] | None = None, default_client: Any | None = None) -> dict[int, DataFrame]

Retrieve samples enriched with source data, grouped by resolution cluster.

Parameters:

  • n

    (int) –

    Number of clusters to sample

  • dag

    (DAG) –

    DAG for which to retrieve samples

  • user_id

    (int) –

    ID of the user requesting the samples

  • clients

    (dict[str, Any] | None, default: None ) –

    Dictionary from location names to valid client for each. Locations whose name is missing from the dictionary will be skipped, unless a default client is provided.

  • default_client

    (Any | None, default: None ) –

    Fallback client to use for all sources.

Returns:

  • dict[int, DataFrame]

    Dictionary of cluster ID to dataframe describing the cluster

Raises:

utils

Collection of client-side functions in aid of model evaluation.

Classes:

  • EvalData

    Object which caches evaluation data to measure performance of models.

Functions:

  • get_samples

    Retrieve samples enriched with source data, grouped by resolution cluster.

  • compare_models

    Compare metrics of models based on evaluation data.

EvalData

EvalData()

Object which caches evaluation data to measure performance of models.

Methods:

precision_recall
precision_recall(results: Results, threshold: float) -> PrecisionRecall

Computes precision and recall at one threshold.

get_samples

get_samples(n: int, dag: DAG, user_id: int, clients: dict[str, Any] | None = None, default_client: Any | None = None) -> dict[int, DataFrame]

Retrieve samples enriched with source data, grouped by resolution cluster.

Parameters:

  • n
    (int) –

    Number of clusters to sample

  • dag
    (DAG) –

    DAG for which to retrieve samples

  • user_id
    (int) –

    ID of the user requesting the samples

  • clients
    (dict[str, Any] | None, default: None ) –

    Dictionary from location names to valid client for each. Locations whose name is missing from the dictionary will be skipped, unless a default client is provided.

  • default_client
    (Any | None, default: None ) –

    Fallback client to use for all sources.

Returns:

  • dict[int, DataFrame]

    Dictionary of cluster ID to dataframe describing the cluster

Raises:

compare_models

Compare metrics of models based on evaluation data.

Parameters:

Returns:

  • ModelComparison

    A model comparison object, listing metrics for each model.