Skip to content

Evaluation

Python helpers

matchbox.client.eval

Public evaluation helpers for Matchbox clients.

Modules:

  • samples

    Client-side helpers for retrieving and preparing evaluation samples.

Classes:

  • EvalData

    Object which caches evaluation data to measure model performance.

  • EvaluationItem

    A cluster awaiting evaluation, with pre-computed display data.

Functions:

  • compare_models

    Compare metrics of models based on cached evaluation data.

  • create_evaluation_item

    Create EvaluationItem with pre-computed display data.

  • create_judgement

    Convert item assignments to Judgement - no default group assignment.

  • get_samples

    Retrieve samples enriched with source data as EvaluationItems.

  • precision_recall

    From models and eval data, compute scores inspired by precision-recall.

Attributes:

ModelComparison module-attribute

EvalData

EvalData()

Object which caches evaluation data to measure model performance.

Methods:

precision_recall

precision_recall(results: Results, threshold: float) -> tuple[float, float]

Compute precision and recall for a given Results object.

EvaluationItem

Bases: BaseModel

A cluster awaiting evaluation, with pre-computed display data.

Attributes:

model_config class-attribute instance-attribute

model_config = {'arbitrary_types_allowed': True}

cluster_id instance-attribute

cluster_id: int

dataframe instance-attribute

dataframe: DataFrame

display_data instance-attribute

display_data: dict[str, list[str]]

duplicate_groups instance-attribute

duplicate_groups: list[list[int]]

display_columns instance-attribute

display_columns: list[int]

assignments class-attribute instance-attribute

assignments: dict[int, str] = {}

compare_models

compare_models(resolutions: list[ModelResolutionPath]) -> ModelComparison

Compare metrics of models based on cached evaluation data.

create_evaluation_item

create_evaluation_item(df: DataFrame, source_configs: list[tuple[str, SourceConfig]], cluster_id: int) -> EvaluationItem

Create EvaluationItem with pre-computed display data.

create_judgement

create_judgement(item: EvaluationItem, user_id: int) -> Judgement

Convert item assignments to Judgement - no default group assignment.

Parameters:

  • item

    (EvaluationItem) –

    Evaluation item with assignments

  • user_id

    (int) –

    User ID for the judgement

Returns:

  • Judgement

    Judgement with endorsed groups based on assignments

get_samples

get_samples(n: int, dag: DAG, user_id: int, resolution: ModelResolutionName | None = None) -> dict[int, EvaluationItem]

Retrieve samples enriched with source data as EvaluationItems.

Parameters:

  • n

    (int) –

    Number of clusters to sample

  • dag

    (DAG) –

    DAG for which to retrieve samples

  • user_id

    (int) –

    ID of the user requesting the samples

  • resolution

    (ModelResolutionName | None, default: None ) –

    The optional resolution from which to sample. If not provided, the final step in the DAG is used

Returns:

Raises:

precision_recall

precision_recall(models_root_leaf: list[DataFrame], judgements: DataFrame, expansion: DataFrame) -> list[PrecisionRecall]

From models and eval data, compute scores inspired by precision-recall.

This function does the following:

  • Convert model and judgement clusters to implied pair-wise connections. For judgments, this includes the pairs shown to users, but rejected. Sum how many times pairs were endorsed (+1) or rejected (-1).
  • Keep only the pairs where leaves are present in all models and in the judgements, so the comparison is fair.
  • If a validation pair was rejected as many times as it was endorsed, discard it from both model and validation pairs.
  • If a validation pair was rejected more times than it was endorsed, remove it from validation pairs, but keep it in model pairs.
  • Precision and recall are computed for each model against validation pairs.

At the moment, this function ignores user IDs.

Parameters:

  • models_root_leaf

    (list[DataFrame]) –

    list of tables with root and leaf columns, one per model. They must include all the clusters that resolve from a model, all the way to the original source clusters if no model in the lineage merged them.

  • judgements

    (DataFrame) –

    Dataframe following matchbox.common.arrow.SCHEMA_JUDGEMENTS.

  • expansion

    (DataFrame) –

    Dataframe following matchbox.common.arrow.SCHEMA_CLUSTER_EXPANSION.

Returns:

CLI module

matchbox.client.cli.eval

Module implementing CLI evaluation app.

Modules:

  • app

    Main application for entity resolution evaluation.

  • modals

    Modal screens for entity resolution evaluation.

  • run

    CLI commands for entity evaluation.

  • widgets

    UI widgets for entity resolution evaluation tool.

Classes:

EntityResolutionApp

EntityResolutionApp(resolution: ModelResolutionName, user: str, num_samples: int = 5, dag: DAG | None = None, show_help: bool = False)

Bases: App

Main Textual application for entity resolution evaluation.

Parameters:

  • resolution

    (ModelResolutionName) –

    The model resolution to evaluate

  • num_samples

    (int, default: 5 ) –

    Number of clusters to sample for evaluation

  • user

    (str) –

    Username for authentication (overrides settings)

  • dag

    (DAG | None, default: None ) –

    Pre-loaded DAG with warehouse location attached

  • show_help

    (bool, default: False ) –

    Whether to show help on start

Methods:

Attributes:

CSS_PATH class-attribute instance-attribute

CSS_PATH = parent / 'styles.tcss'

TITLE class-attribute instance-attribute

TITLE = 'Matchbox evaluate'

SUB_TITLE class-attribute instance-attribute

SUB_TITLE = 'match labelling tool'

BINDINGS class-attribute instance-attribute

BINDINGS = [('right', 'skip', 'Skip'), ('space', 'submit', 'Submit'), ('escape', 'clear', 'Clear'), ('question_mark,f1', 'show_help', 'Help'), ('ctrl+q,ctrl+c', 'quit', 'Quit')]

current_group class-attribute instance-attribute

current_group: reactive[str] = reactive('')

status class-attribute instance-attribute

status: reactive[tuple[str, str]] = reactive(('○ Ready', 'dim'))

sample_limit instance-attribute

sample_limit: int = num_samples

resolution instance-attribute

resolution: ModelResolutionPath = resolution_path

user_id instance-attribute

user_id: int

user_name instance-attribute

user_name: str = user

dag instance-attribute

dag: DAG = dag

show_help instance-attribute

show_help: bool = show_help

queue instance-attribute

timer class-attribute instance-attribute

timer: Timer | None = None

compose

compose() -> ComposeResult

Compose the main application UI.

on_mount async

on_mount() -> None

Initialise the application.

on_key async

on_key(event: Key) -> None

Handle keyboard shortcuts for group assignment.

Textual’s basic key event handler. Handles keys beyond BINDINGS.

watch_status

watch_status(new_value: tuple[str, str]) -> None

React to status changes.

watch_current_group

watch_current_group(new_value: str) -> None

React to current group changes.

authenticate async

authenticate() -> None

Authenticate with the server.

load_samples async

load_samples() -> None

Load evaluation samples from the server.

action_skip async

action_skip() -> None

Skip current entity (moves to back of queue).

action_submit async

action_submit() -> None

Submit current entity if fully painted.

action_clear async

action_clear() -> None

Clear current entity’s group assignments.

action_show_help async

action_show_help() -> None

Show the help modal.

action_show_no_samples async

action_show_no_samples() -> None

Show the no samples modal.

action_quit async

action_quit() -> None

Quit the application.

app

Main application for entity resolution evaluation.

Classes:

Attributes:

logger module-attribute

logger = getLogger(__name__)

EvaluationQueue

EvaluationQueue()

Deque-based queue with current item always at front.

Methods:

Attributes:

items instance-attribute
current property
current: EvaluationItem | None

Get the current item (always at index 0).

total_count property
total_count: int

Total number of items in queue.

skip_current
skip_current() -> None

Move current to back of queue.

remove_current
remove_current() -> EvaluationItem | None

Remove and return current item.

add_items
add_items(items: list[EvaluationItem]) -> int

Add new items to queue, preventing duplicates.

Returns:

  • int

    Number of unique items added.

EntityResolutionApp

EntityResolutionApp(resolution: ModelResolutionName, user: str, num_samples: int = 5, dag: DAG | None = None, show_help: bool = False)

Bases: App

Main Textual application for entity resolution evaluation.

Parameters:

  • resolution
    (ModelResolutionName) –

    The model resolution to evaluate

  • num_samples
    (int, default: 5 ) –

    Number of clusters to sample for evaluation

  • user
    (str) –

    Username for authentication (overrides settings)

  • dag
    (DAG | None, default: None ) –

    Pre-loaded DAG with warehouse location attached

  • show_help
    (bool, default: False ) –

    Whether to show help on start

Methods:

Attributes:

CSS_PATH class-attribute instance-attribute
CSS_PATH = parent / 'styles.tcss'
TITLE class-attribute instance-attribute
TITLE = 'Matchbox evaluate'
SUB_TITLE class-attribute instance-attribute
SUB_TITLE = 'match labelling tool'
BINDINGS class-attribute instance-attribute
BINDINGS = [('right', 'skip', 'Skip'), ('space', 'submit', 'Submit'), ('escape', 'clear', 'Clear'), ('question_mark,f1', 'show_help', 'Help'), ('ctrl+q,ctrl+c', 'quit', 'Quit')]
current_group class-attribute instance-attribute
current_group: reactive[str] = reactive('')
status class-attribute instance-attribute
status: reactive[tuple[str, str]] = reactive(('○ Ready', 'dim'))
user_id instance-attribute
user_id: int
timer class-attribute instance-attribute
timer: Timer | None = None
queue instance-attribute
sample_limit instance-attribute
sample_limit: int = num_samples
user_name instance-attribute
user_name: str = user
dag instance-attribute
dag: DAG = dag
resolution instance-attribute
resolution: ModelResolutionPath = resolution_path
show_help instance-attribute
show_help: bool = show_help
compose
compose() -> ComposeResult

Compose the main application UI.

on_mount async
on_mount() -> None

Initialise the application.

on_key async
on_key(event: Key) -> None

Handle keyboard shortcuts for group assignment.

Textual’s basic key event handler. Handles keys beyond BINDINGS.

watch_status
watch_status(new_value: tuple[str, str]) -> None

React to status changes.

watch_current_group
watch_current_group(new_value: str) -> None

React to current group changes.

authenticate async
authenticate() -> None

Authenticate with the server.

load_samples async
load_samples() -> None

Load evaluation samples from the server.

action_skip async
action_skip() -> None

Skip current entity (moves to back of queue).

action_submit async
action_submit() -> None

Submit current entity if fully painted.

action_clear async
action_clear() -> None

Clear current entity’s group assignments.

action_show_help async
action_show_help() -> None

Show the help modal.

action_show_no_samples async
action_show_no_samples() -> None

Show the no samples modal.

action_quit async
action_quit() -> None

Quit the application.

modals

Modal screens for entity resolution evaluation.

Classes:

  • HelpModal

    Help screen showing commands and shortcuts.

  • NoSamplesModal

    Modal screen showing no samples are available.

Attributes:

HELP_TEXT module-attribute

HELP_TEXT = strip()

NO_SAMPLES_TEXT module-attribute

NO_SAMPLES_TEXT = strip()

HelpModal

Bases: ModalScreen

Help screen showing commands and shortcuts.

Methods:

  • compose

    Compose the help modal UI.

  • close_help

    Close the help modal.

  • on_key

    Handle key events for closing the help modal.

compose
compose() -> ComposeResult

Compose the help modal UI.

close_help
close_help() -> None

Close the help modal.

on_key
on_key(event: Key) -> None

Handle key events for closing the help modal.

NoSamplesModal

Bases: ModalScreen

Modal screen showing no samples are available.

Methods:

  • compose

    Compose the no samples modal UI.

  • quit_app

    Quit the application.

  • on_key

    Handle key events for the no samples modal.

compose
compose() -> ComposeResult

Compose the no samples modal UI.

quit_app
quit_app() -> None

Quit the application.

on_key
on_key(event: Key) -> None

Handle key events for the no samples modal.

run

CLI commands for entity evaluation.

Functions:

  • evaluate

    Start the interactive entity resolution evaluation tool.

evaluate

evaluate(collection: Annotated[str, Option(--collection, -c, help='Collection name (required)')], resolution: Annotated[str | None, Option(--resolution, -r, help="Resolution name (defaults to collection's final_step)")] = None, pending: Annotated[bool, Option(--pending, -p, help='Whether to evaluate the pending DAG, instead of the default')] = False, user: Annotated[str | None, Option(--user, -u, help='Username for authentication (overrides settings)')] = None, warehouse: Annotated[str | None, Option(--warehouse, -w, help='Warehouse database connection string (e.g. postgresql://user:pass@host/db)')] = None, log_file: Annotated[str | None, Option(--log, help='Log file path to redirect all logging output (keeps UI clean)')] = None) -> None

Start the interactive entity resolution evaluation tool.

Requires a warehouse connection to fetch source data for evaluation clusters.

Example

matchbox eval –collection companies –warehouse postgresql://user:pass@localhost/warehouse

widgets

UI widgets for entity resolution evaluation tool.

Modules:

  • styling

    Styling utilities for entity resolution evaluation UI.

  • table

    Comparison display table for entity resolution evaluation.

styling

Styling utilities for entity resolution evaluation UI.

Functions:

Attributes:

GROUP_STYLES module-attribute
GROUP_STYLES = {'q': ('■', '#e53935'), 'w': ('●', '#43a047'), 'e': ('▲', '#1e88e5'), 'r': ('◆', '#fdd835'), 't': ('★', '#d81b60'), 'y': ('⬢', '#00acc1'), 'u': ('♦', '#fb8c00'), 'i': ('▼', '#00897b'), 'o': ('○', '#8e24aa'), 'p': ('△', '#6d4c41'), 'a': ('◇', '#d81b60'), 's': ('☆', '#00acc1'), 'd': ('⬡', '#e53935'), 'f': ('✦', '#43a047'), 'g': ('✧', '#1e88e5'), 'h': ('⟐', '#fdd835'), 'j': ('✚', '#d81b60'), 'k': ('✖', '#3949ab'), 'l': ('□', '#e53935'), 'z': ('▽', '#fb8c00'), 'x': ('◯', '#8e24aa'), 'c': ('◻', '#fdd835'), 'v': ('◼', '#6d4c41'), 'b': ('⬤', '#e53935'), 'n': ('☑', '#43a047'), 'm': ('✤', '#1e88e5')}
get_group_style
get_group_style(group: str) -> tuple[str, str]

Get symbol and colour for a group letter.

Parameters:

  • group
    (str) –

    Single letter group identifier (a-z)

Returns:

get_display_text
get_display_text(group: str, count: int) -> tuple[str, str]

Get formatted display text with colour and symbol.

Parameters:

  • group
    (str) –

    Single letter group identifier (a-z) or “unassigned”

  • count
    (int) –

    Number of items in this group

Returns:

  • tuple[str, str]

    Tuple of (formatted_text, colour)

generate_css_classes
generate_css_classes() -> str

Generate CSS classes for all groups.

Returns:

  • str

    CSS string with all group styling classes

table

Comparison display table for entity resolution evaluation.

Classes:

ComparisonDisplayTable
ComparisonDisplayTable(**kwargs: dict[str, Any])

Bases: Widget

Table for side-by-side record comparison with colour highlighting.

Methods:

Attributes:

current_item instance-attribute
current_item: EvaluationItem | None = None
load_comparison
load_comparison(item: EvaluationItem) -> None

Load new comparison data.

Parameters:

render
render() -> Table

Render the table with current item.