Skip to content

Evaluation

Python helpers

matchbox.client.eval

Public evaluation helpers for Matchbox clients.

Modules:

  • samples

    Client-side helpers for retrieving and preparing evaluation samples.

Classes:

Functions:

  • create_evaluation_item

    Create EvaluationItem with structured metadata.

  • create_judgement

    Convert item assignments to Judgement - no default group assignment.

  • get_samples

    Retrieve samples enriched with source data as EvaluationItems.

  • precision_recall

    From models and eval data, compute scores inspired by precision-recall.

EvalData

EvalData(tag: str | None = None)

Object which caches evaluation data to measure model performance.

Methods:

Attributes:

tag instance-attribute

tag = tag

precision_recall

precision_recall(results: ModelResults, threshold: float) -> tuple[float, float]

Compute precision and recall for a given Results object.

EvaluationFieldMetadata

Bases: BaseModel

Metadata for a field in evaluation.

Attributes:

display_name instance-attribute

display_name: str

source_columns instance-attribute

source_columns: list[str]

EvaluationItem

Bases: BaseModel

A cluster ready for evaluation.

The records dataframe contains the leaf IDs and the qualified index fields associated with it. For example:

leaf_id src_a_first src_a_last src_b_first src_b_last
1 Thomas Bayes
2 Tommy B
12 Tom Bayes

The fields attribute allows any evaluation system to map between a display version of the source columns, and the actual columns contained in the records dataframe. For example:

{
    "display_name": "first",
    "source_columns": "src_a_first", "src_b_first"
}

Methods:

Attributes:

model_config class-attribute instance-attribute

model_config = {'arbitrary_types_allowed': True}

leaves instance-attribute

leaves: list[int]

records instance-attribute

records: DataFrame

fields instance-attribute

get_unique_record_groups

get_unique_record_groups() -> list[list[int]]

Group identical records by leaf ID.

Returns:

  • list[list[int]]

    List of groups, where each group is a list of leaf IDs

  • list[list[int]]

    that have identical values across all data fields.

  • Example ( list[list[int]] ) –

    [[1, 3], [2], [4, 5, 6]] means records 1 & 3 are identical.

create_evaluation_item

create_evaluation_item(df: DataFrame, source_configs: list[tuple[str, SourceConfig]], leaves: list[int]) -> EvaluationItem

Create EvaluationItem with structured metadata.

create_judgement

create_judgement(item: EvaluationItem, assignments: dict[int, str], tag: str | None = None) -> Judgement

Convert item assignments to Judgement - no default group assignment.

Parameters:

  • item

    (EvaluationItem) –

    evaluation item

  • assignments

    (dict[int, str]) –

    column assignments (group_idx -> group_letter)

  • tag

    (str | None, default: None ) –

    string by which to tag the judgement

Returns:

  • Judgement

    Judgement with endorsed groups based on assignments

get_samples

get_samples(n: int, dag: DAG, resolution: ModelResolutionName | None = None, sample_file: str | None = None) -> dict[int, EvaluationItem]

Retrieve samples enriched with source data as EvaluationItems.

Parameters:

  • n

    (int) –

    Number of clusters to sample

  • dag

    (DAG) –

    DAG for which to retrieve samples

  • resolution

    (ModelResolutionName | None, default: None ) –

    The optional resolution from which to sample. If not set, the final step in the DAG is used. If sample_file is set, resolution is ignored

  • sample_file

    (str | None, default: None ) –

    path to parquet file output by ResolvedMatches. If specified, won’t sample from server, ignoring the resolution argument

Returns:

Raises:

precision_recall

precision_recall(models_root_leaf: list[DataFrame], judgements: DataFrame, expansion: DataFrame) -> list[PrecisionRecall]

From models and eval data, compute scores inspired by precision-recall.

This function does the following:

  • Convert model and judgement clusters to implied pair-wise connections. For judgments, this includes the pairs shown to users, but rejected. Sum how many times pairs were endorsed (+1) or rejected (-1).
  • Keep only the pairs where leaves are present in all models and in the judgements, so the comparison is fair.
  • If a validation pair was rejected as many times as it was endorsed, discard it from both model and validation pairs.
  • If a validation pair was rejected more times than it was endorsed, remove it from validation pairs, but keep it in model pairs.
  • Precision and recall are computed for each model against validation pairs.

At the moment, this function ignores user IDs.

Parameters:

  • models_root_leaf

    (list[DataFrame]) –

    list of tables with root and leaf columns, one per model. They must include all the clusters that resolve from a model, all the way to the original source clusters if no model in the lineage merged them.

  • judgements

    (DataFrame) –

    Dataframe following matchbox.common.arrow.SCHEMA_JUDGEMENTS.

  • expansion

    (DataFrame) –

    Dataframe following matchbox.common.arrow.SCHEMA_CLUSTER_EXPANSION.

Returns:

CLI module

matchbox.client.cli.eval

Module implementing CLI evaluation app.

Modules:

  • app

    Main application for entity resolution evaluation.

  • modals

    Modal screens for entity resolution evaluation.

  • run

    CLI commands for entity evaluation.

  • widgets

    UI widgets for entity resolution evaluation tool.

Classes:

EntityResolutionApp

EntityResolutionApp(num_samples: int = 5, resolution: ModelResolutionPath | None = None, dag: DAG | None = None, session_tag: str | None = None, sample_file: str | None = None, show_help: bool = False, scroll_debounce_delay: float | None = 0.3)

Bases: App

Main Textual application for entity resolution evaluation.

Parameters:

  • resolution

    (ModelResolutionPath | None, default: None ) –

    The model resolution to evaluate

  • num_samples

    (int, default: 5 ) –

    Number of clusters to sample for evaluation

  • dag

    (DAG | None, default: None ) –

    Pre-loaded DAG with warehouse location attached

  • session_tag

    (str | None, default: None ) –

    String to use for tagging judgements

  • sample_file

    (str | None, default: None ) –

    Path to pre-compiled sample file. If set, ignores resolutions.

  • show_help

    (bool, default: False ) –

    Whether to show help on start

  • scroll_debounce_delay

    (float | None, default: 0.3 ) –

    Delay before updating column headers after scroll. Set to None to disable debouncing (useful for tests).

Methods:

Attributes:

CSS_PATH class-attribute instance-attribute

CSS_PATH = parent / 'styles.tcss'

TITLE class-attribute instance-attribute

TITLE = 'Matchbox evaluate'

SUB_TITLE class-attribute instance-attribute

SUB_TITLE = 'match labelling tool'

BINDINGS class-attribute instance-attribute

BINDINGS = [('shift+right', 'skip', 'Skip'), ('space', 'submit', 'Submit'), ('escape', 'clear', 'Clear'), ('question_mark,f1', 'show_help', 'Help'), ('ctrl+q,ctrl+c', 'quit', 'Quit')]

status class-attribute instance-attribute

status: reactive[tuple[str, str]] = reactive(('○ Ready', 'dim'))

current_item class-attribute instance-attribute

current_item: reactive[EvaluationItem | None] = reactive(None)

current_assignments class-attribute instance-attribute

current_assignments: reactive[dict[int, str]] = reactive({}, init=False)

sample_limit instance-attribute

sample_limit: int = num_samples

resolution instance-attribute

dag instance-attribute

dag: DAG = dag

session_tag instance-attribute

session_tag: str | None = session_tag

sample_file instance-attribute

sample_file: str | None = sample_file

show_help instance-attribute

show_help: bool = show_help

queue instance-attribute

timer class-attribute instance-attribute

timer: Timer | None = None

compose

compose() -> ComposeResult

Compose the main application UI.

on_mount async

on_mount() -> None

Initialise the application.

on_comparison_display_table_assignment_made

on_comparison_display_table_assignment_made(message: AssignmentMade) -> None

Update reactive assignments when table reports an assignment.

on_comparison_display_table_current_group_changed

on_comparison_display_table_current_group_changed(message: CurrentGroupChanged) -> None

Update current group label when table’s current group changes.

watch_status

watch_status() -> None

React to status changes.

watch_current_item

watch_current_item(item: EvaluationItem | None) -> None

React to item changes - propagate to table and reset assignment bar.

watch_current_assignments

watch_current_assignments(assignments: dict[int, str]) -> None

React to assignment changes - propagate to table.

load_samples async

load_samples() -> None

Load evaluation samples from the server.

action_skip async

action_skip() -> None

Skip current entity (moves to back of queue).

action_submit async

action_submit() -> None

Submit current entity if fully painted.

action_clear async

action_clear() -> None

Clear current entity’s group assignments.

action_show_help async

action_show_help() -> None

Show the help modal.

action_show_no_samples async

action_show_no_samples() -> None

Show the no samples modal.

action_quit async

action_quit() -> None

Quit the application.

app

Main application for entity resolution evaluation.

Classes:

Attributes:

logger module-attribute

logger = getLogger(__name__)

CLIEvaluationSession

Bases: BaseModel

CLI evaluation session state.

Used by queue to store items with their assignments.

Attributes:

model_config class-attribute instance-attribute
model_config = ConfigDict(arbitrary_types_allowed=True)
item instance-attribute
assignments class-attribute instance-attribute
assignments: dict[int, str] = Field(default_factory=dict)

EvaluationQueue

EvaluationQueue()

Deque-based queue with current session always at front.

Methods:

Attributes:

sessions instance-attribute
current property
current: CLIEvaluationSession | None

Get the current session (always at index 0).

total_count property
total_count: int

Total number of sessions in queue.

skip_current
skip_current() -> None

Move current to back of queue.

remove_current
remove_current() -> CLIEvaluationSession | None

Remove and return current session.

add_sessions
add_sessions(sessions: list[CLIEvaluationSession]) -> int

Add new sessions to queue, preventing duplicates.

Returns:

  • int

    Number of unique sessions added.

EntityResolutionApp

EntityResolutionApp(num_samples: int = 5, resolution: ModelResolutionPath | None = None, dag: DAG | None = None, session_tag: str | None = None, sample_file: str | None = None, show_help: bool = False, scroll_debounce_delay: float | None = 0.3)

Bases: App

Main Textual application for entity resolution evaluation.

Parameters:

  • resolution
    (ModelResolutionPath | None, default: None ) –

    The model resolution to evaluate

  • num_samples
    (int, default: 5 ) –

    Number of clusters to sample for evaluation

  • dag
    (DAG | None, default: None ) –

    Pre-loaded DAG with warehouse location attached

  • session_tag
    (str | None, default: None ) –

    String to use for tagging judgements

  • sample_file
    (str | None, default: None ) –

    Path to pre-compiled sample file. If set, ignores resolutions.

  • show_help
    (bool, default: False ) –

    Whether to show help on start

  • scroll_debounce_delay
    (float | None, default: 0.3 ) –

    Delay before updating column headers after scroll. Set to None to disable debouncing (useful for tests).

Methods:

Attributes:

CSS_PATH class-attribute instance-attribute
CSS_PATH = parent / 'styles.tcss'
TITLE class-attribute instance-attribute
TITLE = 'Matchbox evaluate'
SUB_TITLE class-attribute instance-attribute
SUB_TITLE = 'match labelling tool'
BINDINGS class-attribute instance-attribute
BINDINGS = [('shift+right', 'skip', 'Skip'), ('space', 'submit', 'Submit'), ('escape', 'clear', 'Clear'), ('question_mark,f1', 'show_help', 'Help'), ('ctrl+q,ctrl+c', 'quit', 'Quit')]
status class-attribute instance-attribute
status: reactive[tuple[str, str]] = reactive(('○ Ready', 'dim'))
current_item class-attribute instance-attribute
current_item: reactive[EvaluationItem | None] = reactive(None)
current_assignments class-attribute instance-attribute
current_assignments: reactive[dict[int, str]] = reactive({}, init=False)
timer class-attribute instance-attribute
timer: Timer | None = None
resolution instance-attribute
sample_limit instance-attribute
sample_limit: int = num_samples
dag instance-attribute
dag: DAG = dag
session_tag instance-attribute
session_tag: str | None = session_tag
sample_file instance-attribute
sample_file: str | None = sample_file
show_help instance-attribute
show_help: bool = show_help
queue instance-attribute
compose
compose() -> ComposeResult

Compose the main application UI.

on_mount async
on_mount() -> None

Initialise the application.

on_comparison_display_table_assignment_made
on_comparison_display_table_assignment_made(message: AssignmentMade) -> None

Update reactive assignments when table reports an assignment.

on_comparison_display_table_current_group_changed
on_comparison_display_table_current_group_changed(message: CurrentGroupChanged) -> None

Update current group label when table’s current group changes.

watch_status
watch_status() -> None

React to status changes.

watch_current_item
watch_current_item(item: EvaluationItem | None) -> None

React to item changes - propagate to table and reset assignment bar.

watch_current_assignments
watch_current_assignments(assignments: dict[int, str]) -> None

React to assignment changes - propagate to table.

load_samples async
load_samples() -> None

Load evaluation samples from the server.

action_skip async
action_skip() -> None

Skip current entity (moves to back of queue).

action_submit async
action_submit() -> None

Submit current entity if fully painted.

action_clear async
action_clear() -> None

Clear current entity’s group assignments.

action_show_help async
action_show_help() -> None

Show the help modal.

action_show_no_samples async
action_show_no_samples() -> None

Show the no samples modal.

action_quit async
action_quit() -> None

Quit the application.

modals

Modal screens for entity resolution evaluation.

Classes:

  • HelpModal

    Help screen showing commands and shortcuts.

  • NoSamplesModal

    Modal screen showing no samples are available.

Attributes:

HELP_TEXT module-attribute

HELP_TEXT = strip()

NO_SAMPLES_TEXT module-attribute

NO_SAMPLES_TEXT = strip()

HelpModal

Bases: ModalScreen

Help screen showing commands and shortcuts.

Methods:

  • compose

    Compose the help modal UI.

  • close_help

    Close the help modal.

  • on_key

    Handle key events for closing the help modal.

compose
compose() -> ComposeResult

Compose the help modal UI.

close_help
close_help() -> None

Close the help modal.

on_key
on_key(event: Key) -> None

Handle key events for closing the help modal.

NoSamplesModal

Bases: ModalScreen

Modal screen showing no samples are available.

Methods:

  • compose

    Compose the no samples modal UI.

  • quit_app

    Quit the application.

  • on_key

    Handle key events for the no samples modal.

compose
compose() -> ComposeResult

Compose the no samples modal UI.

quit_app
quit_app() -> None

Quit the application.

on_key
on_key(event: Key) -> None

Handle key events for the no samples modal.

run

CLI commands for entity evaluation.

Functions:

  • evaluate

    Start the interactive entity resolution evaluation tool.

evaluate

evaluate(collection: CollectionOpt, resolution: Annotated[str | None, Option(--resolution, -r, help="Resolution name (defaults to collection's final_step)")] = None, pending: Annotated[bool, Option(--pending, -p, help='Whether to evaluate the pending DAG, instead of the default')] = False, warehouse: Annotated[str | None, Option(--warehouse, -w, help='Warehouse database connection string (e.g. postgresql://user:pass@host/db)')] = None, log_file: Annotated[str | None, Option(--log, help='Log file path to redirect all logging output (keeps UI clean)')] = None, sample_file: Annotated[str | None, Option(--file, -f, help='Pre-compiled sample file. If set, ignores resolutions parameters.')] = None, session_tag: Annotated[str | None, Option(--tag, -t, help='String to use to tag judgements sent to the server.')] = None) -> None

Start the interactive entity resolution evaluation tool.

Requires a warehouse connection to fetch source data for evaluation clusters.

Example

matchbox eval –collection companies –warehouse postgresql://user:pass@localhost/warehouse

widgets

UI widgets for entity resolution evaluation tool.

Modules:

  • assignment

    AssignmentBar widget for displaying column assignments in the evaluation UI.

  • styling

    Styling utilities for entity resolution evaluation UI.

  • table

    Comparison display table for entity resolution evaluation.

assignment

AssignmentBar widget for displaying column assignments in the evaluation UI.

Classes:

  • GroupSelect

    A letter and colour combination representing a group assignment.

  • AssignmentBar

    A status bar widget showing column assignments as a visual bar.

GroupSelect

Bases: NamedTuple

A letter and colour combination representing a group assignment.

Attributes:

letter instance-attribute
letter: str
colour instance-attribute
colour: str
AssignmentBar
AssignmentBar(**kwargs: Any)

Bases: Static

A status bar widget showing column assignments as a visual bar.

Each position represents a column in the comparison table. Positions can be:

- None (unassigned): displayed as dim •
- GroupSelect: displayed as letter (first occurrence) or • (subsequent blocks)

Consecutive positions with the same group assignment are rendered in that group’s colour, showing the letter only at the first position.

Methods:

  • reset

    Reset the bar with a new number of positions.

  • set_position

    Set a position to a letter and colour.

Attributes:

positions instance-attribute
positions: list[GroupSelect | None] = []
reset
reset(num_positions: int) -> None

Reset the bar with a new number of positions.

Parameters:

  • num_positions (int) –

    The number of columns to represent

set_position
set_position(index: int, letter: str, colour: str) -> None

Set a position to a letter and colour.

Parameters:

  • index (int) –

    The position index to set

  • letter (str) –

    The group letter (a-z)

  • colour (str) –

    The colour name for the group

styling

Styling utilities for entity resolution evaluation UI.

Functions:

Attributes:

GROUP_STYLES module-attribute
GROUP_STYLES = {'q': ('■', '#e53935'), 'w': ('●', '#43a047'), 'e': ('▲', '#1e88e5'), 'r': ('◆', '#fdd835'), 't': ('★', '#d81b60'), 'y': ('⬢', '#00acc1'), 'u': ('♦', '#fb8c00'), 'i': ('▼', '#00897b'), 'o': ('○', '#8e24aa'), 'p': ('△', '#6d4c41'), 'a': ('◇', '#d81b60'), 's': ('☆', '#00acc1'), 'd': ('⬡', '#e53935'), 'f': ('✦', '#43a047'), 'g': ('✧', '#1e88e5'), 'h': ('⟐', '#fdd835'), 'j': ('✚', '#d81b60'), 'k': ('✖', '#3949ab'), 'l': ('□', '#e53935'), 'z': ('▽', '#fb8c00'), 'x': ('◯', '#8e24aa'), 'c': ('◻', '#fdd835'), 'v': ('◼', '#6d4c41'), 'b': ('⬤', '#e53935'), 'n': ('☑', '#43a047'), 'm': ('✤', '#1e88e5')}
get_group_style
get_group_style(group: str) -> tuple[str, str]

Get symbol and colour for a group letter.

Parameters:

  • group
    (str) –

    Single letter group identifier (a-z)

Returns:

get_display_text
get_display_text(group: str, count: int) -> tuple[str, str]

Get formatted display text with colour and symbol.

Parameters:

  • group
    (str) –

    Single letter group identifier (a-z) or “unassigned”

  • count
    (int) –

    Number of items in this group

Returns:

  • tuple[str, str]

    Tuple of (formatted_text, colour)

generate_css_classes
generate_css_classes() -> str

Generate CSS classes for all groups.

Returns:

  • str

    CSS string with all group styling classes

table

Comparison display table for entity resolution evaluation.

Classes:

ComparisonDisplayTable
ComparisonDisplayTable(scroll_debounce_delay: float | None = 0.3, **kwargs: Any)

Bases: DataTable

DataTable for comparing records with keyboard-driven assignment.

We use the DataTable’s internal cursor (even though hidden) as the “Anchor” for the 1-9 column labels.

  • Cursor Column “N” is labelled “1”.
  • Cursor Column “N+1” is labelled “2”, etc.
  • Paging Right/Left moves the cursor by +/- 9 columns and forces a scroll alignment to the left edge, ensuring the labels remain visible.

Classes:

Methods:

Attributes:

current_item class-attribute instance-attribute
current_item: reactive[EvaluationItem | None] = reactive(None)
current_assignments class-attribute instance-attribute
current_assignments: reactive[dict[int, str]] = reactive({}, init=False)
current_group class-attribute instance-attribute
current_group: reactive[str] = reactive('')
table_ready class-attribute instance-attribute
table_ready: reactive[bool] = reactive(False)
BINDINGS class-attribute instance-attribute
BINDINGS = [Binding('up', 'page_up', 'Page up', show=False), Binding('down', 'page_down', 'Page down', show=False), Binding('right', 'page_right', 'Page right', show=False), Binding('left', 'page_left', 'Page left', show=False)]
zebra_stripes instance-attribute
zebra_stripes = True
cursor_type instance-attribute
cursor_type = 'column'
show_cursor instance-attribute
show_cursor = False
fixed_columns instance-attribute
fixed_columns = 1
AssignmentMade
AssignmentMade(column_idx: int, group: str)

Bases: Message

Posted when a single column assignment is made.

Attributes:

column_idx instance-attribute
column_idx = column_idx
group instance-attribute
group = group
CurrentGroupChanged
CurrentGroupChanged(group: str)

Bases: Message

Posted when current group selection changes.

Attributes:

group instance-attribute
group = group
watch_current_item
watch_current_item(item: EvaluationItem | None) -> None

Rebuild table when item changes (Textual reactive pattern).

watch_cursor_coordinate
watch_cursor_coordinate(old_value: float, new_value: float) -> None

Update headers whenever the cursor moves (Manual scroll or Paging).

watch_current_assignments
watch_current_assignments(old_assignments: dict[int, str], new_assignments: dict[int, str]) -> None

Update column headers and cells when assignments change.

on_key
on_key(event: Key) -> None

Handle keyboard shortcuts for assignment.

action_page_right
action_page_right() -> None

Move the anchor (cursor) right by 9 columns and force scroll.

action_page_left
action_page_left() -> None

Move the anchor (cursor) left by 9 columns and force scroll.