Evaluation¶
Python helpers¶
matchbox.client.eval
¶
Public evaluation helpers for Matchbox clients.
Modules:
-
samples–Client-side helpers for retrieving and preparing evaluation samples.
Classes:
-
EvalData–Object which caches evaluation data to measure model performance.
-
EvaluationFieldMetadata–Metadata for a field in evaluation.
-
EvaluationItem–A cluster ready for evaluation.
Functions:
-
create_evaluation_item–Create EvaluationItem with structured metadata.
-
create_judgement–Convert item assignments to Judgement - no default group assignment.
-
get_samples–Retrieve samples enriched with source data as EvaluationItems.
-
precision_recall–From models and eval data, compute scores inspired by precision-recall.
EvalData
¶
EvalData(tag: str | None = None)
Object which caches evaluation data to measure model performance.
Methods:
-
precision_recall–Compute precision and recall for a given Results object.
Attributes:
-
tag–
EvaluationFieldMetadata
¶
Bases: BaseModel
Metadata for a field in evaluation.
Attributes:
-
display_name(str) – -
source_columns(list[str]) –
EvaluationItem
¶
Bases: BaseModel
A cluster ready for evaluation.
The records dataframe contains the leaf IDs and the qualified index fields associated with it. For example:
| leaf_id | src_a_first | src_a_last | src_b_first | src_b_last |
|---|---|---|---|---|
| 1 | Thomas | Bayes | ||
| 2 | Tommy | B | ||
| 12 | Tom | Bayes |
The fields attribute allows any evaluation system to map between a display version of the source columns, and the actual columns contained in the records dataframe. For example:
Methods:
-
get_unique_record_groups–Group identical records by leaf ID.
Attributes:
-
model_config– -
leaves(list[int]) – -
records(DataFrame) – -
fields(list[EvaluationFieldMetadata]) –
get_unique_record_groups
¶
create_evaluation_item
¶
create_evaluation_item(df: DataFrame, source_configs: list[tuple[str, SourceConfig]], leaves: list[int]) -> EvaluationItem
Create EvaluationItem with structured metadata.
create_judgement
¶
create_judgement(item: EvaluationItem, assignments: dict[int, str], tag: str | None = None) -> Judgement
Convert item assignments to Judgement - no default group assignment.
Parameters:
-
(item¶EvaluationItem) –evaluation item
-
(assignments¶dict[int, str]) –column assignments (group_idx -> group_letter)
-
(tag¶str | None, default:None) –string by which to tag the judgement
Returns:
-
Judgement–Judgement with endorsed groups based on assignments
get_samples
¶
get_samples(n: int, dag: DAG, resolution: ModelResolutionName | None = None, sample_file: str | None = None) -> dict[int, EvaluationItem]
Retrieve samples enriched with source data as EvaluationItems.
Parameters:
-
(n¶int) –Number of clusters to sample
-
(dag¶DAG) –DAG for which to retrieve samples
-
(resolution¶ModelResolutionName | None, default:None) –The optional resolution from which to sample. If not set, the final step in the DAG is used. If sample_file is set, resolution is ignored
-
(sample_file¶str | None, default:None) –path to parquet file output by ResolvedMatches. If specified, won’t sample from server, ignoring the resolution argument
Returns:
-
dict[int, EvaluationItem]–Dictionary of cluster ID to EvaluationItems describing the cluster
Raises:
-
MatchboxSourceTableError–If a source cannot be queried from a location using provided or default clients.
precision_recall
¶
precision_recall(models_root_leaf: list[DataFrame], judgements: DataFrame, expansion: DataFrame) -> list[PrecisionRecall]
From models and eval data, compute scores inspired by precision-recall.
This function does the following:
- Convert model and judgement clusters to implied pair-wise connections. For judgments, this includes the pairs shown to users, but rejected. Sum how many times pairs were endorsed (+1) or rejected (-1).
- Keep only the pairs where leaves are present in all models and in the judgements, so the comparison is fair.
- If a validation pair was rejected as many times as it was endorsed, discard it from both model and validation pairs.
- If a validation pair was rejected more times than it was endorsed, remove it from validation pairs, but keep it in model pairs.
- Precision and recall are computed for each model against validation pairs.
At the moment, this function ignores user IDs.
Parameters:
-
(models_root_leaf¶list[DataFrame]) –list of tables with root and leaf columns, one per model. They must include all the clusters that resolve from a model, all the way to the original source clusters if no model in the lineage merged them.
-
(judgements¶DataFrame) –Dataframe following
matchbox.common.arrow.SCHEMA_JUDGEMENTS. -
(expansion¶DataFrame) –Dataframe following
matchbox.common.arrow.SCHEMA_CLUSTER_EXPANSION.
Returns:
-
list[PrecisionRecall]–List of tuples of precision and recall scores, one per model.
CLI module¶
matchbox.client.cli.eval
¶
Module implementing CLI evaluation app.
Modules:
-
app–Main application for entity resolution evaluation.
-
modals–Modal screens for entity resolution evaluation.
-
run–CLI commands for entity evaluation.
-
widgets–UI widgets for entity resolution evaluation tool.
Classes:
-
EntityResolutionApp–Main Textual application for entity resolution evaluation.
EntityResolutionApp
¶
EntityResolutionApp(num_samples: int = 5, resolution: ModelResolutionPath | None = None, dag: DAG | None = None, session_tag: str | None = None, sample_file: str | None = None, show_help: bool = False, scroll_debounce_delay: float | None = 0.3)
Bases: App
Main Textual application for entity resolution evaluation.
Parameters:
-
(resolution¶ModelResolutionPath | None, default:None) –The model resolution to evaluate
-
(num_samples¶int, default:5) –Number of clusters to sample for evaluation
-
(dag¶DAG | None, default:None) –Pre-loaded DAG with warehouse location attached
-
(session_tag¶str | None, default:None) –String to use for tagging judgements
-
(sample_file¶str | None, default:None) –Path to pre-compiled sample file. If set, ignores resolutions.
-
(show_help¶bool, default:False) –Whether to show help on start
-
(scroll_debounce_delay¶float | None, default:0.3) –Delay before updating column headers after scroll. Set to None to disable debouncing (useful for tests).
Methods:
-
compose–Compose the main application UI.
-
on_mount–Initialise the application.
-
on_comparison_display_table_assignment_made–Update reactive assignments when table reports an assignment.
-
on_comparison_display_table_current_group_changed–Update current group label when table’s current group changes.
-
watch_status–React to status changes.
-
watch_current_item–React to item changes - propagate to table and reset assignment bar.
-
watch_current_assignments–React to assignment changes - propagate to table.
-
load_samples–Load evaluation samples from the server.
-
action_skip–Skip current entity (moves to back of queue).
-
action_submit–Submit current entity if fully painted.
-
action_clear–Clear current entity’s group assignments.
-
action_show_help–Show the help modal.
-
action_show_no_samples–Show the no samples modal.
-
action_quit–Quit the application.
Attributes:
-
CSS_PATH– -
TITLE– -
SUB_TITLE– -
BINDINGS– -
status(reactive[tuple[str, str]]) – -
current_item(reactive[EvaluationItem | None]) – -
current_assignments(reactive[dict[int, str]]) – -
sample_limit(int) – -
resolution(ModelResolutionPath) – -
dag(DAG) – -
session_tag(str | None) – -
sample_file(str | None) – -
show_help(bool) – -
queue(EvaluationQueue) – -
timer(Timer | None) –
BINDINGS
class-attribute
instance-attribute
¶
BINDINGS = [('shift+right', 'skip', 'Skip'), ('space', 'submit', 'Submit'), ('escape', 'clear', 'Clear'), ('question_mark,f1', 'show_help', 'Help'), ('ctrl+q,ctrl+c', 'quit', 'Quit')]
status
class-attribute
instance-attribute
¶
current_item
class-attribute
instance-attribute
¶
current_item: reactive[EvaluationItem | None] = reactive(None)
current_assignments
class-attribute
instance-attribute
¶
on_comparison_display_table_assignment_made
¶
on_comparison_display_table_assignment_made(message: AssignmentMade) -> None
Update reactive assignments when table reports an assignment.
on_comparison_display_table_current_group_changed
¶
on_comparison_display_table_current_group_changed(message: CurrentGroupChanged) -> None
Update current group label when table’s current group changes.
watch_current_item
¶
watch_current_item(item: EvaluationItem | None) -> None
React to item changes - propagate to table and reset assignment bar.
watch_current_assignments
¶
React to assignment changes - propagate to table.
app
¶
Main application for entity resolution evaluation.
Classes:
-
CLIEvaluationSession–CLI evaluation session state.
-
EvaluationQueue–Deque-based queue with current session always at front.
-
EntityResolutionApp–Main Textual application for entity resolution evaluation.
Attributes:
-
logger–
CLIEvaluationSession
¶
Bases: BaseModel
CLI evaluation session state.
Used by queue to store items with their assignments.
Attributes:
-
model_config– -
item(EvaluationItem) – -
assignments(dict[int, str]) –
EvaluationQueue
¶
Deque-based queue with current session always at front.
Methods:
-
skip_current–Move current to back of queue.
-
remove_current–Remove and return current session.
-
add_sessions–Add new sessions to queue, preventing duplicates.
Attributes:
-
sessions(deque[CLIEvaluationSession]) – -
current(CLIEvaluationSession | None) –Get the current session (always at index 0).
-
total_count(int) –Total number of sessions in queue.
current
property
¶
current: CLIEvaluationSession | None
Get the current session (always at index 0).
add_sessions
¶
add_sessions(sessions: list[CLIEvaluationSession]) -> int
EntityResolutionApp
¶
EntityResolutionApp(num_samples: int = 5, resolution: ModelResolutionPath | None = None, dag: DAG | None = None, session_tag: str | None = None, sample_file: str | None = None, show_help: bool = False, scroll_debounce_delay: float | None = 0.3)
Bases: App
Main Textual application for entity resolution evaluation.
Parameters:
-
(resolution¶ModelResolutionPath | None, default:None) –The model resolution to evaluate
-
(num_samples¶int, default:5) –Number of clusters to sample for evaluation
-
(dag¶DAG | None, default:None) –Pre-loaded DAG with warehouse location attached
-
(session_tag¶str | None, default:None) –String to use for tagging judgements
-
(sample_file¶str | None, default:None) –Path to pre-compiled sample file. If set, ignores resolutions.
-
(show_help¶bool, default:False) –Whether to show help on start
-
(scroll_debounce_delay¶float | None, default:0.3) –Delay before updating column headers after scroll. Set to None to disable debouncing (useful for tests).
Methods:
-
compose–Compose the main application UI.
-
on_mount–Initialise the application.
-
on_comparison_display_table_assignment_made–Update reactive assignments when table reports an assignment.
-
on_comparison_display_table_current_group_changed–Update current group label when table’s current group changes.
-
watch_status–React to status changes.
-
watch_current_item–React to item changes - propagate to table and reset assignment bar.
-
watch_current_assignments–React to assignment changes - propagate to table.
-
load_samples–Load evaluation samples from the server.
-
action_skip–Skip current entity (moves to back of queue).
-
action_submit–Submit current entity if fully painted.
-
action_clear–Clear current entity’s group assignments.
-
action_show_help–Show the help modal.
-
action_show_no_samples–Show the no samples modal.
-
action_quit–Quit the application.
Attributes:
-
CSS_PATH– -
TITLE– -
SUB_TITLE– -
BINDINGS– -
status(reactive[tuple[str, str]]) – -
current_item(reactive[EvaluationItem | None]) – -
current_assignments(reactive[dict[int, str]]) – -
timer(Timer | None) – -
resolution(ModelResolutionPath) – -
sample_limit(int) – -
dag(DAG) – -
session_tag(str | None) – -
sample_file(str | None) – -
show_help(bool) – -
queue(EvaluationQueue) –
BINDINGS
class-attribute
instance-attribute
¶
BINDINGS = [('shift+right', 'skip', 'Skip'), ('space', 'submit', 'Submit'), ('escape', 'clear', 'Clear'), ('question_mark,f1', 'show_help', 'Help'), ('ctrl+q,ctrl+c', 'quit', 'Quit')]
status
class-attribute
instance-attribute
¶
current_item
class-attribute
instance-attribute
¶
current_item: reactive[EvaluationItem | None] = reactive(None)
current_assignments
class-attribute
instance-attribute
¶
on_comparison_display_table_assignment_made
¶
on_comparison_display_table_assignment_made(message: AssignmentMade) -> None
Update reactive assignments when table reports an assignment.
on_comparison_display_table_current_group_changed
¶
on_comparison_display_table_current_group_changed(message: CurrentGroupChanged) -> None
Update current group label when table’s current group changes.
watch_current_item
¶
watch_current_item(item: EvaluationItem | None) -> None
React to item changes - propagate to table and reset assignment bar.
watch_current_assignments
¶
React to assignment changes - propagate to table.
modals
¶
Modal screens for entity resolution evaluation.
Classes:
-
HelpModal–Help screen showing commands and shortcuts.
-
NoSamplesModal–Modal screen showing no samples are available.
Attributes:
HelpModal
¶
Bases: ModalScreen
Help screen showing commands and shortcuts.
Methods:
-
compose–Compose the help modal UI.
-
close_help–Close the help modal.
-
on_key–Handle key events for closing the help modal.
NoSamplesModal
¶
Bases: ModalScreen
Modal screen showing no samples are available.
Methods:
run
¶
CLI commands for entity evaluation.
Functions:
-
evaluate–Start the interactive entity resolution evaluation tool.
evaluate
¶
evaluate(collection: CollectionOpt, resolution: Annotated[str | None, Option(--resolution, -r, help="Resolution name (defaults to collection's final_step)")] = None, pending: Annotated[bool, Option(--pending, -p, help='Whether to evaluate the pending DAG, instead of the default')] = False, warehouse: Annotated[str | None, Option(--warehouse, -w, help='Warehouse database connection string (e.g. postgresql://user:pass@host/db)')] = None, log_file: Annotated[str | None, Option(--log, help='Log file path to redirect all logging output (keeps UI clean)')] = None, sample_file: Annotated[str | None, Option(--file, -f, help='Pre-compiled sample file. If set, ignores resolutions parameters.')] = None, session_tag: Annotated[str | None, Option(--tag, -t, help='String to use to tag judgements sent to the server.')] = None) -> None
Start the interactive entity resolution evaluation tool.
Requires a warehouse connection to fetch source data for evaluation clusters.
Example
matchbox eval –collection companies –warehouse postgresql://user:pass@localhost/warehouse
widgets
¶
UI widgets for entity resolution evaluation tool.
Modules:
-
assignment–AssignmentBar widget for displaying column assignments in the evaluation UI.
-
styling–Styling utilities for entity resolution evaluation UI.
-
table–Comparison display table for entity resolution evaluation.
assignment
¶
AssignmentBar widget for displaying column assignments in the evaluation UI.
Classes:
-
GroupSelect–A letter and colour combination representing a group assignment.
-
AssignmentBar–A status bar widget showing column assignments as a visual bar.
GroupSelect
¶
AssignmentBar
¶
AssignmentBar(**kwargs: Any)
Bases: Static
A status bar widget showing column assignments as a visual bar.
Each position represents a column in the comparison table. Positions can be:
- None (unassigned): displayed as dim •
- GroupSelect: displayed as letter (first occurrence) or • (subsequent blocks)
Consecutive positions with the same group assignment are rendered in that group’s colour, showing the letter only at the first position.
Methods:
-
reset–Reset the bar with a new number of positions.
-
set_position–Set a position to a letter and colour.
Attributes:
-
positions(list[GroupSelect | None]) –
reset
¶
reset(num_positions: int) -> None
styling
¶
Styling utilities for entity resolution evaluation UI.
Functions:
-
get_group_style–Get symbol and colour for a group letter.
-
get_display_text–Get formatted display text with colour and symbol.
-
generate_css_classes–Generate CSS classes for all groups.
Attributes:
GROUP_STYLES
module-attribute
¶
GROUP_STYLES = {'q': ('■', '#e53935'), 'w': ('●', '#43a047'), 'e': ('▲', '#1e88e5'), 'r': ('◆', '#fdd835'), 't': ('★', '#d81b60'), 'y': ('⬢', '#00acc1'), 'u': ('♦', '#fb8c00'), 'i': ('▼', '#00897b'), 'o': ('○', '#8e24aa'), 'p': ('△', '#6d4c41'), 'a': ('◇', '#d81b60'), 's': ('☆', '#00acc1'), 'd': ('⬡', '#e53935'), 'f': ('✦', '#43a047'), 'g': ('✧', '#1e88e5'), 'h': ('⟐', '#fdd835'), 'j': ('✚', '#d81b60'), 'k': ('✖', '#3949ab'), 'l': ('□', '#e53935'), 'z': ('▽', '#fb8c00'), 'x': ('◯', '#8e24aa'), 'c': ('◻', '#fdd835'), 'v': ('◼', '#6d4c41'), 'b': ('⬤', '#e53935'), 'n': ('☑', '#43a047'), 'm': ('✤', '#1e88e5')}
get_group_style
¶
get_display_text
¶
table
¶
Comparison display table for entity resolution evaluation.
Classes:
-
ComparisonDisplayTable–DataTable for comparing records with keyboard-driven assignment.
ComparisonDisplayTable
¶
Bases: DataTable
DataTable for comparing records with keyboard-driven assignment.
We use the DataTable’s internal cursor (even though hidden) as the “Anchor” for the 1-9 column labels.
- Cursor Column “N” is labelled “1”.
- Cursor Column “N+1” is labelled “2”, etc.
- Paging Right/Left moves the cursor by +/- 9 columns and forces a scroll alignment to the left edge, ensuring the labels remain visible.
Classes:
-
AssignmentMade–Posted when a single column assignment is made.
-
CurrentGroupChanged–Posted when current group selection changes.
Methods:
-
watch_current_item–Rebuild table when item changes (Textual reactive pattern).
-
watch_cursor_coordinate–Update headers whenever the cursor moves (Manual scroll or Paging).
-
watch_current_assignments–Update column headers and cells when assignments change.
-
on_key–Handle keyboard shortcuts for assignment.
-
action_page_right–Move the anchor (cursor) right by 9 columns and force scroll.
-
action_page_left–Move the anchor (cursor) left by 9 columns and force scroll.
Attributes:
-
current_item(reactive[EvaluationItem | None]) – -
current_assignments(reactive[dict[int, str]]) – -
current_group(reactive[str]) – -
table_ready(reactive[bool]) – -
BINDINGS– -
zebra_stripes– -
cursor_type– -
show_cursor– -
fixed_columns–
current_item
class-attribute
instance-attribute
¶
current_item: reactive[EvaluationItem | None] = reactive(None)
current_assignments
class-attribute
instance-attribute
¶
BINDINGS
class-attribute
instance-attribute
¶
BINDINGS = [Binding('up', 'page_up', 'Page up', show=False), Binding('down', 'page_down', 'Page down', show=False), Binding('right', 'page_right', 'Page right', show=False), Binding('left', 'page_left', 'Page left', show=False)]
AssignmentMade
¶
watch_current_item
¶
watch_current_item(item: EvaluationItem | None) -> None
Rebuild table when item changes (Textual reactive pattern).
watch_cursor_coordinate
¶
Update headers whenever the cursor moves (Manual scroll or Paging).
watch_current_assignments
¶
Update column headers and cells when assignments change.
action_page_right
¶
Move the anchor (cursor) right by 9 columns and force scroll.
action_page_left
¶
Move the anchor (cursor) left by 9 columns and force scroll.