Evaluation¶
Python helpers¶
matchbox.client.eval
¶
Public evaluation helpers for Matchbox clients.
Modules:
-
samples–Client-side helpers for retrieving and preparing evaluation samples.
Classes:
-
EvalData–Object which caches evaluation data to measure model performance.
-
EvaluationItem–A cluster awaiting evaluation, with pre-computed display data.
Functions:
-
compare_models–Compare metrics of models based on cached evaluation data.
-
create_evaluation_item–Create EvaluationItem with pre-computed display data.
-
create_judgement–Convert item assignments to Judgement - no default group assignment.
-
get_samples–Retrieve samples enriched with source data as EvaluationItems.
-
precision_recall–From models and eval data, compute scores inspired by precision-recall.
Attributes:
ModelComparison
module-attribute
¶
ModelComparison: TypeAlias = dict[ModelResolutionPath, PrecisionRecall]
EvalData
¶
Object which caches evaluation data to measure model performance.
Methods:
-
precision_recall–Compute precision and recall for a given Results object.
EvaluationItem
¶
Bases: BaseModel
A cluster awaiting evaluation, with pre-computed display data.
Attributes:
-
model_config– -
cluster_id(int) – -
dataframe(DataFrame) – -
display_data(dict[str, list[str]]) – -
duplicate_groups(list[list[int]]) – -
display_columns(list[int]) – -
assignments(dict[int, str]) –
compare_models
¶
compare_models(resolutions: list[ModelResolutionPath]) -> ModelComparison
Compare metrics of models based on cached evaluation data.
create_evaluation_item
¶
create_evaluation_item(df: DataFrame, source_configs: list[tuple[str, SourceConfig]], cluster_id: int) -> EvaluationItem
Create EvaluationItem with pre-computed display data.
create_judgement
¶
create_judgement(item: EvaluationItem, user_id: int) -> Judgement
Convert item assignments to Judgement - no default group assignment.
Parameters:
-
(item¶EvaluationItem) –Evaluation item with assignments
-
(user_id¶int) –User ID for the judgement
Returns:
-
Judgement–Judgement with endorsed groups based on assignments
get_samples
¶
get_samples(n: int, dag: DAG, user_id: int, resolution: ModelResolutionName | None = None) -> dict[int, EvaluationItem]
Retrieve samples enriched with source data as EvaluationItems.
Parameters:
-
(n¶int) –Number of clusters to sample
-
(dag¶DAG) –DAG for which to retrieve samples
-
(user_id¶int) –ID of the user requesting the samples
-
(resolution¶ModelResolutionName | None, default:None) –The optional resolution from which to sample. If not provided, the final step in the DAG is used
Returns:
-
dict[int, EvaluationItem]–Dictionary of cluster ID to EvaluationItems describing the cluster
Raises:
-
MatchboxSourceTableError–If a source cannot be queried from a location using provided or default clients.
precision_recall
¶
precision_recall(models_root_leaf: list[DataFrame], judgements: DataFrame, expansion: DataFrame) -> list[PrecisionRecall]
From models and eval data, compute scores inspired by precision-recall.
This function does the following:
- Convert model and judgement clusters to implied pair-wise connections. For judgments, this includes the pairs shown to users, but rejected. Sum how many times pairs were endorsed (+1) or rejected (-1).
- Keep only the pairs where leaves are present in all models and in the judgements, so the comparison is fair.
- If a validation pair was rejected as many times as it was endorsed, discard it from both model and validation pairs.
- If a validation pair was rejected more times than it was endorsed, remove it from validation pairs, but keep it in model pairs.
- Precision and recall are computed for each model against validation pairs.
At the moment, this function ignores user IDs.
Parameters:
-
(models_root_leaf¶list[DataFrame]) –list of tables with root and leaf columns, one per model. They must include all the clusters that resolve from a model, all the way to the original source clusters if no model in the lineage merged them.
-
(judgements¶DataFrame) –Dataframe following
matchbox.common.arrow.SCHEMA_JUDGEMENTS. -
(expansion¶DataFrame) –Dataframe following
matchbox.common.arrow.SCHEMA_CLUSTER_EXPANSION.
Returns:
-
list[PrecisionRecall]–List of tuples of precision and recall scores, one per model.
CLI module¶
matchbox.client.cli.eval
¶
Module implementing CLI evaluation app.
Modules:
-
app–Main application for entity resolution evaluation.
-
modals–Modal screens for entity resolution evaluation.
-
run–CLI commands for entity evaluation.
-
widgets–UI widgets for entity resolution evaluation tool.
Classes:
-
EntityResolutionApp–Main Textual application for entity resolution evaluation.
EntityResolutionApp
¶
EntityResolutionApp(resolution: ModelResolutionName, user: str, num_samples: int = 5, dag: DAG | None = None, show_help: bool = False)
Bases: App
Main Textual application for entity resolution evaluation.
Parameters:
-
(resolution¶ModelResolutionName) –The model resolution to evaluate
-
(num_samples¶int, default:5) –Number of clusters to sample for evaluation
-
(user¶str) –Username for authentication (overrides settings)
-
(dag¶DAG | None, default:None) –Pre-loaded DAG with warehouse location attached
-
(show_help¶bool, default:False) –Whether to show help on start
Methods:
-
compose–Compose the main application UI.
-
on_mount–Initialise the application.
-
on_key–Handle keyboard shortcuts for group assignment.
-
watch_status–React to status changes.
-
watch_current_group–React to current group changes.
-
authenticate–Authenticate with the server.
-
load_samples–Load evaluation samples from the server.
-
action_skip–Skip current entity (moves to back of queue).
-
action_submit–Submit current entity if fully painted.
-
action_clear–Clear current entity’s group assignments.
-
action_show_help–Show the help modal.
-
action_show_no_samples–Show the no samples modal.
-
action_quit–Quit the application.
Attributes:
-
CSS_PATH– -
TITLE– -
SUB_TITLE– -
BINDINGS– -
current_group(reactive[str]) – -
status(reactive[tuple[str, str]]) – -
sample_limit(int) – -
resolution(ModelResolutionPath) – -
user_id(int) – -
user_name(str) – -
dag(DAG) – -
show_help(bool) – -
queue(EvaluationQueue) – -
timer(Timer | None) –
BINDINGS
class-attribute
instance-attribute
¶
BINDINGS = [('right', 'skip', 'Skip'), ('space', 'submit', 'Submit'), ('escape', 'clear', 'Clear'), ('question_mark,f1', 'show_help', 'Help'), ('ctrl+q,ctrl+c', 'quit', 'Quit')]
status
class-attribute
instance-attribute
¶
on_key
async
¶
Handle keyboard shortcuts for group assignment.
Textual’s basic key event handler. Handles keys beyond BINDINGS.
app
¶
Main application for entity resolution evaluation.
Classes:
-
EvaluationQueue–Deque-based queue with current item always at front.
-
EntityResolutionApp–Main Textual application for entity resolution evaluation.
Attributes:
-
logger–
EvaluationQueue
¶
Deque-based queue with current item always at front.
Methods:
-
skip_current–Move current to back of queue.
-
remove_current–Remove and return current item.
-
add_items–Add new items to queue, preventing duplicates.
Attributes:
-
items(deque[EvaluationItem]) – -
current(EvaluationItem | None) –Get the current item (always at index 0).
-
total_count(int) –Total number of items in queue.
add_items
¶
add_items(items: list[EvaluationItem]) -> int
EntityResolutionApp
¶
EntityResolutionApp(resolution: ModelResolutionName, user: str, num_samples: int = 5, dag: DAG | None = None, show_help: bool = False)
Bases: App
Main Textual application for entity resolution evaluation.
Parameters:
-
(resolution¶ModelResolutionName) –The model resolution to evaluate
-
(num_samples¶int, default:5) –Number of clusters to sample for evaluation
-
(user¶str) –Username for authentication (overrides settings)
-
(dag¶DAG | None, default:None) –Pre-loaded DAG with warehouse location attached
-
(show_help¶bool, default:False) –Whether to show help on start
Methods:
-
compose–Compose the main application UI.
-
on_mount–Initialise the application.
-
on_key–Handle keyboard shortcuts for group assignment.
-
watch_status–React to status changes.
-
watch_current_group–React to current group changes.
-
authenticate–Authenticate with the server.
-
load_samples–Load evaluation samples from the server.
-
action_skip–Skip current entity (moves to back of queue).
-
action_submit–Submit current entity if fully painted.
-
action_clear–Clear current entity’s group assignments.
-
action_show_help–Show the help modal.
-
action_show_no_samples–Show the no samples modal.
-
action_quit–Quit the application.
Attributes:
-
CSS_PATH– -
TITLE– -
SUB_TITLE– -
BINDINGS– -
current_group(reactive[str]) – -
status(reactive[tuple[str, str]]) – -
user_id(int) – -
timer(Timer | None) – -
queue(EvaluationQueue) – -
sample_limit(int) – -
user_name(str) – -
dag(DAG) – -
resolution(ModelResolutionPath) – -
show_help(bool) –
BINDINGS
class-attribute
instance-attribute
¶
BINDINGS = [('right', 'skip', 'Skip'), ('space', 'submit', 'Submit'), ('escape', 'clear', 'Clear'), ('question_mark,f1', 'show_help', 'Help'), ('ctrl+q,ctrl+c', 'quit', 'Quit')]
status
class-attribute
instance-attribute
¶
on_key
async
¶
Handle keyboard shortcuts for group assignment.
Textual’s basic key event handler. Handles keys beyond BINDINGS.
modals
¶
Modal screens for entity resolution evaluation.
Classes:
-
HelpModal–Help screen showing commands and shortcuts.
-
NoSamplesModal–Modal screen showing no samples are available.
Attributes:
HelpModal
¶
Bases: ModalScreen
Help screen showing commands and shortcuts.
Methods:
-
compose–Compose the help modal UI.
-
close_help–Close the help modal.
-
on_key–Handle key events for closing the help modal.
NoSamplesModal
¶
Bases: ModalScreen
Modal screen showing no samples are available.
Methods:
run
¶
CLI commands for entity evaluation.
Functions:
-
evaluate–Start the interactive entity resolution evaluation tool.
evaluate
¶
evaluate(collection: Annotated[str, Option(--collection, -c, help='Collection name (required)')], resolution: Annotated[str | None, Option(--resolution, -r, help="Resolution name (defaults to collection's final_step)")] = None, pending: Annotated[bool, Option(--pending, -p, help='Whether to evaluate the pending DAG, instead of the default')] = False, user: Annotated[str | None, Option(--user, -u, help='Username for authentication (overrides settings)')] = None, warehouse: Annotated[str | None, Option(--warehouse, -w, help='Warehouse database connection string (e.g. postgresql://user:pass@host/db)')] = None, log_file: Annotated[str | None, Option(--log, help='Log file path to redirect all logging output (keeps UI clean)')] = None) -> None
Start the interactive entity resolution evaluation tool.
Requires a warehouse connection to fetch source data for evaluation clusters.
Example
matchbox eval –collection companies –warehouse postgresql://user:pass@localhost/warehouse
widgets
¶
UI widgets for entity resolution evaluation tool.
Modules:
-
styling–Styling utilities for entity resolution evaluation UI.
-
table–Comparison display table for entity resolution evaluation.
styling
¶
Styling utilities for entity resolution evaluation UI.
Functions:
-
get_group_style–Get symbol and colour for a group letter.
-
get_display_text–Get formatted display text with colour and symbol.
-
generate_css_classes–Generate CSS classes for all groups.
Attributes:
GROUP_STYLES
module-attribute
¶
GROUP_STYLES = {'q': ('■', '#e53935'), 'w': ('●', '#43a047'), 'e': ('▲', '#1e88e5'), 'r': ('◆', '#fdd835'), 't': ('★', '#d81b60'), 'y': ('⬢', '#00acc1'), 'u': ('♦', '#fb8c00'), 'i': ('▼', '#00897b'), 'o': ('○', '#8e24aa'), 'p': ('△', '#6d4c41'), 'a': ('◇', '#d81b60'), 's': ('☆', '#00acc1'), 'd': ('⬡', '#e53935'), 'f': ('✦', '#43a047'), 'g': ('✧', '#1e88e5'), 'h': ('⟐', '#fdd835'), 'j': ('✚', '#d81b60'), 'k': ('✖', '#3949ab'), 'l': ('□', '#e53935'), 'z': ('▽', '#fb8c00'), 'x': ('◯', '#8e24aa'), 'c': ('◻', '#fdd835'), 'v': ('◼', '#6d4c41'), 'b': ('⬤', '#e53935'), 'n': ('☑', '#43a047'), 'm': ('✤', '#1e88e5')}
get_group_style
¶
get_display_text
¶
table
¶
Comparison display table for entity resolution evaluation.
Classes:
-
ComparisonDisplayTable–Table for side-by-side record comparison with colour highlighting.
ComparisonDisplayTable
¶
Bases: Widget
Table for side-by-side record comparison with colour highlighting.
Methods:
-
load_comparison–Load new comparison data.
-
render–Render the table with current item.
Attributes:
-
current_item(EvaluationItem | None) –
load_comparison
¶
load_comparison(item: EvaluationItem) -> None