Link and deduplicate
You have a dataset you want to link to your organisation’s broader network of data.
Your high level process will be:
- Use
matchbox.query()
to retrieve source data from the perspective of a particular resolution point - Use
matchbox.process()
to clean the data with standardised processes - Use
matchbox.make_model()
withmatchbox.dedupers
andmatchbox.linkers
to create a new model - Generate probabilistic model outputs using
model.run()
- Upload the probabilites to matchbox with
results.to_matchbox()
- Label data, or use existing data, to decide the probability threshold that you’re willing to consider “truth” for your new model
- Use
model.roc_curve()
and other tools to make your decision - Update
model.truth
to codify it
Full documentation to follow.