Data retrieval¶
Match¶
Given a key and a source, retrieves all keys that share its cluster in both the source and target. Useful for making ad-hoc queries about specific items of data.
Query¶
Retrieves entire data sources along with a unique entity identifier according to a point of resolution.
Use Cases
- Large-scale statistical analysis
- Building linking or deduplication pipelines
import matchbox as mb
from matchbox import select
import sqlalchemy
engine = sqlalchemy.create_engine('postgresql://')
mb.query(
select(
{
"dbt.companieshouse": ["company_name"],
"hmrc.exporters": ["year", "commodity_codes"],
},
credentials=engine,
)
combine_type="set_agg",
resolution="companies",
)
For more information on how to use the functions on this page, please check out the relevant examples in the client API docs.