API reference¶
matchbox.server.uploads
¶
Worker logic to process user uploads.
Classes:
-
UploadEntry–Entry in upload tracker, combining private metadata and public upload status.
-
UploadTracker–Abstract class for upload tracker.
-
InMemoryUploadTracker–In-memory upload tracker, only usable with single server instance.
-
RedisUploadTracker–Upload tracker backed by Redis.
Functions:
-
settings_to_upload_tracker–Initialise an upload tracker from server settings.
-
table_to_s3–Upload a PyArrow Table to S3 and return the key.
-
s3_to_recordbatch–Download a PyArrow Table from S3 and stream it as RecordBatches.
-
initialise_celery_worker–Initialise backend and tracker for celery worker.
-
process_upload–Generic task to process uploaded file, usable by FastAPI BackgroundTasks.
-
process_upload_celery–Celery task to process uploaded file, with only serialisable arguments.
Attributes:
-
celery_logger– -
CELERY_SETTINGS– -
CELERY_BACKEND(MatchboxDBAdapter | None) – -
CELERY_TRACKER(UploadTracker | None) – -
celery–
UploadEntry
¶
Bases: BaseModel
Entry in upload tracker, combining private metadata and public upload status.
Attributes:
-
status(UploadStatus) – -
path(ResolutionPath) –
UploadTracker
¶
Bases: ABC
Abstract class for upload tracker.
Methods:
-
add_source–Register source resolution and return ID.
-
add_model–Register model resolution and return ID.
-
get–Retrieve entry by ID if not expired.
-
update–Update the stage and details for an upload.
InMemoryUploadTracker
¶
Bases: UploadTracker
In-memory upload tracker, only usable with single server instance.
Methods:
-
get–Retrieve entry by ID if not expired.
-
update–Update the stage and details for an upload.
-
add_source–Register source resolution and return ID.
-
add_model–Register model resolution and return ID.
update
¶
RedisUploadTracker
¶
Bases: UploadTracker
Upload tracker backed by Redis.
Methods:
-
get–Retrieve entry by ID if not expired.
-
update–Update the stage and details for an upload.
-
add_source–Register source resolution and return ID.
-
add_model–Register model resolution and return ID.
Attributes:
-
expiry_minutes– -
redis– -
key_prefix–
update
¶
settings_to_upload_tracker
¶
settings_to_upload_tracker(settings: MatchboxServerSettings) -> UploadTracker
Initialise an upload tracker from server settings.
table_to_s3
¶
table_to_s3(client: S3Client, bucket: str, key: str, file: UploadFile, expected_schema: Schema) -> str
Upload a PyArrow Table to S3 and return the key.
Parameters:
-
(client¶S3Client) –The S3 client to use.
-
(bucket¶str) –The S3 bucket to upload to.
-
(key¶str) –The key to upload to.
-
(file¶UploadFile) –The file to upload.
-
(expected_schema¶Schema) –The schema that the file should match.
Raises:
-
MatchboxServerFileError–If the file is not a valid Parquet file or the schema does not match the expected schema.
Returns:
-
str–The key of the uploaded file.
s3_to_recordbatch
¶
s3_to_recordbatch(client: S3Client, bucket: str, key: str, batch_size: int = 1000) -> Generator[RecordBatch, None, None]
Download a PyArrow Table from S3 and stream it as RecordBatches.
initialise_celery_worker
¶
Initialise backend and tracker for celery worker.
process_upload
¶
process_upload(backend: MatchboxDBAdapter, tracker: UploadTracker, s3_client: S3Client, upload_type: str, resolution_name: str, upload_id: str, bucket: str, filename: str) -> None
Generic task to process uploaded file, usable by FastAPI BackgroundTasks.