API reference¶
matchbox.server.uploads
¶
Worker logic to process user uploads.
Classes:
-
UploadTracker–Upload error tracker.
-
InMemoryUploadTracker–In-memory error tracker.
-
RedisUploadTracker–Error tracker backed by Redis.
Functions:
-
settings_to_upload_tracker–Initialise an upload tracker from server settings.
-
file_to_s3–Upload a PyArrow Table to S3 and return the key.
-
s3_to_recordbatch–Download a PyArrow Table from S3 and stream it as RecordBatches.
-
initialise_celery_worker–Initialise backend and tracker for celery worker.
-
process_upload–Generic task to process uploaded file, usable by FastAPI BackgroundTasks.
-
process_upload_celery–Celery task to process uploaded file, with only serialisable arguments.
Attributes:
-
celery_logger– -
CELERY_SETTINGS– -
CELERY_BACKEND(MatchboxDBAdapter | None) – -
CELERY_TRACKER(UploadTracker | None) – -
celery–
InMemoryUploadTracker
¶
RedisUploadTracker
¶
settings_to_upload_tracker
¶
settings_to_upload_tracker(settings: MatchboxServerSettings) -> UploadTracker
Initialise an upload tracker from server settings.
file_to_s3
¶
Upload a PyArrow Table to S3 and return the key.
Parameters:
-
(client¶S3Client) –The S3 client to use.
-
(bucket¶str) –The S3 bucket to upload to.
-
(key¶str) –The key to upload to.
-
(file¶UploadFile) –The file to upload.
Raises:
-
MatchboxServerFileError–If the file is not a valid Parquet file or the schema does not match the expected schema.
Returns:
-
str–The key of the uploaded file.
s3_to_recordbatch
¶
s3_to_recordbatch(client: S3Client, bucket: str, key: str, batch_size: int = 1000) -> Generator[RecordBatch, None, None]
Download a PyArrow Table from S3 and stream it as RecordBatches.
initialise_celery_worker
¶
Initialise backend and tracker for celery worker.
process_upload
¶
process_upload(backend: MatchboxDBAdapter, tracker: UploadTracker, s3_client: S3Client, resolution_path: ResolutionPath, upload_id: str, bucket: str, filename: str) -> None
Generic task to process uploaded file, usable by FastAPI BackgroundTasks.