API reference¶
matchbox.server.uploads
¶
Worker logic to process user uploads.
Classes:
-
UploadEntry
–Entry in upload tracker, combining private metadata and public upload status.
-
UploadTracker
–Abstract class for upload tracker.
-
InMemoryUploadTracker
–In-memory upload tracker, only usable with single server instance.
-
RedisUploadTracker
–Upload tracker backed by Redis.
Functions:
-
settings_to_upload_tracker
–Initialise an upload tracker from server settings.
-
table_to_s3
–Upload a PyArrow Table to S3 and return the key.
-
s3_to_recordbatch
–Download a PyArrow Table from S3 and stream it as RecordBatches.
-
initialise_celery_worker
–Initialise backend and tracker for celery worker.
-
process_upload
–Generic task to process uploaded file, usable by FastAPI BackgroundTasks.
-
process_upload_celery
–Celery task to process uploaded file, with only serialisable arguments.
Attributes:
-
celery_logger
– -
CELERY_SETTINGS
– -
CELERY_BACKEND
(MatchboxDBAdapter | None
) – -
CELERY_TRACKER
(UploadTracker | None
) – -
celery
–
UploadEntry
¶
Bases: BaseModel
Entry in upload tracker, combining private metadata and public upload status.
Attributes:
-
status
(UploadStatus
) – -
path
(ResolutionPath
) –
UploadTracker
¶
Bases: ABC
Abstract class for upload tracker.
Methods:
-
add_source
–Register source resolution and return ID.
-
add_model
–Register model resolution and return ID.
-
get
–Retrieve entry by ID if not expired.
-
update
–Update the stage and details for an upload.
InMemoryUploadTracker
¶
Bases: UploadTracker
In-memory upload tracker, only usable with single server instance.
Methods:
-
get
–Retrieve entry by ID if not expired.
-
update
–Update the stage and details for an upload.
-
add_source
–Register source resolution and return ID.
-
add_model
–Register model resolution and return ID.
update
¶
RedisUploadTracker
¶
Bases: UploadTracker
Upload tracker backed by Redis.
Methods:
-
get
–Retrieve entry by ID if not expired.
-
update
–Update the stage and details for an upload.
-
add_source
–Register source resolution and return ID.
-
add_model
–Register model resolution and return ID.
Attributes:
-
expiry_minutes
– -
redis
– -
key_prefix
–
update
¶
settings_to_upload_tracker
¶
settings_to_upload_tracker(settings: MatchboxServerSettings) -> UploadTracker
Initialise an upload tracker from server settings.
table_to_s3
¶
table_to_s3(client: S3Client, bucket: str, key: str, file: UploadFile, expected_schema: Schema) -> str
Upload a PyArrow Table to S3 and return the key.
Parameters:
-
client
¶S3Client
) –The S3 client to use.
-
bucket
¶str
) –The S3 bucket to upload to.
-
key
¶str
) –The key to upload to.
-
file
¶UploadFile
) –The file to upload.
-
expected_schema
¶Schema
) –The schema that the file should match.
Raises:
-
MatchboxServerFileError
–If the file is not a valid Parquet file or the schema does not match the expected schema.
Returns:
-
str
–The key of the uploaded file.
s3_to_recordbatch
¶
s3_to_recordbatch(client: S3Client, bucket: str, key: str, batch_size: int = 1000) -> Generator[RecordBatch, None, None]
Download a PyArrow Table from S3 and stream it as RecordBatches.
initialise_celery_worker
¶
Initialise backend and tracker for celery worker.
process_upload
¶
process_upload(backend: MatchboxDBAdapter, tracker: UploadTracker, s3_client: S3Client, upload_type: str, resolution_name: str, upload_id: str, bucket: str, filename: str) -> None
Generic task to process uploaded file, usable by FastAPI BackgroundTasks.