API reference¶
matchbox.server.uploads
¶
Worker logic to process user uploads.
Classes:
-
UploadEntry
–Entry in upload tracker, combining private metadata and public upload status.
-
UploadTracker
–Abstract class for upload tracker.
-
InMemoryUploadTracker
–In-memory upload tracker, only usable with single server instance.
-
RedisUploadTracker
–Upload tracker backed by Redis.
Functions:
-
settings_to_upload_tracker
–Initialise an upload tracker from server settings.
-
table_to_s3
–Upload a PyArrow Table to S3 and return the key.
-
s3_to_recordbatch
–Download a PyArrow Table from S3 and stream it as RecordBatches.
-
initialise_celery_worker
–Initialise backend and tracker for celery worker.
-
process_upload
–Generic task to process uploaded file, usable by FastAPI BackgroundTasks.
-
process_upload_celery
–Celery task to process uploaded file, with only serialisable arguments.
Attributes:
-
CELERY_SETTINGS
– -
CELERY_BACKEND
(MatchboxDBAdapter | None
) – -
CELERY_TRACKER
(UploadTracker | None
) – -
celery
–
UploadEntry
¶
Bases: BaseModel
Entry in upload tracker, combining private metadata and public upload status.
Attributes:
-
status
(UploadStatus
) – -
metadata
(SourceConfig | ModelConfig
) –
UploadTracker
¶
Bases: ABC
Abstract class for upload tracker.
Methods:
-
add_source
–Register source metadata and return ID.
-
add_model
–Register model results metadata and return ID.
-
get
–Retrieve metadata by ID if not expired.
-
update
–Update the stage and details for an upload.
get
abstractmethod
¶
get(upload_id: str) -> UploadEntry | None
Retrieve metadata by ID if not expired.
InMemoryUploadTracker
¶
Bases: UploadTracker
In-memory upload tracker, only usable with single server instance.
Methods:
-
get
–Retrieve metadata by ID if not expired.
-
update
–Update the stage and details for an upload.
-
add_source
–Register source metadata and return ID.
-
add_model
–Register model results metadata and return ID.
update
¶
RedisUploadTracker
¶
Bases: UploadTracker
Upload tracker backed by Redis.
Methods:
-
get
–Retrieve metadata by ID if not expired.
-
update
–Update the stage and details for an upload.
-
add_source
–Register source metadata and return ID.
-
add_model
–Register model results metadata and return ID.
Attributes:
-
expiry_minutes
– -
redis
– -
key_prefix
–
update
¶
settings_to_upload_tracker
¶
settings_to_upload_tracker(
settings: MatchboxServerSettings,
) -> UploadTracker
Initialise an upload tracker from server settings.
table_to_s3
¶
table_to_s3(
client: S3Client,
bucket: str,
key: str,
file: UploadFile,
expected_schema: Schema,
) -> str
Upload a PyArrow Table to S3 and return the key.
Parameters:
-
client
¶S3Client
) –The S3 client to use.
-
bucket
¶str
) –The S3 bucket to upload to.
-
key
¶str
) –The key to upload to.
-
file
¶UploadFile
) –The file to upload.
-
expected_schema
¶Schema
) –The schema that the file should match.
Raises:
-
MatchboxServerFileError
–If the file is not a valid Parquet file or the schema does not match the expected schema.
Returns:
-
str
–The key of the uploaded file.
s3_to_recordbatch
¶
s3_to_recordbatch(
client: S3Client,
bucket: str,
key: str,
batch_size: int = 1000,
) -> Generator[RecordBatch, None, None]
Download a PyArrow Table from S3 and stream it as RecordBatches.
initialise_celery_worker
¶
Initialise backend and tracker for celery worker.
process_upload
¶
process_upload(
backend: MatchboxDBAdapter,
tracker: UploadTracker,
s3_client: S3Client,
upload_type: str,
resolution_name: str,
upload_id: str,
bucket: str,
filename: str,
) -> None
Generic task to process uploaded file, usable by FastAPI BackgroundTasks.