Architecture¶
System Overview¶
Xylolabs Face API is a modular face detection and privacy masking system with four primary components:
- API Server — FastAPI application handling detection/masking requests
- Admin Console — Prebuilt Astro + Svelte admin UI served through a small Node session-gating runtime
- PostgreSQL — Relational store for image metadata and job history
- S3 Storage — Object store for original and processed images
Data Flow¶
Detection Request (POST /api/v1/detect)¶
Client → FastAPI → Image Decode (OpenCV) → SCRFD Inference (ONNX Runtime)
↓
Postprocessing (NMS, clip, sort)
↓
[Optional] Persist to S3 + DB
↓
JSON Response ← Client
Masking Request (POST /api/v1/mask)¶
Client → FastAPI → Image Decode → SCRFD Inference → Face Masking (OpenCV)
↓
Image Encode (JPEG/PNG/WebP)
↓
[Optional] Persist to S3 + DB
↓
Image Response ← Client
Admin Flow¶
Admin UI (runtime server + hydrated client) → REST API → /api/v1/admin/* → PostgreSQL (query)
→ S3 / asset URL generation
← JSON Response → Render in Svelte
Services¶
| Service | Image | Port | Purpose |
|---|---|---|---|
api |
python:3.14-slim |
8000 | FastAPI + SCRFD detection |
admin |
node:24-alpine |
4321 | Prebuilt admin app served by a small Node session-gating server |
db |
postgres:18-alpine |
5432 | Image metadata, job history |
minio |
minio/minio:RELEASE.2025-04-22T22-12-26Z |
9000/9001 | S3-compatible object storage |
Key Design Decisions¶
Direct ONNX Inference (no insightface dependency)¶
The SCRFD detector (app/detector.py) loads ONNX models directly via onnxruntime, implementing preprocessing and postprocessing in numpy/OpenCV. This avoids the insightface package and its heavy transitive dependencies (scikit-learn, scipy, Cython), keeping the runtime smaller and simpler than shipping the full training/inference toolkit stack. Model archives are downloaded through scripts/download_model.py, which now pins and verifies the expected SHA-256 digest for each supported InsightFace release asset before extraction.
Optional Storage¶
Storage (DB + S3) is disabled by default (FACE_API_STORAGE_ENABLED=false). When it is disabled, the API operates statelessly — process and return. When it is enabled, every request persists the original image to S3 and job metadata to PostgreSQL. This design supports both lightweight deployment and full audit-trail operation.
Non-Blocking Persistence¶
Storage operations are wrapped in _try_persist(), which catches and logs exceptions without failing the public API response. That keeps the detect/mask path responsive during storage degradation, but persistence failures are still surfaced in logs and reconciled through cleanup tasks instead of being treated as invisible success.
Concurrency Controls¶
All CPU-bound work such as image decode, ONNX inference, masking, and encoding runs in a dedicated ThreadPoolExecutor (app.state.cpu_executor) to avoid blocking the async event loop.
Key mechanisms:
- Inference Semaphore (FACE_API_MAX_CONCURRENT_INFERENCE, default: 2) — limits parallel ONNX inference calls to prevent thread oversubscription and OOM
- Dedicated CPU Executor — sized to max_inference + 2 workers, separate from the default thread pool used by S3/DB I/O
- Concurrency Limit Middleware (FACE_API_MAX_CONCURRENT_REQUESTS, default: 10) — pure ASGI middleware that fast-rejects with 503 Retry-After: 1 when all request slots are taken, providing backpressure to upstream load balancers
- Pure ASGI Security Headers — avoids BaseHTTPMiddleware response buffering overhead
Production Deployment¶
Currently deployed on Oracle Cloud Infrastructure (OCI):
- VM: Ampere Altra ARM (2 cores, 4GB RAM) at 130.162.132.159
- Domains: face-api.xylolabs.com (API), face-api-admin.xylolabs.com (Admin)
- SSL: Let's Encrypt via certbot with auto-renewal
- Reverse Proxy: nginx with HTTP→HTTPS redirect
- Tuning: 1 uvicorn worker, max_concurrent_inference=1, max_concurrent_requests=8, ort_threads=2