Architecture¶

System Overview¶

Xylolabs Face API is a modular face detection and privacy masking system with four primary components:

API Server — FastAPI application handling detection/masking requests
Admin Console — Astro + Svelte SPA for monitoring and management
PostgreSQL — Relational store for image metadata and job history
S3 Storage — Object store for original and processed images

Data Flow¶

Detection Request (`POST /api/v1/detect`)¶

Client → FastAPI → Image Decode (OpenCV) → SCRFD Inference (ONNX Runtime)
                                                    ↓
                                              Postprocessing (NMS, clip, sort)
                                                    ↓
                                              [Optional] Persist to S3 + DB
                                                    ↓
                                              JSON Response ← Client

Masking Request (`POST /api/v1/mask`)¶

Client → FastAPI → Image Decode → SCRFD Inference → Face Masking (OpenCV)
                                                           ↓
                                                    Image Encode (JPEG/PNG/WebP)
                                                           ↓
                                                    [Optional] Persist to S3 + DB
                                                           ↓
                                          Image Response or JSON ← Client

Admin Flow¶

Admin SPA → REST API → /api/v1/admin/* → PostgreSQL (query)
                                        → S3 (presigned URLs)
                                        ← JSON Response → Render in Svelte

Services¶

Service	Image	Port	Purpose
`api`	`python:3.14-slim`	8000	FastAPI + SCRFD detection
`admin`	`node:24-alpine`	4321	Static SPA served by `serve`
`db`	`postgres:18-alpine`	5432	Image metadata, job history
`minio`	`minio/minio:latest`	9000/9001	S3-compatible object storage

Key Design Decisions¶

Direct ONNX Inference (no insightface dependency)¶

The SCRFD detector (app/detector.py) loads ONNX models directly via onnxruntime, implementing preprocessing and postprocessing in numpy/OpenCV. This avoids the insightface package and its heavy transitive dependencies (scikit-learn, scipy, Cython), reducing the Docker image by ~300MB.

Optional Storage¶

Storage (DB + S3) is disabled by default (FACE_API_STORAGE_ENABLED=false). When disabled, the API operates statelessly — process and return. When enabled, every request persists the original image to S3 and job metadata to PostgreSQL. This design supports both lightweight deployment and full audit-trail operation.

Non-Blocking Persistence¶

Storage operations are wrapped in _try_persist() which catches and logs exceptions without failing the API response. This ensures that S3 or DB outages never impact detection/masking latency.

Concurrency Controls¶

All CPU-bound work (image decode, ONNX inference, masking, encoding) runs in a dedicated ThreadPoolExecutor (app.state.cpu_executor) to avoid blocking the async event loop.

Key mechanisms: - Inference Semaphore (FACE_API_MAX_CONCURRENT_INFERENCE, default: 2) — limits parallel ONNX inference calls to prevent thread oversubscription and OOM - Dedicated CPU Executor — sized to max_inference + 2 workers, separate from the default thread pool used by S3/DB I/O - Concurrency Limit Middleware (FACE_API_MAX_CONCURRENT_REQUESTS, default: 10) — pure ASGI middleware that fast-rejects with 503 Retry-After: 1 when all request slots are taken, providing backpressure to upstream load balancers - Pure ASGI Security Headers — avoids BaseHTTPMiddleware response buffering overhead

Production Deployment¶

Currently deployed on Oracle Cloud Infrastructure (OCI): - VM: Ampere Altra ARM (2 cores, 4GB RAM) at 130.162.132.159 - Domains: face-api.xylolabs.com (API), admin.face-api.xylolabs.com (Admin) - SSL: Let's Encrypt via certbot with auto-renewal - Reverse Proxy: nginx with HTTP→HTTPS redirect - Tuning: 1 uvicorn worker, max_concurrent_inference=1, max_concurrent_requests=8, ort_threads=2