Face Detection — SCRFD¶
Overview¶
SCRFD (Sample-and-Computation-Redistribution Face Detector) is the detection backbone. Published by InsightFace (DeepInsight), it achieves state-of-the-art results on WIDER FACE by redistributing computation across network stages based on the difficulty of detecting faces at each scale.
Variants¶
| Model | WIDER FACE AP (Easy/Med/Hard) | Params | FLOPs | Inference (VGA) |
|---|---|---|---|---|
| SCRFD-10G | 95.16 / 93.87 / 83.05 | 3.86M | 10G | ~5ms |
| SCRFD-2.5G | 93.78 / 92.16 / 77.87 | 0.67M | 2.5G | ~4ms |
| SCRFD-500M | 90.57 / 88.12 / 68.51 | 0.57M | 500M | ~3ms |
All models use 3 feature map strides (8, 16, 32) with 2 anchors per location. KPS variants include 5-point facial landmark regression.
Preprocessing¶
- Letterbox resize: Scale image to fit within input size (default 640x640) maintaining aspect ratio
- Zero-pad: Place resized image in top-left corner of a zero-filled canvas
- Normalize:
cv2.dnn.blobFromImagewithmean=(127.5, 127.5, 127.5),scale=1/128,swapRB=True - Output:
[1, 3, 640, 640]float32 tensor (NCHW, RGB, range ~[-1, 1])
Model Outputs¶
For SCRFD with keypoints (9 outputs total):
| Index | Name | Shape | Description |
|---|---|---|---|
| 0-2 | scores | (1, N_i, 1) |
Detection confidence per anchor at stride 8/16/32 |
| 3-5 | bboxes | (1, N_i, 4) |
Distance predictions (left, top, right, bottom) |
| 6-8 | keypoints | (1, N_i, 10) |
5-point landmark offsets (x, y pairs) |
Where N_i = 2 * H_i * W_i (2 anchors per location).
Postprocessing¶
Anchor Generation¶
centers = np.mgrid[:height, :width][::-1] # (x, y) grid
centers = centers * stride # Scale to pixel coords
centers = stack([centers] * 2, axis=1) # Duplicate for 2 anchors
Distance-to-BBox Conversion¶
x1 = anchor_x - pred_left * stride
y1 = anchor_y - pred_top * stride
x2 = anchor_x + pred_right * stride
y2 = anchor_y + pred_bottom * stride
Non-Maximum Suppression (NMS)¶
Standard IoU-based NMS with configurable threshold (default 0.4). Implemented in pure numpy — sufficient for typical face counts (< 100 pre-NMS candidates).
Post-NMS¶
- Scale bounding boxes back by
1 / det_scale - Clip to original image dimensions
- Sort by confidence descending
- Apply
max_faceslimit
Landmarks¶
Five facial landmarks in order: 1. Left eye 2. Right eye 3. Nose tip 4. Left mouth corner 5. Right mouth corner
Encoded as distance offsets from anchor centers, decoded the same way as bounding boxes but for (x, y) pairs.