Tessact · 2022–25

Face Detection & Tracking Service

Replaced AWS Rekognition with a custom face indexing service. 97.5% cost reduction. Live at JioHotstar and SunTV.

$0.15/hr

vs $6/hr before

97.5%

Cost reduction

$2,500

Saved per month

InsightFace ArcFace Python Docker PostgreSQL GCP

Problem

Tessact’s face detection pipeline ran on AWS Rekognition at $6 per hour of processed video. For a media intelligence platform processing hundreds of hours of broadcast content for clients like JioHotstar and SunTV, the bill was unsustainable.

The goal: replace Rekognition with a custom service that matched or exceeded accuracy at a fraction of the cost.

Approach

Model selection — Evaluated several open-source face recognition models. Selected InsightFace’s ArcFace model for its accuracy on broadcast video (partial occlusion, varied lighting, non-frontal faces).
Custom vector indexing — Built a face index using vector similarity search. ArcFace embeddings are stored per-identity; incoming frames are matched against the index using cosine similarity with a configurable threshold.
Dockerised service — Packaged as a Docker container with a FastAPI interface matching Rekognition’s API surface — a drop-in swap for the existing pipeline.
In-place deployment — Deployed to GCP Cloud Run. Swapped in without changes to upstream or downstream services.

Tech Stack

ML: InsightFace (ArcFace model), custom vector index
Backend: Python, FastAPI, NumPy
Infrastructure: Docker, GCP Cloud Run
Storage: PostgreSQL (face index metadata)

Results

$6/hr → $0.15/hr — 97.5% cost reduction
Saves $2,500/month in AWS spend
Runs in production for JioHotstar and SunTV
Accuracy on par with or better than Rekognition for broadcast video

What I learned

Rekognition charges per hour of video processed. A self-hosted model costs what the instance costs, regardless of throughput. We hit break-even in the first week of real traffic.

The vector index held up well as the face database grew — ArcFace embeddings are stable. The tricky part was threshold tuning. Broadcast video is harder than photos: partial occlusion, motion blur, non-frontal faces. Getting the false positive / miss tradeoff right took a few calibration passes on actual content from the clients.