From Stale Batch Jobs to Real-time Pipelines
Why Netflix had to rebuild recommendations from the ground up — and how the architecture evolved.
The Stale Recommendation Problem
Netflix's original recommendation system ran batch jobs nightly. A user who watches three horror movies at 10 PM is still seeing comedy suggestions the next morning — because the batch job already ran. At 2.3 million events per second, waiting 24 hours means serving millions of stale, irrelevant recommendations.
Apache Kafka: The Event Backbone
Kafka is not a message queue — it's a distributed commit log. Every click, play, pause, seek, and trailer view becomes a durably-stored event in a partitioned topic. Producers write to topics with idempotent exactly-once semantics. Consumer groups allow multiple independent services to process the same stream at their own pace without coordination.
Apache Flink: Stateful Stream Processing
Flink consumes from Kafka and maintains stateful user preference profiles using session windows — groups of activity with a 30-minute idle gap. If you watch three movies in one evening, Flink sees that as one session and aggregates all signals together. Out-of-order events (mobile network delays) are handled with a 5-second watermark.
Hybrid Collaborative Filtering
Collaborative filtering (ALS) decomposes the user-item interaction matrix R ≈ U × V^T into k=128 latent factors that capture hidden taste dimensions — "likes complex narratives," "prefers fast-paced action," etc. The cold-start problem (new content with no views) is solved by blending: content-based filtering for cold items (<100 views), ALS for warm items (>100 views), with a weight that shifts continuously as view count grows.
Serving at Scale with Redis + FastAPI
Updated user feature vectors flow from Flink into Redis as 512-byte binary blobs (float32 × 128 dims). At serve time, the FastAPI recommendation service fetches the vector in under 1ms and computes dot products against all item factors loaded in memory. The entire recommendation cycle — from Kafka event to cached Redis update to API response — completes in under 50ms end-to-end.
The Outcome: Real Numbers
Switching from nightly batch to real-time streaming didn't just reduce latency — it demonstrably changed user behavior. The system now adapts to each viewing session as it happens, serving recommendations that reflect what users are in the mood for right now, not yesterday.