Recommendations are the highest-leverage ML application in consumer tech. Amazon attributes 35% of its revenue to recommendation algorithms. Netflix estimates its system saves $1 billion per year in subscriber retention by reducing churn through personalized content. YouTube reports that 70% of watch time comes from recommended content. Spotify's Discover Weekly drives 30% of all streams for featured artists.
These numbers are not marketing claims. They are measured lifts from A/B tests at scale. The question is not whether recommendations matter. It is whether your recommendation system is capturing the signals available in your data.
The technology has evolved through four generations. Each one extracts patterns that the previous one could not see.
Generation 1: Content-based filtering
The simplest approach: recommend items similar to what the user has liked before. If you watched three action movies, recommend more action movies. If you bought running shoes, recommend running apparel.
Content-based filtering works on item attributes. Each item is represented by a feature vector (genre, price range, brand, color, description keywords). The system finds items with similar feature vectors to the user's historical preferences.
Strengths
- No cold-start for new users if you know their preferences
- Transparent reasoning ("recommended because you liked X")
- Works with no collaborative data (single-user systems)
Limitations
- Filter bubble. Users only see items similar to what they have already seen. A runner who might love cycling content never gets exposed to it.
- Feature dependency. Quality depends on the richness of item metadata. If product descriptions are sparse or categories are coarse, recommendations are generic.
- No serendipity. Cannot discover that users who like item A also tend to like item B, even if A and B have different attributes.
Here is what the underlying data looks like for a streaming platform. The recommendation signal spans users, content, and interactions.
users
| user_id | name | plan | signup | region |
|---|---|---|---|---|
| U-501 | Natalie Reeves | Premium | 2023-05-18 | US-West |
| U-502 | Carlos Mendez | Standard | 2024-02-10 | US-South |
| U-503 | Yuki Tanaka | Premium | 2022-11-03 | US-East |
content
| content_id | title | genre | release | avg_rating |
|---|---|---|---|---|
| C-101 | The Signal | Sci-Fi Thriller | 2025-09-01 | 4.2 |
| C-102 | Iron Ridge | Action | 2025-10-15 | 3.8 |
| C-103 | Midnight Bloom | Drama | 2025-11-01 | 4.6 |
| C-104 | Zero Hour | Sci-Fi Thriller | 2025-11-20 | --- |
C-104 (Zero Hour) is a new release with no ratings yet. This is the cold-start problem: collaborative filtering cannot recommend it.
interactions
| interaction_id | user_id | content_id | type | date | rating |
|---|---|---|---|---|---|
| INT-01 | U-501 | C-101 | Watched | 2025-09-05 | 5 |
| INT-02 | U-501 | C-102 | Watched | 2025-10-18 | 3 |
| INT-03 | U-502 | C-101 | Watched | 2025-09-12 | 4 |
| INT-04 | U-502 | C-103 | Watched | 2025-11-05 | 5 |
| INT-05 | U-503 | C-101 | Watched | 2025-09-08 | 5 |
| INT-06 | U-503 | C-103 | Watched | 2025-11-03 | 4 |
| INT-07 | U-503 | C-102 | Browsed | 2025-11-10 | --- |
Users U-501, U-502, and U-503 all loved 'The Signal' (Sci-Fi Thriller). Zero Hour is the same genre but has no interactions yet. Graph-based models connect it through genre edges.
Generation 2: Collaborative filtering
The breakthrough insight: you do not need item attributes to make good recommendations. You just need to know what similar users liked. If users A, B, and C all liked items 1, 2, and 3, and user A also liked item 4, then recommend item 4 to users B and C.
This is the approach that powered Amazon's early recommendation engine and the Netflix Prize, the famous $1 million competition that drove a decade of recommendation research.
Matrix factorization
The dominant technique from 2006 to 2016 was matrix factorization. Represent the user-item interaction matrix (users as rows, items as columns, interactions as values) and decompose it into two lower-dimensional matrices. Each user gets a latent vector. Each item gets a latent vector. The dot product of a user vector and an item vector predicts the interaction strength.
Matrix factorization is elegant and efficient. It handles sparse data well (most users interact with a tiny fraction of items) and scales to millions of users and items with techniques like alternating least squares (ALS) and stochastic gradient descent.
Limitations
- Cold-start for new items and users. With no interaction history, there is no latent vector. New items never get recommended until they accumulate enough interactions.
- Ignores side information. Matrix factorization uses only the interaction matrix. Item attributes, user demographics, temporal patterns, and contextual signals are discarded.
- Static representation. A user's latent vector summarizes their entire history equally. Recent interests are weighted the same as interests from years ago.
Generation 3: Deep learning
Starting around 2016, deep learning entered recommendations. Neural collaborative filtering replaced dot products with neural networks. Sequence models (RNNs, then transformers) captured temporal dynamics in user behavior. Two-tower architectures enabled efficient retrieval at scale.
Key advances
- Neural collaborative filtering. Replace the linear dot product with a multi-layer network that can learn non-linear user-item interactions. This captures complex preference patterns that matrix factorization misses.
- Sequential models. Treat a user's interaction history as a sequence and use transformers (like SASRec) to model temporal dynamics. Recent interactions are weighted more heavily. Interest drift is captured.
- Side information integration. Deep models can incorporate item features, user features, and contextual signals alongside interaction data. This helps with cold-start.
What deep learning still misses
Deep learning recommendation models process the user-item interaction graph implicitly through embeddings, but they do not model the full relational structure of the data. Here is what each generation sees for the same user.
what each generation sees for Natalie (U-501)
| generation | data_used | recommendation_for_natalie | signal_source |
|---|---|---|---|
| Content-based | Genre: Sci-Fi Thriller (from The Signal) | More Sci-Fi Thrillers | Item attributes only |
| Collaborative | Users who watched The Signal also watched... | Iron Ridge (popular overlap) | User-item matrix only |
| Deep learning | Natalie's watch sequence: Signal then Iron Ridge | Midnight Bloom (sequence pattern) | Interaction sequence |
| Graph-based | Signal (5-star) + genre edge + U-502, U-503 also loved Signal + they loved Midnight Bloom | Midnight Bloom (0.84), Zero Hour (0.91) | Full relational graph |
Each generation adds a layer of signal. Only the graph-based approach discovers that Zero Hour (a new release with zero interactions) should rank highest for Natalie, because it shares a genre edge with The Signal and all users who rated The Signal highly are in Natalie's graph neighborhood.
These multi-hop, relational signals require a model that represents the full data topology, not just sequences.
Generation 4: Graph-based approaches
The latest generation represents the recommendation problem as a graph. Users, items, categories, brands, and all other entities become nodes. Interactions (purchases, clicks, views, ratings) become edges. The graph captures the full relational structure of the data.
How graph recommendations work
A graph neural network processes this structure by passing messages along edges. In the first layer, each node aggregates information from its direct neighbors. In the second layer, it aggregates from 2-hop neighbors. After several layers, each node's representation encodes information from its entire local neighborhood.
For recommendations, this means a user's representation captures:
- Their direct interactions (items they bought)
- The attributes of those items (categories, brands, prices)
- Other users who bought the same items (collaborative signal via the graph)
- Items those similar users bought (2-hop recommendations)
- Temporal patterns (recent vs. historical interactions, trending items)
This captures both content-based and collaborative signals in a single unified model, plus multi-hop relational patterns that neither approach captures alone.
Matrix factorization / Deep learning
- User-item interaction matrix only
- Cold-start problem for new items/users
- Ignores multi-hop relational patterns
- Side information requires manual integration
- Static or sequence-based user representation
Graph-based (KumoRFM)
- Full relational structure as a graph
- Side information connected through graph edges
- Multi-hop patterns captured naturally
- Temporal dynamics preserved on edges
- Unified content-based + collaborative signal
Production evidence
Graph-based recommendation systems have shown significant lifts in production:
- Pinterest's PinSage (one of the first large-scale graph recommendation models) processes 3 billion nodes and 18 billion edges to recommend pins, delivering measurable engagement improvements over previous approaches
- DoorDash used graph-based recommendations and saw a 1.8% engagement lift across 30 million users
- Alibaba's graph recommendation system handles 1 billion items and showed a 10% conversion rate improvement over deep learning baselines
The feature engineering trap in recommendations
Building a recommendation system with traditional ML requires extensive feature engineering. For each user-item pair, you need features like:
- User's interaction history with this item's category
- User's average rating for this brand
- Time since user's last purchase in this category
- Popularity of this item in user's geographic region
- Price relative to user's typical purchase range
- Similarity score to user's top 10 most-interacted items
Each feature requires SQL joins across users, interactions, items, categories, and potentially more tables. For a catalog of 1 million items and 10 million users, computing pairwise features is computationally expensive and the feature engineering takes weeks.
A foundation model eliminates this entirely. It reads the raw relational tables (users, items, interactions, categories, brands) and generates recommendations without any manual feature engineering. DoorDash's 1.8% engagement lift came without building a single feature.
PQL Query
PREDICT interactions.rating > 3 FOR EACH users.user_id, content.content_id
The model reads users, content, and interactions as a graph. For the new release 'Zero Hour' (C-104), it connects through the Sci-Fi Thriller genre edge to 'The Signal' (C-101), then to users who rated it highly. Cold-start solved through graph connectivity.
Output
| user_id | content_id | relevance_score | top_signal |
|---|---|---|---|
| U-501 | C-104 | 0.91 | Rated The Signal 5/5, same genre |
| U-503 | C-104 | 0.88 | Rated The Signal 5/5, browsed Action |
| U-502 | C-104 | 0.72 | Rated The Signal 4/5, prefers Drama |
| U-501 | C-103 | 0.84 | Similar users U-502, U-503 loved it |
The cold-start advantage
Cold-start is the oldest unsolved problem in recommendations. A new user with no history gets generic, low-value recommendations. A new item with no interactions never gets surfaced.
Graph-based approaches mitigate cold-start through graph connectivity. A new user who signs up with demographic information and browses three products is immediately connected in the graph to those products, their categories, their brands, and (through the graph) to other users who interacted with similar items. Even sparse initial interactions provide rich graph context.
A new item is connected to its category, brand, price range, and attributes from the moment it enters the catalog. Graph propagation ensures it appears in recommendations to relevant users even without interaction history, based on its relational position.
Where each generation fits
Content-based filtering
Best for simple catalogs with rich metadata and limited interaction data. Works well for editorial or curated recommendations.
Collaborative filtering / matrix factorization
Good baseline for established catalogs with abundant interaction data. Fast, well-understood, and easy to implement.
Deep learning (sequential, two-tower)
Right for high-scale systems where temporal dynamics matter and you have the engineering capacity for neural model training and serving.
Graph-based / foundation models
Best for complex catalogs with multi-table data (products, categories, brands, user attributes, contextual signals) where the relational structure carries predictive value. Strongest advantage on cold-start, multi-hop discovery, and rapid deployment.
The bottom line
Recommendation systems have evolved from matching item attributes to learning latent factors to modeling full relational graphs. Each generation captures signals the previous one missed. The difference is not marginal: DoorDash measured a 1.8% lift across 30 million users by moving to graph-based recommendations. At DoorDash's scale, that is hundreds of millions in incremental revenue.
The relational structure of your data (users connected to items connected to categories connected to brands connected to other users) is not noise. It is signal. Models that ignore it leave money on the table. Models that exploit it find recommendations that collaborative filtering and deep learning cannot.
If your recommendation system is based on matrix factorization or a two-tower neural model, the graph structure of your data is sitting unused. The question is not whether to upgrade. It is how much lift you are leaving on the table.