Every ecommerce team eventually asks the same question: how do we recommend products that people actually want? The internet is full of tutorials that walk you through building a basic collaborative filter in Python. Those tutorials are not wrong. They are just incomplete. They work on clean demo datasets where every user has rated 50+ movies. They do not work when 60% of your traffic is new or anonymous, your catalog changes weekly, and your CEO wants to know why revenue per session has not moved.
This is a practitioner's guide. Five approaches, ranked honestly, with the trade-offs that the tutorials skip.
Five approaches to recommendations, ranked
Not all recommendation approaches are equal. Here they are, from simplest to most capable, with a direct comparison across the dimensions that matter in production.
recommendation_approaches_comparison
| dimension | Rule-Based / Popularity | Collaborative Filtering | Content-Based | Hybrid Deep Learning | Graph-Based (KumoRFM) |
|---|---|---|---|---|---|
| How it works | Show bestsellers or hand-picked items | Find users with similar purchase/rating history | Match item attributes to user preference profiles | Neural networks combining multiple signal types | Reads the full relational graph: users, items, categories, sessions, reviews, merchants |
| Cold start (new users) | Decent - everyone sees the same popular items | Fails completely - no history means no similar users | Poor - needs user preference data to match against | Partial - can use contextual signals but limited | Strong - uses relational context (signup channel, browsing, demographics) from day one |
| Cold start (new items) | Fails - new items have no popularity data | Fails - new items have no interaction data | Good - can use item attributes immediately | Good - combines content and contextual signals | Strong - reads category, brand, merchant, and catalog graph connections |
| Sparse data handling | Not affected (no user data used) | Poor - needs dense interaction overlap | Moderate - depends on attribute quality | Moderate - helps but still needs training data | Strong - propagates signal through relational connections to fill gaps |
| Personalization depth | None - same recs for everyone | Moderate - based on purchase overlap | Moderate - based on attribute matching | High - multi-signal personalization | Highest - captures user context, session behavior, social graph, and full item relationships |
| Engineering effort | Low - rules and SQL queries | Moderate - matrix factorization, nearest neighbors | Moderate - feature extraction, similarity scoring | Very high - custom neural architectures, large ML teams | Low with foundation model - connect tables, write PQL query |
| Who uses this | Small retailers, early-stage startups | Mid-size ecommerce, media platforms | Content platforms, news sites | Netflix, Amazon, Spotify (200+ person ML teams) | Enterprise teams using KumoRFM |
| Accuracy ceiling | Low - no personalization | Moderate - limited by interaction density | Moderate - limited by attribute quality | High - but requires massive engineering investment | Highest - reads patterns across all connected data |
Five recommendation approaches compared across 8 dimensions. Each level adds capability but also adds complexity, except graph-based foundation models which add capability while reducing engineering effort.
Approach 1: Rule-based and popularity models
This is where most teams start, and it is not a bad starting point. Show the top-selling items. Show items frequently bought together. Show "customers who viewed X also viewed Y" based on co-occurrence counts.
Popularity models are easy to build, easy to explain, and they establish a baseline. The problem is the ceiling. Everyone sees the same recommendations. There is no personalization. Revenue lift from popularity-based recs is modest compared to personalized approaches. That is real money, but it is a fraction of what personalized recs can deliver.
- Best for: Small retailers and early-stage startups that need a baseline with zero ML investment.
- Watch out for: No personalization means every user sees the same recs. Revenue lift is modest compared to personalized approaches.
Approach 2: Collaborative filtering
Collaborative filtering is the textbook answer. Users who bought similar items in the past will buy similar items in the future. Find the nearest neighbors in the user-item interaction matrix, borrow their preferences, and recommend accordingly.
It works well in one specific scenario: when you have dense interaction data. Netflix circa 2006, when every user had rated dozens of movies, was the ideal case. The Netflix Prize competition (2006-2009) made collaborative filtering famous precisely because the dataset was unusually dense.
In practice, most ecommerce interaction matrices are less than 1% dense. A user has bought 3 items out of a 100,000-item catalog. There is not enough overlap with other users to find meaningful neighbors. And for new users with zero purchases, collaborative filtering returns nothing.
- Best for: Mid-size ecommerce and media platforms with dense interaction history (users have rated or purchased dozens of items).
- Watch out for: Fails completely on cold start. If 40-60% of sessions are new or low-activity users, your largest audience segment gets your worst recommendations.
Approach 3: Content-based filtering
Content-based filtering matches item attributes to user preferences. If a user bought running shoes, recommend other running shoes based on attributes like brand, price range, cushioning type, and color. It does not need other users' data, so it avoids the worst of the cold-start problem for new items.
The limitation is that it only recommends more of the same. A user who bought running shoes gets more running shoes, never running socks, water bottles, or GPS watches. Content-based filtering cannot discover cross-category patterns because it only reads item attributes, not the broader context of how products relate to each other through user behavior.
- Best for: Content platforms and news sites where item attributes are rich and well-structured. Handles new-item cold start well.
- Watch out for: Only recommends more of the same. No cross-category discovery. Still fails on new-user cold start.
Approach 4: Hybrid deep learning (what Netflix and Amazon actually use)
Netflix does not run one recommendation algorithm. It runs over 200, each specialized for a different signal type, and a meta-algorithm selects which recommendations to show each user. Each row on your Netflix home screen is generated by a different model. The system blends collaborative filtering, content embeddings, sequence models (what you watched recently and in what order), contextual bandits (time of day, device type), and more.
Amazon Personalize is the managed-service version of this approach. It offers real-time personalization through APIs, handling some of the infrastructure complexity, but you still need to structure your data correctly, manage campaigns, and tune recipes.
Hybrid systems deliver the best results of the traditional approaches. The catch: Netflix has a 200+ person ML team. Building and maintaining a hybrid recommendation system at that level requires dedicated infrastructure engineers, ML researchers, and years of iteration. Most ecommerce companies do not have those resources.
- Best for: Companies with 20-200+ person ML teams and multi-year timelines. Netflix, Amazon, and Spotify operate at this level.
- Watch out for: Massive engineering investment. Still operates on flat interaction tables, limiting relational signal capture. Cold start is only partially addressed.
Approach 5: Graph-based recommendations (KumoRFM)
Here is where the step change happens. Every approach above operates on some subset of your data: the interaction matrix, the item attribute table, the session log. A graph-based model reads all of it at once, as a connected graph.
Think about what your data actually looks like in your warehouse. You have a users table, an orders table, a products table, a categories table, a sessions table, a reviews table, and a merchants table. These tables are connected by foreign keys. User 1234 placed Order 5678, which contained Product 9012, which belongs to Category "Electronics," which was sold by Merchant "TechStore," and User 1234 wrote Review 3456 for Product 9012 during Session 7890 on a mobile device at 9 PM.
That web of connections is a graph. And it contains far more predictive signal than any single flat table.
- Best for: Any team that wants near Netflix-level accuracy without building a 200-person ML team. Handles cold start, sparse data, and long-tail items in one pass.
- Watch out for: Requires relational data in a data warehouse (tables connected by foreign keys). The more tables you connect, the better the results.
Why cold start is the real test
Every recommendation approach looks decent when your user has a long purchase history. The real test is what happens when they do not. Cold start is not an edge case. For most growing ecommerce companies, new and low-activity users are the majority of traffic.
Here is how each approach handles a new user who just signed up and has not bought anything:
- Rule-based/popularity: Shows bestsellers. No personalization. Works as a fallback but leaves money on the table.
- Collaborative filtering: Returns nothing useful. The user has no interactions, so there are no similar users to borrow from. Most systems fall back to popularity, which defeats the purpose.
- Content-based: Cannot work. No user preference profile exists yet. Needs at least one interaction to build a profile.
- Hybrid deep learning: Can use contextual signals (device type, time of day, referral source) for a partial cold-start solution. Better than collaborative filtering alone, but still limited without interaction data.
- Graph-based (KumoRFM): Reads the relational context that exists even for new users. The user signed up from Austin, Texas through a Google Shopping ad for winter jackets, on an iPhone, at 8 PM. The graph connects this user to geographic, channel, device, and temporal patterns from millions of other users. The model recommends relevant products from the first session, no purchase history required.
This is not a theoretical advantage. On benchmarks with cold-start scenarios, graph models consistently outperform collaborative filtering on new-user recommendation accuracy in published benchmarks. For a growing ecommerce business, that gap translates directly to conversion rate on your largest audience segment.
The benchmark evidence
Talk is cheap. Here are the numbers from third-party benchmarks on real relational data.
sap_salt_benchmark_recommendations
| approach | accuracy | what_it_means |
|---|---|---|
| LLM + AutoML | 63% | Language model generates features, AutoML selects model |
| PhD Data Scientist + XGBoost | 75% | Expert spends weeks hand-crafting features, tunes XGBoost |
| KumoRFM (zero-shot) | 91% | No feature engineering, no training, reads relational tables directly |
SAP SALT benchmark on enterprise relational data. KumoRFM outperforms expert-tuned models by 16 percentage points. The gap comes from relational patterns that flat feature tables structurally cannot contain.
relbench_benchmark_recommendations
| approach | AUROC | feature_engineering_time |
|---|---|---|
| LightGBM + manual features | 62.44 | 12.3 hours per task |
| KumoRFM zero-shot | 76.71 | ~1 second |
| KumoRFM fine-tuned | 81.14 | Minutes |
RelBench benchmark across 7 databases, 30 prediction tasks. KumoRFM zero-shot outperforms manually engineered LightGBM by 14+ AUROC points. Fine-tuning pushes the gap to nearly 19 points.
What this looks like in practice
Traditional recommendation pipelines require months of work: build an ETL pipeline, engineer features from multiple tables, train and tune models, deploy serving infrastructure, build A/B testing, and maintain everything as your catalog and user base change.
With KumoRFM, you connect your relational tables and write a PQL (Predictive Query Language) query. The model reads the full relational graph and predicts which items each user will interact with next.
Traditional recommendation pipeline
- Build ETL pipeline to join user, product, order, session, and review tables (2-4 weeks)
- Engineer features: user purchase history, item popularity, co-occurrence counts, session recency (4-8 weeks)
- Train collaborative filtering + content-based models separately
- Build hybrid blending layer to combine model outputs
- Deploy real-time serving infrastructure with low-latency requirements
- Rebuild pipeline every time catalog or schema changes
- Cold start users get popularity fallback (no personalization)
KumoRFM recommendation pipeline
- Connect to data warehouse: users, orders, products, categories, sessions, reviews
- Write PQL: PREDICT product_id FOR EACH users.user_id
- Model reads all tables and discovers predictive patterns automatically
- Handles warm users, cold-start users, and new items in one pass
- No feature engineering, no model blending, no graph construction
- Schema changes handled automatically by the foundation model
- One platform, one query, all user segments
PQL Query
PREDICT product_id FOR EACH users.user_id WHERE orders.order_date > '2026-01-01'
One PQL query replaces the full recommendation pipeline: feature engineering across 6+ tables, model training, cold-start handling, and scoring. KumoRFM reads raw relational tables and discovers which products each user is most likely to buy next.
Output
| user_id | top_recommendation | confidence | signal_source |
|---|---|---|---|
| USR-1001 (active buyer) | Wireless Earbuds Pro | 0.89 | Purchase sequence + category affinity + session recency |
| USR-1002 (new user, 0 purchases) | Running Shoes X1 | 0.74 | Signup channel + geo + browse category + similar user graph |
| USR-1003 (dormant 6 months) | Smart Watch V3 | 0.68 | Historical preferences + new product-category graph connections |
| USR-1004 (1 purchase only) | Phone Case Ultra | 0.81 | Product-to-product graph + category co-purchase patterns |
Why graph models win on sparse and cold-start data
The core insight is simple. A flat interaction matrix is a lossy compression of your actual data. Your database has users connected to orders connected to products connected to categories connected to brands connected to reviews connected to sessions. When you flatten that into a user-item matrix, you throw away most of the signal.
A graph model reads the original structure. Even a user with zero purchases is connected to the graph through their signup attributes, browsing behavior, geographic location, device type, referral channel, and any other data you capture. Every one of those connections provides signal that the model can propagate through the graph to generate recommendations.
This is why graph models show their largest accuracy gains precisely where traditional models struggle most: sparse data and cold start. The denser your interaction data, the smaller the gap between collaborative filtering and graph models. The sparser your data, the larger the gap becomes. Since most real-world ecommerce data is sparse, the gap matters.
The Netflix/Amazon comparison, honestly
Netflix and Amazon built recommendation systems that work very well. But they did it with hundreds of ML engineers over many years. The Netflix recommendation system is not one algorithm. It is a complex ensemble of 200+ specialized models with a meta-ranking layer on top, backed by custom infrastructure for feature computation, model serving, and online experimentation.
If you have that team and that timeline, build a hybrid system. You will get excellent results.
If you do not have that team (and most companies do not), the question is: what gets you closest to Netflix-quality recommendations with the resources you actually have? A collaborative filtering model with manual feature engineering gets you part of the way. A graph-based foundation model that reads your relational data directly gets you further, faster, with a smaller team.
recommendation_approach_roi
| approach | team_size_needed | time_to_production | cold_start_quality | accuracy_ceiling |
|---|---|---|---|---|
| Rule-based | 1 engineer | 1-2 weeks | Generic (popularity only) | Low |
| Collaborative filtering | 2-3 ML engineers | 2-3 months | Fails (no data = no recs) | Moderate |
| Hybrid deep learning (Netflix-style) | 20-200+ ML engineers | 1-3 years | Partial (contextual signals) | Very high |
| KumoRFM | 1 ML engineer or analyst | Days to weeks | Strong (relational context) | Very high |
Practical comparison of team investment vs. recommendation quality. KumoRFM reaches near Netflix-level accuracy with a fraction of the team size and timeline.
Getting started: a practical path
You do not need to rip out your current system. Here is the practical path from wherever you are to graph-based recommendations:
- If you have nothing today: Skip collaborative filtering entirely. Start with KumoRFM. Connect your product catalog, user table, and order history. Write a PQL query. You will have production-quality recommendations, including cold-start coverage, in days instead of months.
- If you have a collaborative filtering model: Keep it running. Add KumoRFM as a parallel system. Compare results on cold-start users first, where the gap is largest. Then expand to all users and measure the revenue lift.
- If you have a hybrid system: Test KumoRFM against your current system on the segments where your current system is weakest (typically cold start, sparse categories, and long-tail items). The foundation model approach often matches or beats custom hybrid systems on these segments while requiring a fraction of the maintenance.
- Regardless of starting point: Connect more tables over time. The more relational context KumoRFM can read (sessions, reviews, categories, merchants, inventory), the more patterns it discovers. Each additional table is a marginal improvement with zero additional feature engineering.