The headline result: SAP SALT benchmark
Before comparing individual tools, here is the result that matters most. The SAP SALT benchmark is an enterprise-grade evaluation where real business analysts and data scientists attempt prediction tasks on SAP enterprise data. It measures how accurately different approaches predict real business outcomes on production-quality enterprise databases with multiple related tables.
sap_salt_enterprise_benchmark
| approach | accuracy | what_it_means |
|---|---|---|
| LLM + AutoML | 63% | Language model generates features, AutoML selects model |
| PhD Data Scientist + XGBoost | 75% | Expert spends weeks hand-crafting features, tunes XGBoost |
| KumoRFM (zero-shot) | 91% | No feature engineering, no training, reads relational tables directly |
SAP SALT benchmark: KumoRFM outperforms expert data scientists by 16 percentage points and LLM+AutoML by 28 percentage points on real enterprise prediction tasks.
KumoRFM scores 91% where PhD-level data scientists with weeks of feature engineering and hand-tuned XGBoost score 75%. The 16 percentage point gap is the value of reading relational data natively instead of flattening it into a single table.
Why recommendation engines hit a ceiling
Every e-commerce and content platform has a recommendation engine. Most of them work the same way: build a user-item interaction matrix (who bought/clicked/viewed what), apply collaborative filtering or matrix factorization, and serve the results. Users who bought X also bought Y.
These engines work. They drive meaningful revenue. But they hit a ceiling because collaborative filtering sees only one signal: the interaction itself. It does not know why a user bought a product, whether they returned it, what they browsed before buying, which products share suppliers or attributes, or how product reviews connect buyers to each other.
The result is recommendations that are obvious (bestsellers, frequently co-purchased items) but rarely surprising. The long tail of your product catalog - where margins are often highest - gets almost no recommendation coverage. And new products with zero interaction history get recommended to nobody.
What makes enterprise recommendations different
Enterprise recommendation systems face three challenges that most off-the-shelf engines handle poorly:
- Massive, sparse product catalogs. An enterprise retailer may have 500,000+ SKUs. Most products have very few interactions. Collaborative filtering concentrates recommendations on the head of the distribution - popular products that need the least recommendation help. The long tail, where discovery actually matters, gets almost zero coverage.
- The cold-start problem. New products are added daily. A product with zero purchase history cannot be recommended by collaborative filtering. Traditional workarounds (content-based fallbacks, manual merchandising rules) are crude and do not scale. The first 30 days of a product's life - when recommendation coverage matters most for sell-through - are exactly when collaborative filtering fails.
- Multi-signal complexity. Enterprise data is relational: purchases, views, returns, reviews, wishlists, category hierarchies, supplier relationships, seasonal patterns. Flattening this into a user-item matrix throws away most of the signal. A return is not just a negative purchase - it tells you about product quality, size fit, expectation mismatch. A view without purchase tells you about interest without conversion. These signals matter.
The 7 best recommendation engines, compared
recommendation_engine_comparison
| Tool | Approach | Cold-Start Handling | Multi-Signal (purchases+views+returns) | Real-Time | Explainability | Best For |
|---|---|---|---|---|---|---|
| Kumo.ai | Multi-table relational GNN | Yes - via relational graph structure | Yes - learns from full relational graph | Batch + near real-time | PQL queries + feature importance | Enterprise with complex relational product data |
| Amazon Personalize | Collaborative filtering + deep learning | Limited - popularity fallback | No - flat interaction data only | Yes | Limited | AWS-native teams wanting managed recs |
| Dynamic Yield | Rules + collaborative filtering + A/B testing | Limited - rule-based fallback | Partial - web + email + app signals | Yes | A/B test attribution | Marketing teams wanting personalization + testing |
| Bloomreach | Commerce-focused search + merch + recs | Limited - content-based fallback | Partial - search + browsing + purchase | Yes | Merchandising dashboards | Commerce teams wanting search + recs unified |
| Algolia Recommend | Search-integrated collaborative filtering | Limited - trending fallback | No - interaction events only | Yes | Limited | Dev teams wanting fast API-first deployment |
| Google Recommendations AI | Cloud-native deep learning | Limited - catalog attribute fallback | Partial - catalog + interactions | Yes | Limited | GCP-native retail teams |
| Recombee | API-first collaborative + content-based | Partial - content-based hybrid | No - interaction events only | Yes | Limited | Multi-domain teams wanting API flexibility |
Highlighted: Kumo.ai is the only engine that ingests multi-table relational data and handles cold-start via graph structure. All other engines rely primarily on interaction data, which structurally limits their coverage of new products and multi-signal patterns.
1. Kumo.ai - GNN-based relational recommendations
Kumo.ai takes a fundamentally different approach to recommendations. Instead of building a user-item interaction matrix, it connects directly to your relational data warehouse and reads the raw tables: purchases, product views, returns, reviews, category hierarchies, supplier relationships, and any other relational data you have.
The system represents your data as a temporal heterogeneous graph. Each customer, each product, each purchase, each view, each return, each review becomes a node. Foreign key relationships become edges. The graph neural network then traverses this structure, learning which cross-table patterns predict what a customer will buy next.
Why the relational approach transforms recommendations
Consider a concrete example. A collaborative filtering engine sees: "User A bought Product X." That is one signal. Kumo's GNN sees:
- User A bought Product X, viewed Products Y and Z but did not buy them (interest without conversion)
- User A returned Product W in the same category (fit/quality signal)
- Product X shares a supplier and price range with Product Q, which was highly rated by users similar to A
- Users who reviewed Product X positively also bought Product R, which is new and has only 3 purchases so far
Each of these signals requires traversing multiple tables and multiple hops in the relational graph. Collaborative filtering cannot represent them because it operates on a single interaction matrix. The GNN discovers these multi-hop patterns automatically, without manual feature engineering.
RelBench benchmark results
On the RelBench recommendation benchmark, KumoRFM achieved a MAP@K of 7.29, compared to 1.85 for GraphSAGE and 1.79 for LightGBM. That is a roughly 4x improvement - and it comes from the same underlying data. The difference is structural: KumoRFM learns from the full relational graph while other approaches operate on flattened representations.
relbench_recommendation_benchmark
| Model | MAP@K | Approach | Uses Relational Structure |
|---|---|---|---|
| KumoRFM | 7.29 | Multi-table relational GNN | Yes - full graph |
| GraphSAGE | 1.85 | Single-graph GNN | Partial - single graph only |
| LightGBM | 1.79 | Gradient boosting on flat features | No - flat table |
RelBench recommendation benchmark (zero-shot). KumoRFM's 4x improvement comes from learning across the full relational graph rather than a flattened interaction matrix or single-graph structure.
PQL for recommendations
Kumo.ai uses Predictive Query Language (PQL) to define recommendation tasks directly on relational data. Instead of configuring model architectures, you express what you want to predict in a query:
PQL Query
PREDICT LIST_DISTINCT(ORDERS.PRODUCT_ID, 0, 30, days) RANK TOP 5 FOR EACH CUSTOMERS.CUSTOMER_ID
This query predicts the top 5 distinct products each customer will order in the next 30 days. The system automatically discovers which relational signals - past purchases, viewed products, return history, product category relationships, review patterns - are most predictive for each customer. No feature engineering required.
Output
| customer_id | rank_1 | rank_2 | rank_3 | rank_4 | rank_5 | confidence |
|---|---|---|---|---|---|---|
| C-1001 | SKU-4821 | SKU-7733 | SKU-1209 | SKU-5540 | SKU-8812 | 0.84 |
| C-1002 | SKU-3310 | SKU-9921 | SKU-0045 | SKU-6617 | SKU-2201 | 0.71 |
| C-1003 | SKU-7733 | SKU-4821 | SKU-3310 | SKU-1150 | SKU-9004 | 0.79 |
| C-1004 | SKU-0045 | SKU-1209 | SKU-5540 | SKU-8812 | SKU-3377 | 0.66 |
2. Amazon Personalize - AWS managed recommendations
Amazon Personalize is a fully managed recommendation service from AWS. You upload interaction data (clicks, purchases, views), optionally add item and user metadata, and the service trains and hosts recommendation models. It uses a combination of collaborative filtering and deep learning approaches developed from Amazon.com's own recommendation systems.
Strengths: Fully managed infrastructure - no ML ops overhead. Deep AWS integration (S3, Lambda, API Gateway). Real-time recommendations with low latency. Supports multiple recommendation types (similar items, personalized ranking, related items). Battle-tested at Amazon scale.
Limitations: Requires flat interaction data - cannot ingest relational tables directly. Cold-start handling is limited to popularity-based fallbacks and optional item metadata. Cannot model multi-hop relational patterns (product returns, review networks, supplier relationships). Explainability is minimal. AWS lock-in.
3. Dynamic Yield - personalization with A/B testing
Dynamic Yield (acquired by Mastercard) is a personalization platform that combines product recommendations with A/B testing, content personalization, and triggered messaging across web, email, and app channels. Its recommendation engine uses a mix of collaborative filtering, rule-based strategies, and merchandising controls.
Strengths: Best-in-class A/B testing framework for recommendation strategies. Cross-channel personalization (web, email, app, push). Strong merchandising controls for marketing teams. Easy to deploy recommendation widgets without engineering support.
Limitations: Recommendations are one feature within a broader personalization platform - not the deepest ML approach. Cold-start relies on rule-based fallbacks. Does not ingest relational data from a warehouse. Better for marketing teams wanting quick personalization than for ML teams wanting recommendation accuracy.
4. Bloomreach - commerce search + recommendations
Bloomreach is a commerce experience platform that unifies product search, merchandising, and recommendations in a single headless solution. Its recommendation engine leverages search and browsing behavior alongside purchase data to generate product suggestions.
Strengths: Unified search + recommendations means search behavior directly informs rec quality. Strong merchandising controls for commerce teams. Headless architecture integrates with any frontend. Good at connecting browsing intent to purchase recommendations.
Limitations: Primarily commerce-focused - not suited for non-retail recommendation use cases. Cold-start handling relies on content-based attributes and merchandising rules. Does not model multi-table relational patterns beyond search and purchase. Recommendation depth is bounded by the signals available within the Bloomreach ecosystem.
5. Algolia Recommend - search-integrated, API-first
Algolia Recommend extends the Algolia search platform with recommendation capabilities. If you already use Algolia for search, adding recommendations is a straightforward API extension. It supports frequently bought together, related products, and trending items based on interaction events.
Strengths: Fastest deployment if you already use Algolia search. Clean API-first design - developers can integrate in hours, not weeks. Search and recommendation signals reinforce each other. Low latency, globally distributed infrastructure.
Limitations: Recommendation models are relatively simple compared to dedicated ML engines. Limited to interaction events (clicks, conversions) - cannot ingest relational data like returns, reviews, or supplier relationships. Cold-start handling is basic (trending/popular fallback). Best for teams that want good-enough recommendations deployed fast, not maximum recommendation accuracy.
6. Google Recommendations AI - cloud-native retail recs
Google Recommendations AI is a managed service within Google Cloud that provides product recommendations for retail. It integrates with Google's product catalog format and uses deep learning models trained on interaction and catalog data. The service is designed specifically for retail use cases.
Strengths: Deep integration with Google Cloud retail APIs and product catalogs. Benefits from Google's ML infrastructure and research. Handles large catalogs well. Real-time serving with auto-scaling. Product catalog attributes contribute to recommendations beyond pure interactions.
Limitations: Retail-focused - not suited for non-retail recommendation use cases. GCP lock-in. Cannot ingest arbitrary relational data from a data warehouse. Cold-start is improved by catalog attributes but still limited compared to full relational graph approaches. Explainability is minimal.
7. Recombee - API-first, multi-domain recommendations
Recombee is an API-first recommendation engine that supports multiple domains (e-commerce, media, jobs, real estate) with a single platform. It uses a hybrid approach combining collaborative filtering and content-based methods, with real-time model updates as new interactions arrive.
Strengths: Versatile - works across e-commerce, media, jobs, and other domains. Real-time model updates without batch retraining. Clean REST API with good documentation. Hybrid collaborative + content-based approach provides partial cold-start handling. Flexible enough for non-standard recommendation use cases.
Limitations: Operates on interaction events and item properties - cannot ingest multi-table relational data. Hybrid content-based approach helps cold-start but does not match the coverage of full relational graph methods. Less enterprise-focused than some alternatives. Limited explainability.
The collaborative filtering ceiling: why multi-hop signals matter
The fundamental limitation of collaborative filtering is that it operates on a single edge type: user interacted with item. Every other signal in your data is invisible. Here is what that means in practice:
recommendation_signal_comparison
| Signal Type | Example | Visible to Collaborative Filtering | Visible to GNN on Relational Graph |
|---|---|---|---|
| Direct purchase | User A bought Product X | Yes | Yes |
| View without purchase | User A viewed Product Y 5 times but did not buy | Only if tracked as an interaction | Yes - interest without conversion signal |
| Return pattern | User A returned Product W (size mismatch) | No - returns are a separate table | Yes - negative signal with reason |
| Review network | Users who positively reviewed X also bought Z | No - reviews are a separate table | Yes - multi-hop traversal |
| Product graph | Product X shares supplier and category with Product Q | No - product metadata is not interaction data | Yes - enables cold-start recommendations |
| Cross-category discovery | Users who buy in category A then explore category B | Weak - limited to co-purchase | Yes - sequential browsing patterns |
Highlighted: return patterns, review networks, and product graph signals are invisible to collaborative filtering because they live in separate relational tables. These signals are precisely what separate obvious recommendations (bestsellers) from genuinely useful product discovery.
The implication is clear. If your product catalog is large, your data spans multiple tables, and you care about long-tail coverage and new product discovery, a collaborative filtering engine is structurally limited. It will keep recommending popular products to everyone while the long tail collects dust.
Cold-start: the billion-dollar blind spot
The cold-start problem is not just a technical inconvenience - it is a direct revenue problem. New products need recommendation coverage most in their first 30 days, when organic discovery is lowest. A collaborative filtering engine provides zero coverage during exactly this window.
Traditional workarounds are crude: show new products to everyone (popularity-based), match on content attributes (basic content-based filtering), or manually merchandise new products into recommendation slots. None of these approach the quality of personalized recommendations.
Kumo.ai's GNN solves cold-start structurally. A new product with zero purchases still exists in the relational graph: it has a category, a supplier, attributes (size, color, price range), and those attributes connect it to products with rich purchase histories. The GNN traverses these connections to generate personalized recommendations for the new product immediately. On the RelBench benchmark, this zero-shot capability is what drives much of the 4x MAP@K improvement.
How to choose the right engine
The right recommendation engine depends on your data complexity, your catalog size, and what you are optimizing for.
recommendation_engine_selection_guide
| If you... | Consider | Why |
|---|---|---|
| Are on AWS and want managed recs fast | Amazon Personalize | Deepest AWS integration, fully managed infrastructure |
| Want personalization + A/B testing for marketing | Dynamic Yield | Best testing framework, cross-channel personalization |
| Need unified search + recommendations for commerce | Bloomreach | Search and rec signals reinforce each other |
| Already use Algolia and want recs added quickly | Algolia Recommend | Fastest deployment, same API ecosystem |
| Are on GCP with a retail product catalog | Google Recommendations AI | Deep GCP and product catalog integration |
| Need multi-domain API-first flexibility | Recombee | Most versatile across domains, real-time updates |
| Have complex relational data and need maximum accuracy | Kumo.ai | Only engine that learns from full relational graph, solves cold-start, 4x MAP@K improvement |
Highlighted: if your data spans multiple tables (purchases, views, returns, reviews, product relationships) and you need coverage for new products and the long tail, the relational graph approach captures signals that interaction-based engines structurally cannot.
The recommendation ceiling is a data ceiling
The most important insight in enterprise recommendations is that the quality ceiling of most engines is not a model limitation - it is a data limitation. Better matrix factorization or deeper neural collaborative filtering on the same interaction matrix yields diminishing returns. You are optimizing within a constrained information space.
The jump from interaction-only data to full relational data unlocks an entirely new class of signals: return patterns that indicate product-market fit, review networks that connect buyers with similar taste, product graph structure that enables cold-start coverage, cross-category browsing sequences that reveal emerging interests. This is why KumoRFM achieves 7.29 MAP@K where GraphSAGE achieves 1.85 - it is not a better algorithm on the same data, it is the same class of algorithm on fundamentally richer data.
For enterprises with complex product catalogs and multi-table transactional data, the question is not "which recommendation algorithm should we use?" It is "which engine can read our full relational data without flattening it into an interaction matrix?"