Reddit is one of the most complex recommendation environments on the internet. Billions of posts across millions of subreddits, hundreds of millions of users with constantly shifting interests, and a community-driven structure where context matters as much as content. A post that thrives in r/MachineLearning might be irrelevant in r/datascience, despite overlapping audiences.
For years, content platforms like Reddit improved their recommendations incrementally: adding new engagement signals, tuning collaborative filtering models, engineering features one at a time. Each iteration cycle took months and yielded small accuracy gains. Then a graph-based approach compressed 4-5 years of that iterative improvement into 2 months.
This article explains why. Not by speculating about Reddit's internal systems, but by analyzing what makes content recommendation fundamentally a graph problem - and why relational deep learning discovers patterns that flat-table approaches structurally cannot.
The headline result: SAP SALT benchmark
The SAP SALT benchmark is an enterprise-grade evaluation where real business analysts and data scientists attempt prediction tasks on SAP enterprise data. It measures how accurately different approaches predict real business outcomes on production-quality enterprise databases with multiple related tables.
sap_salt_enterprise_benchmark
| approach | accuracy | what_it_means |
|---|---|---|
| LLM + AutoML | 63% | Language model generates features, AutoML selects model |
| PhD Data Scientist + XGBoost | 75% | Expert spends weeks hand-crafting features, tunes XGBoost |
| KumoRFM (zero-shot) | 91% | No feature engineering, no training, reads relational tables directly |
SAP SALT benchmark: KumoRFM outperforms expert data scientists by 16 percentage points and LLM+AutoML by 28 percentage points on real enterprise prediction tasks.
KumoRFM scores 91% where PhD-level data scientists with weeks of feature engineering and hand-tuned XGBoost score 75%. The 16 percentage point gap is the value of reading relational data natively instead of flattening it into a single table.
The content recommendation challenge
To recommend content effectively on a platform like Reddit, a system needs to understand multiple interacting signals simultaneously:
- User interests - inferred from upvotes, comments, subscriptions, time spent reading, and what users choose to skip.
- Subreddit relationships - similar communities share overlapping user bases (r/MachineLearning and r/datascience), topical connections (r/cooking and r/MealPrepSunday), or cultural similarities.
- Post quality signals - karma, comment depth, upvote-to-view ratio, whether comments are substantive or shallow.
- Temporal freshness - a breaking news post matters now; a tutorial is relevant for months. Different content types have different decay curves.
- Cross-community interests - a user active in r/Python and r/datascience might enjoy a post in r/MLOps that they have never visited.
Each of these signals lives in a different table. Users in one. Posts in another. Subreddits in a third. Comments, votes, and subscriptions each in their own tables. The relationships between these entities - who posted where, who commented on what, which subreddits share members - are where the predictive signal lives.
Why collaborative filtering plateaus for content platforms
Traditional recommendation systems treat the problem as a user-item interaction matrix. User A upvoted posts 1, 3, and 7. User B upvoted posts 1, 3, and 9. Therefore User A might like post 9. This is collaborative filtering, and it works reasonably well for simple product recommendations.
But for content platforms with rich relational structure, collaborative filtering hits fundamental limits:
signals_collaborative_filtering_misses
| signal | what_CF_sees | what_is_lost |
|---|---|---|
| Community structure | User upvoted Post X | Post X is in a subreddit cluster (r/ML, r/datascience, r/AI) with 60% user overlap |
| Comment depth as engagement | User interacted with Post Y | Post Y generated 200+ deep comment threads - a quality signal visible only in the comment graph |
| User interest evolution | User's recent upvotes | User shifted from r/learnpython to r/MachineLearning to r/MLOps over 6 months - a trajectory, not a snapshot |
| Cross-modality patterns | User reads text posts | User who reads ML text posts in r/MachineLearning may engage with ML video tutorials in r/learnmachinelearning |
| Cold-start subreddits | No data (new community) | New subreddit r/LLMOps is topically similar to r/MLOps and r/LangChain - inferrable from description, creator history, and early subscribers |
Collaborative filtering sees the user-item interaction matrix. Everything in the third column - community structure, engagement quality, interest trajectories, cross-modality patterns, and cold-start inference - requires reading the full relational graph.
Each of these blind spots can be addressed individually by engineering features: compute subreddit similarity scores, build comment depth metrics, create user interest trajectory features. But each feature takes weeks to design, implement, validate, and deploy. After 4-5 years of this iterative process, a mature recommendation system has hundreds of hand-crafted features - and still misses interaction patterns that were never explicitly engineered.
The graph approach to content recommendations
Users, posts, subreddits, and comments naturally form a massive heterogeneous graph. Each entity type is a different kind of node. Each relationship - upvotes, subscriptions, authorship, comment replies - is a different kind of edge.
reddit_as_heterogeneous_graph
| node_type | examples | key_attributes |
|---|---|---|
| User | Hundreds of millions | Account age, karma, activity pattern, subscriptions |
| Post | Billions | Title, content type (text/image/video/link), karma, timestamp |
| Subreddit | Millions | Topic, subscriber count, activity level, rules, related communities |
| Comment | Tens of billions | Text, depth in thread, karma, timestamp, parent comment |
Each entity type becomes a node in the graph. The relationships between them - upvotes, posts, subscriptions, replies - become edges. This is the natural structure of the data.
A graph neural network processes this structure by passing messages along edges. Information about a subreddit's user base flows to the posts within it. Information about comment quality flows up to the post. Information about a user's subscription patterns flows to their activity predictions. Each message-passing layer lets information travel one hop further through the graph.
After multiple layers, the GNN has learned representations that encode multi-hop patterns:
- Community overlap patterns. Users who subscribe to r/MachineLearning and r/statistics have different content preferences than users who subscribe to r/MachineLearning and r/startups - even though both groups are in the same subreddit. The GNN captures this through the subscription edges.
- Content quality propagation. A post's quality is not just its karma score. It is the depth and substance of its comment threads, the reputation of its commenters, and the engagement patterns of similar posts in related subreddits. These signals propagate through upvote and comment edges.
- User interest evolution. By preserving temporal information on edges, the GNN learns trajectories: a user moving from beginner to advanced topics, shifting from one domain to another, increasing or decreasing engagement over time.
- Cold-start inference. A new subreddit with 50 subscribers is connected to the graph through those subscribers' other activity. If its early members are all active in data engineering communities, the GNN infers what content belongs there - without waiting for thousands of interactions.
4-5 years of improvement in 2 months
The key result: what took 4-5 years of iterative collaborative filtering improvement was achieved in 2 months with relational deep learning. This is not about a better algorithm marginally outperforming the old one. It is about a fundamentally different representation of the data.
Years of manual feature engineering - computing subreddit similarity matrices, building user interest decay functions, engineering comment quality scores, creating cross-community affinity features - were replaced by a model that reads the raw relational structure and discovers these patterns automatically.
The GNN did not just replicate the hand-crafted features. It discovered patterns that years of feature engineering had missed: interaction effects between community structure and content type, temporal patterns in cross-community migration, and engagement signals that only become visible when you model the full graph.
Collaborative Filtering (4-5 years iterative)
- Flattens data to user-item interaction matrix
- Each new signal requires manual feature engineering
- Misses community structure and cross-entity patterns
- Cold-start requires separate heuristic systems
- Improvement cycle: months per incremental gain
GNN-Based Recommendations (2 months)
- Reads the full heterogeneous graph directly
- Discovers signals automatically from relational structure
- Captures community overlap, comment quality, interest evolution
- Cold-start handled through graph connectivity
- Discovers patterns that years of feature engineering missed
RelBench recommendation benchmarks
The RelBench benchmark provides an independent measure of how different approaches perform on recommendation tasks across real-world relational databases. The results quantify the gap between flat-table approaches and graph-based methods:
relbench_recommendation_results
| approach | MAP@K | approach_type | what_it_reads |
|---|---|---|---|
| LightGBM | 1.79 | Tabular ML + manual features | Flat feature table |
| GraphSAGE | 1.85 | Basic GNN | Graph structure (limited message passing) |
| KumoRFM | 7.29 | Foundation model for relational data | Full heterogeneous temporal graph |
Highlighted: KumoRFM achieves 4x the MAP@K of both tabular and basic GNN approaches on recommendation tasks. The gap comes from reading the full relational structure with a pre-trained foundation model, not just applying a GNN architecture.
The 4x improvement is not from a better algorithm on the same data. It is from a better representation of the data. LightGBM sees a flat feature table. GraphSAGE sees graph structure but with limited expressiveness. KumoRFM reads the full heterogeneous temporal graph with a model pre-trained on thousands of diverse relational databases.
What each approach captures
collaborative_filtering_vs_gnn_signals
| signal_type | collaborative_filtering | GNN_based_recommendations |
|---|---|---|
| Direct user-item interactions | Yes (upvotes, clicks) | Yes (plus context from the full graph) |
| Community structure | No (requires manual clustering) | Yes (learned from subscription and activity edges) |
| Content quality (beyond karma) | No (requires engineered features) | Yes (propagated from comment depth and engagement patterns) |
| User interest evolution | Limited (recent window only) | Yes (temporal edges preserve full trajectory) |
| Cross-community discovery | No (limited to co-occurrence) | Yes (multi-hop paths through shared users and topics) |
| Cold-start entities | No (no interaction history) | Yes (inferred from graph connectivity) |
| Cross-modality preferences | No (separate models per content type) | Yes (content type is a node attribute, not a silo) |
| Multi-hop patterns | No (pairwise only) | Yes (user → subreddit → similar subreddit → trending post) |
Collaborative filtering captures direct user-item interactions. GNN-based recommendations capture everything else: the relational structure that determines why a user will engage with content they have never seen.
Building recommendations with PQL
With a relational foundation model, building a content recommendation system does not require months of feature engineering and model iteration. It requires describing what you want to predict.
PQL Query
PREDICT engagement FOR EACH users.user_id, posts.post_id WHERE posts.created_at > CURRENT_DATE - INTERVAL '7 days'
One query replaces the entire recommendation pipeline: user profiling, content scoring, community analysis, and ranking. The foundation model reads raw relational tables - users, posts, subreddits, comments, votes - and discovers which content each user will engage with.
Output
| user_id | post_id | engagement_score | primary_signal |
|---|---|---|---|
| U-44201 | P-891034 | 0.92 | Community overlap + interest trajectory |
| U-44201 | P-891107 | 0.87 | Cross-community topic match |
| U-44201 | P-892441 | 0.71 | High comment quality in related subreddit |
| U-44201 | P-890022 | 0.13 | Low community relevance |
Why this matters beyond Reddit
Reddit's experience is a case study in a general pattern. Any platform with rich relational structure - users, items, categories, interactions, temporal dynamics - faces the same fundamental choice: flatten the data into feature tables and iterate for years, or model the relational graph directly and discover patterns in weeks.
E-commerce platforms have customers, products, categories, reviews, and browsing sessions. Streaming services have viewers, content, genres, ratings, and watch patterns. Social networks have users, posts, connections, groups, and engagement events. In every case, the predictive signal lives in the relationships between entities, not in any single flat table.
The 4-5 years vs 2 months result is not specific to Reddit. It is specific to the gap between manually engineering relational patterns into flat features and automatically learning them from the graph structure. That gap exists anywhere relational data powers recommendations.