Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
Learn14 min read

Relational Foundation Models vs Tabular Foundation Models: Why Flat Tables Lose Signal

Tabular foundation models read one table. Enterprise data lives in 5-50 tables. When you flatten relational data for a tabular model, you destroy multi-hop relationships and lose the signal that drives accuracy. Relational foundation models read the full structure natively.

TL;DR

  • 1On the SAP SALT enterprise benchmark, KumoRFM scores 91% accuracy vs 75% for PhD data scientists with XGBoost and 63% for LLM+AutoML - with zero feature engineering and zero training time.
  • 2Tabular foundation models (TabPFN, Nexus) operate on a single flat table. Relational foundation models (KumoRFM) operate on multiple connected tables natively. Enterprise data lives in 5-50 related tables, not one CSV.
  • 3Flattening relational data destroys three categories of signal: multi-hop relationships, temporal sequences, and graph topology. Humans explore only 4-17% of the first-order feature space. For a 5-table database with 50 columns, there are 1,200 first-order features, 719,400 pairwise interactions, and ~8,000 multi-hop features.
  • 4On RelBench (7 databases, 30 tasks, 103M+ rows), KumoRFM zero-shot scores 76.71 AUROC vs. LightGBM with manual features at 62.44. On the SAP SALT benchmark, KumoRFM scores 91% vs. LLM+AutoML at 63% and PhD+XGBoost at 75%.
  • 5KumoRFM zero-shot takes ~1 second. Manual feature engineering takes 12.3 hours and 878 lines of code per task. Fine-tuning improves accuracy by 10-30% over zero-shot.

A new category of foundation models is emerging for structured data. TabPFN (published in Nature by PriorLabs) and Nexus (by Fundamental, valued at $1.2B with $255M in funding) call themselves tabular foundation models. KumoRFM (by Kumo.ai) calls itself a relational foundation model.

The naming difference is not marketing. It describes a deep architectural divide: tabular models operate on a single flat table. Relational models operate on multiple connected tables. This distinction determines what these models can and cannot learn from enterprise data.

Enterprise databases are not flat. A typical CRM has leads, contacts, activities, opportunities, and accounts linked by foreign keys. A banking system has customers, accounts, transactions, merchants, and products across dozens of tables. When you flatten this structure into a single table for a tabular model, you lose the multi-hop relationships that drive prediction accuracy.

foundation_model_comparison

capabilityRelational FM (KumoRFM)Tabular FM (TabPFN / Nexus)AutoML (DataRobot / H2O)Manual ML (XGBoost / LightGBM)
Input formatMultiple relational tablesSingle flat tableSingle flat tableSingle flat table
Feature engineering requiredNoneFlattening requiredFull manual pipelineFull manual pipeline
Multi-hop pattern discoveryNative (graph transformer)Not possibleNot possibleManual joins only
Temporal sequence handlingNative (temporal graph)Static snapshot onlyStatic snapshot onlyManual windowing
Max data scale103M+ rows across 51 tables~50K samples (TabPFN)Varies by platformUnlimited (manual)
Zero-shot predictionYes (~1 second)Yes (~2.8 seconds for TabPFN)NoNo
Pre-training data10,000s of heterogeneous databasesSynthetic + single-table datasetsN/AN/A
AUROC on RelBench (avg)76.71 (zero-shot), 81.14 (fine-tuned)N/A (single-table only)~64-66 (estimated)62.44 (LightGBM)

Highlighted rows show the key differentiators: input format, multi-hop pattern discovery, and temporal handling. These are architectural differences, not tuning differences.

What flattening actually destroys

The core argument for relational foundation models is that flattening relational data into a single table loses predictive signal. This is not theoretical. Here is a concrete example.

Consider lead scoring in a CRM database with four tables: leads, contacts, activities, and opportunities. Lead L-302 is a real candidate for conversion.

what_tabular_models_receive (flattened row for L-302)

lead_idemails_openedpages_vieweddays_since_signupcompany_size
L-30242230200

After flattening to a single row, L-302 looks like a mediocre lead: few emails opened, moderate page views, small company. A tabular foundation model or XGBoost sees only this.

what_relational_models_read (raw multi-table signals for L-302)

tabledata_for_L-302signal_invisible_after_flattening
contacts4 contacts from 3 departments activeMulti-threaded buying committee (3 departments engaged)
activitiesBlog → Case study → API docs → Demo (in sequence)Buying-stage content progression (awareness → evaluation → technical → purchase)
opportunitiesSimilar account closed $210K last quarterAccount similarity to past closed-won deals
accountsCompany raised Series B 30 days agoFirmographic momentum (fresh funding = budget available)

The relational model reads all four tables directly. L-302 has a multi-threaded buying committee, a textbook content progression, account similarity to a $210K closed deal, and fresh Series B funding. None of these signals survive flattening into emails_opened=4, pages_viewed=22.

The feature space explosion

The information loss from flattening is not a matter of laziness. It is a matter of combinatorial scale. For a database with 5 tables and 50 columns:

  • 1,200 first-order features - column-aggregation combinations (sum, mean, count, max, min across time windows for each numeric column, per join path)
  • 719,400 pairwise interactions - combinations of first-order features (ratios, products, differences)
  • ~8,000 multi-hop features - patterns that span 3+ table joins (customer → orders → products → return rates)

A Stanford study found that human data scientists explore only 4-17% of the first-order feature space. They never touch the pairwise or multi-hop spaces because the combinatorics are intractable manually. Feature engineering takes 12.3 hours and 878 lines of code per task.

Tabular foundation models skip this problem entirely - not by solving it, but by requiring someone else to solve it first. They receive a pre-flattened table and operate within whatever feature space a human decided to create. Relational foundation models search the full feature space automatically because they read the raw relational structure.

Tabular Foundation Models (TabPFN, Nexus)

  • Operate on a single flat table
  • Require pre-flattened input (someone must join and aggregate)
  • Cannot discover multi-hop relationships across tables
  • Cannot preserve temporal event sequences
  • Limited to ~50K samples (TabPFN) or single-table scale
  • Search only the feature space that humans pre-built

Relational Foundation Model (KumoRFM)

  • Operate on multiple connected tables natively
  • Read raw relational databases with foreign keys directly
  • Discover multi-hop patterns (customer → orders → products → returns)
  • Preserve temporal dynamics as first-class graph structure
  • Tested on 103M+ rows across 51 tables (RelBench)
  • Search the full combinatorial feature space automatically

Benchmark results: RelBench

RelBench is the standard benchmark for relational prediction tasks: 7 databases, 30 tasks, 103 million+ rows across 51 tables. It tests whether models can extract predictive signal from multi-table relational data.

AUROC (Area Under the Receiver Operating Characteristic curve) measures how well a model distinguishes between positive and negative outcomes. An AUROC of 50 means random guessing, 100 means perfect prediction. Moving from 65 to 77 AUROC means the model correctly ranks a true positive above a true negative 77% of the time instead of 65%.

relbench_classification_benchmarks (avg across 12 tasks)

approachAUROChuman_hours_per_tasknotes
LightGBM + manual features62.4412.3878 lines of feature code per task
LLM baseline (Llama 3.2 3B)68.06~0.5Prompt-based, no relational reasoning
KumoRFM zero-shot76.71~0.001~1 second, no feature engineering
KumoRFM fine-tuned81.14~0.110-30% improvement over zero-shot

Highlighted: KumoRFM zero-shot outperforms manually engineered features by 14+ AUROC points while requiring ~1 second instead of 12.3 hours. Fine-tuning adds another 4.4 points.

sap_salt_benchmark

approachaccuracycontext
LLM + AutoML63%Language model with automated model selection
PhD + XGBoost75%Domain expert with manual feature engineering
KumoRFM91%Zero-shot relational foundation model

On the SAP SALT enterprise benchmark, KumoRFM outperforms both automated and expert-manual approaches by 16-28 percentage points.

The gap between LightGBM with manual features (62.44) and KumoRFM zero-shot (76.71) is not about model architecture. LightGBM is a strong algorithm. The gap exists because KumoRFM sees the full relational structure - the multi-hop patterns, the temporal sequences, the graph topology - while LightGBM sees only the features that a human decided to build from 4-17% of the feature space.

How tabular foundation models work

To understand the architectural divide, it helps to know what tabular foundation models actually do.

TabPFN (PriorLabs)

TabPFN, published in Nature, is a prior-data fitted network. It is pre-trained on millions of synthetic datasets that mimic the statistical properties of real-world tabular data. At inference time, it takes a single flat table (up to ~50K samples), treats the training data as context, and predicts the target column in approximately 2.8 seconds. It performs well on small, single-table classification tasks.

Nexus (Fundamental)

Nexus, developed by Fundamental ($255M in funding, $1.2B valuation), calls itself a “Large Tabular Model.” Like TabPFN, it operates on a single flat table. It is pre-trained on large collections of real-world tabular datasets and uses in-context learning to generate predictions without task-specific training.

The shared limitation

Both TabPFN and Nexus assume their input is a single table with one row per entity and one column per feature. They cannot read a relational database with foreign key relationships. They cannot discover that a customer’s churn risk depends on the return rates of products they bought from merchants in a specific category. That pattern spans four tables and three join hops - it does not exist in any single table.

How the relational foundation model works

KumoRFM takes a structurally different approach. It represents your database as a temporal heterogeneous graph: each row in each table becomes a node, each foreign key relationship becomes an edge, and timestamps are preserved as temporal attributes.

A graph transformer processes this structure by passing messages along edges (foreign key relationships), learning which cross-table patterns are predictive. Multi-hop patterns propagate naturally through the graph layer by layer. KumoRFM is pre-trained on tens of thousands of heterogeneous databases, so it has already learned the universal relational patterns: recency effects, frequency dynamics, temporal decay, graph topology signals.

PQL Query

PREDICT conversion
FOR EACH leads.lead_id
WHERE leads.status = 'open'

One Predictive Query Language (PQL) statement replaces the entire pipeline: data extraction, table joins, feature engineering, model selection, and training. The relational foundation model reads raw CRM tables - leads, contacts, activities, opportunities - and generates predictions in seconds.

Output

lead_idconversion_probvs_flat_table_modelsignal_source
L-3010.420.39 (similar)Single-table signals sufficient
L-3020.890.34 (flat model misses)Multi-threaded buying committee + content progression
L-3030.120.18 (flat model overestimates)CTO title inflates flat score, but no activity signal
L-3040.760.41 (flat model misses)Account similarity to $210K closed deal

KumoRFM 2.0: tabular and relational

An important development: KumoRFM 2.0 supports both tabular (single-table) and relational (multi-table) data. This means it is not limited to relational problems. For single-table classification tasks, it performs competitively with tabular foundation models. For multi-table problems, it dramatically outperforms them.

This matters because real enterprise ML involves both types of problems. Some tasks genuinely are single-table (a clean CSV export, a feature store snapshot). Most tasks involve relational data. With KumoRFM 2.0, you do not need separate tools for separate problem types. One model handles both, and it automatically takes advantage of relational structure when it exists.

cost_and_time_comparison

dimensionManual ML (LightGBM)Tabular FM (TabPFN / Nexus)Relational FM (KumoRFM)
Feature engineering time12.3 hours (878 lines of code)12.3 hours (still need to flatten)0 hours (reads tables directly)
Model training time1-4 hours~2.8 seconds (TabPFN)~1 second (zero-shot)
Multi-table signal captured4-17% of feature spaceOnly what humans pre-builtFull relational feature space
Cost at 20 tasks (annual)$650K-$900K$500K-$750K (saves model tuning, not features)$80K-$120K
Time to new prediction task2-4 weeks1-3 weeks (still need feature pipeline)Minutes
Data scientist headcount3-4 FTEs2-3 FTEs (still need feature engineers)0.5 FTE

Highlighted: feature engineering time and annual cost. Tabular foundation models save time on model training but not on the 12.3-hour feature engineering bottleneck. The cost savings of relational FMs come from eliminating that bottleneck entirely.

When tabular foundation models make sense

Tabular foundation models are not useless. They solve a real problem for a specific subset of ML tasks:

  • Genuinely single-table problems. If your data is already in one clean table with no relational context needed (a Kaggle dataset, a pre-built feature store export), tabular FMs provide fast, competitive predictions without model training.
  • Small datasets. TabPFN excels on small datasets (under 50K samples) where traditional models struggle. Its prior-data fitting approach is particularly effective when data is scarce.
  • Rapid prototyping on flat data. For quick directional answers on pre-aggregated data, tabular FMs give you a prediction in seconds.

The limitation is that most enterprise prediction tasks are not single-table problems. Enterprise data lives in relational databases with 5-50 connected tables. The moment your task requires signals from more than one table, a tabular foundation model requires someone to build the flattening pipeline - and you are back to the 12.3-hour, 878-line feature engineering bottleneck.

The architectural divide

The difference between tabular and relational foundation models is not incremental. It is architectural. Tabular models ask: “Given this flat table, what is the best prediction?” Relational models ask: “Given this database, what are the best predictions?”

The first question assumes someone has already flattened the relational data and selected the features. The second question starts from raw tables with foreign keys. The first question searches within a human-defined feature space. The second searches the full combinatorial space of 1,200+ first-order features, 719,400 pairwise interactions, and 8,000+ multi-hop patterns.

Tabular foundation models are a meaningful advance over manual model tuning for single-table problems. But they do not address the core bottleneck of enterprise ML: converting relational data into features. Relational foundation models eliminate that bottleneck by reading the relational structure directly.

KumoRFM was built by the team behind the ML systems at Pinterest, Airbnb, and LinkedIn: Vanja Josifovski (CEO, former CTO at Airbnb and Pinterest), Jure Leskovec (Chief Scientist, Stanford professor, co-creator of GraphSAGE), and Hema Raghavan (Head of Engineering, former Sr. Director at LinkedIn). Backed by Sequoia Capital.

Frequently asked questions

What is the difference between a relational foundation model and a tabular foundation model?

A tabular foundation model (like TabPFN or Nexus) operates on a single flat table - one row per entity, one column per feature. A relational foundation model (like KumoRFM) operates on multiple connected tables natively, reading foreign key relationships, temporal sequences, and multi-hop patterns directly from the relational database without flattening.

Why can't I just flatten my relational data into a single table for a tabular model?

Flattening destroys three categories of signal: multi-hop relationships (customer → orders → products → returns), temporal sequences (the order in which events happened), and graph topology (how entities connect to each other). A Stanford study found that manual feature engineering explores only 4-17% of the first-order feature space. For a 5-table database with 50 columns, there are 1,200 first-order features, 719,400 pairwise interactions, and ~8,000 multi-hop features. Flattening collapses this into a handful of aggregates.

How does KumoRFM compare to TabPFN on benchmarks?

On RelBench (7 databases, 30 tasks, 103M+ rows across 51 tables), KumoRFM zero-shot achieves 76.71 AUROC averaged across 12 classification tasks. TabPFN is designed for single flat tables up to 50K samples and cannot natively process multi-table relational data. On the SAP SALT benchmark, KumoRFM scored 91% compared to LLM+AutoML at 63% and PhD+XGBoost at 75%.

Does KumoRFM work on single-table data too?

Yes. KumoRFM 2.0 supports both tabular (single-table) and relational (multi-table) data. For single-table problems, it performs competitively with tabular foundation models. For multi-table problems, it significantly outperforms them because it reads relational structure that tabular models cannot access.

What is the feature space explosion problem?

For a database with 5 tables and 50 columns, there are 1,200 possible first-order features (column-aggregation combinations), 719,400 pairwise interactions, and approximately 8,000 multi-hop features spanning 3+ table joins. Human data scientists explore only 4-17% of this space. Tabular foundation models never see it at all because they receive a pre-flattened table. Relational foundation models search this space automatically.

How long does it take to get predictions from a relational foundation model?

KumoRFM zero-shot predictions take approximately 1 second. By comparison, manual feature engineering for the same task takes an average of 12.3 hours and 878 lines of code (Stanford study). Fine-tuning KumoRFM for higher accuracy takes minutes to hours depending on dataset size, but still eliminates the feature engineering step entirely.

See it in action

KumoRFM delivers predictions on relational data in seconds. No feature engineering, no ML pipelines. Try it free.