A new category of foundation models is emerging for structured data. TabPFN (published in Nature by PriorLabs) and Nexus (by Fundamental, valued at $1.2B with $255M in funding) call themselves tabular foundation models. KumoRFM (by Kumo.ai) calls itself a relational foundation model.
The naming difference is not marketing. It describes a deep architectural divide: tabular models operate on a single flat table. Relational models operate on multiple connected tables. This distinction determines what these models can and cannot learn from enterprise data.
Enterprise databases are not flat. A typical CRM has leads, contacts, activities, opportunities, and accounts linked by foreign keys. A banking system has customers, accounts, transactions, merchants, and products across dozens of tables. When you flatten this structure into a single table for a tabular model, you lose the multi-hop relationships that drive prediction accuracy.
foundation_model_comparison
| capability | Relational FM (KumoRFM) | Tabular FM (TabPFN / Nexus) | AutoML (DataRobot / H2O) | Manual ML (XGBoost / LightGBM) |
|---|---|---|---|---|
| Input format | Multiple relational tables | Single flat table | Single flat table | Single flat table |
| Feature engineering required | None | Flattening required | Full manual pipeline | Full manual pipeline |
| Multi-hop pattern discovery | Native (graph transformer) | Not possible | Not possible | Manual joins only |
| Temporal sequence handling | Native (temporal graph) | Static snapshot only | Static snapshot only | Manual windowing |
| Max data scale | 103M+ rows across 51 tables | ~50K samples (TabPFN) | Varies by platform | Unlimited (manual) |
| Zero-shot prediction | Yes (~1 second) | Yes (~2.8 seconds for TabPFN) | No | No |
| Pre-training data | 10,000s of heterogeneous databases | Synthetic + single-table datasets | N/A | N/A |
| AUROC on RelBench (avg) | 76.71 (zero-shot), 81.14 (fine-tuned) | N/A (single-table only) | ~64-66 (estimated) | 62.44 (LightGBM) |
Highlighted rows show the key differentiators: input format, multi-hop pattern discovery, and temporal handling. These are architectural differences, not tuning differences.
What flattening actually destroys
The core argument for relational foundation models is that flattening relational data into a single table loses predictive signal. This is not theoretical. Here is a concrete example.
Consider lead scoring in a CRM database with four tables: leads, contacts, activities, and opportunities. Lead L-302 is a real candidate for conversion.
what_tabular_models_receive (flattened row for L-302)
| lead_id | emails_opened | pages_viewed | days_since_signup | company_size |
|---|---|---|---|---|
| L-302 | 4 | 22 | 30 | 200 |
After flattening to a single row, L-302 looks like a mediocre lead: few emails opened, moderate page views, small company. A tabular foundation model or XGBoost sees only this.
what_relational_models_read (raw multi-table signals for L-302)
| table | data_for_L-302 | signal_invisible_after_flattening |
|---|---|---|
| contacts | 4 contacts from 3 departments active | Multi-threaded buying committee (3 departments engaged) |
| activities | Blog → Case study → API docs → Demo (in sequence) | Buying-stage content progression (awareness → evaluation → technical → purchase) |
| opportunities | Similar account closed $210K last quarter | Account similarity to past closed-won deals |
| accounts | Company raised Series B 30 days ago | Firmographic momentum (fresh funding = budget available) |
The relational model reads all four tables directly. L-302 has a multi-threaded buying committee, a textbook content progression, account similarity to a $210K closed deal, and fresh Series B funding. None of these signals survive flattening into emails_opened=4, pages_viewed=22.
The feature space explosion
The information loss from flattening is not a matter of laziness. It is a matter of combinatorial scale. For a database with 5 tables and 50 columns:
- 1,200 first-order features - column-aggregation combinations (sum, mean, count, max, min across time windows for each numeric column, per join path)
- 719,400 pairwise interactions - combinations of first-order features (ratios, products, differences)
- ~8,000 multi-hop features - patterns that span 3+ table joins (customer → orders → products → return rates)
A Stanford study found that human data scientists explore only 4-17% of the first-order feature space. They never touch the pairwise or multi-hop spaces because the combinatorics are intractable manually. Feature engineering takes 12.3 hours and 878 lines of code per task.
Tabular foundation models skip this problem entirely - not by solving it, but by requiring someone else to solve it first. They receive a pre-flattened table and operate within whatever feature space a human decided to create. Relational foundation models search the full feature space automatically because they read the raw relational structure.
Tabular Foundation Models (TabPFN, Nexus)
- Operate on a single flat table
- Require pre-flattened input (someone must join and aggregate)
- Cannot discover multi-hop relationships across tables
- Cannot preserve temporal event sequences
- Limited to ~50K samples (TabPFN) or single-table scale
- Search only the feature space that humans pre-built
Relational Foundation Model (KumoRFM)
- Operate on multiple connected tables natively
- Read raw relational databases with foreign keys directly
- Discover multi-hop patterns (customer → orders → products → returns)
- Preserve temporal dynamics as first-class graph structure
- Tested on 103M+ rows across 51 tables (RelBench)
- Search the full combinatorial feature space automatically
Benchmark results: RelBench
RelBench is the standard benchmark for relational prediction tasks: 7 databases, 30 tasks, 103 million+ rows across 51 tables. It tests whether models can extract predictive signal from multi-table relational data.
AUROC (Area Under the Receiver Operating Characteristic curve) measures how well a model distinguishes between positive and negative outcomes. An AUROC of 50 means random guessing, 100 means perfect prediction. Moving from 65 to 77 AUROC means the model correctly ranks a true positive above a true negative 77% of the time instead of 65%.
relbench_classification_benchmarks (avg across 12 tasks)
| approach | AUROC | human_hours_per_task | notes |
|---|---|---|---|
| LightGBM + manual features | 62.44 | 12.3 | 878 lines of feature code per task |
| LLM baseline (Llama 3.2 3B) | 68.06 | ~0.5 | Prompt-based, no relational reasoning |
| KumoRFM zero-shot | 76.71 | ~0.001 | ~1 second, no feature engineering |
| KumoRFM fine-tuned | 81.14 | ~0.1 | 10-30% improvement over zero-shot |
Highlighted: KumoRFM zero-shot outperforms manually engineered features by 14+ AUROC points while requiring ~1 second instead of 12.3 hours. Fine-tuning adds another 4.4 points.
sap_salt_benchmark
| approach | accuracy | context |
|---|---|---|
| LLM + AutoML | 63% | Language model with automated model selection |
| PhD + XGBoost | 75% | Domain expert with manual feature engineering |
| KumoRFM | 91% | Zero-shot relational foundation model |
On the SAP SALT enterprise benchmark, KumoRFM outperforms both automated and expert-manual approaches by 16-28 percentage points.
The gap between LightGBM with manual features (62.44) and KumoRFM zero-shot (76.71) is not about model architecture. LightGBM is a strong algorithm. The gap exists because KumoRFM sees the full relational structure - the multi-hop patterns, the temporal sequences, the graph topology - while LightGBM sees only the features that a human decided to build from 4-17% of the feature space.
How tabular foundation models work
To understand the architectural divide, it helps to know what tabular foundation models actually do.
TabPFN (PriorLabs)
TabPFN, published in Nature, is a prior-data fitted network. It is pre-trained on millions of synthetic datasets that mimic the statistical properties of real-world tabular data. At inference time, it takes a single flat table (up to ~50K samples), treats the training data as context, and predicts the target column in approximately 2.8 seconds. It performs well on small, single-table classification tasks.
Nexus (Fundamental)
Nexus, developed by Fundamental ($255M in funding, $1.2B valuation), calls itself a “Large Tabular Model.” Like TabPFN, it operates on a single flat table. It is pre-trained on large collections of real-world tabular datasets and uses in-context learning to generate predictions without task-specific training.
The shared limitation
Both TabPFN and Nexus assume their input is a single table with one row per entity and one column per feature. They cannot read a relational database with foreign key relationships. They cannot discover that a customer’s churn risk depends on the return rates of products they bought from merchants in a specific category. That pattern spans four tables and three join hops - it does not exist in any single table.
How the relational foundation model works
KumoRFM takes a structurally different approach. It represents your database as a temporal heterogeneous graph: each row in each table becomes a node, each foreign key relationship becomes an edge, and timestamps are preserved as temporal attributes.
A graph transformer processes this structure by passing messages along edges (foreign key relationships), learning which cross-table patterns are predictive. Multi-hop patterns propagate naturally through the graph layer by layer. KumoRFM is pre-trained on tens of thousands of heterogeneous databases, so it has already learned the universal relational patterns: recency effects, frequency dynamics, temporal decay, graph topology signals.
PQL Query
PREDICT conversion FOR EACH leads.lead_id WHERE leads.status = 'open'
One Predictive Query Language (PQL) statement replaces the entire pipeline: data extraction, table joins, feature engineering, model selection, and training. The relational foundation model reads raw CRM tables - leads, contacts, activities, opportunities - and generates predictions in seconds.
Output
| lead_id | conversion_prob | vs_flat_table_model | signal_source |
|---|---|---|---|
| L-301 | 0.42 | 0.39 (similar) | Single-table signals sufficient |
| L-302 | 0.89 | 0.34 (flat model misses) | Multi-threaded buying committee + content progression |
| L-303 | 0.12 | 0.18 (flat model overestimates) | CTO title inflates flat score, but no activity signal |
| L-304 | 0.76 | 0.41 (flat model misses) | Account similarity to $210K closed deal |
KumoRFM 2.0: tabular and relational
An important development: KumoRFM 2.0 supports both tabular (single-table) and relational (multi-table) data. This means it is not limited to relational problems. For single-table classification tasks, it performs competitively with tabular foundation models. For multi-table problems, it dramatically outperforms them.
This matters because real enterprise ML involves both types of problems. Some tasks genuinely are single-table (a clean CSV export, a feature store snapshot). Most tasks involve relational data. With KumoRFM 2.0, you do not need separate tools for separate problem types. One model handles both, and it automatically takes advantage of relational structure when it exists.
cost_and_time_comparison
| dimension | Manual ML (LightGBM) | Tabular FM (TabPFN / Nexus) | Relational FM (KumoRFM) |
|---|---|---|---|
| Feature engineering time | 12.3 hours (878 lines of code) | 12.3 hours (still need to flatten) | 0 hours (reads tables directly) |
| Model training time | 1-4 hours | ~2.8 seconds (TabPFN) | ~1 second (zero-shot) |
| Multi-table signal captured | 4-17% of feature space | Only what humans pre-built | Full relational feature space |
| Cost at 20 tasks (annual) | $650K-$900K | $500K-$750K (saves model tuning, not features) | $80K-$120K |
| Time to new prediction task | 2-4 weeks | 1-3 weeks (still need feature pipeline) | Minutes |
| Data scientist headcount | 3-4 FTEs | 2-3 FTEs (still need feature engineers) | 0.5 FTE |
Highlighted: feature engineering time and annual cost. Tabular foundation models save time on model training but not on the 12.3-hour feature engineering bottleneck. The cost savings of relational FMs come from eliminating that bottleneck entirely.
When tabular foundation models make sense
Tabular foundation models are not useless. They solve a real problem for a specific subset of ML tasks:
- Genuinely single-table problems. If your data is already in one clean table with no relational context needed (a Kaggle dataset, a pre-built feature store export), tabular FMs provide fast, competitive predictions without model training.
- Small datasets. TabPFN excels on small datasets (under 50K samples) where traditional models struggle. Its prior-data fitting approach is particularly effective when data is scarce.
- Rapid prototyping on flat data. For quick directional answers on pre-aggregated data, tabular FMs give you a prediction in seconds.
The limitation is that most enterprise prediction tasks are not single-table problems. Enterprise data lives in relational databases with 5-50 connected tables. The moment your task requires signals from more than one table, a tabular foundation model requires someone to build the flattening pipeline - and you are back to the 12.3-hour, 878-line feature engineering bottleneck.
The architectural divide
The difference between tabular and relational foundation models is not incremental. It is architectural. Tabular models ask: “Given this flat table, what is the best prediction?” Relational models ask: “Given this database, what are the best predictions?”
The first question assumes someone has already flattened the relational data and selected the features. The second question starts from raw tables with foreign keys. The first question searches within a human-defined feature space. The second searches the full combinatorial space of 1,200+ first-order features, 719,400 pairwise interactions, and 8,000+ multi-hop patterns.
Tabular foundation models are a meaningful advance over manual model tuning for single-table problems. But they do not address the core bottleneck of enterprise ML: converting relational data into features. Relational foundation models eliminate that bottleneck by reading the relational structure directly.
KumoRFM was built by the team behind the ML systems at Pinterest, Airbnb, and LinkedIn: Vanja Josifovski (CEO, former CTO at Airbnb and Pinterest), Jure Leskovec (Chief Scientist, Stanford professor, co-creator of GraphSAGE), and Hema Raghavan (Head of Engineering, former Sr. Director at LinkedIn). Backed by Sequoia Capital.