TabPFN is one of the most impressive recent developments in tabular machine learning. Built by PriorLabs (EUR 9M pre-seed led by Balderton Capital), published in Nature, and open-sourced on Hugging Face, it represents a genuine breakthrough: a foundation model that can make accurate predictions on a new dataset in seconds, without any training, by using in-context learning. On single-table benchmarks, it matches or beats carefully tuned XGBoost models in approximately 2.8 seconds.
KumoRFM is also a foundation model for tabular data. But it solves a different structural problem. Where TabPFN reads one flat table, KumoRFM reads multiple relational tables connected by foreign keys and discovers predictive patterns across the full relational graph. This is not a marginal difference in architecture - it is a fundamental difference in what data the model can see.
The question is not which model is better. The question is: does your data fit in one table? If it does, both models are strong options. If it does not - and enterprise data almost never does - then forcing it into a single table for TabPFN means paying a steep accuracy tax.
The headline result: SAP SALT benchmark
Before diving into detailed comparisons, here is the result that matters most. The SAP SALT benchmark is an enterprise-grade evaluation where real business analysts and data scientists attempt prediction tasks on SAP enterprise data. It measures how accurately different approaches predict real business outcomes (customer behavior, demand patterns, operational metrics) on production-quality enterprise databases with multiple related tables.
sap_salt_enterprise_benchmark
| approach | accuracy | what_it_means |
|---|---|---|
| LLM + AutoML | 63% | Language model generates features, AutoML selects model |
| PhD Data Scientist + XGBoost | 75% | Expert spends weeks hand-crafting features, tunes XGBoost |
| KumoRFM (zero-shot) | 91% | No feature engineering, no training, reads relational tables directly |
SAP SALT benchmark: KumoRFM outperforms expert data scientists by 16 percentage points and LLM+AutoML by 28 percentage points. Zero feature engineering. Zero training. The model reads raw enterprise tables and predicts.
This is not a marginal improvement. KumoRFM scores 91% where PhD-level data scientists with weeks of feature engineering and hand-tuned XGBoost score 75%. The 16 percentage point gap is the value of reading relational data natively instead of flattening it into a single table.
kumo_vs_tabpfn_comparison
| dimension | TabPFN (PriorLabs) | Kumo (KumoRFM) |
|---|---|---|
| Data input | Single flat table | Multiple relational tables connected by foreign keys |
| Architecture | Transformer with in-context learning | Graph transformer over relational structure |
| Training data | Synthetic single-table datasets | 10,000s of diverse relational datasets |
| Multi-table support | None - requires pre-flattened single table | Native - reads 5-50 connected tables directly |
| Multi-hop pattern discovery | Not possible - single table only | Native - captures 2-hop, 3-hop, 4+ hop signals across tables |
| Training required | None (in-context learning) | None for zero-shot; optional fine-tuning for maximum accuracy |
| Inference speed | ~2.8 seconds | ~1 second (zero-shot) |
| Scale (rows) | ~50K (open-source), 10M (enterprise) | Hundreds of millions of rows across dozens of tables |
| Open-source | Yes (Hugging Face) | No (enterprise SaaS) |
| Data warehouse integration | None - export data to use | Native Snowflake/Databricks - no data movement |
| Single-table performance | Strong - matches tuned XGBoost | Strong - competitive on single-table tasks (KumoRFM 2.0) |
| Relational data performance | Limited by flattening - loses multi-hop signals | State-of-the-art - 76.71 AUROC zero-shot on RelBench |
Both are foundation models for tabular data. The structural difference is what they can read: one table vs. many tables. For enterprise data that spans 5-50 tables, this distinction determines what signals the model can discover.
What TabPFN does well
TabPFN is a genuine advance in tabular ML, and a fair comparison requires acknowledging its real strengths.
- No training required. TabPFN uses in-context learning: you pass your data as context, and the model makes predictions immediately. No hyperparameter tuning, no cross-validation, no training loop. This is a real simplification of the ML workflow for single-table problems.
- Fast inference. Approximately 2.8 seconds to produce predictions on a new dataset. This makes it practical for rapid prototyping, exploratory analysis, and situations where you need a quick baseline before investing in a full pipeline.
- Published in Nature. TabPFN's approach is rigorously validated. The Nature publication demonstrates that a pre-trained transformer can match or beat tuned tree-based models on a wide range of single-table benchmarks. This is a credible, peer-reviewed result.
- Open-source. Available on Hugging Face, TabPFN can be used freely for experimentation and production on single-table tasks. The open-source model supports datasets up to approximately 50,000 samples (version 2.5), with PriorLabs' enterprise offering scaling to 10 million rows.
- No feature engineering on single tables. For problems where all predictive signals exist in a single table, TabPFN eliminates the need for manual feature engineering. You provide the raw table, and the model handles the rest.
The flattening ceiling: what you lose when you force relational data into a single table
Enterprise data does not live in a single table. A typical prediction task - churn prediction, fraud detection, lead scoring, demand forecasting - requires data from 5 to 50 connected tables. To use TabPFN on this data, you must flatten it: join the tables, compute aggregations, and collapse everything into one row per entity. This is not just tedious. It permanently destroys information that no model can recover.
Think about what this actually means. Flattening a relational database into one table is like reducing a large company's org chart into a single flat list of employee names. You keep the names. But you lose who reports to whom, which departments exist, who has dotted-line relationships, and how deep the organization goes. A small startup with 10 people? The flat list is fine. A Fortune 500 with 50,000 employees across 200 departments? The flat list is useless for any question that depends on organizational structure. Enterprise databases are the same: with billions of rows across dozens of connected tables, flattening into a single table throws away exactly the relationships that predict business outcomes.
Consider a concrete example. You want to predict customer churn. The predictive signal you need follows a 4-hop path through your relational database:
- Customer → Orders. Which products has this customer bought, when, and how frequently?
- Orders → Products. What categories and price points characterize their purchase history?
- Products → Reviews. How are other customers rating the same products? Are satisfaction scores declining for the products this customer relies on?
- Reviews → Other customers who bought the same products → Their churn patterns. Did customers with similar product portfolios and review sentiment churn recently? At what rate?
This 4-hop signal is one of the strongest churn predictors in relational data. It captures a structural pattern: when customers with similar purchasing behavior start churning, it is a leading indicator for the remaining customers in that cohort. KumoRFM's graph transformer discovers this pattern automatically by traversing the relational graph. TabPFN never sees it because the signal does not exist in any single flat table - it exists in the connections between tables.
what_flattening_destroys (churn prediction example)
| signal_type | available in flat table (TabPFN) | available in relational graph (Kumo) |
|---|---|---|
| Customer purchase count | Yes - orders_count = 23 | Yes - plus full temporal sequence and recency patterns |
| Average order value | Yes - avg_order_value = $142 | Yes - plus trend (declining from $180 to $95 over 6 months) |
| Product category distribution | Partially - top_category = 'electronics' | Yes - full distribution across 8 categories with temporal shifts |
| Product review sentiment (for purchased items) | No - requires Product-to-Review join | Yes - reviews for purchased products averaging 2.1 stars (declining) |
| Similar-customer churn signal | No - requires 4-hop traversal | Yes - 67% of customers with similar product portfolio churned in last 90 days |
| Cross-table temporal patterns | No - flattening collapses time | Yes - support ticket spike followed by order frequency drop detected |
| Graph-structural position | No - flat table has no graph structure | Yes - customer is in a weakly-connected component with high churn density |
The first two rows are the only signals TabPFN can access. The remaining five rows represent the multi-hop, temporal, and structural patterns that only exist in the relational graph. On relational datasets, these hidden signals account for the 15-20+ AUROC point gap.
Both are foundation models - but pre-trained on different structures
TabPFN and KumoRFM are both pre-trained foundation models that generalize to new datasets without task-specific training. The critical difference is what they were pre-trained on.
- TabPFN was pre-trained on synthetic single-table datasets. It learned the statistical patterns common to flat tabular data: feature correlations, nonlinear decision boundaries, class distributions, missing value patterns. This makes it excellent at single-table prediction - it has seen millions of synthetic tables and learned general patterns that transfer to real single-table data.
- KumoRFM was pre-trained on tens of thousands of diverse relational datasets. It learned patterns that exist specifically in relational structures: how entities relate across tables, how multi-hop connections carry predictive signal, how temporal patterns propagate across table boundaries, and how graph-structural properties predict entity behavior. These patterns do not exist in single-table data.
This pre-training difference has a direct consequence. TabPFN generalizes well to new single tables. KumoRFM generalizes well to new relational databases. For enterprise data, which is inherently relational, KumoRFM's pre-training is more aligned with the actual structure of the data.
TabPFN workflow (relational data)
- Export data from your database into a flat format
- Write SQL joins to combine 5-50 tables (2-8 hours)
- Compute aggregations, handle temporal features manually
- Lose multi-hop relationships, temporal sequences, and graph structure
- Feed the flattened table to TabPFN (~2.8 seconds inference)
- Get predictions limited by the signals that survived flattening
Kumo workflow
- Connect Kumo to your data warehouse (one-time setup)
- Write a PQL query defining what you want to predict
- KumoRFM reads all relational tables, discovers multi-hop patterns
- Zero flattening, zero feature engineering, zero information loss
- Time to first prediction: ~1 second (zero-shot)
- Get predictions powered by the full relational structure
Benchmark results: single-table vs relational
On single-table benchmarks, TabPFN performs well - matching or beating tuned XGBoost. This is its designed operating range. The divergence appears when the data is relational.
AUROC (Area Under the Receiver Operating Characteristic curve) measures how well a model distinguishes between positive and negative outcomes. An AUROC of 50 means random guessing. An AUROC of 100 means perfect prediction. In practice, moving from 65 to 77 AUROC is a significant improvement - it means the model correctly ranks a true positive above a true negative 77% of the time instead of 65%. For fraud detection, that difference can mean catching 40% more fraud with the same false positive rate. For churn prediction, it means identifying at-risk customers weeks earlier.
performance_by_data_structure
| scenario | TabPFN | KumoRFM | gap |
|---|---|---|---|
| Single-table classification (standard benchmarks) | Strong - matches tuned XGBoost | Strong - competitive (KumoRFM 2.0) | Comparable |
| Relational data, 2-3 tables | Moderate - loses some cross-table signal | 76+ AUROC zero-shot | 5-10 AUROC points |
| Relational data, 5+ tables | Weak - severe flattening tax | 76.71 AUROC zero-shot (RelBench avg) | 15-20+ AUROC points |
| Relational data, 5+ tables (fine-tuned) | Cannot improve - limited by flat input | 81.14 AUROC (RelBench avg) | 20-25+ AUROC points |
Highlighted: the accuracy gap scales with relational complexity. On single tables, both models are strong. As the number of tables increases, the flattening tax compounds - each additional table represents more multi-hop signals that TabPFN cannot access.
The widening gap is not about model sophistication. TabPFN's transformer architecture is powerful. But it operates on a flat table - and a flat table derived from 5+ joined tables has lost the very patterns that differentiate accurate predictions from mediocre ones. No model, no matter how advanced, can recover signals that were destroyed in the flattening step.
PQL Query
PREDICT churn_90d FOR EACH customers.customer_id WHERE customers.segment = 'enterprise'
One PQL query replaces the entire flattening pipeline. KumoRFM reads the raw customers, orders, products, reviews, and support_tickets tables directly. The 4-hop signal (Customer to Orders to Products to Reviews to Similar customers' churn) is discovered automatically - no joins, no aggregations, no information loss.
Output
| customer_id | churn_prob_kumo | churn_prob_flat_table | delta |
|---|---|---|---|
| C-7201 | 0.91 | 0.64 | +27 points (Kumo detects similar-customer churn wave) |
| C-7202 | 0.15 | 0.42 | Kumo correctly lower (strong cross-table engagement signals) |
| C-7203 | 0.88 | 0.55 | +33 points (Kumo sees product review decline + support escalation) |
| C-7204 | 0.07 | 0.09 | Both correctly low (healthy account, no relational risk signals) |
Scale: open-source research vs enterprise production
TabPFN's open-source version (v2.5) scales to approximately 50,000 samples - suitable for research, prototyping, and smaller datasets. PriorLabs' enterprise offering extends this to 10 million rows, which covers many production use cases on single tables.
KumoRFM is designed for enterprise relational data at scale: hundreds of millions of rows across dozens of connected tables, with billions of relationship edges in the relational graph. It runs natively inside Snowflake and Databricks - no data export, no data movement, no external processing. For organizations with large relational databases, this is a material infrastructure difference.
scale_comparison
| dimension | TabPFN open-source | TabPFN enterprise | Kumo (KumoRFM) |
|---|---|---|---|
| Max rows | ~50,000 | ~10 million | Hundreds of millions |
| Max tables | 1 | 1 | 50+ |
| Data warehouse integration | None | Limited | Native Snowflake/Databricks |
| Data movement required | Yes - export to Python | Yes - connect via API | No - runs inside your warehouse |
| Enterprise security | Self-hosted | PriorLabs managed | Data never leaves your warehouse |
For single-table research and prototyping, TabPFN's open-source model is a strong choice. For enterprise relational data at production scale, Kumo's warehouse-native architecture avoids the data movement and scale limitations.
When to choose TabPFN
TabPFN is an excellent tool in specific scenarios. Choose TabPFN when:
- Your data fits in a single table. If all predictive signals exist in one table with no multi-table joins required, TabPFN delivers strong accuracy with zero training time. This is its core strength and designed operating range.
- You need a fast baseline. TabPFN's 2.8-second inference makes it ideal for rapid prototyping, exploratory analysis, and quick comparisons before investing in a full production pipeline.
- You want open-source. TabPFN is freely available on Hugging Face. For teams that prefer open-source tools, want to inspect the model, or need to self-host, TabPFN provides that flexibility.
- Your dataset is small to medium. For datasets under 50,000 rows (open-source) or 10 million rows (enterprise), TabPFN handles the scale comfortably on single-table tasks.
- You are in a research or academic setting. TabPFN's Nature publication, open-source availability, and strong single-table benchmarks make it a natural choice for research comparisons and academic work.
When to choose Kumo
Kumo solves a different structural problem. Choose Kumo when:
- Your data lives in multiple relational tables. Customers, orders, products, reviews, interactions, support tickets - if your predictive signals span table boundaries, KumoRFM discovers them automatically. TabPFN requires you to flatten them first, losing multi-hop patterns in the process.
- Multi-hop patterns matter for your prediction. If churn depends on what similar customers experienced, if fraud depends on transaction network structure, if recommendations depend on purchase-graph similarity - these are patterns that only exist in relational structure and cannot survive flattening.
- You need enterprise scale. Hundreds of millions of rows across dozens of tables, running natively in your data warehouse without data export. KumoRFM's architecture is designed for this operating range.
- You want to avoid the flattening pipeline. The SQL joins, aggregation computation, and temporal feature engineering required to flatten relational data for TabPFN take hours of data science time per task and create brittle pipelines. Kumo eliminates this entirely.
- You need maximum accuracy on relational data. The 15-20+ AUROC point gap on relational benchmarks translates directly to business outcomes. In fraud detection, this means millions more in caught fraud. In churn prediction, it means significantly more customers retained. In lead scoring, it means higher conversion rates.
They are not competitors - they solve different data structures
The most accurate way to frame this comparison is not TabPFN vs. KumoRFM, but single-table vs. relational. TabPFN is the best foundation model for single flat tables. KumoRFM is the best foundation model for relational tables. The question is which description matches your data.
For most enterprise prediction tasks, the answer is relational. Customer churn, fraud detection, recommendation, lead scoring, demand forecasting, supply chain optimization - these tasks inherently involve multiple connected entities across multiple tables. Forcing this data into a single table is possible, but the flattening tax is real: 15-20+ AUROC points on complex relational tasks, plus the engineering cost of building and maintaining the flattening pipeline.
KumoRFM 2.0 makes this choice simpler by supporting both single-table and multi-table tasks. On single-table problems, it is competitive with TabPFN. On multi-table problems, it captures patterns that no single-table model can access. It is a superset, not a replacement.