What is single-table ML?

Single-table ML is the standard approach where all data is flattened into one table with one row per entity and one column per feature. Models like XGBoost, LightGBM, random forests, and logistic regression operate on this format. The flattening process requires joining multiple source tables, computing aggregations, and engineering features manually, which destroys multi-hop relationships, temporal sequences, and graph topology in the process.

What is relational ML?

Relational ML operates directly on multi-table relational data without flattening. The database is represented as a temporal heterogeneous graph where rows are nodes and foreign keys are edges. Graph neural networks learn patterns across this structure, discovering cross-table relationships, multi-hop dependencies, and temporal sequences that flat-table models cannot capture. This approach was formalized in the Relational Deep Learning paper published at ICML 2024.

What information is lost when you flatten relational data?

Three types of information are destroyed: (1) Multi-hop relationships, where patterns span 3-4 tables (e.g., a customer's churn risk depends on the return rates of products bought by similar customers); (2) Temporal sequences, where the order and timing of events carries signal that aggregation erases; (3) Graph topology, where the structure of connections (how many entities connect to a node, and how those entities are themselves connected) is predictive but cannot be captured in a flat row.

How much accuracy is lost by flattening?

On the RelBench benchmark (7 databases, 30 tasks, 103M+ rows), LightGBM with manual feature engineering on flat tables scored 62.44 AUROC. Graph neural networks on the raw relational data scored 75.83 AUROC. That is a 13.4-point gap caused by information destruction during flattening. On specific tasks involving multi-hop patterns, the gap is even larger.

When is single-table ML sufficient?

Single-table ML is sufficient when most of the predictive signal lives in a single table and the relationships between tables add minimal information. Examples include tabular Kaggle competitions where the data is already flat, sensor data from a single device, or simple classification tasks where demographic features dominate. In enterprise settings with 5-50 interconnected tables, relational ML almost always outperforms because the cross-table signal is substantial.

Single-Table ML vs Relational ML: What Gets Lost When You Flatten | Kumo.ai

Open any ML textbook, any online course, any Kaggle competition. The data arrives as a single CSV file. One row per sample, one column per feature, one target variable. The model trains on this table and produces predictions. Clean, simple, well-understood.

Now look at where enterprise data actually lives. A PostgreSQL database with 15 tables. A Snowflake warehouse with 40 tables across 6 schemas. A data lake with hundreds of Parquet files organized by domain. The data that predicts customer churn, credit default, or next purchase is spread across all of these tables, connected by foreign keys and temporal relationships.

The standard approach is to flatten this relational structure into a single table through feature engineering: write SQL joins, compute aggregations, create derived features. This produces the familiar CSV that models expect. And it destroys information in the process.

The question is: how much information is destroyed, and does it matter?

The three types of information destruction

1. Multi-hop relationships

A customer's churn risk depends not just on their own behavior but on the behavior of the products they bought, the other customers who bought those products, and the churn rates of those customers. That is a 4-hop path through the relational graph: customer → orders → products → orders (other customers) → customer outcomes.

No data scientist writes this feature. It is not that the SQL is hard. It is that no one thinks to look for it. The feature space of possible multi-hop aggregations is combinatorially large, and humans explore a tiny fraction. A typical feature engineering effort covers 1-hop relationships (direct aggregates from immediately joined tables) and occasionally 2-hop relationships. Patterns at 3-4 hops are systematically invisible.

On the RelBench benchmark, the tasks where relational models most dramatically outperform flat-table models are those where multi-hop patterns carry significant signal. The Amazon product recommendation task, which involves a customer-review-product graph with 3-hop patterns, shows a 15+ point AUROC gap between flat and relational approaches.

2. Temporal sequences

When you aggregate a customer's transaction history into "total orders in last 30 days," you destroy the sequence. Consider two SaaS customers who both logged 20 sessions in the past month:

sessions: User A (disengaging)

session_id	date	duration_min	features_used
S-101	Mar 1	42	5
S-102	Mar 2	38	4
S-103	Mar 3	35	4
S-104	Mar 4	28	3
S-105	Mar 5	22	2

20 sessions crammed into week 1, then nothing for 3 weeks. Duration and feature usage declining each day. This user is abandoning the product.

sessions: User B (deepening)

session_id	date	duration_min	features_used
S-201	Mar 1	15	2
S-202	Mar 8	22	3
S-203	Mar 15	31	4
S-204	Mar 22	40	5
S-205	Mar 29	48	7

1 session per week, steady cadence, increasing duration and feature adoption. This user is deepening engagement.

flat_feature_table (what the model sees)

user	sessions_30d	avg_duration	avg_features_used	reality
User A	20	33 min	3.6	Disengaging (churn in 2 weeks)
User B	20	31 min	4.2	Deepening (expansion candidate)

Both users show 20 sessions and similar averages. User A crammed 20 declining sessions into week 1, then disappeared. User B steadily increased engagement over 4 weeks. The flat table erased the trajectory.

These temporal patterns carry strong predictive signal. Accelerating purchase frequency predicts expansion. Decelerating frequency predicts churn. Category migration predicts lifetime value growth. Payment timing drift predicts credit default. All of these patterns exist in the raw transaction data. All of them are erased by aggregation.

The flat-table workaround is to create more granular time windows: instead of "orders last 30 days," compute "orders in days 1-7," "orders in days 8-14," "orders in days 15-21," "orders in days 22-30." This captures some temporal structure but at the cost of feature explosion. Four time windows for each of 10 aggregations across 5 tables produces 200 features. And the sequence within each window is still lost.

3. Graph topology

The structure of connections around an entity carries information that cannot be captured in a flat row. Consider two sellers on a marketplace platform, both with identical flat-table metrics:

transactions: Seller X (embedded in community)

txn_id	seller	buyer	amount	repeat_buyer
T-401	Seller X	Buyer A	$85	Yes (3rd purchase)
T-402	Seller X	Buyer B	$120	Yes (2nd purchase)
T-403	Seller X	Buyer C	$65	Yes (5th purchase)
T-404	Seller X	Buyer D	$90	No

Seller X has repeat buyers. Buyers A, B, and C also buy from Seller Y and Seller Z, forming a tightly connected community of trusted sellers.

transactions: Seller W (isolated)

txn_id	seller	buyer	amount	repeat_buyer
T-501	Seller W	Buyer E	$95	No
T-502	Seller W	Buyer F	$110	No
T-503	Seller W	Buyer G	$70	No
T-504	Seller W	Buyer H	$85	No

Seller W has no repeat buyers. None of Seller W's buyers buy from any other seller on the platform. No community embedding.

flat_feature_table (what the model sees)

seller	txn_count	total_revenue	avg_order	unique_buyers	reality
Seller X	4	$360	$90	4	Trusted, community-embedded, high LTV
Seller W	4	$360	$90	4	Isolated, no repeat buyers, high churn risk

Identical flat features: same count, same revenue, same average, same buyer count. The graph reveals Seller X is embedded in a community of repeat buyers while Seller W's buyers are one-time, isolated transactions.

Graph topology goes deeper than degree counts. The clustering coefficient (do a customer's merchants also share other customers?) indicates whether the customer is embedded in a community or operating in isolation. The path length to high-value nodes (how many hops to reach a VIP customer through shared product connections?) indicates growth potential. These are structural properties of the graph that flat features cannot represent.

Single-table ML

One row per entity, manually engineered features
1-2 hop relationships captured at best
Temporal sequences aggregated into counts and averages
Graph topology reduced to simple degree counts
62.44 AUROC on RelBench benchmark

Relational ML

Full multi-table structure preserved as a graph
3-4 hop patterns discovered automatically via message passing
Temporal sequences processed in raw form
Full graph topology captured: clustering, path lengths, community
75.83-81.14 AUROC on RelBench benchmark

raw relational data — customers table

customer_id	name	signup_date	segment	region
C-201	Aisha Patel	2023-08-12	Premium	West
C-202	Tom Nguyen	2024-01-05	Basic	East
C-203	Sarah Klein	2023-03-22	Premium	Central
C-204	Marcus Lee	2024-06-18	Basic	West

flattened_feature_table — what XGBoost sees

customer_id	orders_90d	avg_value	support_tickets	days_inactive	churned
C-201	8	$72.40	1	5	No
C-202	1	$45.00	4	62	Yes
C-203	12	$110.50	0	2	No
C-204	2	$38.20	2	41	?

After flattening: C-202 has 4 support tickets, but the model cannot see that 3 were 'cancellation' type filed in the last 2 weeks. C-204's 2 tickets were routine billing inquiries. Same count, completely different signal.

A concrete example: predicting customer churn

Consider an e-commerce database with five tables: customers, orders, products, reviews, and support tickets. You want to predict which customers will churn in the next 90 days.

The flat-table approach

A data scientist writes SQL to produce features like: total orders (last 30/60/90 days), average order value, number of distinct product categories, total returns, number of support tickets, average review score, days since last order, days since last support ticket. After 10-15 hours, the result is a table with 50-100 features per customer. LightGBM trains in minutes and produces a decent model.

What the flat model misses

The support-then-purchase sequence. Customers who file a support ticket and then purchase within 7 days are satisfied with the resolution and unlikely to churn. Customers who file a support ticket and do not purchase within 21 days are dissatisfied and highly likely to churn. The flat model sees "1 support ticket" and "2 orders in 30 days" as separate features. The sequence and timing between them is lost.

Product quality propagation. A customer who purchased a product with a 2.1-star average review is at higher churn risk than a customer who purchased a product with a 4.5-star average, even if neither customer has left a review themselves. This is a 2-hop pattern (customer → orders → products → reviews) that the flat model would only capture if a data scientist explicitly computed "average review score of purchased products." Most do not.

Cohort behavior. If 30% of customers who bought the same product in the same week churned within 60 days, the remaining customers from that cohort face elevated risk. This is a graph-level pattern: customer → order → product → order (same product, same time) → customer outcome. The flat model cannot see it.

The relational approach

A relational model represents all five tables as a graph. Customer nodes connect to order nodes, which connect to product nodes, which connect to review nodes. Support ticket nodes connect to customer nodes. Timestamps create temporal ordering.

The graph neural network propagates information along all these connections. After 3-4 rounds of message passing, each customer node's representation contains: their own purchase history and support interactions (1-hop), the quality and review scores of their purchased products (2-hop), the behavior of other customers who bought the same products (3-hop), and the aggregate outcomes of those customers (4-hop).

The model discovers which of these patterns are predictive. No human specifies features. The accuracy improvement comes from the multi-hop and temporal patterns that flattening destroys.

When single-table ML is enough

Relational ML is not always necessary. Single-table ML is sufficient when:

The data is genuinely single-table (sensor readings from one device, survey responses, image metadata)
The features have been pre-engineered by domain experts who captured the key cross-table patterns
The prediction task depends primarily on entity-level attributes rather than relational context (e.g., predicting a product's weight from its description)
The relational structure is shallow (2 tables with a simple one-to-many relationship) and 1-hop aggregates capture most of the signal

In enterprise settings, these conditions rarely hold. The databases have 5-50 tables. The predictive patterns span multiple hops. The temporal dynamics matter. And the data science team has already tried flat-table ML and hit an accuracy ceiling.

The foundation model bridge

KumoRFM is a foundation model pre-trained on relational patterns across thousands of databases. It represents the relational database as a temporal heterogeneous graph and generates predictions directly from the raw structure.

For the churn prediction example above, the workflow is:

PQL Query

PREDICT churn_90d
FOR EACH customers.customer_id

One line of PQL replaces the entire flattening process. The model reads the raw relational tables and preserves multi-hop patterns, temporal sequences, and graph topology.

Output

customer_id	churn_90d	confidence	top_signal
C-201	0.09	0.95	Consistent engagement, single resolved ticket
C-202	0.93	0.92	3 cancellation tickets in 14 days + declining orders
C-203	0.04	0.97	High frequency, premium segment, zero tickets
C-204	0.41	0.86	Moderate inactivity, routine ticket history

The model returns a churn probability for every customer, incorporating the full relational context: multi-hop patterns, temporal sequences, and graph topology. No flattening, no feature engineering, no information destruction.

The 13-19 point accuracy gap between flat and relational ML is not a theoretical curiosity. It is revenue left on the table. Every churn prediction that the flat model gets wrong is a customer who could have been retained with the right intervention. Every credit risk prediction that the flat model gets wrong is a default that could have been avoided or a creditworthy borrower who was denied. The information destroyed by flattening has a dollar value. Relational ML recovers it.

Key Takeaways

1Flattening relational data destroys three categories of information: multi-hop relationships (3-4 table patterns), temporal sequences (order and timing of events), and graph topology (connection structure between entities).
2On RelBench, flat-table ML scores 62.44 AUROC vs 75.83 for relational ML and 81.14 for KumoRFM fine-tuned. The 13-19 point gap is the cost of information destruction.
3The most valuable signals are the ones flattening erases: support ticket categories become counts, order sequences become averages, and cohort behavior patterns vanish entirely.
4Single-table ML is sufficient when data genuinely lives in one table or the relational structure is shallow. In enterprise settings with 5-50 interconnected tables, relational ML almost always outperforms.
5Foundation models like KumoRFM recover the destroyed information by learning directly from the raw relational graph. No flattening, no feature engineering, no information loss.

Single-Table ML vs Relational ML: What Gets Lost When You Flatten