Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
Learn13 min read

Single-Table ML vs Relational ML: What Gets Lost When You Flatten

Your database has 12 tables connected by foreign keys. Your ML model sees one table. Between those two realities, three categories of information are being destroyed. Here is exactly what they are and how much they cost you.

TL;DR

  • 1Flattening relational data destroys three categories of information: multi-hop relationships (3-4 table patterns), temporal sequences (order and timing of events), and graph topology (connection structure).
  • 2On RelBench (7 databases, 30 tasks, 103M+ rows), flat-table ML scores 62.44 AUROC vs 75.83 for relational ML and 81.14 for KumoRFM fine-tuned. The 13-19 point gap is the cost of information destruction.
  • 3The most valuable signals are exactly what flattening erases: support ticket categories become counts, order sequences become averages, cohort behavior patterns vanish entirely.
  • 4Single-table ML is sufficient when data genuinely lives in one table or the relational structure is shallow. In enterprise settings with 5-50 interconnected tables, relational ML almost always outperforms.
  • 5KumoRFM recovers destroyed information by learning directly from the raw relational graph. No flattening, no feature engineering, no information loss. One line of PQL, predictions in seconds.

Open any ML textbook, any online course, any Kaggle competition. The data arrives as a single CSV file. One row per sample, one column per feature, one target variable. The model trains on this table and produces predictions. Clean, simple, well-understood.

Now look at where enterprise data actually lives. A PostgreSQL database with 15 tables. A Snowflake warehouse with 40 tables across 6 schemas. A data lake with hundreds of Parquet files organized by domain. The data that predicts customer churn, credit default, or next purchase is spread across all of these tables, connected by foreign keys and temporal relationships.

The standard approach is to flatten this relational structure into a single table through feature engineering: write SQL joins, compute aggregations, create derived features. This produces the familiar CSV that models expect. And it destroys information in the process.

The question is: how much information is destroyed, and does it matter?

The three types of information destruction

1. Multi-hop relationships

A customer's churn risk depends not just on their own behavior but on the behavior of the products they bought, the other customers who bought those products, and the churn rates of those customers. That is a 4-hop path through the relational graph: customer → orders → products → orders (other customers) → customer outcomes.

No data scientist writes this feature. It is not that the SQL is hard. It is that no one thinks to look for it. The feature space of possible multi-hop aggregations is combinatorially large, and humans explore a tiny fraction. A typical feature engineering effort covers 1-hop relationships (direct aggregates from immediately joined tables) and occasionally 2-hop relationships. Patterns at 3-4 hops are systematically invisible.

On the RelBench benchmark, the tasks where relational models most dramatically outperform flat-table models are those where multi-hop patterns carry significant signal. The Amazon product recommendation task, which involves a customer-review-product graph with 3-hop patterns, shows a 15+ point AUROC gap between flat and relational approaches.

2. Temporal sequences

When you aggregate a customer's transaction history into "total orders in last 30 days," you destroy the sequence. Consider two SaaS customers who both logged 20 sessions in the past month:

sessions: User A (disengaging)

session_iddateduration_minfeatures_used
S-101Mar 1425
S-102Mar 2384
S-103Mar 3354
S-104Mar 4283
S-105Mar 5222

20 sessions crammed into week 1, then nothing for 3 weeks. Duration and feature usage declining each day. This user is abandoning the product.

sessions: User B (deepening)

session_iddateduration_minfeatures_used
S-201Mar 1152
S-202Mar 8223
S-203Mar 15314
S-204Mar 22405
S-205Mar 29487

1 session per week, steady cadence, increasing duration and feature adoption. This user is deepening engagement.

flat_feature_table (what the model sees)

usersessions_30davg_durationavg_features_usedreality
User A2033 min3.6Disengaging (churn in 2 weeks)
User B2031 min4.2Deepening (expansion candidate)

Both users show 20 sessions and similar averages. User A crammed 20 declining sessions into week 1, then disappeared. User B steadily increased engagement over 4 weeks. The flat table erased the trajectory.

These temporal patterns carry strong predictive signal. Accelerating purchase frequency predicts expansion. Decelerating frequency predicts churn. Category migration predicts lifetime value growth. Payment timing drift predicts credit default. All of these patterns exist in the raw transaction data. All of them are erased by aggregation.

The flat-table workaround is to create more granular time windows: instead of "orders last 30 days," compute "orders in days 1-7," "orders in days 8-14," "orders in days 15-21," "orders in days 22-30." This captures some temporal structure but at the cost of feature explosion. Four time windows for each of 10 aggregations across 5 tables produces 200 features. And the sequence within each window is still lost.

3. Graph topology

The structure of connections around an entity carries information that cannot be captured in a flat row. Consider two sellers on a marketplace platform, both with identical flat-table metrics:

transactions: Seller X (embedded in community)

txn_idsellerbuyeramountrepeat_buyer
T-401Seller XBuyer A$85Yes (3rd purchase)
T-402Seller XBuyer B$120Yes (2nd purchase)
T-403Seller XBuyer C$65Yes (5th purchase)
T-404Seller XBuyer D$90No

Seller X has repeat buyers. Buyers A, B, and C also buy from Seller Y and Seller Z, forming a tightly connected community of trusted sellers.

transactions: Seller W (isolated)

txn_idsellerbuyeramountrepeat_buyer
T-501Seller WBuyer E$95No
T-502Seller WBuyer F$110No
T-503Seller WBuyer G$70No
T-504Seller WBuyer H$85No

Seller W has no repeat buyers. None of Seller W's buyers buy from any other seller on the platform. No community embedding.

flat_feature_table (what the model sees)

sellertxn_counttotal_revenueavg_orderunique_buyersreality
Seller X4$360$904Trusted, community-embedded, high LTV
Seller W4$360$904Isolated, no repeat buyers, high churn risk

Identical flat features: same count, same revenue, same average, same buyer count. The graph reveals Seller X is embedded in a community of repeat buyers while Seller W's buyers are one-time, isolated transactions.

Graph topology goes deeper than degree counts. The clustering coefficient (do a customer's merchants also share other customers?) indicates whether the customer is embedded in a community or operating in isolation. The path length to high-value nodes (how many hops to reach a VIP customer through shared product connections?) indicates growth potential. These are structural properties of the graph that flat features cannot represent.

Single-table ML

  • One row per entity, manually engineered features
  • 1-2 hop relationships captured at best
  • Temporal sequences aggregated into counts and averages
  • Graph topology reduced to simple degree counts
  • 62.44 AUROC on RelBench benchmark

Relational ML

  • Full multi-table structure preserved as a graph
  • 3-4 hop patterns discovered automatically via message passing
  • Temporal sequences processed in raw form
  • Full graph topology captured: clustering, path lengths, community
  • 75.83-81.14 AUROC on RelBench benchmark

raw relational data — customers table

customer_idnamesignup_datesegmentregion
C-201Aisha Patel2023-08-12PremiumWest
C-202Tom Nguyen2024-01-05BasicEast
C-203Sarah Klein2023-03-22PremiumCentral
C-204Marcus Lee2024-06-18BasicWest

flattened_feature_table — what XGBoost sees

customer_idorders_90davg_valuesupport_ticketsdays_inactivechurned
C-2018$72.4015No
C-2021$45.00462Yes
C-20312$110.5002No
C-2042$38.20241?

After flattening: C-202 has 4 support tickets, but the model cannot see that 3 were 'cancellation' type filed in the last 2 weeks. C-204's 2 tickets were routine billing inquiries. Same count, completely different signal.

A concrete example: predicting customer churn

Consider an e-commerce database with five tables: customers, orders, products, reviews, and support tickets. You want to predict which customers will churn in the next 90 days.

The flat-table approach

A data scientist writes SQL to produce features like: total orders (last 30/60/90 days), average order value, number of distinct product categories, total returns, number of support tickets, average review score, days since last order, days since last support ticket. After 10-15 hours, the result is a table with 50-100 features per customer. LightGBM trains in minutes and produces a decent model.

What the flat model misses

The support-then-purchase sequence. Customers who file a support ticket and then purchase within 7 days are satisfied with the resolution and unlikely to churn. Customers who file a support ticket and do not purchase within 21 days are dissatisfied and highly likely to churn. The flat model sees "1 support ticket" and "2 orders in 30 days" as separate features. The sequence and timing between them is lost.

Product quality propagation. A customer who purchased a product with a 2.1-star average review is at higher churn risk than a customer who purchased a product with a 4.5-star average, even if neither customer has left a review themselves. This is a 2-hop pattern (customer → orders → products → reviews) that the flat model would only capture if a data scientist explicitly computed "average review score of purchased products." Most do not.

Cohort behavior. If 30% of customers who bought the same product in the same week churned within 60 days, the remaining customers from that cohort face elevated risk. This is a graph-level pattern: customer → order → product → order (same product, same time) → customer outcome. The flat model cannot see it.

The relational approach

A relational model represents all five tables as a graph. Customer nodes connect to order nodes, which connect to product nodes, which connect to review nodes. Support ticket nodes connect to customer nodes. Timestamps create temporal ordering.

The graph neural network propagates information along all these connections. After 3-4 rounds of message passing, each customer node's representation contains: their own purchase history and support interactions (1-hop), the quality and review scores of their purchased products (2-hop), the behavior of other customers who bought the same products (3-hop), and the aggregate outcomes of those customers (4-hop).

The model discovers which of these patterns are predictive. No human specifies features. The accuracy improvement comes from the multi-hop and temporal patterns that flattening destroys.

When single-table ML is enough

Relational ML is not always necessary. Single-table ML is sufficient when:

  • The data is genuinely single-table (sensor readings from one device, survey responses, image metadata)
  • The features have been pre-engineered by domain experts who captured the key cross-table patterns
  • The prediction task depends primarily on entity-level attributes rather than relational context (e.g., predicting a product's weight from its description)
  • The relational structure is shallow (2 tables with a simple one-to-many relationship) and 1-hop aggregates capture most of the signal

In enterprise settings, these conditions rarely hold. The databases have 5-50 tables. The predictive patterns span multiple hops. The temporal dynamics matter. And the data science team has already tried flat-table ML and hit an accuracy ceiling.

The foundation model bridge

KumoRFM is a foundation model pre-trained on relational patterns across thousands of databases. It represents the relational database as a temporal heterogeneous graph and generates predictions directly from the raw structure.

For the churn prediction example above, the workflow is:

PQL Query

PREDICT churn_90d
FOR EACH customers.customer_id

One line of PQL replaces the entire flattening process. The model reads the raw relational tables and preserves multi-hop patterns, temporal sequences, and graph topology.

Output

customer_idchurn_90dconfidencetop_signal
C-2010.090.95Consistent engagement, single resolved ticket
C-2020.930.923 cancellation tickets in 14 days + declining orders
C-2030.040.97High frequency, premium segment, zero tickets
C-2040.410.86Moderate inactivity, routine ticket history

The model returns a churn probability for every customer, incorporating the full relational context: multi-hop patterns, temporal sequences, and graph topology. No flattening, no feature engineering, no information destruction.

The 13-19 point accuracy gap between flat and relational ML is not a theoretical curiosity. It is revenue left on the table. Every churn prediction that the flat model gets wrong is a customer who could have been retained with the right intervention. Every credit risk prediction that the flat model gets wrong is a default that could have been avoided or a creditworthy borrower who was denied. The information destroyed by flattening has a dollar value. Relational ML recovers it.

Frequently asked questions

What is single-table ML?

Single-table ML is the standard approach where all data is flattened into one table with one row per entity and one column per feature. Models like XGBoost, LightGBM, random forests, and logistic regression operate on this format. The flattening process requires joining multiple source tables, computing aggregations, and engineering features manually, which destroys multi-hop relationships, temporal sequences, and graph topology in the process.

What is relational ML?

Relational ML operates directly on multi-table relational data without flattening. The database is represented as a temporal heterogeneous graph where rows are nodes and foreign keys are edges. Graph neural networks learn patterns across this structure, discovering cross-table relationships, multi-hop dependencies, and temporal sequences that flat-table models cannot capture. This approach was formalized in the Relational Deep Learning paper published at ICML 2024.

What information is lost when you flatten relational data?

Three types of information are destroyed: (1) Multi-hop relationships, where patterns span 3-4 tables (e.g., a customer's churn risk depends on the return rates of products bought by similar customers); (2) Temporal sequences, where the order and timing of events carries signal that aggregation erases; (3) Graph topology, where the structure of connections (how many entities connect to a node, and how those entities are themselves connected) is predictive but cannot be captured in a flat row.

How much accuracy is lost by flattening?

On the RelBench benchmark (7 databases, 30 tasks, 103M+ rows), LightGBM with manual feature engineering on flat tables scored 62.44 AUROC. Graph neural networks on the raw relational data scored 75.83 AUROC. That is a 13.4-point gap caused by information destruction during flattening. On specific tasks involving multi-hop patterns, the gap is even larger.

When is single-table ML sufficient?

Single-table ML is sufficient when most of the predictive signal lives in a single table and the relationships between tables add minimal information. Examples include tabular Kaggle competitions where the data is already flat, sensor data from a single device, or simple classification tasks where demographic features dominate. In enterprise settings with 5-50 interconnected tables, relational ML almost always outperforms because the cross-table signal is substantial.

See it in action

KumoRFM delivers predictions on relational data in seconds. No feature engineering, no ML pipelines. Try it free.