Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
Learn14 min read

What is Kumo AI? The Relational Foundation Model Platform

Kumo.ai is an AI platform that makes predictions directly on relational databases using graph neural networks and the world's first relational foundation model. No feature engineering, no pipeline complexity, no months of data science work.

TL;DR

  • 1On the SAP SALT enterprise benchmark, KumoRFM scores 91% accuracy vs 75% for PhD data scientists with XGBoost and 63% for LLM+AutoML - with zero feature engineering and zero training time.
  • 2Kumo.ai is an AI platform that makes predictions directly on enterprise relational databases. Instead of the traditional ML pipeline (join tables, engineer features, select model, train, deploy), you connect your database, write a predictive query in PQL, and get results in minutes.
  • 3KumoRFM is the world's first relational foundation model, pre-trained on tens of thousands of heterogeneous relational datasets. It delivers zero-shot predictions that score 76.71 AUROC on RelBench - beating PhD data scientists using LightGBM with manual features (62.44 AUROC). Fine-tuning adds 10-30% more accuracy (81.14 AUROC).
  • 4Founded by Jure Leskovec (Stanford, graph ML pioneer) and Matthias Fey (creator of PyTorch Geometric, 21K+ GitHub stars). Backed by 40+ peer-reviewed papers at NeurIPS, ICML, and KDD.
  • 5Enterprise customers include DoorDash (30% accuracy improvement) and Reddit (4-5 years of projected work completed in 2 months). Integrates natively with Snowflake, Databricks, BigQuery, and AWS Athena.

What Kumo does

Kumo.ai is an AI platform that makes predictions directly on relational databases. Most enterprise data lives in relational databases - customers, transactions, products, interactions, support tickets - spread across dozens of interconnected tables. Traditional ML requires data scientists to manually join, flatten, and engineer features from these tables before any model can be trained. Kumo eliminates that entire process.

The platform uses graph neural networks and a relational foundation model called KumoRFM to read raw relational tables directly. It understands the relationships between tables (foreign keys), the temporal dynamics within them (timestamps), and the multi-hop patterns that connect entities across the database. The result: enterprise-grade predictions without feature engineering.

Instead of writing hundreds of lines of SQL and Python to build a feature table, a data scientist using Kumo writes a single predictive query:

The headline result: SAP SALT benchmark

The SAP SALT benchmark is an enterprise-grade evaluation where real business analysts and data scientists attempt prediction tasks on SAP enterprise data. It measures how accurately different approaches predict real business outcomes on production-quality enterprise databases with multiple related tables.

sap_salt_enterprise_benchmark

approachaccuracywhat_it_means
LLM + AutoML63%Language model generates features, AutoML selects model
PhD Data Scientist + XGBoost75%Expert spends weeks hand-crafting features, tunes XGBoost
KumoRFM (zero-shot)91%No feature engineering, no training, reads relational tables directly

SAP SALT benchmark: KumoRFM outperforms expert data scientists by 16 percentage points and LLM+AutoML by 28 percentage points on real enterprise prediction tasks.

KumoRFM scores 91% where PhD-level data scientists with weeks of feature engineering and hand-tuned XGBoost score 75%. The 16 percentage point gap is the value of reading relational data natively instead of flattening it into a single table.

PQL Query

PREDICT churn
FOR EACH customers.customer_id
IN 30 days

Predictive Query Language (PQL) replaces the entire traditional ML pipeline. One query connects to your database, discovers predictive patterns across all related tables, and returns predictions. No feature engineering, no model selection, no training.

Output

customer_idchurn_probabilityconfidence
C-10010.87High
C-10020.12High
C-10030.64Medium
C-10040.03High

How Kumo is different

The difference between Kumo and traditional ML is not incremental. It is a fundamentally different approach to making predictions from relational data. The traditional ML pipeline has six stages that take weeks. Kumo collapses them into two steps that take minutes.

Traditional ML pipeline

  • Join multiple tables into a flat table (days of SQL)
  • Engineer features: aggregations, encodings, time windows (weeks of iteration)
  • Select features: test which ones improve accuracy (days of experimentation)
  • Choose a model: XGBoost, LightGBM, neural net, ensemble (days)
  • Train and tune hyperparameters (hours to days)
  • Deploy pipeline and maintain feature code (ongoing)
  • Timeline: 4-12 weeks per prediction task

Kumo

  • Connect your relational database (minutes)
  • Write a predictive query in PQL (seconds)
  • Get predictions - no feature engineering, no model selection (minutes)
  • Timeline: minutes per prediction task

pipeline_comparison

pipeline_stagetraditional_MLKumo
Data extraction & joining2-5 days of SQLAutomatic - reads tables directly
Feature engineering1-4 weeks of iterationAutomatic - discovered by KumoRFM
Feature selection2-3 daysAutomatic - model learns relevance
Model selection1-2 daysAutomatic - KumoRFM or fine-tuned model
Training & tuningHours to daysMinutes (fine-tuning) or zero (zero-shot)
Deployment & maintenanceOngoing pipeline codeNo pipeline to maintain
Total time to first prediction4-12 weeksMinutes
Human expertise requiredSenior data scientistAny data analyst

Highlighted: the total time drops from weeks to minutes, and the skill requirement drops from senior data scientist to any analyst who can write a predictive query.

KumoRFM: the world's first relational foundation model

At the core of Kumo is KumoRFM, the world's first foundation model purpose-built for relational data. Just as GPT was pre-trained on billions of words to understand language, KumoRFM was pre-trained on tens of thousands of heterogeneous relational datasets to understand the patterns that exist in structured, multi-table data.

How KumoRFM works

KumoRFM represents your database as a temporal heterogeneous graph. Each row in each table becomes a node. Each foreign key relationship becomes an edge. Timestamps are preserved as temporal attributes. A graph transformer then processes this structure, passing messages along edges to learn which cross-table, multi-hop patterns are predictive for any given task.

Because it was pre-trained on thousands of diverse relational databases, KumoRFM has already learned the universal patterns that recur across structured data: recency effects, frequency dynamics, temporal decay, graph centrality signals, and multi-hop relationship patterns. At inference time, it applies these learned patterns to your database without any task-specific training.

Zero-shot performance

On the RelBench benchmark (7 databases, 30 tasks, 103 million rows), KumoRFM achieves remarkable zero-shot performance - meaning it generates predictions on databases it has never seen, with no task-specific training:

AUROC (Area Under the Receiver Operating Characteristic curve) measures how well a model distinguishes between positive and negative outcomes. An AUROC of 50 means random guessing, 100 means perfect prediction. Moving from 65 to 77 AUROC means the model correctly ranks a true positive above a true negative 77% of the time instead of 65%.

relbench_performance

approachAUROChuman_efforttime_to_prediction
PhD data scientists + LightGBM62.4412.3 hours + 878 lines of codeDays to weeks
KumoRFM zero-shot76.71ZeroSeconds
KumoRFM fine-tuned81.14Minutes of computeMinutes

Highlighted: KumoRFM zero-shot beats PhD-level data scientists by 14+ AUROC points with zero human effort. Fine-tuning adds another 4+ points (10-30% relative improvement).

KumoRFM 2.0: relational and tabular data

KumoRFM 2.0 extends the foundation model to support both relational and tabular data. While the original KumoRFM was designed for multi-table relational databases, KumoRFM 2.0 also handles single-table tabular datasets - making it a universal foundation model for all structured data.

This means enterprises can use a single platform for every structured data prediction task, whether the data lives in a complex relational schema with dozens of interconnected tables or in a single flat table. KumoRFM 2.0 automatically adapts its approach based on the structure of the input data.

Who built Kumo

Kumo was founded by two researchers who helped create the field of graph machine learning:

  • Jure Leskovec - Stanford professor and one of the most cited researchers in graph machine learning. His research group at Stanford has produced foundational work on graph neural networks, network analysis, and relational learning. He has published 40+ peer-reviewed papers at top venues including NeurIPS, ICML, and KDD.
  • Matthias Fey - Creator of PyTorch Geometric, the most widely used library for graph neural networks with 21,000+ GitHub stars. PyG is the standard toolkit used by researchers and practitioners worldwide for building GNN-based systems.

The Kumo team includes researchers and engineers from Stanford, Google, and Facebook AI. The company's work is grounded in a deep research foundation: 40+ peer-reviewed papers at NeurIPS, ICML, and KDD, and the creation of both PyTorch Geometric and RelBench (the benchmark for relational deep learning).

Use cases

Kumo supports any prediction task that can be expressed on a relational database. Enterprise customers use it across industries for tasks that traditionally required months of data science work:

use_cases

categoryexample_predictiontypical_accuracy_gain
Churn predictionWhich customers will cancel in the next 30 days?20-40% improvement over manual ML
Fraud detectionWhich transactions are fraudulent in real time?30-50% improvement in detection rate
Lead scoringWhich leads will convert to paying customers?25-35% improvement in conversion prediction
Recommendation systemsWhich products should we recommend to each user?15-30% improvement in relevance
Demand forecastingHow much inventory do we need at each location?20-35% reduction in forecast error
Credit risk modelingWhat is the default probability for each applicant?15-25% improvement in risk ranking
Anti-money launderingWhich transaction patterns indicate money laundering?30-50% reduction in false positives
Customer lifetime valueWhat is the expected revenue from each customer over 12 months?20-30% improvement in CLV accuracy

Accuracy gains are based on customer results and RelBench benchmarks comparing Kumo predictions to traditional ML pipelines with manual feature engineering.

Integrations

Kumo connects directly to the data platforms enterprises already use. Data never leaves your environment - Kumo reads from your existing data warehouse and writes predictions back to it.

  • Snowflake - native Snowflake app. Install directly from the Snowflake Marketplace. Kumo runs inside your Snowflake environment, reading and writing data without any external data movement.
  • Databricks - lakehouse app. Integrates with Databricks Unity Catalog and Delta Lake. Predictions are written back as Delta tables.
  • Google BigQuery - direct connector. Reads from BigQuery datasets and writes predictions back as BigQuery tables.
  • AWS Athena - connector for querying data in S3 via Athena. Supports Parquet, ORC, and CSV formats.
  • Private cloud - deploy Kumo in your own VPC for full data isolation. Supports AWS, GCP, and Azure.

Customer results

Kumo is used by enterprise customers including Fortune 500 companies. Two publicly shared results illustrate the impact:

customer_results

customeruse_caseresultcomparison
DoorDashPrediction accuracy across ML tasks30% accuracy improvementCompared to their existing in-house ML pipeline
RedditMultiple prediction tasks4-5 years of projected work completed in 2 monthsCompressed years of data science pipeline development into weeks

Customer results are based on publicly shared case studies. Individual results vary based on data quality, database complexity, and prediction task.

The DoorDash result demonstrates the accuracy advantage of reading raw relational data versus manual feature engineering. The Reddit result demonstrates the time advantage: tasks that would have required years of data science effort - building pipelines, engineering features, training models for each use case - were completed in two months using Kumo.

Research foundation

Kumo is not a startup that applied existing technology to a new market. The company's founders created the underlying technology. The research foundation includes:

  • 40+ peer-reviewed papers at NeurIPS, ICML, KDD, and other top machine learning venues. These papers cover graph neural networks, temporal graph learning, relational deep learning, and foundation models for structured data.
  • PyTorch Geometric (PyG) - created by co-founder Matthias Fey. The most widely used graph neural network library with 21,000+ GitHub stars. PyG is the standard toolkit for GNN research and production systems worldwide.
  • RelBench - created by the Kumo research team. The first comprehensive benchmark for machine learning on relational databases, with 7 databases, 30 tasks, and 103 million rows. RelBench provides a standardized way to compare approaches to relational prediction tasks.

Frequently asked questions

What is Kumo AI used for?

Kumo AI is used for making predictions directly on enterprise relational databases. Common use cases include churn prediction, fraud detection, lead scoring, recommendation systems, demand forecasting, credit risk modeling, anti-money laundering, and customer lifetime value prediction. Instead of building manual ML pipelines, teams write a simple predictive query and get results in minutes.

What is a relational foundation model?

A relational foundation model is a large pre-trained model that understands the structure and patterns in relational databases. KumoRFM was pre-trained on tens of thousands of heterogeneous relational datasets, learning universal patterns like recency effects, frequency dynamics, and graph topology signals. It can make zero-shot predictions on new databases it has never seen before, without any task-specific training.

How is Kumo AI different from traditional ML platforms?

Traditional ML platforms require data scientists to manually join tables, engineer features, select models, and tune hyperparameters - a process that takes weeks per prediction task. Kumo eliminates this entire pipeline. You connect your database, write a predictive query in PQL, and get predictions. KumoRFM reads raw relational tables directly, discovering cross-table patterns that manual feature engineering misses.

Does Kumo AI require feature engineering?

No. Kumo AI eliminates feature engineering entirely. Traditional ML requires joining tables into flat feature tables, computing aggregations, and encoding variables - a process that takes 80% of data science time. KumoRFM reads raw relational tables directly and discovers predictive patterns across multiple tables, time windows, and relationship hops automatically.

What databases and platforms does Kumo integrate with?

Kumo integrates with Snowflake (as a native Snowflake app), Databricks (as a lakehouse app), Google BigQuery, and AWS Athena. Kumo also supports private cloud deployment. Data never leaves your environment - Kumo reads directly from your existing data warehouse.

Who founded Kumo AI?

Kumo AI was founded by Jure Leskovec, a Stanford professor and pioneer in graph machine learning, and Matthias Fey, the creator of PyTorch Geometric (21,000+ GitHub stars). The team includes researchers and engineers from Stanford, Google, and Facebook AI, with over 40 peer-reviewed papers at top venues like NeurIPS, ICML, and KDD.

See it in action

KumoRFM delivers predictions on relational data in seconds. No feature engineering, no ML pipelines. Try it free.