Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
Learn12 min read

Best Algorithm for Churn Prediction: A Ranked Guide

The honest answer: XGBoost wins on flat tables. But churn signals live across 4+ tables, and flattening them destroys the patterns that matter most. Here is how each algorithm performs, when to use it, and what actually breaks past the 70% accuracy ceiling.

TL;DR

  • 1Algorithms ranked by churn prediction capability: (1) Logistic Regression - baseline, (2) Random Forest - solid, (3) XGBoost/LightGBM - best on flat tables, (4) Deep Learning - marginal gains on flat data, (5) Graph Neural Networks / KumoRFM - best when data spans multiple tables.
  • 2XGBoost is the right choice IF your data fits in one table. Most real churn data does not. Customer churn signals come from 4+ hops across customer, order, product, support, and usage tables.
  • 3On the SAP SALT enterprise benchmark, KumoRFM scores 91% accuracy vs 75% for PhD data scientists with XGBoost and 63% for LLM+AutoML. On RelBench, KumoRFM zero-shot achieves 76.71 AUROC vs 62.44 for LightGBM with manual features.
  • 4Most churn models plateau at 65-70% because they flatten relational data into aggregate columns. Reducing a customer's behavior to order_count=23 and avg_value=$142 is like reducing an org chart to a headcount - you lose the structure that predicts what happens next.
  • 5KumoRFM replaces the entire churn modeling pipeline (SQL joins, feature engineering, model selection) with a one-line PQL query against your raw relational tables.

7 churn prediction algorithms compared

Here is every algorithm worth considering for churn prediction, from simplest to most capable. Each one has a sweet spot.

1. Logistic Regression

The simplest supervised classifier and a solid starting point. Logistic regression fits a linear decision boundary between churners and non-churners. It is fast to train, easy to explain to business teams, and works well as a baseline to measure other models against.

  • Best for: establishing a baseline, regulated environments where every coefficient must be explainable, very small datasets where complex models overfit.
  • Watch out for: cannot capture non-linear relationships or feature interactions. Accuracy typically caps at 55-65% on real churn data.

2. Decision Trees

A single decision tree splits customers into groups based on feature thresholds (e.g., "if tenure < 6 months AND support_tickets > 2, predict churn"). Easy to visualize and explain. Often used in marketing teams for segment-based churn rules.

  • Best for: interpretability (you can draw the tree), small datasets, business rule extraction, teams without ML expertise.
  • Watch out for: single trees overfit easily and have lower accuracy (58-64%) than ensemble methods. Rarely used in production without bagging or boosting.

3. Random Forest

An ensemble of hundreds of decision trees, each trained on a random subset of data and features. The ensemble averages out the overfitting problem of single trees and handles non-linear relationships automatically.

  • Best for: quick iteration with minimal tuning, solid accuracy (60-68%) without much feature engineering, robust to noisy data and outliers.
  • Watch out for: slower to train than XGBoost on large datasets. Does not capture sequential or temporal patterns. Accuracy ceiling is lower than gradient boosting.

4. XGBoost / LightGBM / CatBoost

Gradient boosted trees are the gold standard for flat tabular data. They sequentially build trees that correct the errors of previous trees, producing highly accurate models with good feature engineering. XGBoost has won more Kaggle competitions than any other algorithm.

  • Best for: maximum accuracy on a single flat table (65-75% on churn data), production deployment (fast inference, small model files), teams with feature engineering expertise.
  • Watch out for: requires significant feature engineering to reach peak accuracy. Performance depends heavily on the quality of hand-crafted features. Needs hyperparameter tuning (learning rate, depth, regularization) to avoid overfitting.

5. Support Vector Machines (SVMs)

SVMs find the optimal separating boundary between churners and non-churners. They work well in high-dimensional spaces and can capture non-linear patterns with kernel functions.

  • Best for: small-to-medium datasets with many features, problems where the decision boundary is clear, academic benchmarks.
  • Watch out for: scales poorly to large datasets (training time grows quadratically). Difficult to tune kernel parameters. Accuracy (60-68%) is comparable to random forest but with more effort. Largely superseded by XGBoost in practice.

6. Deep Learning (MLPs, LSTMs, Transformers)

Neural networks can capture complex non-linear patterns. LSTMs are useful for sequential event data (login sequences, purchase timelines). On flat churn tables, they rarely beat XGBoost by more than 1-2 points.

  • Best for: large datasets (100K+ rows), sequential event data where order matters, teams with deep learning infrastructure already in place.
  • Watch out for: requires more data, more tuning, and more compute than XGBoost. Accuracy on flat tables: 66-76%. Not worth the added complexity unless you have sequential event streams or very large data.

7. Graph Neural Networks / KumoRFM

GNNs operate on the relational structure of your data, not a flattened version of it. They read patterns across customers, orders, products, support tickets, and usage logs without requiring manual joins or feature engineering. KumoRFM is a foundation model built on GNN architecture, pre-trained on thousands of relational databases.

  • Best for: data spanning multiple tables (which is most real-world churn data), maximum accuracy (76-91% on enterprise benchmarks), teams that want to skip feature engineering entirely.
  • Watch out for: requires relational data with foreign keys between tables. If your data truly fits in one flat table with no cross-table signals, XGBoost is simpler and competitive. KumoRFM is enterprise SaaS (not open-source), though the free tier at kumorfm.ai covers basic use cases.

churn_algorithm_comparison

algorithmtypical_accuracydata_requirementtuning_effortbest_for
Logistic Regression55-65%Single flat tableMinimalBaseline, compliance, interpretability
Decision Trees58-64%Single flat tableMinimalVisualization, business rules, small data
Random Forest60-68%Single flat tableLowQuick iteration, robust to noise
SVM60-68%Single flat tableHigh (kernels)Small data with many features
XGBoost / LightGBM65-75%Flat table + engineered featuresMedium-HighBest flat-table accuracy
Deep Learning66-76%Large flat table or event sequencesHighSequential data, very large datasets
GNN / KumoRFM76-91%Multiple relational tablesNone (zero-shot)Multi-table data, maximum accuracy

7 algorithms compared. Accuracy ranges based on published benchmarks and practitioner experience. Ranges for individual algorithms are typical estimates; GNN/KumoRFM range is from published RelBench (76.71) and SAP SALT (91%) results. All other models operate on a single flat table.

The honest take: when XGBoost is enough

XGBoost and LightGBM are excellent algorithms. If these two conditions are true, use them and stop reading:

  1. Your churn data fits in a single, well-engineered table with all the features you need already computed.
  2. You have a data scientist who can build strong features - recency, frequency, monetary value, engagement trends, support interactions - and maintain the pipeline over time.

On clean, flat, well-featured data, XGBoost is hard to beat. The problem is that most real-world churn data does not meet those conditions.

Why churn signals live across multiple tables

A customer's churn risk is not a property of the customer row alone. It depends on what they bought, how they used it, what support interactions they had, whether similar customers churned, and how those patterns changed over time. That information lives across 4-6 tables in a typical enterprise database:

  • Customers - demographics, segment, tenure, plan tier
  • Orders / Transactions - purchase history, frequency, recency, value
  • Products / SKUs - what they bought, category, margin, return rate
  • Support tickets - issue types, resolution time, escalations, sentiment
  • Usage / Engagement logs - login frequency, feature adoption, session depth
  • Payments - failed charges, downgrades, billing disputes

To train an XGBoost model, someone has to join all of these tables, compute aggregate features, and flatten everything into one row per customer. That flattening step destroys information.

what_flattening_destroys (churn example)

signalflat_table_seesrelational_model_sees
Purchase patternorder_count=23, avg_value=$1425 orders last month, 0 this month - declining trajectory
Product mix shiftnum_categories=3Shifted from premium to budget products over 8 weeks
Support escalationtickets=3, avg_resolve=4.2d3 tickets in 2 weeks, last one escalated, still open
Peer churn patternNot visible4 of 7 customers on same plan with same rep churned last quarter
Cross-product engagementactive=trueStopped using 2 of 3 product modules, only logs in for billing
Account health trajectoryStatic snapshot onlyNPS dropped 40 points over 3 surveys while usage fell 60%

Each row shows a real churn signal. The flat table captures a number. The relational model captures the pattern, sequence, and cross-entity context that actually predicts whether this customer will leave.

How to build a churn prediction model in 5 steps

If you are building a traditional churn model, here is the standard approach. This is the right way to do it - and understanding why it plateaus will clarify when you need something different.

  1. Collect and join your data. Write SQL to join your customer table with orders, support, usage, and payments. Apply temporal filters so you do not leak future data into training features. For 5 tables with point-in-time correctness, expect 100-200 lines of SQL.
  2. Engineer features. Compute RFM (recency, frequency, monetary) features, engagement metrics, support interaction counts, trend features (change in order frequency over 30/60/90 days), and any domain-specific signals. This typically produces 50-200 features and takes 40-60% of total project time.
  3. Split with temporal awareness. Use a time-based split, not a random split. Train on data before a cutoff date, validate on the period after. Random splits cause data leakage that inflates accuracy by 5-15 points and produces models that fail in production.
  4. Train XGBoost or LightGBM. Use cross-validation to tune hyperparameters (learning rate, max depth, min child weight, subsample ratio). Evaluate with precision-recall curves, not just accuracy - because churn datasets are imbalanced and accuracy is misleading when 95% of customers do not churn.
  5. Deploy and maintain. Set up a feature pipeline that recomputes features on a schedule, retrain the model monthly or quarterly, monitor for data drift, and maintain the SQL joins as source schemas change. Budget 20-30% of initial build time for ongoing maintenance per quarter.

Traditional churn model pipeline

  • Write SQL to join 4-6 tables (100-200 lines)
  • Engineer 50-200 features manually (40-60% of project time)
  • Handle temporal leakage and point-in-time correctness
  • Train XGBoost, tune hyperparameters via cross-validation
  • Deploy feature pipeline + model retraining on schedule
  • Maintain SQL joins when source schemas change
  • Total timeline: 3-8 weeks to first production model

KumoRFM approach

  • Connect Kumo to your data warehouse (one-time setup)
  • Write one PQL query defining the prediction target
  • KumoRFM reads raw relational tables directly
  • No feature engineering, no joins, no model selection
  • Predictions in ~1 second (zero-shot) or minutes (fine-tuned)
  • No feature pipeline to maintain
  • Total timeline: minutes to first prediction

Why most churn models plateau at 65-70% accuracy

If you follow the 5 steps above with XGBoost, you will likely land between 65% and 70% accuracy. That is not a failure of execution - it is a ceiling imposed by the data representation.

The flat table cannot contain:

  • Multi-hop relationships. A customer's churn risk depends on what similar customers did. "Similar" means customers who bought the same products, used the same features, or share an account manager. That is a 3-hop pattern: customer -> orders -> products -> other customers' outcomes. No amount of feature engineering on a flat table captures this.
  • Temporal sequences across tables. A customer whose support tickets are increasing while their order frequency is decreasing and their usage of premium features has stopped - that three-table temporal pattern is a strong churn signal. A flat table collapses each into a single number.
  • Network effects. When a key user at a B2B account leaves, the other users on that account often follow. When a product line gets negative reviews from multiple customers, churn accelerates across that cohort. These are graph patterns that exist in the relationships between entities, not in any single row.

How KumoRFM reaches 91% accuracy on enterprise churn data

KumoRFM is a relational foundation model - pre-trained on thousands of relational databases to understand the patterns that exist across connected tables. When it predicts churn, it does three things that flat-table algorithms cannot:

  1. Reads raw relational tables directly. No joins, no flattening, no feature engineering. It takes your customers, orders, products, support, and usage tables as-is and constructs a graph that preserves every relationship and temporal sequence.
  2. Discovers multi-hop patterns automatically. It finds signals like "customers who bought products in this category and then contacted support about shipping issues churned at 3x the base rate." These patterns span 3-4 tables and would require a data scientist to hypothesize and manually encode them as features.
  3. Transfers knowledge from pre-training. Because KumoRFM was pre-trained on thousands of relational databases, it already understands common patterns (declining engagement predicts churn, support escalations predict churn, peer behavior predicts churn). It applies that knowledge zero-shot to your data, which is why it scores 91% on the SAP SALT benchmark without any task-specific training.

sap_salt_churn_benchmark

approachaccuracyfeature_engineering_timelines_of_code
LLM + AutoML63%Hours (LLM-generated)LLM-generated
PhD Data Scientist + XGBoost75%Weeks878+ lines
KumoRFM (zero-shot)91%00

SAP SALT benchmark on enterprise data. KumoRFM outperforms expert data scientists with hand-tuned XGBoost by 16 percentage points - with no feature engineering and no training time.

relbench_benchmark_churn_relevant

approachAUROCfeature_engineeringcode
LightGBM + manual features62.4412.3 hours/task878 lines
AutoML + manual features~64-66Reduced hours/task878 lines
KumoRFM zero-shot76.71~1 second0 lines
KumoRFM fine-tuned81.14Minutes0 lines

RelBench benchmark (7 databases, 30 tasks, 103M rows). The 14+ AUROC point gap between LightGBM and KumoRFM zero-shot comes entirely from cross-table patterns the flat table never contains.

Predicting churn with PQL: one query, no pipeline

PQL (Predictive Query Language) is how you tell KumoRFM what to predict. Instead of building a feature pipeline, writing joins, and training a model, you write a query that looks like SQL but defines a prediction target:

PQL Query

PREDICT churn_90d
FOR EACH customers.customer_id
WHERE customers.segment = 'enterprise'
AND customers.tenure_months > 3

This single query replaces the entire traditional pipeline: the SQL joins across 4-6 tables, the feature engineering code, the model selection, and the training loop. KumoRFM reads raw customers, orders, products, support_tickets, and usage_logs tables to generate predictions.

Output

customer_idchurn_probabilityconfidencetop_signal
C-92010.89highSupport escalation + declining order frequency
C-92020.14highStable multi-product engagement
C-92030.72mediumPeer accounts on same plan churning
C-92040.05highExpanding usage across 3 product lines
C-92050.91highPayment failure + zero logins in 30 days

When to use each algorithm: a decision framework

Do not pick an algorithm based on what performed best in someone else's blog post. Pick it based on your data and your team:

  • Use Logistic Regression when you need a fast baseline, full interpretability, or regulatory compliance that requires transparent coefficients. It is also the right sanity check before trying anything more complex.
  • Use Random Forest when you want a quick improvement over logistic regression without extensive hyperparameter tuning. Good for prototyping and for teams without deep ML experience.
  • Use XGBoost / LightGBM when your data is in a single flat table with well-engineered features and you want maximum accuracy on that table. This is the workhorse of production churn models on flat data.
  • Use Deep Learning (MLP/LSTM) only if you have sequential event data (clickstreams, session logs) and enough volume to justify the complexity. On flat churn tables, it rarely beats XGBoost.
  • Use KumoRFM when your data spans multiple relational tables, when you do not have weeks for feature engineering, or when you need to break past the 70% accuracy ceiling. It is the only approach that reads raw relational structure without manual joins.

Frequently asked questions

Which algorithm should I use to predict customer churn?

It depends on your data structure. If your customer data fits in a single flat table, XGBoost or LightGBM will give you the best accuracy-to-effort ratio - they consistently outperform logistic regression and random forest on tabular churn data. But if your data spans multiple tables (customers, orders, products, support tickets, usage logs), a graph neural network approach like KumoRFM will significantly outperform XGBoost because it reads cross-table patterns that flat-table models never see. On the SAP SALT benchmark, KumoRFM scored 91% vs 75% for expert-tuned XGBoost.

How do I build a churn prediction model on our customer data?

The traditional approach has 5 steps: (1) collect and join your customer data into a single table, (2) engineer features like recency, frequency, and monetary value, (3) split into train/test sets with temporal awareness, (4) train an XGBoost or LightGBM model, (5) evaluate with precision-recall and deploy. This works but plateaus around 65-70% accuracy because flattening your data loses cross-table signals. With KumoRFM, you skip steps 1-4 entirely: connect your data warehouse, write a one-line PQL query, and get predictions in seconds against your raw relational tables.

Why does my churn model plateau at 65-70% accuracy?

Most churn models plateau because they are trained on a single flat table that compresses rich customer behavior into aggregate numbers. A column like order_count=23 or avg_support_time=4.2min throws away the sequence, timing, and cross-entity relationships that actually predict churn. The customer who placed 5 orders last month but zero this month looks the same as the customer who places 2 orders every month - both have order_count=23 over a year. Breaking past 70% requires features from multiple hops across your relational data, which flat-table algorithms cannot access.

Is XGBoost good enough for churn prediction?

XGBoost is the right choice if two conditions are true: your data fits in one well-engineered table, and you have a data scientist who can build strong features. On flat tabular data, XGBoost and LightGBM are hard to beat. But for most real-world churn problems, the data that predicts churn lives across 4-6 tables, and the feature engineering to flatten it takes weeks. If you are spending more time on feature engineering than on modeling, the bottleneck is not your algorithm - it is your data representation.

How much data do I need to build a churn prediction model?

For a traditional XGBoost model, you typically need at least 5,000-10,000 customer records with labeled churn outcomes and 6-12 months of behavioral history to capture seasonal patterns. For KumoRFM, the data requirements are lower because the foundation model transfers knowledge from pre-training on thousands of relational datasets. It can produce useful zero-shot predictions even on smaller datasets, though accuracy improves with more data and more connected tables.

What features matter most for churn prediction?

The most predictive churn features fall into four categories: (1) engagement decline - dropping login frequency, fewer orders, shorter sessions, (2) support friction - increasing ticket volume, escalations, unresolved issues, (3) payment signals - failed charges, downgrade requests, billing disputes, and (4) peer behavior - when customers who bought similar products or share an account manager also churn. Categories 1-3 can be captured in a flat table with effort. Category 4 requires multi-table graph patterns that only relational models can discover automatically.

How is KumoRFM different from building a churn model in Python?

A Python churn model (scikit-learn, XGBoost, etc.) requires you to write SQL joins, compute features, handle temporal leakage, train the model, and maintain the pipeline. This takes 2-6 weeks per model and 878 lines of feature engineering code on average, according to the Stanford RelBench study. KumoRFM replaces that entire pipeline with a one-line PQL query. It reads your raw relational tables, discovers cross-table features automatically, and returns predictions in seconds - with 91% accuracy on enterprise benchmarks vs 75% for expert-built XGBoost models.

Can I use deep learning for churn prediction?

Standard deep learning (feed-forward networks, LSTMs) on flat churn tables rarely outperforms XGBoost. Neural networks need large datasets to shine, and most churn datasets are modest in size. The exception is graph neural networks, which operate on the relational structure of your data rather than a flat table. GNNs can capture multi-hop patterns (customer -> orders -> products -> other customers' behavior) that no flat-table algorithm can see. KumoRFM is a foundation model built on graph neural network architecture, which is why it reaches 91% on enterprise churn benchmarks.

See it in action

KumoRFM delivers predictions on relational data in seconds. No feature engineering, no ML pipelines. Try it free.