7 churn prediction algorithms compared
Here is every algorithm worth considering for churn prediction, from simplest to most capable. Each one has a sweet spot.
1. Logistic Regression
The simplest supervised classifier and a solid starting point. Logistic regression fits a linear decision boundary between churners and non-churners. It is fast to train, easy to explain to business teams, and works well as a baseline to measure other models against.
- Best for: establishing a baseline, regulated environments where every coefficient must be explainable, very small datasets where complex models overfit.
- Watch out for: cannot capture non-linear relationships or feature interactions. Accuracy typically caps at 55-65% on real churn data.
2. Decision Trees
A single decision tree splits customers into groups based on feature thresholds (e.g., "if tenure < 6 months AND support_tickets > 2, predict churn"). Easy to visualize and explain. Often used in marketing teams for segment-based churn rules.
- Best for: interpretability (you can draw the tree), small datasets, business rule extraction, teams without ML expertise.
- Watch out for: single trees overfit easily and have lower accuracy (58-64%) than ensemble methods. Rarely used in production without bagging or boosting.
3. Random Forest
An ensemble of hundreds of decision trees, each trained on a random subset of data and features. The ensemble averages out the overfitting problem of single trees and handles non-linear relationships automatically.
- Best for: quick iteration with minimal tuning, solid accuracy (60-68%) without much feature engineering, robust to noisy data and outliers.
- Watch out for: slower to train than XGBoost on large datasets. Does not capture sequential or temporal patterns. Accuracy ceiling is lower than gradient boosting.
4. XGBoost / LightGBM / CatBoost
Gradient boosted trees are the gold standard for flat tabular data. They sequentially build trees that correct the errors of previous trees, producing highly accurate models with good feature engineering. XGBoost has won more Kaggle competitions than any other algorithm.
- Best for: maximum accuracy on a single flat table (65-75% on churn data), production deployment (fast inference, small model files), teams with feature engineering expertise.
- Watch out for: requires significant feature engineering to reach peak accuracy. Performance depends heavily on the quality of hand-crafted features. Needs hyperparameter tuning (learning rate, depth, regularization) to avoid overfitting.
5. Support Vector Machines (SVMs)
SVMs find the optimal separating boundary between churners and non-churners. They work well in high-dimensional spaces and can capture non-linear patterns with kernel functions.
- Best for: small-to-medium datasets with many features, problems where the decision boundary is clear, academic benchmarks.
- Watch out for: scales poorly to large datasets (training time grows quadratically). Difficult to tune kernel parameters. Accuracy (60-68%) is comparable to random forest but with more effort. Largely superseded by XGBoost in practice.
6. Deep Learning (MLPs, LSTMs, Transformers)
Neural networks can capture complex non-linear patterns. LSTMs are useful for sequential event data (login sequences, purchase timelines). On flat churn tables, they rarely beat XGBoost by more than 1-2 points.
- Best for: large datasets (100K+ rows), sequential event data where order matters, teams with deep learning infrastructure already in place.
- Watch out for: requires more data, more tuning, and more compute than XGBoost. Accuracy on flat tables: 66-76%. Not worth the added complexity unless you have sequential event streams or very large data.
7. Graph Neural Networks / KumoRFM
GNNs operate on the relational structure of your data, not a flattened version of it. They read patterns across customers, orders, products, support tickets, and usage logs without requiring manual joins or feature engineering. KumoRFM is a foundation model built on GNN architecture, pre-trained on thousands of relational databases.
- Best for: data spanning multiple tables (which is most real-world churn data), maximum accuracy (76-91% on enterprise benchmarks), teams that want to skip feature engineering entirely.
- Watch out for: requires relational data with foreign keys between tables. If your data truly fits in one flat table with no cross-table signals, XGBoost is simpler and competitive. KumoRFM is enterprise SaaS (not open-source), though the free tier at kumorfm.ai covers basic use cases.
churn_algorithm_comparison
| algorithm | typical_accuracy | data_requirement | tuning_effort | best_for |
|---|---|---|---|---|
| Logistic Regression | 55-65% | Single flat table | Minimal | Baseline, compliance, interpretability |
| Decision Trees | 58-64% | Single flat table | Minimal | Visualization, business rules, small data |
| Random Forest | 60-68% | Single flat table | Low | Quick iteration, robust to noise |
| SVM | 60-68% | Single flat table | High (kernels) | Small data with many features |
| XGBoost / LightGBM | 65-75% | Flat table + engineered features | Medium-High | Best flat-table accuracy |
| Deep Learning | 66-76% | Large flat table or event sequences | High | Sequential data, very large datasets |
| GNN / KumoRFM | 76-91% | Multiple relational tables | None (zero-shot) | Multi-table data, maximum accuracy |
7 algorithms compared. Accuracy ranges based on published benchmarks and practitioner experience. Ranges for individual algorithms are typical estimates; GNN/KumoRFM range is from published RelBench (76.71) and SAP SALT (91%) results. All other models operate on a single flat table.
The honest take: when XGBoost is enough
XGBoost and LightGBM are excellent algorithms. If these two conditions are true, use them and stop reading:
- Your churn data fits in a single, well-engineered table with all the features you need already computed.
- You have a data scientist who can build strong features - recency, frequency, monetary value, engagement trends, support interactions - and maintain the pipeline over time.
On clean, flat, well-featured data, XGBoost is hard to beat. The problem is that most real-world churn data does not meet those conditions.
Why churn signals live across multiple tables
A customer's churn risk is not a property of the customer row alone. It depends on what they bought, how they used it, what support interactions they had, whether similar customers churned, and how those patterns changed over time. That information lives across 4-6 tables in a typical enterprise database:
- Customers - demographics, segment, tenure, plan tier
- Orders / Transactions - purchase history, frequency, recency, value
- Products / SKUs - what they bought, category, margin, return rate
- Support tickets - issue types, resolution time, escalations, sentiment
- Usage / Engagement logs - login frequency, feature adoption, session depth
- Payments - failed charges, downgrades, billing disputes
To train an XGBoost model, someone has to join all of these tables, compute aggregate features, and flatten everything into one row per customer. That flattening step destroys information.
what_flattening_destroys (churn example)
| signal | flat_table_sees | relational_model_sees |
|---|---|---|
| Purchase pattern | order_count=23, avg_value=$142 | 5 orders last month, 0 this month - declining trajectory |
| Product mix shift | num_categories=3 | Shifted from premium to budget products over 8 weeks |
| Support escalation | tickets=3, avg_resolve=4.2d | 3 tickets in 2 weeks, last one escalated, still open |
| Peer churn pattern | Not visible | 4 of 7 customers on same plan with same rep churned last quarter |
| Cross-product engagement | active=true | Stopped using 2 of 3 product modules, only logs in for billing |
| Account health trajectory | Static snapshot only | NPS dropped 40 points over 3 surveys while usage fell 60% |
Each row shows a real churn signal. The flat table captures a number. The relational model captures the pattern, sequence, and cross-entity context that actually predicts whether this customer will leave.
How to build a churn prediction model in 5 steps
If you are building a traditional churn model, here is the standard approach. This is the right way to do it - and understanding why it plateaus will clarify when you need something different.
- Collect and join your data. Write SQL to join your customer table with orders, support, usage, and payments. Apply temporal filters so you do not leak future data into training features. For 5 tables with point-in-time correctness, expect 100-200 lines of SQL.
- Engineer features. Compute RFM (recency, frequency, monetary) features, engagement metrics, support interaction counts, trend features (change in order frequency over 30/60/90 days), and any domain-specific signals. This typically produces 50-200 features and takes 40-60% of total project time.
- Split with temporal awareness. Use a time-based split, not a random split. Train on data before a cutoff date, validate on the period after. Random splits cause data leakage that inflates accuracy by 5-15 points and produces models that fail in production.
- Train XGBoost or LightGBM. Use cross-validation to tune hyperparameters (learning rate, max depth, min child weight, subsample ratio). Evaluate with precision-recall curves, not just accuracy - because churn datasets are imbalanced and accuracy is misleading when 95% of customers do not churn.
- Deploy and maintain. Set up a feature pipeline that recomputes features on a schedule, retrain the model monthly or quarterly, monitor for data drift, and maintain the SQL joins as source schemas change. Budget 20-30% of initial build time for ongoing maintenance per quarter.
Traditional churn model pipeline
- Write SQL to join 4-6 tables (100-200 lines)
- Engineer 50-200 features manually (40-60% of project time)
- Handle temporal leakage and point-in-time correctness
- Train XGBoost, tune hyperparameters via cross-validation
- Deploy feature pipeline + model retraining on schedule
- Maintain SQL joins when source schemas change
- Total timeline: 3-8 weeks to first production model
KumoRFM approach
- Connect Kumo to your data warehouse (one-time setup)
- Write one PQL query defining the prediction target
- KumoRFM reads raw relational tables directly
- No feature engineering, no joins, no model selection
- Predictions in ~1 second (zero-shot) or minutes (fine-tuned)
- No feature pipeline to maintain
- Total timeline: minutes to first prediction
Why most churn models plateau at 65-70% accuracy
If you follow the 5 steps above with XGBoost, you will likely land between 65% and 70% accuracy. That is not a failure of execution - it is a ceiling imposed by the data representation.
The flat table cannot contain:
- Multi-hop relationships. A customer's churn risk depends on what similar customers did. "Similar" means customers who bought the same products, used the same features, or share an account manager. That is a 3-hop pattern: customer -> orders -> products -> other customers' outcomes. No amount of feature engineering on a flat table captures this.
- Temporal sequences across tables. A customer whose support tickets are increasing while their order frequency is decreasing and their usage of premium features has stopped - that three-table temporal pattern is a strong churn signal. A flat table collapses each into a single number.
- Network effects. When a key user at a B2B account leaves, the other users on that account often follow. When a product line gets negative reviews from multiple customers, churn accelerates across that cohort. These are graph patterns that exist in the relationships between entities, not in any single row.
How KumoRFM reaches 91% accuracy on enterprise churn data
KumoRFM is a relational foundation model - pre-trained on thousands of relational databases to understand the patterns that exist across connected tables. When it predicts churn, it does three things that flat-table algorithms cannot:
- Reads raw relational tables directly. No joins, no flattening, no feature engineering. It takes your customers, orders, products, support, and usage tables as-is and constructs a graph that preserves every relationship and temporal sequence.
- Discovers multi-hop patterns automatically. It finds signals like "customers who bought products in this category and then contacted support about shipping issues churned at 3x the base rate." These patterns span 3-4 tables and would require a data scientist to hypothesize and manually encode them as features.
- Transfers knowledge from pre-training. Because KumoRFM was pre-trained on thousands of relational databases, it already understands common patterns (declining engagement predicts churn, support escalations predict churn, peer behavior predicts churn). It applies that knowledge zero-shot to your data, which is why it scores 91% on the SAP SALT benchmark without any task-specific training.
sap_salt_churn_benchmark
| approach | accuracy | feature_engineering_time | lines_of_code |
|---|---|---|---|
| LLM + AutoML | 63% | Hours (LLM-generated) | LLM-generated |
| PhD Data Scientist + XGBoost | 75% | Weeks | 878+ lines |
| KumoRFM (zero-shot) | 91% | 0 | 0 |
SAP SALT benchmark on enterprise data. KumoRFM outperforms expert data scientists with hand-tuned XGBoost by 16 percentage points - with no feature engineering and no training time.
relbench_benchmark_churn_relevant
| approach | AUROC | feature_engineering | code |
|---|---|---|---|
| LightGBM + manual features | 62.44 | 12.3 hours/task | 878 lines |
| AutoML + manual features | ~64-66 | Reduced hours/task | 878 lines |
| KumoRFM zero-shot | 76.71 | ~1 second | 0 lines |
| KumoRFM fine-tuned | 81.14 | Minutes | 0 lines |
RelBench benchmark (7 databases, 30 tasks, 103M rows). The 14+ AUROC point gap between LightGBM and KumoRFM zero-shot comes entirely from cross-table patterns the flat table never contains.
Predicting churn with PQL: one query, no pipeline
PQL (Predictive Query Language) is how you tell KumoRFM what to predict. Instead of building a feature pipeline, writing joins, and training a model, you write a query that looks like SQL but defines a prediction target:
PQL Query
PREDICT churn_90d FOR EACH customers.customer_id WHERE customers.segment = 'enterprise' AND customers.tenure_months > 3
This single query replaces the entire traditional pipeline: the SQL joins across 4-6 tables, the feature engineering code, the model selection, and the training loop. KumoRFM reads raw customers, orders, products, support_tickets, and usage_logs tables to generate predictions.
Output
| customer_id | churn_probability | confidence | top_signal |
|---|---|---|---|
| C-9201 | 0.89 | high | Support escalation + declining order frequency |
| C-9202 | 0.14 | high | Stable multi-product engagement |
| C-9203 | 0.72 | medium | Peer accounts on same plan churning |
| C-9204 | 0.05 | high | Expanding usage across 3 product lines |
| C-9205 | 0.91 | high | Payment failure + zero logins in 30 days |
When to use each algorithm: a decision framework
Do not pick an algorithm based on what performed best in someone else's blog post. Pick it based on your data and your team:
- Use Logistic Regression when you need a fast baseline, full interpretability, or regulatory compliance that requires transparent coefficients. It is also the right sanity check before trying anything more complex.
- Use Random Forest when you want a quick improvement over logistic regression without extensive hyperparameter tuning. Good for prototyping and for teams without deep ML experience.
- Use XGBoost / LightGBM when your data is in a single flat table with well-engineered features and you want maximum accuracy on that table. This is the workhorse of production churn models on flat data.
- Use Deep Learning (MLP/LSTM) only if you have sequential event data (clickstreams, session logs) and enough volume to justify the complexity. On flat churn tables, it rarely beats XGBoost.
- Use KumoRFM when your data spans multiple relational tables, when you do not have weeks for feature engineering, or when you need to break past the 70% accuracy ceiling. It is the only approach that reads raw relational structure without manual joins.