If you are on Databricks, you already have the hardest part of data infrastructure figured out. Your data lands in Delta Lake. Unity Catalog governs access. Spark handles compute. Notebooks let your team explore and transform data.
But when it comes time to add predictive ML, the options multiply and the complexity returns. Do you use Databricks AutoML? Write custom models with MLflow? Try the new Genie Code agent? Bring in DataRobot or H2O? Build a Feature Store pipeline?
Each approach makes different trade-offs on the same fundamental question: who builds the features? The answer to that question determines whether your first prediction takes minutes or months.
The headline result: SAP SALT benchmark
The SAP SALT benchmark is an enterprise-grade evaluation where real business analysts and data scientists attempt prediction tasks on SAP enterprise data. It measures how accurately different approaches predict real business outcomes on production-quality enterprise databases with multiple related tables.
sap_salt_enterprise_benchmark
| approach | accuracy | what_it_means |
|---|---|---|
| LLM + AutoML | 63% | Language model generates features, AutoML selects model |
| PhD Data Scientist + XGBoost | 75% | Expert spends weeks hand-crafting features, tunes XGBoost |
| KumoRFM (zero-shot) | 91% | No feature engineering, no training, reads relational tables directly |
SAP SALT benchmark: KumoRFM outperforms expert data scientists by 16 percentage points and LLM+AutoML by 28 percentage points on real enterprise prediction tasks.
KumoRFM scores 91% where PhD-level data scientists with weeks of feature engineering and hand-tuned XGBoost score 75%. The 16 percentage point gap is the value of reading relational data natively instead of flattening it into a single table.
databricks_ml_options_compared
| Option | Reads Delta Tables | Feature Engineering Required | Multi-Table Native | Autonomous | Time to First Prediction | Best For |
|---|---|---|---|---|---|---|
| Kumo.ai (Lakehouse App) | Yes, natively via Unity Catalog | None | Yes | Yes | Minutes | Multi-table predictions at scale |
| Databricks AutoML | Single table only | Full (joins + aggregations) | No | Partial (model selection only) | Days to weeks | Single-table problems with existing features |
| Databricks Genie Code | Yes (generates code to read them) | AI-generated (still flat-table) | No | Workflow only | Hours to days | Accelerating notebook-based workflows |
| MLflow + custom code | Yes (manual Spark reads) | Full (manual pipelines) | No | No | Weeks to months | Full control with experienced ML team |
| DataRobot on Databricks | Via connector | Full (requires flat features) | No | Partial (model selection only) | Days to weeks | Enterprise AutoML with governance |
| H2O Sparkling Water | Via Spark integration | Full (manual pipelines) | No | Partial | Weeks | Spark-native distributed training |
| Feature Store + AutoML | Feature Store reads Delta | Full (most complex setup) | No | Partial | Weeks to months | Mature orgs with dedicated ML platform team |
Highlighted: Kumo.ai is the only option that reads multiple Delta tables natively and generates predictions without feature engineering. Every other approach requires building a flat feature table first.
The feature engineering divide
Every option in the table above falls into one of two categories: approaches that require you to build a flat feature table from your Delta tables, and approaches that read your relational Delta tables directly.
Six of the seven options require a flat feature table. That means someone on your team has to write the Spark SQL or PySpark to join your customers table with your orders table with your products table, compute aggregations like avg_order_value_last_90d and count_support_tickets_last_30d, handle temporal leakage, and produce one row per entity. This is the step that consumes 80% of the effort in every ML project.
Option 1: Kumo.ai as a Lakehouse App
Kumo.ai is available in the Databricks Marketplace as a Lakehouse App. The integration path is: install from marketplace, connect to Unity Catalog, write a PQL query, get predictions back as a Delta table.
What makes Kumo different from every other option is what happens under the hood. Kumo's relational foundation model reads your Delta tables as a temporal heterogeneous graph. Each row in each table becomes a node. Each foreign key becomes an edge. Timestamps are preserved. The model discovers predictive patterns across tables, time windows, and relationship hops without any feature engineering.
PQL Query
PREDICT churn FOR EACH unity_catalog.sales.customers.customer_id WHERE customers.status = 'active'
This PQL query reads directly from Delta tables registered in Unity Catalog. Kumo's foundation model traverses the relational structure (customers, orders, products, support_tickets) and generates churn predictions without any feature engineering, joins, or aggregations.
Output
| customer_id | churn_probability | key_signals | confidence |
|---|---|---|---|
| C-4401 | 0.89 | Order frequency declining + support tickets rising | High |
| C-4402 | 0.12 | Stable purchase pattern + no support issues | High |
| C-4403 | 0.67 | Category shift + payment method changed | Medium |
| C-4404 | 0.03 | Increasing order value + new product adoption | High |
No notebooks. No feature tables. No Spark jobs to maintain. The predictions land in a Delta table that any downstream process (dashboards, reverse ETL, operational systems) can consume directly.
Option 2: Databricks AutoML
Databricks AutoML is built into the workspace. You point it at a single table, it tries multiple algorithms (LightGBM, XGBoost, sklearn, Prophet), tunes hyperparameters, and produces a notebook with the winning model. It is genuinely good at model selection.
The limitation is the input requirement: a single flat table. If your prediction depends on patterns across customers, orders, and products, you must join and aggregate those tables yourself before AutoML sees the data. AutoML automates the last 20% of the pipeline. The first 80% (feature engineering) remains manual.
Option 3: Databricks Genie Code
Genie Code is Databricks' new AI agent that generates notebook code. You describe what you want in natural language, and Genie writes PySpark, SQL, and ML code to accomplish it. It can generate feature engineering code, training scripts, and evaluation logic.
This is a genuine productivity improvement. Instead of writing feature pipelines by hand, you describe them and Genie writes the code. But the underlying approach is unchanged: Genie still produces a flat feature table and trains a single-table model. It automates the workflow (writing code, running notebooks). It does not automate the prediction (understanding relational structure).
Genie Code (automates the workflow)
- Generates PySpark code to join tables
- Writes feature engineering logic
- Produces a flat feature table
- Trains a single-table model
- Still requires human review of generated features
Kumo.ai (automates the prediction)
- Reads Delta tables as relational graph
- Discovers cross-table patterns automatically
- No flat feature table needed
- Foundation model understands relational structure
- Predictions in minutes with zero code review
Option 4: MLflow + custom models
MLflow is the backbone of ML operations on Databricks. It tracks experiments, versions models, manages artifacts, and handles deployment. If you have a strong ML team that wants full control, MLflow + custom PySpark/sklearn/PyTorch code gives you maximum flexibility.
The trade-off is effort. Your team writes the feature pipelines, selects the algorithms, tunes hyperparameters, and maintains everything. MLflow tracks all of this beautifully. But the 80% of time spent on feature engineering happens before MLflow enters the picture. MLflow tracks what you built. It does not build it for you.
where_mlflow_time_actually_goes
| Stage | Hours per task | % of total | MLflow helps? |
|---|---|---|---|
| Delta table joins & prep | 2.5 hours | 17% | No |
| Feature computation (Spark) | 5.0 hours | 34% | No |
| Feature iteration & selection | 4.2 hours | 29% | Tracks experiments only |
| Model training & tuning | 1.8 hours | 12% | Yes (full tracking) |
| Evaluation & deployment | 1.2 hours | 8% | Yes (model registry) |
Highlighted: 80% of the work happens before MLflow's tracking capabilities become relevant. MLflow is excellent infrastructure for the last 20%. It does not address the first 80%.
Option 5: DataRobot on Databricks
DataRobot integrates with Databricks via Spark connectors and can read from Unity Catalog. It brings enterprise AutoML with strong governance, explainability, and deployment features. Like Databricks AutoML, it requires a flat feature table as input.
DataRobot adds value over native Databricks AutoML in model governance, monitoring, and compliance documentation. But the core limitation is the same: it optimizes over a pre-engineered feature table. Cross-table patterns that were not manually encoded as features are invisible to DataRobot.
Option 6: H2O Sparkling Water
H2O Sparkling Water runs H2O's algorithms directly on Spark clusters. This gives you distributed training at scale without moving data out of Databricks. The integration is mature and well-tested.
Like every other option except Kumo, H2O requires a flat feature table. You write PySpark to join and aggregate your Delta tables, then H2O trains models on the result. The feature engineering bottleneck remains fully manual.
Option 7: Feature Store + AutoML
Databricks Feature Store (now part of Unity Catalog) lets you define, compute, and serve features as managed tables. Combined with AutoML, this is the most "Databricks-native" approach to production ML.
It is also the most complex. You define feature tables, write compute functions, schedule refresh jobs, manage point-in-time correctness, handle feature serving, and then feed the feature table to AutoML. This is the right approach for organizations with dedicated ML platform teams and dozens of models in production. For teams trying to get their first prediction live, it is months of infrastructure work before the first model trains.
The real question: who builds the features?
Every approach in this guide answers a slightly different question. But they all come back to the same bottleneck: converting your multi-table Delta Lake data into a flat feature table that a model can consume.
who_builds_the_features
| Approach | Who writes the feature code? | Feature engineering effort |
|---|---|---|
| Kumo.ai | Nobody (foundation model reads raw tables) | Zero |
| Databricks AutoML | Your data scientists | Full manual effort |
| Genie Code | AI generates code, humans review | Reduced but not eliminated |
| MLflow + custom | Your ML engineers | Full manual effort |
| DataRobot | Your data scientists | Full manual effort |
| H2O Sparkling Water | Your ML engineers | Full manual effort |
| Feature Store + AutoML | Your ML platform team | Full manual effort (most structured) |
Highlighted: Kumo is the only option where nobody writes feature engineering code. The foundation model discovers predictive patterns directly from your relational Delta tables.
How Kumo reads your lakehouse differently
To understand why Kumo eliminates the feature engineering step, consider what the other tools see versus what Kumo sees when pointed at the same Unity Catalog tables.
what_automl_sees_vs_what_kumo_sees
| Delta table | What AutoML/MLflow/DataRobot see | What Kumo's foundation model sees |
|---|---|---|
| customers | Source table for manual joins | Entity nodes with temporal attributes |
| orders | Source table for aggregation SQL | Event nodes linked to customers and products |
| products | Source for one-hot encoding | Attribute nodes with category relationships |
| support_tickets | Source for count/recency features | Signal nodes with temporal patterns |
| Relationships between tables | Invisible (lost in flattening) | Graph edges preserving full relational structure |
Every other tool requires you to flatten the relational structure into a single table, losing cross-table patterns in the process. Kumo preserves the full relational structure as a temporal graph.
When to use each option
The right choice depends on your team, your data, and your timeline:
- Kumo.ai Lakehouse App: You have multi-table Delta data and want predictions without building feature pipelines. You want your first prediction in minutes, not months. Your team's time is better spent on business problems than feature engineering.
- Databricks AutoML: You already have a flat feature table or single-table data. You want a quick baseline model with minimal setup. Your data does not require multi-table joins.
- Genie Code: You want AI assistance writing notebook code. Your team is comfortable reviewing generated code. You want to accelerate existing notebook-based workflows.
- MLflow + custom: You have a strong ML team that wants full control. You need custom model architectures or domain-specific feature engineering. You already have feature pipelines in production.
- DataRobot: You need enterprise governance and compliance documentation on top of AutoML. Your organization has regulatory requirements for model explainability.
- H2O Sparkling Water: You need distributed training at scale on Spark. Your team has H2O expertise.
- Feature Store + AutoML: You have a dedicated ML platform team, dozens of models in production, and the resources to build and maintain feature infrastructure.
PQL Query
PREDICT fraud_flag FOR EACH unity_catalog.payments.transactions.txn_id WHERE transactions.timestamp > '2026-03-01'
Fraud detection on Delta tables with a single PQL query. Kumo's foundation model reads transactions, accounts, merchants, and device tables from Unity Catalog, discovers cross-table anomaly patterns, and returns fraud probabilities. No Spark feature pipeline required.
Output
| txn_id | fraud_probability | risk_tier | tables_used |
|---|---|---|---|
| T-88201 | 0.94 | Critical | transactions, accounts, merchants, devices |
| T-88202 | 0.07 | Low | transactions, accounts |
| T-88203 | 0.71 | High | transactions, accounts, merchants |
| T-88204 | 0.02 | Low | transactions, accounts |
The bottom line
Databricks has built the best data lakehouse platform in the industry. But a data platform is not a prediction platform. Adding ML predictions still requires choosing who builds the features and how models get trained.
Six of the seven options on this page require you to solve the feature engineering problem yourself (manually, with AI code generation, or through Feature Store infrastructure). One option eliminates it entirely by reading your relational Delta tables as they are.
If your team has been spending weeks or months building feature pipelines before any model trains, the issue is not which AutoML tool you use on the flat table at the end. The issue is that you are building the flat table at all.