What does AutoML automate?

AutoML automates model selection, hyperparameter tuning, and sometimes feature selection from a pre-built feature table. Tools like DataRobot, H2O, and Google AutoML take a flat table as input, try many model architectures (XGBoost, LightGBM, neural nets, ensembles), tune their parameters, and return the best-performing model. They automate the last 20% of the ML pipeline.

What does a foundation model automate?

A foundation model for relational data automates feature discovery, the first 80% of the ML pipeline that AutoML leaves manual. Models like KumoRFM read raw relational tables directly, discover predictive patterns across multiple tables, time windows, and relationship hops, and generate predictions without any feature engineering or model training.

Can I use AutoML and foundation models together?

In principle yes, but in practice a foundation model makes AutoML redundant for most use cases. AutoML's value is automating model selection, but a foundation model already includes the model. If you use KumoRFM, you skip both feature engineering and model selection. The main exception is if you want to use foundation model outputs (embeddings or predictions) as features in an ensemble with traditional models.

Does AutoML solve the feature engineering bottleneck?

No. AutoML tools require a pre-engineered flat feature table as input. They cannot read raw relational databases, discover multi-table patterns, or engineer features from joins and aggregations. The 12.3 hours and 878 lines of code that data scientists spend on feature engineering per task remain entirely manual when using AutoML.

Why is feature discovery harder than model selection?

Model selection is a finite search over a known set of architectures and hyperparameters. Feature discovery is a combinatorial search over all possible joins, columns, aggregations, time windows, and interactions in a relational database. For a database with 5 tables and 50 columns, there are 1,200+ possible first-order features and 700,000+ pairwise interactions. Model selection has perhaps 50-100 configurations to try. Feature discovery has an effectively infinite space.

AutoML vs Foundation Models: Why AutoML Can't Fix the Real Bottleneck | Kumo.ai

AutoML was supposed to democratize machine learning. The pitch was compelling: upload your data, click a button, get a model. DataRobot, H2O, Google AutoML, and Amazon SageMaker Autopilot all promised to replace the ML expert with software.

The tools work. They do a genuinely good job of selecting the right model architecture, tuning hyperparameters, and building ensembles. On Kaggle-style benchmarks with clean, pre-engineered feature tables, AutoML platforms often match or beat what a mid-level data scientist produces.

But enterprise adoption has not matched the hype. Gartner reported in 2024 that while 75% of enterprises have evaluated AutoML, fewer than 20% use it as their primary ML workflow. The reason is simple: AutoML solves the wrong bottleneck.

ml_pipeline_time_breakdown

pipeline_stage	time_spent	% of total	automated_by_AutoML	automated_by_FM
Data extraction & joining	2.8 hours	18%	No	Yes
Feature computation	5.1 hours	33%	No	Yes
Feature selection & iteration	4.4 hours	29%	No	Yes
Model selection & tuning	1.8 hours	12%	Yes	Yes
Evaluation & validation	1.2 hours	8%	Partial	Partial
Total	15.3 hours	100%	12-20%	80-92%

Highlighted: the first three stages (feature engineering) consume 80% of time. AutoML automates none of them. Foundation models automate all of them.

automl_vs_foundation_model_accuracy

approach	AUROC	what_it_automates	human_hours_per_task
LightGBM + manual features	62.44	Nothing	12.3
AutoML + manual features	~64-66	Model selection only	10.5
AutoML + Featuretools	~66-68	Model selection + basic features	4.2
KumoRFM zero-shot	76.71	Everything	0.001
KumoRFM fine-tuned	81.14	Features + model + adaptation	0.1

Highlighted: the 10+ AUROC point gap between AutoML approaches and KumoRFM is the difference between automating model selection and automating feature discovery. The harder problem yields the bigger improvement.

The ML pipeline has two bottlenecks

A standard enterprise ML pipeline has two labor-intensive stages:

Feature engineering (joining tables, computing aggregations, encoding variables, building a flat feature table)
Model selection and tuning (choosing an algorithm, tuning hyperparameters, building ensembles, evaluating results)

The Stanford RelBench study measured how data scientists spend their time: 80% on feature engineering (12.3 hours, 878 lines of code) and 20% on modeling. AutoML automates the 20%. Foundation models automate the 80%.

What AutoML actually does

To understand the gap, you need to be precise about what AutoML automates and what it leaves manual.

What AutoML automates

Algorithm selection. AutoML tries multiple model types (XGBoost, LightGBM, random forest, logistic regression, neural networks) and picks the best performer. A human would typically try 2-3 algorithms. AutoML tries 10-20.
Hyperparameter tuning. AutoML uses Bayesian optimization or grid search to find optimal hyperparameters (learning rate, tree depth, regularization). This saves a few hours of manual work.
Ensemble building. AutoML builds stacked ensembles that combine multiple models. This often yields a 1-3% accuracy improvement over any single model.
Basic preprocessing. Some AutoML tools handle missing values, one-hot encoding, and normalization automatically.

What AutoML does not automate

Table joins. AutoML cannot read a relational database with multiple tables. It needs a single flat table as input. Someone has to write the SQL to join customers, orders, products, and support tickets into one row per entity.
Feature computation. AutoML does not compute avg_order_value_last_90d or days_since_last_login. Those aggregations must already exist as columns in the input table.
Multi-hop pattern discovery. AutoML cannot discover that a customer's churn risk depends on the return rates of products they bought, because it never sees the products table.
Temporal sequence preservation. AutoML consumes a static feature table. The temporal dynamics (accelerating purchase frequency, declining engagement over weeks) are only present if someone pre-computed them as features.

What foundation models actually do

A relational foundation model like KumoRFM solves the problem that AutoML skips. It reads raw relational tables directly, without any feature engineering.

How it works

KumoRFM represents your database as a temporal heterogeneous graph. Each row in each table becomes a node. Each foreign key relationship becomes an edge. Timestamps are preserved as temporal attributes on nodes and edges.

what_automl_receives (flat feature table)

lead_id	emails_opened	pages_viewed	days_since_signup	company_size	title_rank
L-301	12	8	45	500	3 (VP)
L-302	4	22	30	200	1 (Engineer)
L-303	0	1	90	5000	5 (CTO)

AutoML receives this pre-built flat table and searches for the best model to fit it. It tries XGBoost, LightGBM, neural nets, ensembles. It never sees the raw CRM tables underneath.

what_the_foundation_model_reads (raw relational tables)

table	example_data_for_L-302	signal_invisible_to_AutoML
contacts	4 contacts from 3 departments active	Multi-threaded account engagement
activities	Blog > Case study > API docs > Demo (in sequence)	Buying-stage content progression
opportunities	Similar account closed $210K last quarter	Account similarity to past wins
accounts	Company raised Series B 30 days ago	Firmographic momentum

The foundation model reads all four tables directly. It discovers that L-302 has a multi-threaded buying committee, a textbook content progression, and account similarity to past closed-won deals. None of these signals exist in the flat table AutoML receives.

A graph transformer processes this structure by passing messages along edges (foreign key relationships), learning which cross-table patterns are predictive. Multi-hop patterns (customer → orders → products → returns) are captured naturally because information propagates through the graph layer by layer.

Because KumoRFM is pre-trained on thousands of diverse databases, it has already learned the universal patterns that recur across relational data: recency effects, frequency dynamics, temporal decay, graph topology signals. At inference time, it applies these learned patterns to your database without any task-specific training.

AutoML

Requires flat feature table as input
Automates model selection and tuning
Cannot discover cross-table patterns
Cannot handle temporal sequences
Solves 20% of the pipeline

Foundation model (KumoRFM)

Reads raw relational tables directly
Automates feature discovery and modeling
Discovers multi-hop cross-table patterns
Preserves temporal dynamics natively
Solves 100% of the pipeline

The accuracy gap

The difference between these approaches shows up directly in accuracy. On the RelBench benchmark (7 databases, 30 tasks, 103 million rows):

Approach	AUROC (classification)	What it automates
LightGBM + manual features	62.44	Nothing (fully manual)
AutoML + manual features	~64-66 (estimated)	Model selection only
KumoRFM zero-shot	76.71	Features + model + training
KumoRFM fine-tuned	81.14	Features + model (fine-tuning adds task adaptation)

AutoML can squeeze 2-4 AUROC points out of the same feature table that LightGBM uses, by trying more algorithms and better hyperparameters. But the gap between a well-tuned model on manual features (~64-66) and a foundation model on raw relational data (76.71) is over 10 points.

That 10-point gap is not about model architecture. It is about data. The foundation model sees the full relational structure. The AutoML model sees whatever features someone decided to build.

PQL Query

PREDICT conversion
FOR EACH leads.lead_id
WHERE leads.status = 'open'

One query to the foundation model replaces the entire AutoML pipeline: data extraction, feature engineering, model selection, hyperparameter tuning, and ensemble building. The model reads raw CRM tables directly.

Output

lead_id	conversion_prob	approach_comparison	accuracy_delta
L-2201	0.84 (FM)	0.71 (AutoML)	+13 points
L-2202	0.23 (FM)	0.38 (AutoML)	FM correctly lower
L-2203	0.91 (FM)	0.62 (AutoML)	+29 points
L-2204	0.11 (FM)	0.14 (AutoML)	Both correctly low

cost_at_scale (20 prediction tasks)

cost_dimension	AutoML approach	foundation_model	savings
Feature engineering hours	210 hours	0 hours	210 hours
Model selection hours	0 hours (automated)	0 hours	—
Pipeline maintenance (annual)	520 hours	20 hours	500 hours
Data scientist headcount needed	3-4 FTEs	0.5 FTE	2.5-3.5 FTEs
Time to new prediction task	2-4 weeks	Minutes	99%+ reduction
Total annual cost	$650K-$900K	$80K-$120K	$570K-$780K

Highlighted: at 20 prediction tasks, the foundation model approach costs 85% less than AutoML + manual features. The savings come entirely from eliminating the feature engineering that AutoML leaves manual.

Why the difference matters at scale

For a single, well-defined prediction task with a dedicated data science team and months of time, AutoML provides modest value. The team builds the features, AutoML picks the model, and you save a few days of tuning.

But enterprises do not have one prediction task. They have dozens. Churn, upsell, cross-sell, fraud, credit risk, demand forecasting, personalization, campaign targeting. Each task needs its own feature engineering pipeline.

The cost arithmetic

With AutoML, each task still costs 12.3 hours of feature engineering. For 20 prediction tasks, that is 246 hours of senior data scientist time, roughly 6 person-weeks, just on feature engineering. AutoML saves maybe 20% on top of that (the modeling time), bringing total time to perhaps 260 hours instead of 310.

With a foundation model, each task costs seconds. For 20 prediction tasks, you spend less than a minute on predictions and the rest of your time on problem framing, evaluation, and deployment. The total time drops from 260 hours to maybe 20 hours of human work.

Where AutoML still has a role

AutoML is not useless. There are specific situations where it delivers real value:

Single-table problems. If your data is already in a flat table (no multi-table joins needed), AutoML skips the feature engineering bottleneck because there is no feature engineering to do. Kaggle-style classification on a single CSV is AutoML's sweet spot.
Mature feature stores. If your organization has already invested in a comprehensive feature store with hundreds of curated features, AutoML can efficiently select and tune models on those features. You have already paid the feature engineering cost.
Rapid prototyping on flat data. For quick experiments where the data is already flat and the goal is directional (not production accuracy), AutoML gives you an answer in minutes.

The fundamental difference

AutoML and foundation models solve different problems. AutoML asks: "Given this feature table, what is the best model?" Foundation models ask: "Given this database, what are the best predictions?"

The first question assumes that someone has already converted the raw relational data into features. The second question starts from raw data. The first question is a search over model configurations. The second is a search over the full relational pattern space.

If your bottleneck is model selection, AutoML is the right tool. But for most enterprises, the bottleneck has never been model selection. It is the 12.3 hours of feature engineering that come before the model ever sees the data.

Foundation models do not make AutoML better. They make it unnecessary. When the model reads raw relational data directly, there is no feature table to optimize over and no model selection to automate. The entire pipeline collapses into a single step: ask a question, get a prediction.

Key Takeaways

1AutoML automates model selection (20% of the pipeline). Foundation models automate feature discovery (80% of the pipeline). These are different problems, and feature discovery is the harder one.
2On RelBench, AutoML + manual features scores ~64-66 AUROC. KumoRFM zero-shot scores 76.71. The 10+ point gap is the value of automating feature discovery, not model selection.
3At scale (20 prediction tasks), AutoML + manual features costs $650K-$900K annually in data science time. A foundation model approach costs $80K-$120K. The 85% savings come from eliminating feature engineering.
4AutoML still has a role for single-table problems, mature feature stores, and rapid prototyping on flat data. But for multi-table relational data, it solves the wrong bottleneck.
5Foundation models do not make AutoML better. They make it unnecessary. When the model reads raw relational data directly, there is no feature table to optimize over and no model selection to automate.

AutoML vs Foundation Models: Why AutoML Can't Fix the Real Bottleneck