Why is feature engineering so time-consuming?

Feature engineering is time-consuming because enterprise data is relational. A typical business has 5-50 interconnected tables (customers, orders, products, interactions, payments). To train a traditional ML model, you must flatten all of that into a single table with one row per entity. The Stanford RelBench study measured this: 12.3 hours and 878 lines of code per prediction task, on average. You are writing complex SQL joins, computing temporal aggregations across tables, handling point-in-time correctness to avoid data leakage, and then iterating 3-4 times when the first set of features underperforms. This is why feature engineering consumes roughly 80% of total data science project time.

Is there a tool that automates feature engineering?

Yes, several tools automate feature engineering. Featuretools generates features from relational data using deep feature synthesis. DataRobot and H2O Driverless AI automate single-table feature generation as part of their AutoML pipelines. These tools speed up the process, but they still produce a flat feature table as output. They automate the flattening process rather than eliminating it. KumoRFM takes a different approach: it eliminates feature engineering entirely by learning directly from raw relational tables, discovering patterns across the full relational structure without ever creating a flat table.

What is the difference between automating and eliminating feature engineering?

Automating feature engineering means using software to generate the same type of flat feature table that a data scientist would build manually. The output is still a single table with one row per entity. Tools like Featuretools and DataRobot do this. Eliminating feature engineering means the model reads raw relational tables directly and learns predictive patterns from the relational structure itself. No flat table is ever created. KumoRFM does this. The distinction matters because even automated feature engineering only explores a fraction of the possible feature space. Elimination lets the model discover patterns that no flat table can represent.

What ML platform should we use if we do not want to do feature engineering?

If you want to skip feature engineering entirely, the only production platform that eliminates it is Kumo.ai. KumoRFM is a relational foundation model that reads raw relational tables connected by foreign keys and discovers predictive patterns across the full relational structure. You write a PQL (Predictive Query Language) query describing what you want to predict, and the model handles everything else. On the RelBench benchmark, KumoRFM zero-shot achieves 76.71 AUROC with zero feature engineering, compared to 62.44 for manual feature engineering with LightGBM.

How much of the feature space does manual feature engineering actually explore?

Manual feature engineering typically explores only 4-17% of the possible feature space. A data scientist hypothesizes which features matter based on domain knowledge and intuition, then builds and tests those specific features. But the number of possible multi-table aggregations, temporal windows, cross-entity interactions, and multi-hop relationships grows combinatorially. A human working 12.3 hours per task cannot explore more than a small fraction. KumoRFM does not enumerate features at all. It learns directly from the relational structure, effectively exploring the full space of possible patterns.

What benchmarks show that eliminating feature engineering is better than automating it?

Two key benchmarks. First, the SAP SALT enterprise benchmark: KumoRFM zero-shot scores 91% accuracy, compared to 75% for PhD data scientists with hand-tuned XGBoost and 63% for LLM + AutoML approaches. Second, the Stanford RelBench benchmark across 7 databases and 30 prediction tasks: KumoRFM zero-shot achieves 76.71 AUROC vs 62.44 for LightGBM with manual features. KumoRFM fine-tuned reaches 81.14. In both cases, the gap comes from patterns in the relational structure that flat-table approaches never capture, regardless of how the features are generated.

Can Featuretools or DataRobot handle multi-table relational data?

Featuretools can ingest multiple related tables and automatically generate cross-table features using deep feature synthesis. This is a real improvement over fully manual feature engineering. However, it still produces a flat output table, and the feature generation is limited to predefined aggregation primitives. It does not learn which patterns are predictive for a specific task. DataRobot and H2O Driverless AI handle single-table feature engineering well but require a pre-joined flat table as input. Neither discovers multi-hop relational patterns or preserves the full relational structure the way a relational foundation model does.

KumoRFM was built by the team behind the ML systems at Pinterest, Airbnb, and LinkedIn. The founders are Vanja Josifovski (CEO, former CTO at Airbnb and Pinterest), Jure Leskovec (Chief Scientist, Stanford professor, co-creator of GraphSAGE and one of the most cited computer scientists in the world), and Hema Raghavan (Head of Engineering, former Sr. Director at LinkedIn). The company is backed by Sequoia Capital. The research behind KumoRFM builds on a decade of work in graph neural networks and relational learning at Stanford.

Why Feature Engineering Is Obsolete: The Case for Eliminating It Entirely | Kumo.ai

The real reason feature engineering takes so long

If you have spent time in enterprise ML, you already know the statistic: feature engineering consumes roughly 80% of data science project time. But the usual explanation ("it is tedious") misses the structural reason it is so expensive.

The problem is relational data. A typical enterprise does not store customer behavior in one table. It stores it across 5-50 interconnected tables: customers, orders, products, interactions, support tickets, payments, subscriptions, events. Each table connects to others through foreign keys. The relationships between tables contain the most predictive signals.

But traditional ML models (XGBoost, LightGBM, random forests, neural networks) cannot read relational databases. They require a single flat table with one row per entity. So before you can train any model, you must collapse your entire relational database into that flat structure.

This is where the time goes. Not in model training. Not in hyperparameter tuning. In the flattening.

What flattening actually requires

Here is what a data scientist does for every prediction task on relational data:

Write SQL joins across 5-15 tables with correct temporal constraints (no data leakage). For a churn prediction task, this means joining customers to orders to products to support tickets to payments, all filtered to the correct time windows. Easily 100-300 lines of SQL.
Compute cross-table aggregations like avg_order_value_last_90d, support_tickets_last_30d, product_return_rate_by_category. Each one is a hypothesis about what might matter. Each one requires careful implementation.
Engineer temporal features across table boundaries: purchase frequency trends, support escalation patterns, engagement velocity changes. These require window functions spanning multiple joined tables.
Iterate 3-4 times when the first model underperforms. Go back, hypothesize new features, implement them, retrain. Each cycle takes hours.
Maintain the pipeline in production. When schemas change, when new data sources appear, when business logic shifts, the feature pipeline breaks and must be updated.

The deeper problem: you are exploring 4-17% of the feature space

Time is not the only cost. The bigger issue is coverage.

When a data scientist builds features, they start with hypotheses: "recency of last purchase probably matters," "support ticket count probably correlates with churn," "high-value customers probably behave differently." These are educated guesses. Good ones. But guesses.

The number of possible features from a relational database grows combinatorially. Consider just the aggregation options: for each pair of tables, you can compute count, sum, average, min, max, standard deviation, and trend across dozens of columns, over multiple time windows (7 days, 30 days, 90 days, 365 days), with various filters and groupings. Add multi-hop relationships (customer → orders → products → other customers who bought the same products → their churn rates), and the space becomes enormous.

A data scientist working 12.3 hours per task explores a tiny fraction of this space. Research on automated feature generation suggests that manual approaches typically cover only 4-17% of the feasible feature space. That means 83-96% of potentially predictive patterns are never tested.

Three approaches to the feature engineering problem

The industry has developed three distinct approaches, and the differences between them matter more than most comparisons acknowledge.

1. Manual feature engineering (XGBoost + hand-crafted features)

This is the traditional approach. A data scientist writes SQL, computes aggregations, builds a flat table, and trains a model (typically XGBoost or LightGBM). It works. It has worked for years. But it costs 12.3 hours and 878 lines of code per task, explores only a fraction of the feature space, and creates brittle pipelines that require ongoing maintenance.

Best for: Teams with strong data science talent who need full control over every feature, or regulatory environments that require every feature to be explicitly defined and auditable.
Watch out for: Only explores 4-17% of the possible feature space. Costs 12.3 hours per task. Creates brittle pipelines that break when schemas change. Does not scale beyond a handful of prediction tasks without a large team.

2. Automated feature engineering (Featuretools, DataRobot, H2O)

Tools like Featuretools use deep feature synthesis to automatically generate features from relational data. DataRobot and H2O Driverless AI automate single-table feature generation as part of their AutoML pipelines. These tools genuinely reduce the manual effort. Featuretools can generate hundreds of features from multiple tables in minutes instead of hours.

But here is the critical point: they still produce a flat table. They automate the flattening process. The output is still one row per entity with columns representing aggregated features. The model still trains on a single table. The relational structure is still lost.

Best for: Teams that want to speed up existing workflows without changing their approach, or organizations already invested in an AutoML platform that need broader feature coverage than manual engineering provides.
Watch out for: Still produces a flat table as output - the relational structure is lost. Limited to predefined aggregation primitives. Cannot discover multi-hop relational patterns. Platform licensing adds $150K-$250K per year.

3. Eliminate feature engineering (KumoRFM)

KumoRFM is a relational foundation model. It does not generate features. It does not flatten tables. It reads raw relational tables connected by foreign keys and learns predictive patterns directly from the relational structure. The model ingests the tables as they exist in your data warehouse, preserves every relationship, and discovers patterns that span multiple tables and multiple hops.

This is not a faster version of feature engineering. It is a different approach entirely. No flat table is ever created. No features are ever enumerated. The model learns what matters from the raw data.

Best for: Organizations with relational data (5-50 tables) where feature engineering is the bottleneck, teams that need to scale from 1 to 20+ prediction tasks without scaling headcount, and any situation where speed to production is a competitive advantage.
Watch out for: Newer paradigm with less industry history than XGBoost-based workflows. If your data is genuinely single-table and already flat, the relational advantage is smaller.

three_approaches_to_feature_engineering

dimension	Manual (XGBoost)	Automated (Featuretools/DataRobot)	Eliminated (KumoRFM)
Feature engineering effort	12.3 hours + 878 lines of code per task	Minutes of configuration, automated generation	Zero. No features are created.
Data input	Hand-built flat table (SQL joins)	Relational tables (Featuretools) or flat table (DataRobot/H2O)	Raw relational tables connected by foreign keys
Feature space explored	4-17% (manual hypothesis-driven)	Broader than manual, but limited to predefined primitives	Full relational structure. No enumeration needed.
Multi-hop patterns	Rarely. Too expensive to implement manually.	Limited. Depth restricted by computational cost.	Native. Model traverses full relational graph.
Output format	Flat table with one row per entity	Flat table with one row per entity	Predictions directly. No intermediate table.
Pipeline maintenance	High. Feature code breaks when schemas change.	Medium. Automated pipelines still need updates.	None. Model reads raw tables as they are.
Time to first prediction	Weeks (feature engineering + model training)	Days (setup + automated generation + training)	~1 second (zero-shot) to minutes (fine-tuned)
RelBench AUROC	62.44	~64-66 (AutoML + manual features)	76.71 zero-shot, 81.14 fine-tuned

Highlighted: the accuracy gap between automated and eliminated approaches is 10+ AUROC points. This gap comes from relational patterns that flat-table approaches cannot represent, regardless of how features are generated.

Why automation is not enough

The distinction between automating and eliminating feature engineering is the most important point in this article, so let me be direct about it.

Featuretools, DataRobot, and H2O Driverless AI are real improvements over manual feature engineering. They reduce the time from hours to minutes. They generate more features than a human would think to test. They are legitimate tools that solve a real problem.

But they still flatten. And flattening is lossy. When you collapse a customer's order history into avg_order_value = $47.30 and order_count = 12, you lose the sequence. You lose the fact that order values have been declining for three months. You lose the fact that the last two orders were returns. You lose the fact that this customer's purchase pattern matches other customers who churned.

Automated tools generate more aggregations, but they are still aggregations. They describe the relational structure using summary statistics instead of preserving it.

what_flattening_loses (churn prediction example)

signal	available in flat table	available in relational model
Average order value	Yes (single number: $47.30)	Yes, plus the full trajectory over time
Order value trending down	Only if someone engineers a trend feature	Yes, learned automatically from the sequence
Support tickets increasing while purchases decrease	Only if cross-table trend is manually computed	Yes, cross-table temporal pattern detected natively
Similar customers churned after same pattern	No. Requires cross-entity joins rarely attempted.	Yes. Multi-hop pattern: customer > products > other customers > outcomes
Product category engagement shifting	Only if category-level aggregations are built	Yes. Full product interaction history preserved.
Account-level multi-user behavior	Aggregated to single row. Individual patterns lost.	Each user's behavior preserved with account relationships.

Automated feature engineering tools would generate the first two or three signals. The bottom three require multi-hop relational reasoning that flat-table approaches do not attempt.

The benchmark evidence

Two independent benchmarks quantify the difference between these approaches on real enterprise data.

SAP SALT enterprise benchmark

The SAP SALT benchmark tests prediction accuracy on production-quality enterprise databases with multiple related tables. Real business analysts and data scientists attempt the same prediction tasks.

sap_salt_enterprise_benchmark

approach	accuracy	feature_engineering_required
LLM + AutoML	63%	Automated (LLM generates features, AutoML selects model)
PhD Data Scientist + XGBoost	75%	Weeks of manual feature engineering by experts
KumoRFM (zero-shot)	91%	None. Zero feature engineering. Zero training.

Highlighted: KumoRFM outperforms expert data scientists by 16 percentage points with zero feature engineering and zero training time. The LLM+AutoML approach, which represents automated feature engineering, scores lowest.

The 63% score for LLM + AutoML is particularly telling. This is the automated approach: a language model generates feature engineering code, an AutoML system selects and tunes the model. It should be faster and more consistent than manual work. But it scores 12 points lower than a PhD data scientist doing it by hand, because automation without understanding produces worse features, not better ones.

KumoRFM sidesteps the problem entirely. It does not try to generate better features. It reads the relational data directly. The 91% score represents what happens when you stop summarizing relational structure and start learning from it.

Stanford RelBench benchmark

RelBench provides a standardized evaluation across 7 databases, 30 prediction tasks, and 103 million rows. It was designed specifically to test ML approaches on relational data.

relbench_benchmark_results

approach	AUROC	feature_engineering_time	lines_of_code
LightGBM + manual features	62.44	12.3 hours per task	878
AutoML + manual features	~64-66	reduced time per task	878
KumoRFM zero-shot	76.71	~1 second	0
KumoRFM fine-tuned	81.14	Minutes	0

Highlighted: KumoRFM zero-shot outperforms manual + AutoML approaches by 10+ AUROC points. Fine-tuned KumoRFM reaches 81.14. Zero lines of feature engineering code in both cases.

The jump from 62.44 to ~64-66 is what AutoML buys you: better model selection on the same features. The jump from ~64-66 to 76.71 is what elimination buys you: patterns that exist in the relational structure but never made it into any flat table. That second gap is 5x larger than the first.

What this looks like in practice

Traditional workflow (manual or automated)

Identify prediction task (e.g., 90-day churn for enterprise accounts)
Data scientist writes SQL joins across 5-15 tables (2-4 hours)
Compute cross-table aggregations and temporal features (4-6 hours)
Build flat feature table with one row per customer
Train model (XGBoost/LightGBM or AutoML platform)
Evaluate. Underperforming? Go back to step 2. Repeat 3-4 times.
Deploy model + maintain feature pipeline ongoing
Total: 2-6 weeks to first production prediction

KumoRFM workflow

Connect Kumo to your data warehouse (one-time, 30 minutes)
Write a PQL query: PREDICT churn_90d FOR EACH customer_id
KumoRFM reads raw tables, discovers patterns, returns predictions
No SQL joins. No aggregations. No flat table. No feature iteration.
Time to first prediction: ~1 second (zero-shot)
Fine-tune for task-specific accuracy: minutes, not weeks
No feature pipeline to maintain. Ever.
Total: minutes to first production prediction

PQL Query

PREDICT churn_90d
FOR EACH customers.customer_id
WHERE customers.segment = 'enterprise'
AND customers.contract_value > 50000

This single PQL query replaces the entire feature engineering pipeline. No SQL joins across tables. No aggregation logic. No feature iteration cycles. KumoRFM reads the raw customers, orders, products, support_tickets, and payments tables directly and discovers the predictive patterns itself.

Output

customer_id	churn_probability	top_signal
C-4401	0.87	Declining order frequency + rising support escalations
C-4402	0.12	Stable multi-department usage, recent contract expansion
C-4403	0.93	Similar accounts churned after same engagement drop pattern
C-4404	0.08	Increasing product adoption, 3 new integrations this month

The cost of continuing to do feature engineering

The time cost is obvious. But the compounding costs are what make feature engineering truly expensive at scale.

annual_cost_of_feature_engineering (20 prediction tasks)

cost_dimension	manual_approach	automated_approach	eliminated (KumoRFM)
Feature engineering labor	246 hours ($61,500)	~80 hours ($20,000)	0 hours ($0)
Data science team for pipelines	3-4 FTEs ($450K-$600K)	2-3 FTEs ($300K-$450K)	0.5 FTE ($75K)
Pipeline maintenance (annual)	520 hours ($130K)	260 hours ($65K)	20 hours ($5K)
Platform/tool licensing	$0 (open-source models)	$150K-$250K (DataRobot/H2O)	$80K-$120K (Kumo)
Time to new prediction task	2-6 weeks	3-7 days	Minutes
Total annual cost	$650K-$800K	$535K-$785K	$80K-$120K

Highlighted: automation reduces cost by 15-25%. Elimination reduces cost by 85%. The difference is that automation still requires data science teams for pipeline maintenance and feature iteration.

Notice that the automated approach is not dramatically cheaper than the manual approach. The tools cost $150K-$250K per year, and you still need 2-3 data scientists for the multi-table work that automation cannot handle. The savings are real but incremental.

Elimination is a step change. When there is no feature pipeline to build, maintain, or debug, the cost structure collapses. One ML engineer can operate 20 prediction tasks because the work is writing PQL queries, not maintaining SQL pipelines.

When each approach makes sense

To be direct about this: not every organization should switch to KumoRFM tomorrow.

Manual feature engineering makes sense when your data is already in a single table, when you have a strong data science team that values full control, or when regulatory requirements demand that every feature be explicitly defined and auditable.
Automated feature engineering (Featuretools, DataRobot) makes sense when you want to speed up existing workflows without changing your approach, when your team is already invested in an AutoML platform, or when you need the breadth of features that tools like Featuretools generate from relational data.
Elimination (KumoRFM) makes sense when your data is relational (5-50 tables), when feature engineering is your bottleneck, when you need maximum accuracy on relational data, when you want to scale from 1 to 20+ prediction tasks without scaling your data science team, or when speed to production is a competitive advantage.

Key Takeaways

1Feature engineering is time-consuming because enterprise data is relational (5-50 tables), and traditional ML requires flattening it all into one table. This costs 12.3 hours and 878 lines of code per prediction task, and even the best data scientists only explore 4-17% of the possible feature space.
2Automated feature engineering tools (Featuretools, DataRobot, H2O) reduce the manual effort but still produce a flat table. They automate the flattening. They do not eliminate it. The relational structure is still lost in the process.
3KumoRFM eliminates feature engineering entirely. It reads raw relational tables and learns predictive patterns directly from the relational structure. No flat table is ever created. No features are ever enumerated. KumoRFM does not automate feature engineering. It removes the need for it.
4The benchmark evidence is clear: SAP SALT shows 91% accuracy for KumoRFM vs 75% for expert data scientists vs 63% for LLM+AutoML. RelBench shows 76.71 AUROC zero-shot vs 62.44 for manual features. The gap comes from patterns in relational structure that flat tables cannot represent.
5At scale (20 prediction tasks), manual feature engineering costs $650K-$800K per year. Automated feature engineering costs $535K-$785K. Eliminating it with KumoRFM costs $80K-$120K. The 85% cost reduction comes from removing the need for feature pipelines and the teams that maintain them.

Why Feature Engineering Is Obsolete: The Case for Eliminating It Entirely