Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
Learn13 min read

How to Reduce ML Pipeline Complexity by 90%

Your ML team spends more time maintaining pipelines than building models. The pipeline exists to bridge the gap between relational data and flat-table models. Eliminate the gap, and the pipeline collapses.

TL;DR

  • 1Only 5% of production ML code is the model. The other 95% is data extraction, feature engineering, feature storage, serving, monitoring, and retraining infrastructure.
  • 2Feature engineering and feature pipelines account for 60-80% of total pipeline effort. A Stanford study measured 12.3 hours and 878 lines of code per prediction task.
  • 3Maintenance costs 30-50% of initial build cost annually. A Fortune 500 company with 20 production models needs 3-5 full-time ML engineers just for maintenance.
  • 4Foundation models eliminate 4-5 of the 8 pipeline stages by learning directly from raw relational data. No feature engineering, no feature store, no extraction pipeline, no retraining cycle.
  • 5The simplified pipeline (connect, query, deploy) reduces time-to-prediction from months to minutes and total 3-year cost by 75-90% across multiple use cases.

Google published a paper in 2015 titled "Hidden Technical Debt in Machine Learning Systems." The central finding: only 5% of the code in a production ML system is the model itself. The other 95% is data extraction, feature engineering, feature validation, training infrastructure, model serving, monitoring, and the glue that holds it all together.

Nine years later, this ratio has not changed. Industry surveys from Anaconda, MLOps Community, and Gartner consistently show that data science teams spend 40-60% of their time on pipeline maintenance. Not building new models. Not improving existing ones. Maintaining the infrastructure that exists solely to transform relational data into flat tables and keep those transformations running.

This is the structural problem. And it is solvable.

production_ml_pipeline — stages and effort

StageTools RequiredTeamTypical DurationAnnual Maintenance
1. Data ExtractionAirflow, Prefect, dbtData Engineer4-8 weeks$50K-100K
2. Feature EngineeringSQL, Python, SparkData Scientist6-12 weeks$150K-300K
3. Feature StorageTecton, Feast, RedisML Engineer2-4 weeks$100K-300K
4. Model TrainingXGBoost, PyTorch, MLflowData Scientist2-4 weeks$50K-100K
5. Model ValidationCustom scripts, fairness toolsData Scientist1-2 weeks$20K-50K
6. DeploymentDocker, K8s, SageMakerML Engineer2-4 weeks$50K-150K
7. MonitoringEvidently, Arize, WhyLabsML Engineer1-2 weeks$50K-150K
8. RetrainingAirflow + stages 1-6Full teamOngoing$200K-500K

Highlighted: feature engineering and feature storage account for 60-80% of total pipeline effort and maintenance cost.

Anatomy of a production ML pipeline

A typical ML pipeline for a prediction task on enterprise data has 6 to 8 stages. Each stage has its own tools, failure modes, and maintenance requirements.

Stage 1: Data extraction

Pull data from source systems: the data warehouse, transactional databases, event streams, CRM, and third-party APIs. This requires scheduling (Airflow, Prefect, Dagster), connection management, schema monitoring, and incremental load logic. When a source schema changes (and it always changes), the extraction pipeline breaks.

Stage 2: Feature engineering

Transform raw data into features. This is the longest stage: writing SQL joins across 5-15 tables, computing aggregations (sum, count, average, max, min) over multiple time windows (7, 30, 60, 90 days), creating derived features (ratios, rates, deltas), and encoding categorical variables. A Stanford study measured this at 12.3 hours and 878 lines of code per prediction task for experienced data scientists.

Here is what that looks like in practice. A telecom company wants to predict churn. The raw data lives in three tables:

subscribers

subscriber_idplantenure_monthsmonthly_charge
S-4001Unlimited34$89
S-4002Basic8$45
S-4003Family22$120

usage_events

event_idsubscriber_idevent_typetimestampdata_mb
E-001S-4001Data2025-03-01 08:14142
E-002S-4001Voice2025-03-01 09:30
E-003S-4002Data2025-03-01 11:458
E-004S-4002Data2025-02-15 14:2012
E-005S-4003Data2025-03-02 10:00340

S-4002 has only 2 usage events in 2 weeks, with minimal data consumption. S-4001 uses data and voice on the same day. These temporal patterns matter.

support_interactions

ticket_idsubscriber_idcategorydateresolution
T-101S-4002Coverage complaint2025-02-20Unresolved
T-102S-4002Cancel request2025-03-01Pending
T-103S-4003Billing question2025-02-15Resolved

S-4002 escalated from a coverage complaint to a cancellation request in 9 days. This sequence is the strongest churn signal in the data.

Stage 2 flattens these three tables into a single row per subscriber:

flat_feature_table (what the model sees after 12.3 hours of engineering)

subscriber_idusage_events_30davg_data_mbticket_counttenurechurned
S-40012142034?
S-400221028?
S-40031340122?

The flattened table compresses ticket categories into a count. S-4002's escalation from complaint to cancellation is invisible. S-4003's resolved billing question looks identical to an unresolved cancellation.

Stage 3: Feature storage and serving

Store computed features for training and serve them at prediction time. Feature stores (Tecton, Feast, Hopsworks) manage this but add infrastructure complexity: online stores for low-latency serving, offline stores for training, materialization jobs to keep them in sync, and TTL policies to manage stale data.

Stage 4: Model training

Train the model on the feature table. This includes hyperparameter tuning, cross-validation, experiment tracking (MLflow, Weights & Biases), and compute management (GPU allocation, distributed training). This stage is what most people think of as "ML," but it represents only 10-15% of the total pipeline effort.

Stage 5: Model validation

Validate the model against held-out data, check for data leakage, test fairness across protected groups, compare against the current production model, and generate documentation for model governance. At regulated companies (finance, healthcare), this stage alone can take weeks.

Stage 6: Deployment

Serve the model in production: containerize (Docker), deploy (Kubernetes, SageMaker, Vertex AI), set up auto-scaling, configure A/B testing, and integrate with the application layer. The model is now live, but the work is not done.

Stage 7: Monitoring

Monitor for data drift (input distributions changing), model decay (accuracy degrading over time), feature freshness (stale data in the feature store), and infrastructure health (latency, errors). Set up alerting for each.

Stage 8: Retraining

When monitoring detects decay, retrain the model. This triggers stages 1-6 again: re-extract data, re-compute features, retrain, revalidate, redeploy. Most organizations retrain monthly or quarterly. Some retrain weekly. Each retraining cycle requires human oversight.

Where the complexity actually lives

Not all pipeline stages are equally complex. The distribution of effort is highly skewed.

High-complexity stages (85% of effort)

  • Feature engineering: 878 lines of SQL/Python per task
  • Feature storage: Online/offline sync, materialization jobs
  • Data extraction: Schema monitoring, incremental loads
  • Monitoring: Drift detection, freshness checks, alerting
  • Retraining: Full pipeline re-execution monthly/quarterly

Low-complexity stages (15% of effort)

  • Model training: 50-100 lines, well-tooled with AutoML
  • Validation: Standardized metrics, automated testing
  • Deployment: Containerized, one-click with modern platforms
  • Hyperparameter tuning: Automated with Optuna/Ray Tune
  • Experiment tracking: Mature tools (MLflow, W&B)

The pattern is clear: the stages that are hard are the ones that deal with the gap between relational data and flat features. Data extraction, feature engineering, feature storage, and the monitoring and retraining that keeps them running. The stages that are easy are the ones that operate on the flat feature table after it exists: training, tuning, validation.

The ML industry has spent a decade building better tools for the easy stages (AutoML, experiment tracking, model serving platforms) while leaving the hard stages manual. This is like optimizing the last mile of a marathon while ignoring the first 25 miles.

pipeline_cost_comparison — traditional vs foundation model

Cost CategoryTraditional (10 use cases)Foundation Model (10 use cases)
Data Science Team$2M-4M/year$200K-400K/year
Infrastructure & Tooling$1M-3M/year$100K-300K/year
Feature Store Licensing$100K-300K/year$0
Monitoring & MLOps$200K-500K/yearIncluded
Annual Maintenance$3M-10M/year$50K-100K/year
Total (Year 1)$5M-20M$500K-1.5M
Total (3-Year)$16M-50M$1.5M-4.5M

Foundation model approach reduces total 3-year cost by 75-90% by eliminating feature pipelines and per-use-case engineering.

What a simplified pipeline looks like

If you eliminate the gap between relational data and flat features, four of the eight stages disappear entirely.

Data extraction: eliminated. The model connects directly to the relational database. No extraction pipeline, no schema monitoring, no incremental load logic.

Feature engineering: eliminated. The model learns directly from raw relational data. No SQL joins, no aggregation queries, no time-window features.

Feature storage: eliminated. There are no computed features to store, serve, or keep fresh. The model reads the data at prediction time.

Retraining: eliminated (for zero-shot) or simplified (for fine-tuning). A foundation model that is pre-trained on relational patterns does not need task-specific training for many use cases. For cases where fine-tuning improves accuracy, the retraining cycle is minutes, not weeks.

What remains: connect the database, write a predictive query, validate the output, deploy the predictions. Four stages that take hours instead of months.

The foundation model approach

KumoRFM implements this simplified pipeline. The model is pre-trained on billions of relational patterns across thousands of databases. It has already learned the universal patterns that predict outcomes in relational data: recency, frequency, temporal dynamics, graph topology, cross-table signal propagation.

The production workflow is:

1. Connect. Point KumoRFM at your Snowflake, Databricks, BigQuery, or PostgreSQL database. The model reads the schema and maps tables and relationships automatically.

2. Query. Write a one-line predictive query:

PQL Query

PREDICT churn_90d
FOR EACH customers.customer_id

This single query replaces stages 2-4 of the traditional pipeline: feature engineering, feature storage, and model training. The foundation model handles all three internally.

Output

customer_idchurn_90dconfidence
C-100420.820.93
C-100430.140.91
C-100440.670.88
C-100450.030.96

3. Deploy. Predictions are available via API or written back to your data warehouse. No model serving infrastructure. No containerization. No Kubernetes.

The entire pipeline, from connected database to production predictions, is measured in minutes. There is no feature pipeline to maintain. No retraining schedule to manage. No drift to monitor in a feature store that does not exist. When the underlying data changes, the model adapts automatically because it reads the data at prediction time.

What this means for ML teams

The implication is not that ML teams become unnecessary. It is that they spend their time differently. Instead of 60% on pipeline maintenance and 40% on new work, the ratio flips.

More predictions, faster. A team that previously shipped 3-4 new models per year (each taking 3-6 months to build and deploy) can now ship 50+ predictions per quarter. Each new prediction question is a query, not a project.

Higher-value work. Data scientists spend time on business problem framing, result interpretation, and stakeholder communication instead of writing SQL joins and debugging feature pipelines.

Lower infrastructure cost. Eliminating feature stores, training clusters, orchestration systems, and monitoring infrastructure reduces cloud spend by 40-70% for prediction workloads.

The complexity of ML pipelines is not inherent to machine learning. It is an artifact of the gap between relational data and flat-table models. Close the gap with a model that learns directly from relational data, and 90% of the pipeline evaporates. What remains is the part that creates value: asking the right questions and acting on the answers.

Frequently asked questions

Why are ML pipelines so complex?

ML pipelines are complex because they bridge a structural gap between how data is stored (relational tables) and how models consume data (flat feature vectors). This gap requires data extraction, joining, aggregation, feature engineering, feature storage, model training, validation, deployment, monitoring, and retraining. Each stage introduces its own infrastructure, failure modes, and maintenance burden. Google's research found that only 5% of ML system code is the model itself; the other 95% is infrastructure.

What are the main stages of a typical ML pipeline?

A production ML pipeline typically has 6-8 stages: (1) data extraction from source databases, (2) feature engineering (SQL joins, aggregations, time windows), (3) feature storage and serving, (4) model training and hyperparameter tuning, (5) model validation and testing, (6) deployment to production, (7) monitoring for data drift and model decay, (8) scheduled retraining. Each stage requires different tools, different teams, and ongoing maintenance.

How much does ML pipeline maintenance cost?

Industry surveys consistently show that ML teams spend 40-60% of their time on pipeline maintenance rather than building new models. At a Fortune 500 company with a 20-person data science team at an average cost of $200K per person, that is $1.6M-$2.4M per year spent on maintaining existing pipelines rather than creating new value. The infrastructure costs (compute, storage, orchestration tools) add another $500K-$2M per year.

What is a feature pipeline and why is it the main bottleneck?

A feature pipeline is the code that transforms raw data from multiple source tables into the flat feature table that a model needs. It includes SQL queries for joining and aggregating data, time-window logic to prevent data leakage, data quality checks, and serving infrastructure to deliver features at prediction time. Feature pipelines account for 60-80% of ML pipeline code and are the primary source of bugs, data leakage, and maintenance burden.

How do foundation models reduce pipeline complexity?

Foundation models like KumoRFM eliminate the feature pipeline entirely by learning directly from raw relational data. Instead of extracting, joining, aggregating, and engineering features, you connect the database and write a predictive query. The model handles feature discovery, pattern learning, and prediction internally. This removes 4-5 pipeline stages (extraction, feature engineering, feature storage, training, retraining) and reduces the infrastructure to a database connection and an API.

See it in action

KumoRFM delivers predictions on relational data in seconds. No feature engineering, no ML pipelines. Try it free.