MLOps (machine learning operations) is the set of practices, tools, and infrastructure required to deploy, monitor, and maintain machine learning models in production. It covers the entire lifecycle: data ingestion, feature engineering, model training, validation, deployment, serving, monitoring, and retraining. In practice, MLOps is the reason most ML projects take 6-12 months to reach production and require dedicated platform engineering teams.

Why do 87% of ML models never reach production?

The primary reason is infrastructure complexity, not model quality. A typical ML pipeline involves 6-8 distinct stages, each requiring different tools, teams, and expertise: data pipelines (Airflow, Spark), feature engineering (SQL, Pandas), feature stores (Tecton, Feast), training infrastructure (GPUs, Kubernetes), model registry (MLflow), serving infrastructure (SageMaker, Vertex), monitoring, and retraining pipelines. Each stage is a potential failure point, and each handoff between teams introduces delays.

What is a feature store and do I still need one?

A feature store is infrastructure that manages the creation, storage, and serving of machine learning features. Tools like Tecton, Feast, and Hopsworks solve the operational problem of serving pre-computed features at low latency. However, they do not solve the creation problem: someone still has to decide which features to build. Foundation models that learn directly from raw relational data eliminate the need for feature stores entirely, because there are no features to store.

How do foundation models simplify MLOps?

Foundation models like KumoRFM collapse most of the MLOps pipeline into a single inference call. There is no feature engineering (the model reads raw tables), no training pipeline (the model is pre-trained), no feature store (no features to store), and no model registry (one model handles all tasks). The remaining operational concerns, serving and monitoring, are handled by the foundation model platform. Time to production drops from months to minutes.

What is PQL and how does it replace ML pipelines?

PQL (Predictive Query Language) is an interface for expressing prediction tasks in a SQL-like syntax. Instead of building an entire ML pipeline, you write a query like: PREDICT churn FOR customer WHERE signup_date > '2024-01-01'. The foundation model translates this into a graph traversal, runs inference, and returns predictions. PQL replaces the pipeline with a query, turning a multi-month engineering project into a single line of code.

MLOps: The Pipeline Tax That Kills Most ML Projects | Kumo.ai

In 2019, VentureBeat reported that 87% of data science projects never make it to production. Despite billions invested in MLOps platforms, feature stores, and infrastructure, the industry spent years building MLOps platforms, hiring ML engineers, and deploying Kubernetes clusters. The needle barely moved.

The problem was never the model. A competent data scientist can train a model that works in a Jupyter notebook in a day or two. The problem is everything that happens between "the model works in my notebook" and "the model runs in production serving 10 million predictions per day." That gap is 6 to 12 months of engineering. That gap is MLOps.

What MLOps actually involves

MLOps is the discipline of running machine learning in production. It sounds like DevOps for ML, and the analogy is correct in one important way: just as DevOps added enormous infrastructure complexity to software development, MLOps adds enormous infrastructure complexity to data science.

A typical production ML pipeline has 6 to 8 stages. Each one requires different tools, different expertise, and often a different team.

Stage 1: Data ingestion

Raw data lives in data warehouses (Snowflake, BigQuery, Redshift), operational databases (Postgres, MySQL), event streams (Kafka), and third-party APIs. An ingestion pipeline extracts data from these sources, handles schema changes, and lands it in a staging area. Tools: Airflow, dbt, Fivetran, custom Spark jobs.

Stage 2: Feature engineering

A data scientist writes SQL and Python to transform raw tables into a flat feature matrix. This stage alone consumes 80% of the total project time. A Stanford study measured it at 12.3 hours and 878 lines of code per prediction task. For a production model, multiply that by the number of iterations needed to reach acceptable accuracy.

Stage 3: Feature store

Features need to be computed consistently for training and serving. The feature store (Tecton, Feast, Hopsworks) manages this: storing historical feature values for training, serving fresh features at low latency for inference, and ensuring no data leakage between training and serving. Setting up a feature store is a platform engineering project that takes 2 to 6 months.

Stage 4: Model training

Train the model on historical data. Tune hyperparameters. Run cross-validation. Compare model architectures. This is what most people think of as "doing ML." It is typically 10-15% of the total project time. Tools: PyTorch, TensorFlow, XGBoost, SageMaker Training, Vertex AI Training.

Stage 5: Model validation and registry

Before deployment, the model goes through validation: checking for bias, verifying performance on holdout data, running A/B tests against the current production model, getting sign-off from the model risk team. Approved models are versioned in a model registry (MLflow, Weights & Biases). In regulated industries, this stage can take weeks.

Stage 6: Serving infrastructure

The model needs to run somewhere that can handle production traffic. Batch predictions might run on Spark. Real-time predictions need a low-latency serving endpoint (SageMaker Endpoints, Vertex AI Prediction, Seldon, BentoML). The serving layer needs to fetch fresh features from the feature store, run inference, and return predictions within an SLA (often under 100ms).

Stage 7: Monitoring

Models degrade over time as the data distribution shifts. A monitoring system tracks prediction quality, detects data drift and concept drift, and alerts the team when the model needs retraining. Tools: Evidently, WhyLabs, Arize, custom dashboards.

Stage 8: Retraining

When model performance degrades, the entire pipeline runs again: ingest fresh data, recompute features, retrain the model, validate, deploy. Automated retraining pipelines are the holy grail of MLOps and the hardest thing to get right, because any change in the data schema or feature logic can break the pipeline silently.

Here is what the pipeline looks like in practice. A fintech company tracks every stage of their churn prediction model.

pipeline_runs

run_id	model	stage	started	duration	status
RUN-101	churn_v3	Data Ingestion	2025-09-01	2h 14m	Success
RUN-101	churn_v3	Feature Engineering	2025-09-01	18h 42m	Success
RUN-101	churn_v3	Feature Store Sync	2025-09-02	3h 08m	Success
RUN-101	churn_v3	Model Training	2025-09-02	1h 23m	Success
RUN-101	churn_v3	Validation	2025-09-03	4d (waiting)	Pending
RUN-101	churn_v3	Deployment	---	---	Blocked

Highlighted: feature engineering took 18+ hours (the longest compute stage). Validation has been pending 4 days waiting for model risk review. Deployment is blocked.

pipeline_costs

stage	compute_cost	labor_hours	labor_cost	total
Data Ingestion	$42	4h	$600	$642
Feature Engineering	$187	40h	$6,000	$6,187
Feature Store	$95	16h	$2,400	$2,495
Training	$124	8h	$1,200	$1,324
Validation	$0	24h	$3,600	$3,600
Deployment	$68	12h	$1,800	$1,868

Feature engineering dominates both compute and labor cost. The total pipeline cost for one model version: $16,116. For a team running 4-5 models, this repeats quarterly.

pipeline_failures (last 6 months)

date	model	stage	failure_reason	time_lost
2025-07-14	churn_v2	Feature Eng.	Schema change in payments table	3 days
2025-08-02	fraud_v4	Feature Store	Feast version incompatibility	2 days
2025-08-19	ltv_v1	Training	GPU OOM on new feature set	1 day
2025-09-05	churn_v3	Deployment	Serving latency exceeded SLA	4 days

Highlighted: an upstream schema change broke the feature pipeline for 3 days. A serving latency issue blocked deployment for 4 days. These cascading failures are the norm.

Why the pipeline kills projects

The pipeline does not fail in one place. It fails everywhere, slowly.

The dependency chain

Each stage depends on the output of the previous stage. Here is a concrete trace of one schema change cascading through the pipeline.

cascade_impact: payments table renamed 'amount' to 'payment_amount'

stage	impact	fix_required	time_to_fix
Feature Engineering	SQL query referencing payments.amount fails	Update 14 SQL queries	4 hours
Feature Store	Feature 'avg_payment_amount_30d' stops computing	Update Feast definition + backfill	6 hours
Model Training	Training job fails: expected column missing	Retrigger after feature store fix	2 hours
Serving	Live predictions return stale features from cache	Flush cache + redeploy	3 hours
Monitoring	Data drift alert fires on missing feature	Update monitoring config	1 hour

One column rename in one upstream table cascaded through 5 pipeline stages, requiring 16 hours of combined engineering time across 3 different teams. This is a routine schema change.

If the feature logic changes, the feature store needs updating. If the feature store schema changes, the serving layer breaks. A single upstream change can cascade through 4 stages of pipeline code. Teams that have been through this call it “pipeline debt.”

The iteration penalty

When the model does not perform well enough, the data scientist goes back to Stage 2 and engineers more features. Each iteration takes days. After 3 to 5 iterations, the project is months old and has consumed significant engineering time. Many projects are cancelled at this point, not because the approach was wrong, but because the timeline exceeded the business window.

The last mile problem

A model that works in a notebook is not a model that works in production. The gap between "it works on my laptop" and "it serves predictions reliably at scale" is where most projects die. The model needs containerization, API wrapping, load balancing, failover handling, latency optimization, and integration with the feature store. This is pure infrastructure engineering with zero data science value-add.

Traditional MLOps

6-8 pipeline stages, each requiring different tools
5 teams with 5 different backlogs and priorities
Feature engineering: 12.3 hours per task, repeated per iteration
Feature store setup: 2-6 months of platform engineering
Time to production: 6-12 months per model
Retraining requires re-running the entire pipeline

Foundation model approach

One inference call replaces 6 pipeline stages
One interface (PQL) for any prediction task
No feature engineering: model reads raw tables directly
No feature store: no features to store
Time to production: minutes (write a PQL query)
No retraining: foundation model is pre-trained and updated centrally

The tools people buy (and the problem they do not solve)

The MLOps tool landscape is enormous. MLflow, Kubeflow, SageMaker, Vertex AI, Databricks ML, Tecton, Feast, Weights & Biases, Seldon, BentoML, Evidently, Arize. Each tool solves one stage of the pipeline. None of them eliminate the pipeline.

This is the core issue. The MLOps ecosystem optimizes a fundamentally over-engineered process instead of questioning whether the process should exist. Every dollar spent on a feature store is a dollar spent managing the output of feature engineering. If you eliminate feature engineering, the feature store is unnecessary. Every hour spent building retraining pipelines is an hour spent re-running a process that a foundation model makes irrelevant.

The MLOps industry is worth an estimated $2.4 billion in 2024, projected to reach $13.3 billion by 2030. Most of that spend is managing complexity that foundation models remove.

How foundation models collapse the stack

A relational foundation model like KumoRFM eliminates most of the MLOps pipeline by removing the stages that create the complexity.

Feature engineering: eliminated. The model reads raw relational tables directly. It represents the database as a temporal graph and learns predictive patterns from the graph structure. No SQL joins. No aggregations. No feature iteration cycles.

Feature store: eliminated. With no engineered features, there is nothing to store, version, or serve separately. The model computes everything it needs from raw data at inference time.

Training pipeline: eliminated (for most use cases). The model is pre-trained on billions of relational patterns across thousands of databases. For a new prediction task, you do not train a new model. You write a PQL query and run zero-shot inference. If you need higher accuracy on a specific task, fine-tuning takes minutes, not months.

Model registry: simplified. Instead of managing dozens of task-specific models, each with its own version history and dependencies, you have one foundation model. Version management happens at the platform level, not the project level.

Serving infrastructure: abstracted. The foundation model platform handles serving, scaling, and latency optimization. Your team writes PQL queries. The platform runs them.

PQL Query

PREDICT COUNT(sessions.*, 0, 30) = 0
FOR EACH users.user_id

This single line replaces the entire 6-stage pipeline shown above. No data ingestion pipeline, no feature engineering, no feature store, no training job, no model registry. The foundation model reads the raw tables and returns predictions.

Output

user_id	churn_probability	time_to_predict
U-1001	0.73	0.8s
U-1002	0.12	0.8s
U-1003	0.91	0.8s
U-1004	0.44	0.8s

PQL: the interface that replaces the pipeline

Predictive Query Language (PQL) is to ML what SQL was to data retrieval. SQL meant you no longer had to write custom code to read data from disk. PQL means you no longer have to build custom pipelines to generate predictions from data.

A PQL query looks like this: "For each customer, what is the probability of churn in the next 30 days?" The foundation model translates this into a graph traversal over the relational data, runs inference, and returns predictions with explanations. One line. One second. No pipeline.

The shift is from imperative (build a pipeline that produces predictions) to declarative (ask a question and get predictions). The same shift that SQL brought to data access in the 1970s.

What stays

Foundation models do not eliminate all operational concerns. Data quality still matters. Garbage in, garbage out, regardless of how sophisticated the model is. Access controls, data governance, and compliance requirements remain. Monitoring prediction quality over time is still necessary, though the mechanism changes: instead of monitoring model drift, you monitor data quality and prediction calibration.

The difference is the scale of the operational burden. Instead of managing a 6-to-8-stage pipeline with 5 teams, you manage a data connection and a query interface. The operational complexity drops by an order of magnitude.

If your organization has spent the last three years building an MLOps platform, the investment was not wasted. The data engineering foundations (clean pipelines, governed data, reliable infrastructure) transfer directly. What changes is everything downstream of the raw data: the feature engineering, the feature store, the training pipeline, the model registry, the serving layer. All of that collapses into a single foundation model query.

The 87% of models that never reach production are not failing because data scientists cannot build good models. They are failing because the pipeline between "good model" and "production model" is too long, too fragile, and too expensive. Remove the pipeline, and the 87% starts to look very different.

Key Takeaways

187% of ML models never reach production, not because the models are bad, but because the 6-8 stage pipeline between notebook and production takes 6-12 months and involves 5 different teams.
2Feature engineering dominates the pipeline in both time (18+ hours per run) and cost ($6,000+ in labor per model version). Pipeline failures cascade: one schema change can break the entire chain.
3The MLOps tool ecosystem ($2.4B in 2024) optimizes an over-engineered process instead of questioning whether the process should exist.
4Foundation models collapse the stack: no feature engineering, no feature store, no training pipeline, no model registry. The database connects directly to the model.
5PQL replaces the pipeline with a query, shifting ML from imperative (build a pipeline) to declarative (ask a question). Time to production drops from months to minutes.

MLOps: The Pipeline Tax That Kills Most ML Projects