Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
Learn11 min read

Build vs Buy for Enterprise ML: The Real TCO Comparison

Building custom ML pipelines costs $500K-2M per use case. Most enterprises underestimate that number by 3-5x. Here is the full cost breakdown and the math that changes the decision.

TL;DR

  • 1A single production ML use case costs $500K-2M in year one and $1.6M-4.9M over three years when you include team, infrastructure, feature stores, and maintenance.
  • 2Maintenance costs 30-50% of initial build cost annually. Most enterprises cap at 3-5 production models not because they lack ideas, but because they cannot afford more.
  • 385% of ML projects fail to reach production (Gartner). The failure is economic: total cost exceeds value delivered, especially when maintenance compounds year over year.
  • 4Foundation models change the economics by serving multiple use cases from a single platform. The 10th use case costs approximately zero incremental infrastructure.
  • 5Build custom for your 1-2 most differentiated ML capabilities. Use a foundation model for the remaining 80% of enterprise ML on structured relational data.

Every enterprise ML leader eventually faces the same question: should we build our own ML pipelines or buy a platform that handles predictions for us? The standard answer is "it depends." The honest answer is that most companies dramatically undercount what "build" actually costs.

Gartner estimates that 85% of ML projects fail to reach production. That number has not changed meaningfully since 2019. The failure is rarely technical. The models work in notebooks. The failure is economic: the total cost of getting a model from prototype to production, and keeping it there, exceeds the value it delivers.

This article walks through the real numbers. Not vendor marketing math. The actual line items that show up when you build a custom ML pipeline from raw data to production predictions.

The anatomy of a custom ML pipeline

A production ML system is not a model. It is a pipeline with at least seven distinct components, each requiring different skills and infrastructure.

Data ingestion and transformation. Connecting to source systems (data warehouses, transactional databases, streaming sources), handling schema changes, ensuring data freshness. This requires data engineers and typically 4-8 weeks of work.

Feature engineering. Transforming raw relational data into flat feature tables. The Stanford RelBench study measured this at 12.3 hours and 878 lines of code per prediction task for experienced data scientists. For a production system, this phase takes 6-12 weeks and produces 100-500 features.

Feature storage and serving. Feature stores like Tecton or Feast cost $100K-300K/year in licensing and infrastructure. You need them for consistency between training and serving, and for low-latency feature retrieval in production.

Model training and validation. The part most people think of as "ML." Experiment tracking, hyperparameter tuning, cross-validation, fairness auditing. This is typically 2-4 weeks and requires senior data scientists.

Model serving infrastructure. Deploying models behind APIs, managing compute scaling, ensuring sub-100ms latency for real-time use cases. Requires ML engineers and DevOps, plus GPU or high-CPU infrastructure.

Monitoring and observability. Tracking prediction quality, detecting data drift, alerting on feature pipeline failures. Tools like Evidently, Arize, or WhyLabs cost $50K-150K/year.

Retraining and maintenance. Models degrade. Features break when upstream schemas change. New data sources need integration. This is the cost that never stops.

custom_ml_pipeline — TCO breakdown per use case

Cost CategoryYear 1Year 2Year 33-Year Total
Data Science Team (2 DS + 1 MLE + 1 DE)$300K-600K$150K-300K$150K-300K$600K-1.2M
Cloud Compute (training + serving)$20K-100K$20K-100K$20K-100K$60K-300K
Feature Store (Tecton/Feast)$100K-300K$100K-300K$100K-300K$300K-900K
Monitoring & MLOps Tools$50K-150K$50K-150K$50K-150K$150K-450K
Ongoing Maintenance (30-50%)$250K-600K$250K-600K$500K-1.2M
Total per Use Case$500K-2M$570K-1.45M$570K-1.45M$1.6M-4.9M

Most enterprises underestimate by 3-5x because they only count the modeling phase. Maintenance alone exceeds the initial build cost by year 3.

scaling_costs — total cost by number of use cases

Use CasesCustom Build (3-Year)Foundation Model (3-Year)Savings
1$1.6M-4.9M$200K-500K69-90%
5$8M-24.5M$400K-1M88-96%
10$16M-49M$600K-1.5M94-97%
20$32M-98M$800K-2M97-98%

Custom pipelines scale linearly. Foundation models scale sub-linearly because the same model serves all use cases.

The real numbers

Here is what a single production ML use case actually costs at a mid-to-large enterprise, broken down by category.

People costs

A typical ML team for one use case includes 2 data scientists ($180K each fully loaded), 1-2 ML engineers ($200K each), 1 data engineer ($170K), and partial allocation of a project manager and ML platform engineer. Total team cost for a 6-month build: $300K-600K.

That assumes the team already exists and is not being recruited. If you are hiring, add 3-6 months of recruiting time and $30K-50K per hire in recruiting costs. Senior ML engineers in the Bay Area command $350K-500K in total compensation.

Infrastructure and tooling

Cloud compute for training: $20K-100K per use case depending on data volume and model complexity. Feature store: $100K-300K/year. Monitoring tools: $50K-150K/year. Experiment tracking and MLOps platform: $50K-200K/year. Total infrastructure: $220K-750K/year.

Maintenance

Industry benchmarks consistently show that maintenance costs 30-50% of the initial build cost annually. For a $1M pipeline, that is $300K-500K per year in perpetuity. This covers model retraining, feature pipeline updates, infrastructure upgrades, and on-call engineering time.

Where the money actually goes

When you break down the time allocation, the distribution is striking.

time_allocation — typical 6-month ML project

ActivityWeeks% of TotalTypical Output
Data extraction & cleaning4-615-20%SQL pipelines, schema mapping
Feature engineering8-1440-55%100-500 aggregate features
Model training & tuning2-38-12%Trained XGBoost/LightGBM model
Validation & testing1-25-8%Fairness audit, holdout evaluation
Deployment & serving2-38-12%Docker container, API endpoint
Documentation & handoff1-25-8%Model cards, runbooks

Feature engineering consumes 40-55% of project time. Combined with data extraction, the data plumbing stages take 60-75% of the total project.

This means that the "ML" part of an ML project is roughly 15% of the work. The rest is data plumbing. You are paying $400K/year ML engineers to write SQL joins and debug Airflow DAGs.

Here is what a typical feature engineering session produces for a lead scoring model. The raw data lives across three CRM tables:

leads

lead_idsourcecompany_sizeindustrycreated_date
L-201Webinar500-1KSaaS2025-01-15
L-202Organic50-100Retail2025-02-03
L-203Paid Ad1K-5KFinance2025-02-10

activities

activity_idlead_idtypetimestampduration_sec
A-301L-201Page view (pricing)2025-01-16 09:12180
A-302L-201Demo request2025-01-16 09:15
A-303L-202Page view (blog)2025-02-04 14:3022
A-304L-203Page view (pricing)2025-02-11 10:0045
A-305L-203Page view (pricing)2025-02-12 10:05120
A-306L-203Page view (case study)2025-02-12 10:20210

L-201 went from pricing page to demo request in 3 minutes (high intent). L-203 returned to pricing twice, then read a case study (researching). L-202 bounced from a blog post in 22 seconds.

flat_feature_table (after 12+ hours of engineering)

lead_idactivity_count_7dsourcecompany_sizeconverted
L-2012Webinar500-1K?
L-2021Organic50-100?
L-2033Paid Ad1K-5K?

The flat table compresses all activity into a count. L-201's pricing-to-demo sequence (3 minutes, extremely high intent) is indistinguishable from L-203's research-phase browsing. The behavioral sequence that best predicts conversion is gone.

It also means that the single highest-leverage improvement you can make is not a better model architecture or a faster GPU. It is eliminating the feature engineering and pipeline orchestration that dominate the cost structure.

The hidden multiplier: use case count

One use case is expensive. But the economics get worse as you scale. Traditional ML pipelines are bespoke. Each use case requires its own feature engineering, its own model, its own serving infrastructure, and its own maintenance budget.

A churn prediction model and a fraud detection model on the same database share almost no code. Different features, different targets, different time windows, different serving requirements. You are building from scratch each time.

At 10 use cases, you are looking at $5M-20M in year-one costs and $3M-10M in annual maintenance. At 20 use cases, the numbers double. Most enterprises cap out at 3-5 production models not because they lack ideas, but because they cannot afford to build more.

Build: custom ML pipeline

  • $500K-2M per use case (year one)
  • 6-18 months to first production prediction
  • 4-8 person team per use case
  • 30-50% annual maintenance overhead
  • Each new use case starts from zero

Buy: foundation model approach

  • Single platform cost covers all use cases
  • Minutes to first prediction, not months
  • No dedicated ML team required per use case
  • Maintenance handled by the platform
  • Same model serves any prediction task

PQL Query

-- Same model, five use cases, five queries
PREDICT churn_90d FOR EACH customers.customer_id
PREDICT fraud_probability FOR EACH transactions.txn_id
PREDICT ltv_next_12m FOR EACH customers.customer_id
PREDICT conversion FOR EACH leads.lead_id
PREDICT demand_next_7d FOR EACH products.product_id

A foundation model serves all five prediction tasks from one platform. Each query is a new use case that would cost $500K-2M to build custom.

Output

Use CaseTime to Deploy (Custom)Time to Deploy (KumoRFM)AUROC
Churn6-12 months< 1 day81.1
Fraud8-14 months< 1 day79.3
LTV4-8 months< 1 day77.8
Lead Scoring3-6 months< 1 day82.4
Demand Forecast6-10 months< 1 day78.6

What foundation models change

The economics shift fundamentally when a single model can serve multiple prediction tasks on the same data. A relational foundation model like KumoRFM is pre-trained on relational patterns across thousands of databases. It connects directly to your data warehouse, understands the table structure through foreign keys, and generates predictions without feature engineering, model training, or pipeline orchestration.

The same model that predicts churn also predicts fraud, forecasts demand, scores leads, and estimates lifetime value. You are not building 10 separate pipelines. You are querying one model 10 different ways.

On the RelBench benchmark (7 databases, 30 tasks, 103 million rows), KumoRFM zero-shot outperforms custom pipelines built by Stanford data scientists on 11 of 12 classification tasks. The accuracy is 76.71 AUROC vs 62.44 for LightGBM with manually engineered features. With fine-tuning, it reaches 81.14 AUROC.

DoorDash deployed this approach and saw a 1.8% engagement lift across 30 million users. That single lift, across that user base, represents millions in annual revenue from a system that took days, not months, to deploy.

When build still wins

Foundation models are not the right answer for every ML problem. Build still makes sense in specific scenarios.

Unusual data modalities. If your core data is satellite imagery, genomic sequences, or proprietary sensor streams, relational foundation models will not help. These require domain-specific architectures trained on domain-specific data.

Custom optimization objectives. If your business requires a custom loss function that encodes specific operational constraints (multi-objective optimization for logistics routing, for example), you need a custom training setup.

Regulatory requirements for full model transparency. Some regulated industries require line-by-line model explainability. Foundation models offer feature attribution but may not satisfy the most stringent regulatory audits.

For the estimated 80% of enterprise ML that involves predicting outcomes from structured relational data (customer behavior, financial risk, operational forecasting, product recommendations), the TCO math favors foundation models by a wide margin.

The decision framework

When evaluating build vs buy, ask three questions:

1. How many use cases do you need? If you need one highly specialized model and have the team, build may be viable. If you need 5-20 prediction tasks across your relational data, the per-use-case cost of building makes the math prohibitive.

2. What is your time-to-value requirement? If you can wait 12 months for a first prediction, build gives you maximum control. If the business needs answers in weeks, you cannot afford a custom pipeline.

3. Do you have the team? A production ML pipeline requires data engineers, data scientists, ML engineers, and MLOps specialists. If that team exists and has capacity, build is feasible. If you are hiring, add 6-12 months and $200K-500K in recruiting costs before the first line of code.

For most enterprises, the honest answer is: build for your 1-2 most differentiated ML capabilities, and use a foundation model for everything else. The 80/20 rule applies. Let your expensive ML talent work on problems that genuinely require custom solutions, and let the platform handle the rest.

Frequently asked questions

How much does it cost to build a custom ML pipeline?

A single production ML use case typically costs $500K-2M when you account for all expenses: 4-8 months of a data science team ($150K-400K in salaries), infrastructure and tooling ($50K-200K/year), feature store and pipeline orchestration ($100K-300K), and ongoing maintenance at 30-50% of the initial build cost annually. Most enterprises underestimate costs by 3-5x because they only count the modeling phase.

What is the biggest hidden cost in enterprise ML?

Maintenance. After a model reaches production, it requires continuous monitoring, retraining on data drift, feature pipeline updates as schemas change, and on-call engineering support. Industry data shows maintenance consumes 30-50% of the original build cost every year. A model that cost $1M to build costs $300K-500K annually just to keep running.

Can foundation models replace custom ML pipelines?

For prediction tasks on relational data (churn, fraud, conversion, demand forecasting), yes. Relational foundation models like KumoRFM deliver predictions directly from raw database tables without feature engineering, model training, or pipeline orchestration. They match or exceed custom pipeline accuracy on standard benchmarks while reducing time-to-prediction from months to minutes.

When should you still build custom ML?

Custom pipelines still make sense for highly specialized domains with unusual data modalities (satellite imagery, genomics, proprietary sensor data), tasks requiring custom loss functions tied to specific business constraints, and scenarios where the prediction target has no analog in relational data. For the 80% of enterprise ML that involves predicting outcomes from structured relational data, foundation models are more cost-effective.

How do you calculate ML ROI accurately?

Total the full cost: team salaries for data scientists, ML engineers, and data engineers (typically 4-8 people per use case); infrastructure (compute, storage, MLOps tooling); opportunity cost of the 6-18 month build cycle; and annual maintenance at 30-50% of build cost. Compare this against the revenue impact or cost savings the model delivers. Most enterprises find that only 2-3 of their top 10 ML use cases have positive ROI when fully costed.

See it in action

KumoRFM delivers predictions on relational data in seconds. No feature engineering, no ML pipelines. Try it free.