Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
Learn12 min read

ML Predictions Without a Data Science Team: 5 Options Ranked by Cost and Accuracy

Hiring 3-5 data scientists costs $600K-$1M/year and takes 6 months to produce a first model. Gartner and IDC estimate that 53-88% of those models never reach production. Here are 5 realistic alternatives, ranked by the team size they actually require.

TL;DR

  • 1A full in-house data science team costs $600K-$1M/year in salary alone, takes 6 months to ramp, and the majority of models never reach production. For most companies, this is not the right starting point.
  • 2Five options exist, ranked by team size: Cloud AutoML (0 people), no-code ML (0 people), AutoML platforms (1-2 analysts), relational foundation models (1 analyst), and in-house teams (3-5 data scientists). They differ dramatically in what they automate.
  • 3Options 1-3 all automate parts of the ML pipeline but still require feature engineering: converting raw relational tables into a single flat table. This is the hardest, most expensive step, taking 80% of data science time.
  • 4Option 4 (KumoRFM) eliminates feature engineering entirely by reading raw relational tables directly. On the SAP SALT benchmark, it scores 91% accuracy vs 75% for PhD data scientists with XGBoost and 63% for LLM+AutoML.
  • 5For most enterprises with relational data, a foundation model approach costs $80K-$120K/year vs $600K-$1M/year for an in-house team, while delivering higher accuracy and predictions in seconds instead of months.

The hiring math that kills most ML initiatives

Here is the reality most ML vendor websites skip over. You need predictions (churn, fraud, demand, lead scoring), so you start looking into ML. The first thing you learn: you need data scientists. The second thing you learn: they are expensive and hard to find.

A mid-level data scientist costs $150K-$200K/year in total compensation. You need 3-5 of them to cover feature engineering, model development, deployment, and maintenance. That is $600K-$1M/year before you have produced a single prediction. It takes 6 months to hire and ramp the team. Then Gartner and IDC estimate that 53-88% of the models they build never make it to production.

The bottleneck is the pipeline, not the talent. The bottleneck is not building models. It is the feature engineering step that comes before modeling: converting your raw relational database (customers, orders, products, interactions) into a single flat table that ML algorithms can read. That step takes 80% of data science time and is the reason most models stall before reaching production.

So what are the actual alternatives? Here are five options, ranked from smallest to largest team requirement.

Option 1: Cloud AutoML (Vertex AI, SageMaker Autopilot)

Team required: 0 data scientists. Cost: Pay-per-use compute.

Google Vertex AI and Amazon SageMaker Autopilot let you upload a CSV and get a trained model back. No coding required. The platform handles algorithm selection, hyperparameter tuning, and basic feature transformations. These services take a single flat table as input. If your prediction depends on patterns across multiple tables (and for most enterprise use cases, it does), someone still has to write the SQL joins and aggregations to build that flat table.

  • Best for: Simple, single-table classification and regression. Good for prototyping a churn model on a pre-built customer features table.
  • Watch out for: Automates the modeling step but not data preparation. For fraud detection across transactions, accounts, devices, and merchant history, it does not get you far.

Option 2: No-code ML platforms (Obviously AI, Akkio)

Team required: 0 data scientists. Cost: $500-$5,000/month.

No-code platforms go further than Cloud AutoML by wrapping the entire workflow in a point-and-click interface. Upload a spreadsheet, select what you want to predict, and get results. Some even generate dashboards and natural language explanations. The input is still a flat table. If your data lives in a relational database with foreign keys connecting customers to orders to products to support tickets, no-code platforms cannot discover patterns across those relationships.

  • Best for: Quick predictions on clean, single-table data. Solid for a marketing team predicting email open rates from a campaign spreadsheet.
  • Watch out for: They predict well on the data you give them, but they cannot find the signals you did not know to include. For enterprise-scale relational predictions, accuracy drops significantly.

Option 3: AutoML platforms (DataRobot, H2O.ai)

Team required: 1-2 analysts. Cost: $150K-$300K/year platform + analyst salary.

DataRobot and H2O.ai are the heavyweights of AutoML. They automate model selection, hyperparameter tuning, and (in H2O Driverless AI) single-table feature engineering. They produce strong models with solid explainability. But the input is still a flat table. The analyst you hire is not tuning models (the platform does that). The analyst is writing and maintaining the feature engineering pipeline: the SQL joins, the cross-table aggregations, the temporal features that flatten your relational database into a single CSV.

  • Best for: Flat-table predictions with model sophistication, when you have analysts who can prepare data and need more power than Cloud AutoML offers.
  • Watch out for: The Stanford RelBench study measured feature engineering at 12.3 hours and 878 lines of code per prediction task. For 10 prediction tasks, that is 123 hours of feature engineering before any AutoML platform touches the data.

Option 4: Relational foundation models (KumoRFM)

Team required: 1 analyst. Cost: $80K-$120K/year.

This is where the architecture shifts. KumoRFM is a foundation model pre-trained on relational data. Instead of requiring a flat table as input, it reads your raw relational tables directly: customers, orders, products, interactions, support tickets, connected by foreign keys. It discovers predictive patterns across tables, including multi-hop relationships and temporal sequences, without anyone writing a single line of feature engineering code.

  • Best for: Any prediction on relational data without feature engineering. Covers churn, fraud, demand forecasting, lead scoring, and recommendations with a single analyst writing PQL queries.
  • Watch out for: Commercial platform, not open-source. If you need full algorithmic source code access or have a genuinely novel use case outside relational prediction, consider other options.

The analyst writes a PQL (Predictive Query Language) query that looks like this:

PQL Query

PREDICT churn_90d
FOR EACH customers.customer_id
WHERE customers.signup_date > '2025-01-01'

One PQL query replaces the entire feature engineering pipeline. KumoRFM reads raw customers, orders, products, and support_tickets tables, discovers cross-table patterns, and returns predictions. No SQL joins. No aggregation code. No flat table.

Output

customer_idchurn_probabilitykey_signal
C-22010.89Support escalations + declining order frequency across 3 product lines
C-22020.14Stable cross-department usage, recent expansion to new product
C-22030.91Similar accounts churned after same pattern of reduced API calls
C-22040.06High engagement, increasing order volume, active in 4 integrations

The key difference from Options 1-3: there is no flat table. No one joins tables. No one computes aggregations. The model reads the raw relational structure and finds predictive signals that a flat table physically cannot contain, like multi-hop patterns (customer bought products that other churned customers also bought) and temporal sequences across entities.

Option 5: In-house team + XGBoost

Team required: 3-5 data scientists. Cost: $600K-$1M/year.

This is the traditional approach. You hire data scientists, they build feature pipelines, train XGBoost or LightGBM models, deploy them, and maintain them. You get full control over every step. You can customize everything. You own the IP. A team of 3-5 produces maybe 2-4 production models per year, with each model requiring ongoing maintenance at 30-50% of the original build cost annually.

  • Best for: Maximum control, highly custom use cases, organizations with existing ML infrastructure, or regulatory requirements for full algorithmic transparency.
  • Watch out for: Despite having full control, the team spends 80% of their time on feature engineering, not on the modeling work they were hired to do. Expect 6 months to ramp and $600K-$1M/year before a single prediction ships.

The comparison: all 5 options side by side

ml_options_comparison_by_team_size

optionteam_sizeannual_costtime_to_first_modelaccuracy_on_relational_data
1. Cloud AutoML0$5K-$50KHoursLow (limited to flat-table features)
2. No-code ML0$6K-$60KMinutesLow-Medium (single-table only)
3. AutoML (DataRobot, H2O)1-2 analysts$200K-$400K2-6 weeksMedium (~64-66 AUROC on RelBench)
4. KumoRFM1 analyst$80K-$120K~1 second (zero-shot)High (76.71 AUROC zero-shot on RelBench)
5. In-house team3-5 data scientists$600K-$1M3-6 monthsMedium-High (75% on SAP SALT with PhD + XGBoost)

Highlighted: KumoRFM delivers the highest accuracy on relational data at the lowest cost, with the smallest team and fastest time to first prediction. The accuracy gap comes from eliminating feature engineering, not from better model tuning.

The key insight: feature engineering is the real cost driver

Look at the table above and notice something. Options 1, 2, and 3 all automate different parts of the ML pipeline: model selection, hyperparameter tuning, single-table feature generation. But they all share the same input requirement: a single flat table.

In a real enterprise, your data does not live in a single flat table. It lives in a relational database: customers linked to orders linked to products linked to support tickets linked to payments. Someone has to flatten that structure into one row per entity before any of these tools can run. That flattening process is feature engineering, and it is the step that costs the most time, the most money, and loses the most predictive signal.

Benchmark proof: SAP SALT and RelBench

Two independent benchmarks show what happens when you eliminate feature engineering instead of automating it.

sap_salt_enterprise_benchmark

approachaccuracyfeature_engineering_required
LLM + AutoML63%Automated but limited to single-table patterns
PhD Data Scientist + XGBoost75%Weeks of manual feature engineering by experts
KumoRFM (zero-shot)91%None. Reads raw relational tables directly.

SAP SALT benchmark on enterprise data. KumoRFM outperforms PhD data scientists by 16 percentage points and LLM+AutoML by 28 points. No feature engineering. No training. The model reads raw tables and predicts.

On the RelBench benchmark (7 databases, 30 prediction tasks, 103 million rows), the pattern holds:

relbench_benchmark_results

approachAUROCfeature_engineering_time
LightGBM + manual features62.4412.3 hours per task
AutoML + manual features~64-6610.5 hours per task
KumoRFM zero-shot76.71~1 second
KumoRFM fine-tuned81.14Minutes

RelBench results. The 10+ AUROC point gap between AutoML and KumoRFM zero-shot is the value of reading relational data natively vs flattening it into a table first.

The 2-point jump from LightGBM to AutoML is the value of better model selection. The 10+ point jump from AutoML to KumoRFM is the value of better features, features that exist in the relational structure but get destroyed when you flatten the data into a single table. No amount of model tuning on a flat table can recover what was lost in the join.

The cost comparison at 10 prediction tasks

total_cost_comparison_10_tasks

cost_dimensionAutoML platform approachKumoRFM approachIn-house team approach
Platform/tooling$150K-$300K$80K-$120K$50K-$100K (infra + tools)
Feature engineering labor123 hours ($30K)0 hours ($0)123 hours ($30K)
Data science team1-2 analysts ($100K-$200K)1 analyst ($75K-$100K)3-5 FTEs ($600K-$1M)
Pipeline maintenance (annual)260 hours ($65K)10 hours ($2.5K)520 hours ($130K)
Time to first prediction2-6 weeks~1 second3-6 months
Total annual cost$345K-$595K$157K-$222K$810K-$1.26M

10 prediction tasks, annual costs. KumoRFM is 2-3x cheaper than AutoML platforms and 5-6x cheaper than an in-house team, while delivering higher accuracy on relational data.

Options 1-3 workflow (flat table required)

  • Someone writes SQL to join 5+ relational tables (2-4 hours)
  • Someone computes cross-table aggregations and temporal features (4-6 hours)
  • Someone iterates on features 3-4 times when the first model underperforms (4-6 hours)
  • Upload the flat table to the AutoML platform
  • Platform selects model, tunes parameters, returns predictions
  • Maintain the feature pipeline every time schemas change

Option 4 workflow (KumoRFM)

  • Connect KumoRFM to your data warehouse (one-time setup)
  • Write a PQL query defining what you want to predict
  • KumoRFM reads raw tables, discovers features, returns predictions in ~1 second
  • No feature engineering, no model selection, no pipeline code
  • Add a new prediction task by writing another PQL query
  • No feature pipeline to maintain

When each option makes sense

Not every company needs a relational foundation model. Here is when each option is the right fit:

  • Cloud AutoML (Option 1) makes sense when you have a single, simple prediction task on data that already exists in one table. Good for prototyping and proof-of-concept work.
  • No-code ML (Option 2) makes sense when your team has zero technical expertise and needs quick predictions from spreadsheet-level data. Marketing teams predicting campaign performance, for example.
  • AutoML platforms (Option 3) make sense when you have 1-2 analysts who can prepare data and you need more model sophistication than Cloud AutoML offers. Good for organizations with an existing data warehouse and some SQL expertise.
  • KumoRFM (Option 4) makes sense when your predictions depend on relational data (most enterprise use cases), you want maximum accuracy without a large team, and you need predictions fast. This covers churn, fraud, demand forecasting, lead scoring, recommendations, and most enterprise prediction tasks.
  • In-house team (Option 5) makes sense when you need highly custom model architectures, have regulatory requirements for full algorithmic transparency, or your use case is genuinely novel and no existing platform handles it. Think proprietary trading signals or custom NLP on domain-specific data.

Frequently asked questions

We want ML predictions but can't afford a full data science team. What are our options?

You have five realistic options, ranked by team size required. (1) Cloud AutoML services like Vertex AI and SageMaker Autopilot need no team but only handle simple, single-table tasks. (2) No-code ML platforms like Obviously AI and Akkio need no team but lose accuracy on complex relational data. (3) AutoML platforms like DataRobot and H2O.ai need 1-2 analysts but still require someone to flatten your relational database into a single table. (4) Relational foundation models like KumoRFM need just 1 analyst and work directly on raw relational tables with zero feature engineering. (5) A full in-house team with XGBoost gives maximum control but costs $600K-$1M/year and takes 6 months to ramp. For most companies with relational data, option 4 delivers the best accuracy-to-cost ratio because it eliminates feature engineering, which is the hardest and most expensive part of the ML pipeline.

Is it worth building an in-house ML team or should we buy a platform?

It depends on how many prediction tasks you need and how complex your data is. If you have a single, well-defined use case on clean tabular data and your team already has ML expertise, building in-house can make sense. But if you need predictions across multiple relational tables (customers, orders, products, interactions), the math shifts dramatically. A 3-5 person data science team costs $600K-$1M/year in salary alone, takes 6 months to ramp, and Gartner and IDC estimate that 53-88% of their models never reach production. A platform like KumoRFM costs $80K-$120K/year, delivers predictions in seconds, and a single analyst can operate it. For most enterprises with relational data, buying a platform that eliminates feature engineering is 5-8x cheaper than building a team to do it manually.

What is feature engineering and why does it matter for platform selection?

Feature engineering is the process of converting raw database tables into a single flat table that ML models can read. If you have customers, orders, and products in separate tables, someone has to write SQL joins, compute aggregations like average_order_value_last_90_days, and flatten everything into one row per customer. This takes 80% of data science time, averaging 12.3 hours and 878 lines of code per prediction task according to the Stanford RelBench study. Options 1-3 (Cloud AutoML, no-code, and AutoML platforms) all require this step. Option 4 (relational foundation models like KumoRFM) skips it entirely by reading raw relational tables directly.

How accurate are no-code ML platforms compared to a data science team?

On simple, single-table data (a clean CSV with all features pre-computed), no-code platforms can get within 5-10% of what a skilled data scientist produces. On relational data that spans multiple tables, the gap widens significantly because no-code platforms cannot discover cross-table patterns. On the RelBench benchmark, AutoML approaches with manually engineered features score approximately 62-66 AUROC, while KumoRFM zero-shot achieves 76.71 AUROC. The difference is not model quality. It is about features that exist in the relational structure but never make it into a flat table.

What is PQL and how does it replace feature engineering?

PQL (Predictive Query Language) is Kumo's query language that lets you define prediction tasks in plain English-like syntax. Instead of writing hundreds of lines of SQL joins and feature code, you write a query like: PREDICT churn_90d FOR EACH customers.customer_id. KumoRFM then reads all connected tables (orders, products, support tickets, payments) and discovers predictive patterns automatically. One PQL query replaces the entire feature engineering pipeline. A business analyst who knows SQL can write PQL queries without any ML expertise.

How long does it take to get the first prediction from each option?

Cloud AutoML: hours to days for simple tasks. No-code platforms: minutes to hours for single-table data. AutoML platforms (DataRobot, H2O): 2-6 weeks including feature engineering time. KumoRFM: roughly 1 second for zero-shot predictions after connecting your data warehouse. In-house team: 3-6 months for the first production model. The speed difference is driven almost entirely by whether the approach requires feature engineering. If it does, someone has to build and iterate on features before any model runs.

Can a single analyst really operate KumoRFM without ML expertise?

Yes. KumoRFM is a pre-trained foundation model, so there is no model architecture selection, hyperparameter tuning, or training pipeline to manage. The analyst connects the data warehouse (Snowflake, BigQuery, Databricks), writes PQL queries to define prediction tasks, and reviews the output. The skills required are SQL-level data literacy and business domain knowledge, not deep ML expertise. This is why the team requirement drops from 3-5 data scientists to 1 analyst.

Is there a free way to try a relational foundation model?

Yes. KumoRFM offers a free tier at kumorfm.ai where you can connect your data and run predictions without a paid license. This lets you benchmark accuracy on your actual data before committing to any platform. You can also compare results against your existing approach to see the accuracy difference on your specific use case.

See it in action

KumoRFM delivers predictions on relational data in seconds. No feature engineering, no ML pipelines. Try it free.