Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
Learn16 min read

Build ML Models on Snowflake Without Moving Data: All 7 Options Compared

You have more options than you think for running ML predictions directly on Snowflake. But most of them still require you to flatten your relational tables into a single table before they work. Here is what each option actually requires, what it delivers, and where it falls short.

TL;DR

  • 1Snowflake users have 7 distinct options for adding ML predictions without moving data out: Cortex ML Functions, Snowpark ML, Notebooks + Container Runtime, DataRobot Native App, H2O.ai on Snowflake, KumoRFM Native App, and custom Snowpark Container Services deployments.
  • 2Six of the seven options require you to prepare a single flat feature table before they can generate predictions. Only KumoRFM reads multiple related Snowflake tables directly, discovering features across table relationships automatically.
  • 3On the SAP SALT enterprise benchmark, KumoRFM scores 91% accuracy vs 75% for PhD data scientists with XGBoost and 63% for LLM+AutoML. On RelBench, KumoRFM zero-shot achieves 76.71 AUROC vs 62.44 for manual flat-table approaches.
  • 4KumoRFM runs as a Snowflake Native App inside Snowpark Container Services. Data never leaves your Snowflake account. Sridhar Ramaswamy, CEO of Snowflake, serves as a Kumo advisor.
  • 5For most Snowflake teams, the real question is not 'which ML tool' but 'who builds and maintains the flat feature table.' KumoRFM eliminates that question entirely.

If your data lives in Snowflake and you want to add ML predictions, the good news is you have options. The bad news is that most guides only mention one or two of them, and none lay out the full picture with honest tradeoffs.

This guide covers every viable option as of early 2026. For each one, we explain what it does, who it is for, what it requires from your team, and where it falls short. We are opinionated about which option works best for multi-table relational data, and we will tell you why.

All 7 options for ML on Snowflake without moving data

Here is the complete list, ordered from simplest to most flexible:

1. Snowflake Cortex ML Functions

SQL-based, built into Snowflake. Covers forecasting, anomaly detection, and classification. No setup, no Python, no external tools. Operates on single columns or simple tables only and cannot handle multi-table prediction tasks.

  • Best for: Analysts who need quick forecasts without leaving SQL.
  • Watch out for: Cannot join or reason across multiple tables. If your prediction depends on patterns across customers, orders, and products, Cortex ML Functions will miss them entirely.

2. Snowpark ML

Python-based model training and inference inside Snowflake's compute. Full access to scikit-learn, XGBoost, LightGBM, and PyTorch through Snowpark Python. Requires a data science team that can write feature engineering code, train models, and manage the pipeline.

  • Best for: Data science teams that want full control over the modeling process while keeping data inside Snowflake.
  • Watch out for: You still need to join and flatten your tables into a single training dataframe yourself. The modeling is easy; the feature engineering is the bottleneck.

3. Snowflake Notebooks + Container Runtime

Jupyter-style notebooks running inside Snowflake with GPU support. The most flexible option for experimentation. You can run any Python library, build custom deep learning models, and iterate quickly.

  • Best for: ML research teams prototyping new approaches that need maximum flexibility.
  • Watch out for: Highest engineering requirement. You build everything from scratch, including data prep, feature engineering, model training, and deployment. This is a blank canvas, not a solution.

4. DataRobot Snowflake Native App

AutoML platform that runs inside your Snowflake account. Automates model selection, hyperparameter tuning, and provides a visual interface for non-coders.

  • Best for: Teams that want AutoML without exporting data and prefer a visual, low-code interface.
  • Watch out for: Still requires a pre-joined flat feature table as input. DataRobot automates the modeling step but not the multi-table feature engineering that typically takes 80% of the effort.

5. H2O.ai on Snowflake

Open-source AutoML (H2O-3) and commercial Driverless AI available through Snowflake. Strong algorithmic transparency and community support. Driverless AI adds automatic single-table feature engineering.

  • Best for: Teams that value open-source transparency and already have data science skills.
  • Watch out for: Same flat-table requirement as DataRobot. Multi-table joins and cross-entity aggregations are your responsibility. Needs data science expertise to operate effectively.

6. KumoRFM Snowflake Native App

Relational foundation model that runs as a Snowflake Native App inside Snowpark Container Services. The only option on this list that reads multiple related Snowflake tables directly using foreign key relationships. Zero feature engineering, zero data flattening. You write a PQL (Predictive Query Language) query describing what you want to predict, and KumoRFM discovers features across your full relational structure automatically.

  • Best for: Teams with multi-table relational data who want predictions without building feature pipelines.
  • Watch out for: Commercial platform, not open-source. If you need full algorithmic source code access, this is not the option.

7. Custom models via Snowpark Container Services

The infrastructure layer that powers several options above. You can deploy any Docker container inside Snowflake, which means any ML framework, any custom model, any inference pipeline.

  • Best for: Teams with specific requirements that no existing platform meets and the engineering capacity to build from scratch.
  • Watch out for: You build and maintain everything. This is infrastructure, not a solution. Expect weeks to months before your first prediction.

Side-by-side comparison: all 7 options

snowflake_ml_options_comparison

optiondata_inputfeature_engineeringteam_requiredtime_to_first_predictionmulti-table_support
Cortex ML FunctionsSingle column / tableNone (SQL functions only)SQL analystMinutesNo
Snowpark MLSingle flat dataframeManual (Python)Data science teamDays to weeksManual joins only
Notebooks + Container RuntimeAny (you build it)Manual (Python)ML engineersDays to weeksManual joins only
DataRobot Native AppSingle flat tableAutomatic (single-table only)ML-literate analystHours (after table prep)No
H2O.ai on SnowflakeSingle flat tableAutomatic single-table (Driverless AI)Data science teamHours (after table prep)No
KumoRFM Native AppMultiple related tablesAutomatic (cross-table)ML engineer or analystMinutesYes - native
Custom (Snowpark Container Services)Any (you build it)Manual (any framework)ML + infrastructure engineersWeeks to monthsWhatever you build

Highlighted: KumoRFM is the only option that accepts multiple related tables as input and automates feature discovery across them. All other options require you to prepare a single flat table first.

The real bottleneck: who builds the flat table?

Look at the comparison table again. Six of seven options require a flat feature table as input. The question most Snowflake ML guides skip is: who builds that table?

For a typical enterprise prediction task like churn prediction, fraud detection, or lead scoring, the flat table requires joining 5-10 related tables, computing temporal aggregations (average order value over 90 days, support tickets in the last 30 days, login frequency trends), and encoding cross-entity patterns. The Stanford RelBench study measured this effort: 12.3 hours and 878 lines of code per prediction task, on average.

This is not a one-time cost. Feature pipelines break when schemas change. They need updating when business logic evolves. They require monitoring for data drift. At scale, maintaining 10-20 feature pipelines demands 3-4 full-time data scientists.

What flat-table ML misses on Snowflake data

When you flatten relational tables into a single row per entity, you lose structural information that drives prediction accuracy. Here is a concrete example for a churn prediction task on a typical Snowflake data warehouse with customers, orders, products, support tickets, and payments tables:

signals_lost_when_flattening_snowflake_tables

signalvisible_in_flat_tablevisible_to_KumoRFM
Total order countYes - orders_count = 47Yes - plus order sequence, frequency changes, and category shifts
Support ticket escalation patternNo - only ticket_count = 3Yes - tickets escalating from billing to technical to cancellation
Product return rate correlationNo - requires cross-table joinYes - customer buys products with 23% return rate across all buyers
Payment method risk signalNo - only last_payment_methodYes - switched from annual to monthly billing 60 days ago
Similar customer outcomesNo - no cross-entity patternsYes - customers with matching product mix churned at 68% rate
Multi-department engagement declineNo - aggregated to single scoreYes - usage dropping across 3 product lines simultaneously

A churn prediction example on Snowflake. The flat table captures simple counts and latest values. The relational model captures behavioral sequences, cross-entity patterns, and multi-hop signals that actually predict churn.

Benchmark results on relational data

The accuracy gap between flat-table approaches and relational ML is not theoretical. Two independent benchmarks quantify it.

sap_salt_enterprise_benchmark

approachaccuracyfeature_engineering_required
LLM + AutoML63%LLM generates features, AutoML selects model
PhD Data Scientist + XGBoost75%Weeks of expert hand-crafted features
KumoRFM (zero-shot)91%Zero - reads relational tables directly

SAP SALT benchmark on enterprise data. KumoRFM outperforms expert data scientists by 16 percentage points and LLM+AutoML by 28 percentage points, with no feature engineering and no training.

relbench_benchmark_results

approachAUROCfeature_engineering_timelines_of_code
LightGBM + manual features62.4412.3 hours per task878
AutoML + manual features~64-6610.5 hours per task878
KumoRFM zero-shot76.71~1 second0
KumoRFM fine-tuned81.14Minutes0

RelBench benchmark (7 databases, 30 tasks, 103M rows). KumoRFM zero-shot scores 76.71 vs 62.44 for manual flat-table approaches. The gap comes from features discovered in relational structure that flat tables never contain.

The 10+ AUROC point gap on RelBench and the 16 percentage point gap on SAP SALT both measure the same thing: the predictive information that lives in relationships between tables and gets destroyed when you flatten data into a single table. Better model selection cannot recover this information. Better single-table feature engineering cannot recover it either. Only reading the relational structure directly can capture it.

How KumoRFM works on Snowflake: a PQL example

KumoRFM uses Predictive Query Language (PQL) to define prediction tasks. Instead of writing SQL joins, feature engineering code, and model training scripts, you describe what you want to predict. Here is a churn prediction query running on Snowflake tables:

PQL Query

PREDICT churn_90d
FOR EACH customers.customer_id
USING
  snowflake.customers,
  snowflake.orders,
  snowflake.products,
  snowflake.support_tickets,
  snowflake.payments

This single PQL query replaces the entire ML pipeline: the SQL joins across 5 Snowflake tables, the feature engineering code, the model training, and the deployment. KumoRFM reads all 5 tables, discovers predictive features across their relationships, and writes churn probabilities back to a Snowflake table. Data never leaves your Snowflake account.

Output

customer_idchurn_probabilitytop_risk_factors
C-88010.89Support escalation + declining order frequency + similar account churn
C-88020.14Stable multi-product usage + expanding seat count
C-88030.72Payment method downgrade + product return rate increasing
C-88040.06Growing order volume + positive support interactions

Workflow comparison: flat-table ML vs KumoRFM on Snowflake

Flat-table ML on Snowflake (Cortex/Snowpark/DataRobot/H2O)

  • Write SQL to join 5-10 Snowflake tables into a single flat table (4-8 hours)
  • Compute temporal aggregations and cross-entity features (4-6 hours)
  • Iterate on features 3-4 times as initial model underperforms (4-8 hours)
  • Feed flat table to chosen ML tool (Snowpark ML, DataRobot, or H2O)
  • Train model, tune hyperparameters, evaluate (1-4 hours)
  • Deploy model, schedule retraining, maintain feature pipeline
  • Repeat entire process when schema changes or new tables are added

KumoRFM on Snowflake

  • Install KumoRFM Snowflake Native App (one-time setup)
  • Point KumoRFM at your Snowflake tables and define foreign key relationships
  • Write a PQL query describing what you want to predict
  • KumoRFM reads raw tables, discovers features across relationships, returns predictions
  • Predictions written back to a Snowflake table automatically
  • No feature engineering code, no flat table, no pipeline to maintain
  • New tables? Add them to the schema. KumoRFM discovers new features automatically.

Security and compliance: data never leaves Snowflake

For regulated industries like financial services, healthcare, and insurance, data movement is not just an engineering inconvenience. It creates compliance exposure. Every time data leaves your Snowflake account, you add a new surface area for data governance, access control, and audit trail requirements.

All seven options on this list can run inside Snowflake to varying degrees. But the security posture differs:

  • Cortex ML Functions and Snowpark ML are fully native to Snowflake. Data stays within your account by design.
  • DataRobot, H2O.ai, and KumoRFM Native Apps run inside Snowpark Container Services. Data stays within your Snowflake account. The compute container has no outbound network access to your data.
  • Custom Snowpark Container Services deployments depend on how you configure them. You control the security posture.

KumoRFM takes this a step further: because it reads relational tables directly without requiring data exports to a staging layer or a separate feature store, there is no intermediate data copy outside Snowflake's governance model. Your existing Snowflake RBAC policies, masking rules, and audit logs apply to the ML workflow exactly as they do to your analytics queries.

Who built KumoRFM

KumoRFM was built by the team behind the ML systems at Pinterest, Airbnb, and LinkedIn: Vanja Josifovski (CEO, former CTO at Airbnb and Pinterest), Jure Leskovec (Chief Scientist, Stanford professor, co-creator of GraphSAGE), and Hema Raghavan (Head of Engineering, former Sr. Director at LinkedIn). Backed by Sequoia Capital.

Notably, Sridhar Ramaswamy, CEO of Snowflake, serves as a Kumo advisor. This relationship reflects the strategic alignment between Kumo's relational foundation model approach and Snowflake's vision for in-warehouse AI. KumoRFM was designed from the ground up to run natively on Snowflake, not bolted on as an afterthought.

Choosing the right option for your team

The right choice depends on three factors: your data structure, your team's skills, and how many prediction tasks you need to run.

  • Single-column forecasting or anomaly detection? Start with Snowflake Cortex ML Functions. They are free, built-in, and take minutes to set up. No reason to use anything heavier.
  • Single flat table with a strong data science team? Snowpark ML gives you full control. DataRobot or H2O.ai add AutoML if you want to automate model selection.
  • Multi-table relational data? KumoRFM is the only option that handles this without requiring you to flatten tables first. If your predictions depend on patterns across customers, orders, products, support tickets, and other related tables, KumoRFM eliminates the feature engineering bottleneck that every other option leaves to you.
  • Custom deep learning or LLM workloads? Snowflake Notebooks with Container Runtime or bare Snowpark Container Services give you the flexibility to run anything. Be prepared to build and maintain the full pipeline yourself.
  • Scaling from 1 to 20+ prediction tasks? This is where the approach differences compound. With flat-table ML, each new task means a new feature pipeline. With KumoRFM, each new task means a new PQL query against the same connected data.

cost_at_scale_flat_table_vs_kumo (20 prediction tasks, annual)

cost_dimensionflat-table_ML_on_SnowflakeKumoRFM_on_Snowflakesavings
Feature engineering labor246 hours ($61,500)0 hours ($0)$61,500
ML platform licensing$150K-$250K (DataRobot/H2O) or $0 (Snowpark ML)$80K-$120KVaries
Data science team (feature pipelines)3-4 FTEs ($450K-$600K)0.5 FTE ($75K)$375K-$525K
Pipeline maintenance520 hours/year ($130K)20 hours/year ($5K)$125K
Total annual cost$650K-$900K$80K-$120K~85% savings

At 20 prediction tasks, the flat-table approach costs 6-8x more than KumoRFM, driven almost entirely by the feature engineering and pipeline maintenance that KumoRFM eliminates.

Frequently asked questions

How do I build ML models on Snowflake without moving data out?

You have seven options, ranging from SQL-only to full custom deployments. Snowflake Cortex ML Functions let you run forecasting, anomaly detection, and classification directly in SQL with no setup. Snowpark ML gives your Python team full control inside Snowflake's compute. Snowflake Notebooks with Container Runtime offer the most flexibility for custom workflows. DataRobot and H2O.ai both offer Snowflake Native Apps for AutoML. KumoRFM runs as a Snowflake Native App that reads multiple relational tables directly without flattening. And Snowpark Container Services lets you deploy any custom model inside Snowflake. In every case, your data stays inside your Snowflake account perimeter. The right choice depends on your team's skills, your data structure, and how much feature engineering you want to do yourself.

Which ML platforms have native Snowflake integration?

As of early 2026, the ML platforms with Snowflake Native App integrations include DataRobot, H2O.ai, and KumoRFM. These run inside your Snowflake account using Snowpark Container Services, so data never leaves your environment. Additionally, Snowflake's own Cortex ML Functions and Snowpark ML are built into the platform. The key difference between these options is what they require from you: Cortex ML Functions are limited to simple forecasting and anomaly detection on single columns. DataRobot and H2O.ai require you to prepare a single flat feature table before they can work. KumoRFM is the only option that reads multiple related Snowflake tables directly and discovers features across them automatically.

What are my options for adding ML predictions to Snowflake?

Seven main options: (1) Snowflake Cortex ML Functions for SQL-based forecasting and anomaly detection, (2) Snowpark ML for Python-based model training inside Snowflake, (3) Snowflake Notebooks with Container Runtime for flexible experimentation, (4) DataRobot Snowflake Native App for AutoML on flat tables, (5) H2O.ai on Snowflake for open-source AutoML, (6) KumoRFM Snowflake Native App for graph-based ML on raw relational tables, and (7) custom models via Snowpark Container Services for full build-your-own flexibility. Options 1-3 are Snowflake-native. Options 4-6 are third-party native apps. Option 7 is infrastructure for deploying anything.

Do I need to flatten my Snowflake tables into a single table before running ML?

For six of the seven options, yes. Snowflake Cortex ML Functions operate on single columns. Snowpark ML, DataRobot, and H2O.ai all require a pre-joined flat feature table as input. Snowflake Notebooks give you flexibility, but you still write the join logic yourself. Custom Snowpark Container Services deployments require whatever your model expects. The sole exception is KumoRFM, which reads multiple related Snowflake tables directly using foreign key relationships. It builds a graph representation of your relational data and discovers predictive features across tables automatically. No joins, no flattening, no feature engineering code.

Is Snowflake Cortex ML enough for enterprise prediction tasks?

Snowflake Cortex ML Functions are useful for straightforward forecasting, anomaly detection, and basic classification on single-column time series data. They work well for simple questions like 'forecast next month's revenue' or 'flag anomalous transactions.' But they cannot handle multi-table prediction tasks like churn prediction (which requires patterns across customer, order, support, and product tables), fraud detection (which requires network analysis across accounts and transactions), or recommendation (which requires user-item interaction graphs). For these tasks, you need one of the other six options.

How does KumoRFM work inside Snowflake specifically?

KumoRFM runs as a Snowflake Native App inside Snowpark Container Services. When you install it, the compute runs within your Snowflake account. You point KumoRFM at your tables and define their relationships using foreign keys. Then you write a PQL (Predictive Query Language) query that describes what you want to predict. KumoRFM reads the raw tables, builds a graph representation of the relational structure, and generates predictions that get written back to a Snowflake table. Your data never leaves your Snowflake account. There is no data export, no external API calls, and no staging to cloud storage.

What accuracy difference should I expect between flat-table ML and relational ML on Snowflake?

On the RelBench benchmark (7 databases, 30 tasks, 103 million rows), flat-table approaches with manual feature engineering score approximately 62.44 AUROC. AutoML with manual features reaches roughly 64-66 AUROC. KumoRFM zero-shot achieves 76.71 AUROC. The 10+ point gap comes from predictive patterns that exist in relationships between tables but get lost when you flatten data into a single table. On the SAP SALT enterprise benchmark, KumoRFM scores 91% accuracy versus 75% for PhD data scientists with XGBoost and 63% for LLM+AutoML approaches. The more relational your data, the larger the accuracy gap.

Can I use multiple Snowflake ML options together?

Yes, and many teams do. A common pattern is using Snowflake Cortex ML Functions for simple time series forecasting (demand, revenue), KumoRFM for multi-table relational predictions (churn, fraud, recommendations), and Snowpark ML for custom models on specialized datasets. Since all options write predictions back to Snowflake tables, you can combine outputs in downstream analytics or applications. The key is matching each tool to the type of problem it handles best rather than forcing one tool to cover everything.

See it in action

KumoRFM delivers predictions on relational data in seconds. No feature engineering, no ML pipelines. Try it free.