How do I build ML models on Snowflake without moving data out?

You have seven options, ranging from SQL-only to full custom deployments. Snowflake Cortex ML Functions let you run forecasting, anomaly detection, and classification directly in SQL with no setup. Snowpark ML gives your Python team full control inside Snowflake's compute. Snowflake Notebooks with Container Runtime offer the most flexibility for custom workflows. DataRobot and H2O.ai both offer Snowflake Native Apps for AutoML. KumoRFM runs as a Snowflake Native App that reads multiple relational tables directly without flattening. And Snowpark Container Services lets you deploy any custom model inside Snowflake. In every case, your data stays inside your Snowflake account perimeter. The right choice depends on your team's skills, your data structure, and how much feature engineering you want to do yourself.

Which ML platforms have native Snowflake integration?

As of early 2026, the ML platforms with Snowflake Native App integrations include DataRobot, H2O.ai, and KumoRFM. These run inside your Snowflake account using Snowpark Container Services, so data never leaves your environment. Additionally, Snowflake's own Cortex ML Functions and Snowpark ML are built into the platform. The key difference between these options is what they require from you: Cortex ML Functions are limited to simple forecasting and anomaly detection on single columns. DataRobot and H2O.ai require you to prepare a single flat feature table before they can work. KumoRFM is the only option that reads multiple related Snowflake tables directly and discovers features across them automatically.

What are my options for adding ML predictions to Snowflake?

Seven main options: (1) Snowflake Cortex ML Functions for SQL-based forecasting and anomaly detection, (2) Snowpark ML for Python-based model training inside Snowflake, (3) Snowflake Notebooks with Container Runtime for flexible experimentation, (4) DataRobot Snowflake Native App for AutoML on flat tables, (5) H2O.ai on Snowflake for open-source AutoML, (6) KumoRFM Snowflake Native App for graph-based ML on raw relational tables, and (7) custom models via Snowpark Container Services for full build-your-own flexibility. Options 1-3 are Snowflake-native. Options 4-6 are third-party native apps. Option 7 is infrastructure for deploying anything.

Do I need to flatten my Snowflake tables into a single table before running ML?

For six of the seven options, yes. Snowflake Cortex ML Functions operate on single columns. Snowpark ML, DataRobot, and H2O.ai all require a pre-joined flat feature table as input. Snowflake Notebooks give you flexibility, but you still write the join logic yourself. Custom Snowpark Container Services deployments require whatever your model expects. The sole exception is KumoRFM, which reads multiple related Snowflake tables directly using foreign key relationships. It builds a graph representation of your relational data and discovers predictive features across tables automatically. No joins, no flattening, no feature engineering code.

Is Snowflake Cortex ML enough for enterprise prediction tasks?

Snowflake Cortex ML Functions are useful for straightforward forecasting, anomaly detection, and basic classification on single-column time series data. They work well for simple questions like 'forecast next month's revenue' or 'flag anomalous transactions.' But they cannot handle multi-table prediction tasks like churn prediction (which requires patterns across customer, order, support, and product tables), fraud detection (which requires network analysis across accounts and transactions), or recommendation (which requires user-item interaction graphs). For these tasks, you need one of the other six options.

How does KumoRFM work inside Snowflake specifically?

KumoRFM runs as a Snowflake Native App inside Snowpark Container Services. When you install it, the compute runs within your Snowflake account. You point KumoRFM at your tables and define their relationships using foreign keys. Then you write a PQL (Predictive Query Language) query that describes what you want to predict. KumoRFM reads the raw tables, builds a graph representation of the relational structure, and generates predictions that get written back to a Snowflake table. Your data never leaves your Snowflake account. There is no data export, no external API calls, and no staging to cloud storage.

What accuracy difference should I expect between flat-table ML and relational ML on Snowflake?

On the RelBench benchmark (7 databases, 30 tasks, 103 million rows), flat-table approaches with manual feature engineering score approximately 62.44 AUROC. AutoML with manual features reaches roughly 64-66 AUROC. KumoRFM zero-shot achieves 76.71 AUROC. The 10+ point gap comes from predictive patterns that exist in relationships between tables but get lost when you flatten data into a single table. On the SAP SALT enterprise benchmark, KumoRFM scores 91% accuracy versus 75% for PhD data scientists with XGBoost and 63% for LLM+AutoML approaches. The more relational your data, the larger the accuracy gap.

Can I use multiple Snowflake ML options together?

Yes, and many teams do. A common pattern is using Snowflake Cortex ML Functions for simple time series forecasting (demand, revenue), KumoRFM for multi-table relational predictions (churn, fraud, recommendations), and Snowpark ML for custom models on specialized datasets. Since all options write predictions back to Snowflake tables, you can combine outputs in downstream analytics or applications. The key is matching each tool to the type of problem it handles best rather than forcing one tool to cover everything.

Build ML Models on Snowflake Without Moving Data: All 7 Options Compared | Kumo.ai

If your data lives in Snowflake and you want to add ML predictions, the good news is you have options. The bad news is that most guides only mention one or two of them, and none lay out the full picture with honest tradeoffs.

This guide covers every viable option as of early 2026. For each one, we explain what it does, who it is for, what it requires from your team, and where it falls short. We are opinionated about which option works best for multi-table relational data, and we will tell you why.

All 7 options for ML on Snowflake without moving data

Here is the complete list, ordered from simplest to most flexible:

1. Snowflake Cortex ML Functions

SQL-based, built into Snowflake. Covers forecasting, anomaly detection, and classification. No setup, no Python, no external tools. Operates on single columns or simple tables only and cannot handle multi-table prediction tasks.

Best for: Analysts who need quick forecasts without leaving SQL.
Watch out for: Cannot join or reason across multiple tables. If your prediction depends on patterns across customers, orders, and products, Cortex ML Functions will miss them entirely.

2. Snowpark ML

Python-based model training and inference inside Snowflake's compute. Full access to scikit-learn, XGBoost, LightGBM, and PyTorch through Snowpark Python. Requires a data science team that can write feature engineering code, train models, and manage the pipeline.

Best for: Data science teams that want full control over the modeling process while keeping data inside Snowflake.
Watch out for: You still need to join and flatten your tables into a single training dataframe yourself. The modeling is easy; the feature engineering is the bottleneck.

3. Snowflake Notebooks + Container Runtime

Jupyter-style notebooks running inside Snowflake with GPU support. The most flexible option for experimentation. You can run any Python library, build custom deep learning models, and iterate quickly.

Best for: ML research teams prototyping new approaches that need maximum flexibility.
Watch out for: Highest engineering requirement. You build everything from scratch, including data prep, feature engineering, model training, and deployment. This is a blank canvas, not a solution.

4. DataRobot Snowflake Native App

AutoML platform that runs inside your Snowflake account. Automates model selection, hyperparameter tuning, and provides a visual interface for non-coders.

Best for: Teams that want AutoML without exporting data and prefer a visual, low-code interface.
Watch out for: Still requires a pre-joined flat feature table as input. DataRobot automates the modeling step but not the multi-table feature engineering that typically takes 80% of the effort.

5. H2O.ai on Snowflake

Open-source AutoML (H2O-3) and commercial Driverless AI available through Snowflake. Strong algorithmic transparency and community support. Driverless AI adds automatic single-table feature engineering.

Best for: Teams that value open-source transparency and already have data science skills.
Watch out for: Same flat-table requirement as DataRobot. Multi-table joins and cross-entity aggregations are your responsibility. Needs data science expertise to operate effectively.

6. KumoRFM Snowflake Native App

Relational foundation model that runs as a Snowflake Native App inside Snowpark Container Services. The only option on this list that reads multiple related Snowflake tables directly using foreign key relationships. Zero feature engineering, zero data flattening. You write a PQL (Predictive Query Language) query describing what you want to predict, and KumoRFM discovers features across your full relational structure automatically.

Best for: Teams with multi-table relational data who want predictions without building feature pipelines.
Watch out for: Commercial platform, not open-source. If you need full algorithmic source code access, this is not the option.

7. Custom models via Snowpark Container Services

The infrastructure layer that powers several options above. You can deploy any Docker container inside Snowflake, which means any ML framework, any custom model, any inference pipeline.

Best for: Teams with specific requirements that no existing platform meets and the engineering capacity to build from scratch.
Watch out for: You build and maintain everything. This is infrastructure, not a solution. Expect weeks to months before your first prediction.

Side-by-side comparison: all 7 options

snowflake_ml_options_comparison

option	data_input	feature_engineering	team_required	time_to_first_prediction	multi-table_support
Cortex ML Functions	Single column / table	None (SQL functions only)	SQL analyst	Minutes	No
Snowpark ML	Single flat dataframe	Manual (Python)	Data science team	Days to weeks	Manual joins only
Notebooks + Container Runtime	Any (you build it)	Manual (Python)	ML engineers	Days to weeks	Manual joins only
DataRobot Native App	Single flat table	Automatic (single-table only)	ML-literate analyst	Hours (after table prep)	No
H2O.ai on Snowflake	Single flat table	Automatic single-table (Driverless AI)	Data science team	Hours (after table prep)	No
KumoRFM Native App	Multiple related tables	Automatic (cross-table)	ML engineer or analyst	Minutes	Yes - native
Custom (Snowpark Container Services)	Any (you build it)	Manual (any framework)	ML + infrastructure engineers	Weeks to months	Whatever you build

Highlighted: KumoRFM is the only option that accepts multiple related tables as input and automates feature discovery across them. All other options require you to prepare a single flat table first.

The real bottleneck: who builds the flat table?

Look at the comparison table again. Six of seven options require a flat feature table as input. The question most Snowflake ML guides skip is: who builds that table?

For a typical enterprise prediction task like churn prediction, fraud detection, or lead scoring, the flat table requires joining 5-10 related tables, computing temporal aggregations (average order value over 90 days, support tickets in the last 30 days, login frequency trends), and encoding cross-entity patterns. The Stanford RelBench study measured this effort: 12.3 hours and 878 lines of code per prediction task, on average.

This is not a one-time cost. Feature pipelines break when schemas change. They need updating when business logic evolves. They require monitoring for data drift. At scale, maintaining 10-20 feature pipelines demands 3-4 full-time data scientists.

What flat-table ML misses on Snowflake data

When you flatten relational tables into a single row per entity, you lose structural information that drives prediction accuracy. Here is a concrete example for a churn prediction task on a typical Snowflake data warehouse with customers, orders, products, support tickets, and payments tables:

signals_lost_when_flattening_snowflake_tables

signal	visible_in_flat_table	visible_to_KumoRFM
Total order count	Yes - orders_count = 47	Yes - plus order sequence, frequency changes, and category shifts
Support ticket escalation pattern	No - only ticket_count = 3	Yes - tickets escalating from billing to technical to cancellation
Product return rate correlation	No - requires cross-table join	Yes - customer buys products with 23% return rate across all buyers
Payment method risk signal	No - only last_payment_method	Yes - switched from annual to monthly billing 60 days ago
Similar customer outcomes	No - no cross-entity patterns	Yes - customers with matching product mix churned at 68% rate
Multi-department engagement decline	No - aggregated to single score	Yes - usage dropping across 3 product lines simultaneously

A churn prediction example on Snowflake. The flat table captures simple counts and latest values. The relational model captures behavioral sequences, cross-entity patterns, and multi-hop signals that actually predict churn.

Benchmark results on relational data

The accuracy gap between flat-table approaches and relational ML is not theoretical. Two independent benchmarks quantify it.

sap_salt_enterprise_benchmark

approach	accuracy	feature_engineering_required
LLM + AutoML	63%	LLM generates features, AutoML selects model
PhD Data Scientist + XGBoost	75%	Weeks of expert hand-crafted features
KumoRFM (zero-shot)	91%	Zero - reads relational tables directly

SAP SALT benchmark on enterprise data. KumoRFM outperforms expert data scientists by 16 percentage points and LLM+AutoML by 28 percentage points, with no feature engineering and no training.

relbench_benchmark_results

approach	AUROC	feature_engineering_time	lines_of_code
LightGBM + manual features	62.44	12.3 hours per task	878
AutoML + manual features	~64-66	10.5 hours per task	878
KumoRFM zero-shot	76.71	~1 second	0
KumoRFM fine-tuned	81.14	Minutes	0

RelBench benchmark (7 databases, 30 tasks, 103M rows). KumoRFM zero-shot scores 76.71 vs 62.44 for manual flat-table approaches. The gap comes from features discovered in relational structure that flat tables never contain.

The 10+ AUROC point gap on RelBench and the 16 percentage point gap on SAP SALT both measure the same thing: the predictive information that lives in relationships between tables and gets destroyed when you flatten data into a single table. Better model selection cannot recover this information. Better single-table feature engineering cannot recover it either. Only reading the relational structure directly can capture it.

How KumoRFM works on Snowflake: a PQL example

KumoRFM uses Predictive Query Language (PQL) to define prediction tasks. Instead of writing SQL joins, feature engineering code, and model training scripts, you describe what you want to predict. Here is a churn prediction query running on Snowflake tables:

PQL Query

PREDICT churn_90d
FOR EACH customers.customer_id
USING
  snowflake.customers,
  snowflake.orders,
  snowflake.products,
  snowflake.support_tickets,
  snowflake.payments

This single PQL query replaces the entire ML pipeline: the SQL joins across 5 Snowflake tables, the feature engineering code, the model training, and the deployment. KumoRFM reads all 5 tables, discovers predictive features across their relationships, and writes churn probabilities back to a Snowflake table. Data never leaves your Snowflake account.

Output

customer_id	churn_probability	top_risk_factors
C-8801	0.89	Support escalation + declining order frequency + similar account churn
C-8802	0.14	Stable multi-product usage + expanding seat count
C-8803	0.72	Payment method downgrade + product return rate increasing
C-8804	0.06	Growing order volume + positive support interactions

Workflow comparison: flat-table ML vs KumoRFM on Snowflake

Flat-table ML on Snowflake (Cortex/Snowpark/DataRobot/H2O)

Write SQL to join 5-10 Snowflake tables into a single flat table (4-8 hours)
Compute temporal aggregations and cross-entity features (4-6 hours)
Iterate on features 3-4 times as initial model underperforms (4-8 hours)
Feed flat table to chosen ML tool (Snowpark ML, DataRobot, or H2O)
Train model, tune hyperparameters, evaluate (1-4 hours)
Deploy model, schedule retraining, maintain feature pipeline
Repeat entire process when schema changes or new tables are added

KumoRFM on Snowflake

Install KumoRFM Snowflake Native App (one-time setup)
Point KumoRFM at your Snowflake tables and define foreign key relationships
Write a PQL query describing what you want to predict
KumoRFM reads raw tables, discovers features across relationships, returns predictions
Predictions written back to a Snowflake table automatically
No feature engineering code, no flat table, no pipeline to maintain
New tables? Add them to the schema. KumoRFM discovers new features automatically.

Security and compliance: data never leaves Snowflake

For regulated industries like financial services, healthcare, and insurance, data movement is not just an engineering inconvenience. It creates compliance exposure. Every time data leaves your Snowflake account, you add a new surface area for data governance, access control, and audit trail requirements.

All seven options on this list can run inside Snowflake to varying degrees. But the security posture differs:

Cortex ML Functions and Snowpark ML are fully native to Snowflake. Data stays within your account by design.
DataRobot, H2O.ai, and KumoRFM Native Apps run inside Snowpark Container Services. Data stays within your Snowflake account. The compute container has no outbound network access to your data.
Custom Snowpark Container Services deployments depend on how you configure them. You control the security posture.

KumoRFM takes this a step further: because it reads relational tables directly without requiring data exports to a staging layer or a separate feature store, there is no intermediate data copy outside Snowflake's governance model. Your existing Snowflake RBAC policies, masking rules, and audit logs apply to the ML workflow exactly as they do to your analytics queries.

Who built KumoRFM

KumoRFM was built by the team behind the ML systems at Pinterest, Airbnb, and LinkedIn: Vanja Josifovski (CEO, former CTO at Airbnb and Pinterest), Jure Leskovec (Chief Scientist, Stanford professor, co-creator of GraphSAGE), and Hema Raghavan (Head of Engineering, former Sr. Director at LinkedIn). Backed by Sequoia Capital.

Notably, Sridhar Ramaswamy, CEO of Snowflake, serves as a Kumo advisor. This relationship reflects the strategic alignment between Kumo's relational foundation model approach and Snowflake's vision for in-warehouse AI. KumoRFM was designed from the ground up to run natively on Snowflake, not bolted on as an afterthought.

Choosing the right option for your team

The right choice depends on three factors: your data structure, your team's skills, and how many prediction tasks you need to run.

Single-column forecasting or anomaly detection? Start with Snowflake Cortex ML Functions. They are free, built-in, and take minutes to set up. No reason to use anything heavier.
Single flat table with a strong data science team? Snowpark ML gives you full control. DataRobot or H2O.ai add AutoML if you want to automate model selection.
Multi-table relational data? KumoRFM is the only option that handles this without requiring you to flatten tables first. If your predictions depend on patterns across customers, orders, products, support tickets, and other related tables, KumoRFM eliminates the feature engineering bottleneck that every other option leaves to you.
Custom deep learning or LLM workloads? Snowflake Notebooks with Container Runtime or bare Snowpark Container Services give you the flexibility to run anything. Be prepared to build and maintain the full pipeline yourself.
Scaling from 1 to 20+ prediction tasks? This is where the approach differences compound. With flat-table ML, each new task means a new feature pipeline. With KumoRFM, each new task means a new PQL query against the same connected data.

cost_at_scale_flat_table_vs_kumo (20 prediction tasks, annual)

cost_dimension	flat-table_ML_on_Snowflake	KumoRFM_on_Snowflake	savings
Feature engineering labor	246 hours ($61,500)	0 hours ($0)	$61,500
ML platform licensing	$150K-$250K (DataRobot/H2O) or $0 (Snowpark ML)	$80K-$120K	Varies
Data science team (feature pipelines)	3-4 FTEs ($450K-$600K)	0.5 FTE ($75K)	$375K-$525K
Pipeline maintenance	520 hours/year ($130K)	20 hours/year ($5K)	$125K
Total annual cost	$650K-$900K	$80K-$120K	~85% savings

At 20 prediction tasks, the flat-table approach costs 6-8x more than KumoRFM, driven almost entirely by the feature engineering and pipeline maintenance that KumoRFM eliminates.

Key Takeaways

1Snowflake users have 7 options for ML without moving data: Cortex ML Functions, Snowpark ML, Notebooks + Container Runtime, DataRobot Native App, H2O.ai, KumoRFM Native App, and custom Snowpark Container Services. Each serves a different use case and requires a different level of investment.
2Six of seven options require you to prepare a single flat feature table before ML can begin. This flattening step takes 12.3 hours and 878 lines of code per prediction task on average, destroys relational signals, and creates ongoing maintenance burden.
3KumoRFM is the only Snowflake-native option that reads multiple related tables directly, discovers features across table relationships automatically, and generates predictions without any feature engineering. On RelBench, this produces 76.71 AUROC vs 62.44 for flat-table approaches. On SAP SALT, 91% vs 75% for expert data scientists.
4KumoRFM runs inside your Snowflake account via Snowpark Container Services. Data never leaves Snowflake. Your existing RBAC policies, masking rules, and audit logs apply unchanged.
5For most Snowflake teams, the right answer is not one tool for everything. Use Cortex ML Functions for simple forecasting, KumoRFM for multi-table relational predictions, and Snowpark ML for custom models on specialized datasets.

Build ML Models on Snowflake Without Moving Data: All 7 Options Compared