If your data lives in Snowflake and you want to add ML predictions, the good news is you have options. The bad news is that most guides only mention one or two of them, and none lay out the full picture with honest tradeoffs.
This guide covers every viable option as of early 2026. For each one, we explain what it does, who it is for, what it requires from your team, and where it falls short. We are opinionated about which option works best for multi-table relational data, and we will tell you why.
All 7 options for ML on Snowflake without moving data
Here is the complete list, ordered from simplest to most flexible:
1. Snowflake Cortex ML Functions
SQL-based, built into Snowflake. Covers forecasting, anomaly detection, and classification. No setup, no Python, no external tools. Operates on single columns or simple tables only and cannot handle multi-table prediction tasks.
- Best for: Analysts who need quick forecasts without leaving SQL.
- Watch out for: Cannot join or reason across multiple tables. If your prediction depends on patterns across customers, orders, and products, Cortex ML Functions will miss them entirely.
2. Snowpark ML
Python-based model training and inference inside Snowflake's compute. Full access to scikit-learn, XGBoost, LightGBM, and PyTorch through Snowpark Python. Requires a data science team that can write feature engineering code, train models, and manage the pipeline.
- Best for: Data science teams that want full control over the modeling process while keeping data inside Snowflake.
- Watch out for: You still need to join and flatten your tables into a single training dataframe yourself. The modeling is easy; the feature engineering is the bottleneck.
3. Snowflake Notebooks + Container Runtime
Jupyter-style notebooks running inside Snowflake with GPU support. The most flexible option for experimentation. You can run any Python library, build custom deep learning models, and iterate quickly.
- Best for: ML research teams prototyping new approaches that need maximum flexibility.
- Watch out for: Highest engineering requirement. You build everything from scratch, including data prep, feature engineering, model training, and deployment. This is a blank canvas, not a solution.
4. DataRobot Snowflake Native App
AutoML platform that runs inside your Snowflake account. Automates model selection, hyperparameter tuning, and provides a visual interface for non-coders.
- Best for: Teams that want AutoML without exporting data and prefer a visual, low-code interface.
- Watch out for: Still requires a pre-joined flat feature table as input. DataRobot automates the modeling step but not the multi-table feature engineering that typically takes 80% of the effort.
5. H2O.ai on Snowflake
Open-source AutoML (H2O-3) and commercial Driverless AI available through Snowflake. Strong algorithmic transparency and community support. Driverless AI adds automatic single-table feature engineering.
- Best for: Teams that value open-source transparency and already have data science skills.
- Watch out for: Same flat-table requirement as DataRobot. Multi-table joins and cross-entity aggregations are your responsibility. Needs data science expertise to operate effectively.
6. KumoRFM Snowflake Native App
Relational foundation model that runs as a Snowflake Native App inside Snowpark Container Services. The only option on this list that reads multiple related Snowflake tables directly using foreign key relationships. Zero feature engineering, zero data flattening. You write a PQL (Predictive Query Language) query describing what you want to predict, and KumoRFM discovers features across your full relational structure automatically.
- Best for: Teams with multi-table relational data who want predictions without building feature pipelines.
- Watch out for: Commercial platform, not open-source. If you need full algorithmic source code access, this is not the option.
7. Custom models via Snowpark Container Services
The infrastructure layer that powers several options above. You can deploy any Docker container inside Snowflake, which means any ML framework, any custom model, any inference pipeline.
- Best for: Teams with specific requirements that no existing platform meets and the engineering capacity to build from scratch.
- Watch out for: You build and maintain everything. This is infrastructure, not a solution. Expect weeks to months before your first prediction.
Side-by-side comparison: all 7 options
snowflake_ml_options_comparison
| option | data_input | feature_engineering | team_required | time_to_first_prediction | multi-table_support |
|---|---|---|---|---|---|
| Cortex ML Functions | Single column / table | None (SQL functions only) | SQL analyst | Minutes | No |
| Snowpark ML | Single flat dataframe | Manual (Python) | Data science team | Days to weeks | Manual joins only |
| Notebooks + Container Runtime | Any (you build it) | Manual (Python) | ML engineers | Days to weeks | Manual joins only |
| DataRobot Native App | Single flat table | Automatic (single-table only) | ML-literate analyst | Hours (after table prep) | No |
| H2O.ai on Snowflake | Single flat table | Automatic single-table (Driverless AI) | Data science team | Hours (after table prep) | No |
| KumoRFM Native App | Multiple related tables | Automatic (cross-table) | ML engineer or analyst | Minutes | Yes - native |
| Custom (Snowpark Container Services) | Any (you build it) | Manual (any framework) | ML + infrastructure engineers | Weeks to months | Whatever you build |
Highlighted: KumoRFM is the only option that accepts multiple related tables as input and automates feature discovery across them. All other options require you to prepare a single flat table first.
The real bottleneck: who builds the flat table?
Look at the comparison table again. Six of seven options require a flat feature table as input. The question most Snowflake ML guides skip is: who builds that table?
For a typical enterprise prediction task like churn prediction, fraud detection, or lead scoring, the flat table requires joining 5-10 related tables, computing temporal aggregations (average order value over 90 days, support tickets in the last 30 days, login frequency trends), and encoding cross-entity patterns. The Stanford RelBench study measured this effort: 12.3 hours and 878 lines of code per prediction task, on average.
This is not a one-time cost. Feature pipelines break when schemas change. They need updating when business logic evolves. They require monitoring for data drift. At scale, maintaining 10-20 feature pipelines demands 3-4 full-time data scientists.
What flat-table ML misses on Snowflake data
When you flatten relational tables into a single row per entity, you lose structural information that drives prediction accuracy. Here is a concrete example for a churn prediction task on a typical Snowflake data warehouse with customers, orders, products, support tickets, and payments tables:
signals_lost_when_flattening_snowflake_tables
| signal | visible_in_flat_table | visible_to_KumoRFM |
|---|---|---|
| Total order count | Yes - orders_count = 47 | Yes - plus order sequence, frequency changes, and category shifts |
| Support ticket escalation pattern | No - only ticket_count = 3 | Yes - tickets escalating from billing to technical to cancellation |
| Product return rate correlation | No - requires cross-table join | Yes - customer buys products with 23% return rate across all buyers |
| Payment method risk signal | No - only last_payment_method | Yes - switched from annual to monthly billing 60 days ago |
| Similar customer outcomes | No - no cross-entity patterns | Yes - customers with matching product mix churned at 68% rate |
| Multi-department engagement decline | No - aggregated to single score | Yes - usage dropping across 3 product lines simultaneously |
A churn prediction example on Snowflake. The flat table captures simple counts and latest values. The relational model captures behavioral sequences, cross-entity patterns, and multi-hop signals that actually predict churn.
Benchmark results on relational data
The accuracy gap between flat-table approaches and relational ML is not theoretical. Two independent benchmarks quantify it.
sap_salt_enterprise_benchmark
| approach | accuracy | feature_engineering_required |
|---|---|---|
| LLM + AutoML | 63% | LLM generates features, AutoML selects model |
| PhD Data Scientist + XGBoost | 75% | Weeks of expert hand-crafted features |
| KumoRFM (zero-shot) | 91% | Zero - reads relational tables directly |
SAP SALT benchmark on enterprise data. KumoRFM outperforms expert data scientists by 16 percentage points and LLM+AutoML by 28 percentage points, with no feature engineering and no training.
relbench_benchmark_results
| approach | AUROC | feature_engineering_time | lines_of_code |
|---|---|---|---|
| LightGBM + manual features | 62.44 | 12.3 hours per task | 878 |
| AutoML + manual features | ~64-66 | 10.5 hours per task | 878 |
| KumoRFM zero-shot | 76.71 | ~1 second | 0 |
| KumoRFM fine-tuned | 81.14 | Minutes | 0 |
RelBench benchmark (7 databases, 30 tasks, 103M rows). KumoRFM zero-shot scores 76.71 vs 62.44 for manual flat-table approaches. The gap comes from features discovered in relational structure that flat tables never contain.
The 10+ AUROC point gap on RelBench and the 16 percentage point gap on SAP SALT both measure the same thing: the predictive information that lives in relationships between tables and gets destroyed when you flatten data into a single table. Better model selection cannot recover this information. Better single-table feature engineering cannot recover it either. Only reading the relational structure directly can capture it.
How KumoRFM works on Snowflake: a PQL example
KumoRFM uses Predictive Query Language (PQL) to define prediction tasks. Instead of writing SQL joins, feature engineering code, and model training scripts, you describe what you want to predict. Here is a churn prediction query running on Snowflake tables:
PQL Query
PREDICT churn_90d FOR EACH customers.customer_id USING snowflake.customers, snowflake.orders, snowflake.products, snowflake.support_tickets, snowflake.payments
This single PQL query replaces the entire ML pipeline: the SQL joins across 5 Snowflake tables, the feature engineering code, the model training, and the deployment. KumoRFM reads all 5 tables, discovers predictive features across their relationships, and writes churn probabilities back to a Snowflake table. Data never leaves your Snowflake account.
Output
| customer_id | churn_probability | top_risk_factors |
|---|---|---|
| C-8801 | 0.89 | Support escalation + declining order frequency + similar account churn |
| C-8802 | 0.14 | Stable multi-product usage + expanding seat count |
| C-8803 | 0.72 | Payment method downgrade + product return rate increasing |
| C-8804 | 0.06 | Growing order volume + positive support interactions |
Workflow comparison: flat-table ML vs KumoRFM on Snowflake
Flat-table ML on Snowflake (Cortex/Snowpark/DataRobot/H2O)
- Write SQL to join 5-10 Snowflake tables into a single flat table (4-8 hours)
- Compute temporal aggregations and cross-entity features (4-6 hours)
- Iterate on features 3-4 times as initial model underperforms (4-8 hours)
- Feed flat table to chosen ML tool (Snowpark ML, DataRobot, or H2O)
- Train model, tune hyperparameters, evaluate (1-4 hours)
- Deploy model, schedule retraining, maintain feature pipeline
- Repeat entire process when schema changes or new tables are added
KumoRFM on Snowflake
- Install KumoRFM Snowflake Native App (one-time setup)
- Point KumoRFM at your Snowflake tables and define foreign key relationships
- Write a PQL query describing what you want to predict
- KumoRFM reads raw tables, discovers features across relationships, returns predictions
- Predictions written back to a Snowflake table automatically
- No feature engineering code, no flat table, no pipeline to maintain
- New tables? Add them to the schema. KumoRFM discovers new features automatically.
Security and compliance: data never leaves Snowflake
For regulated industries like financial services, healthcare, and insurance, data movement is not just an engineering inconvenience. It creates compliance exposure. Every time data leaves your Snowflake account, you add a new surface area for data governance, access control, and audit trail requirements.
All seven options on this list can run inside Snowflake to varying degrees. But the security posture differs:
- Cortex ML Functions and Snowpark ML are fully native to Snowflake. Data stays within your account by design.
- DataRobot, H2O.ai, and KumoRFM Native Apps run inside Snowpark Container Services. Data stays within your Snowflake account. The compute container has no outbound network access to your data.
- Custom Snowpark Container Services deployments depend on how you configure them. You control the security posture.
KumoRFM takes this a step further: because it reads relational tables directly without requiring data exports to a staging layer or a separate feature store, there is no intermediate data copy outside Snowflake's governance model. Your existing Snowflake RBAC policies, masking rules, and audit logs apply to the ML workflow exactly as they do to your analytics queries.
Who built KumoRFM
KumoRFM was built by the team behind the ML systems at Pinterest, Airbnb, and LinkedIn: Vanja Josifovski (CEO, former CTO at Airbnb and Pinterest), Jure Leskovec (Chief Scientist, Stanford professor, co-creator of GraphSAGE), and Hema Raghavan (Head of Engineering, former Sr. Director at LinkedIn). Backed by Sequoia Capital.
Notably, Sridhar Ramaswamy, CEO of Snowflake, serves as a Kumo advisor. This relationship reflects the strategic alignment between Kumo's relational foundation model approach and Snowflake's vision for in-warehouse AI. KumoRFM was designed from the ground up to run natively on Snowflake, not bolted on as an afterthought.
Choosing the right option for your team
The right choice depends on three factors: your data structure, your team's skills, and how many prediction tasks you need to run.
- Single-column forecasting or anomaly detection? Start with Snowflake Cortex ML Functions. They are free, built-in, and take minutes to set up. No reason to use anything heavier.
- Single flat table with a strong data science team? Snowpark ML gives you full control. DataRobot or H2O.ai add AutoML if you want to automate model selection.
- Multi-table relational data? KumoRFM is the only option that handles this without requiring you to flatten tables first. If your predictions depend on patterns across customers, orders, products, support tickets, and other related tables, KumoRFM eliminates the feature engineering bottleneck that every other option leaves to you.
- Custom deep learning or LLM workloads? Snowflake Notebooks with Container Runtime or bare Snowpark Container Services give you the flexibility to run anything. Be prepared to build and maintain the full pipeline yourself.
- Scaling from 1 to 20+ prediction tasks? This is where the approach differences compound. With flat-table ML, each new task means a new feature pipeline. With KumoRFM, each new task means a new PQL query against the same connected data.
cost_at_scale_flat_table_vs_kumo (20 prediction tasks, annual)
| cost_dimension | flat-table_ML_on_Snowflake | KumoRFM_on_Snowflake | savings |
|---|---|---|---|
| Feature engineering labor | 246 hours ($61,500) | 0 hours ($0) | $61,500 |
| ML platform licensing | $150K-$250K (DataRobot/H2O) or $0 (Snowpark ML) | $80K-$120K | Varies |
| Data science team (feature pipelines) | 3-4 FTEs ($450K-$600K) | 0.5 FTE ($75K) | $375K-$525K |
| Pipeline maintenance | 520 hours/year ($130K) | 20 hours/year ($5K) | $125K |
| Total annual cost | $650K-$900K | $80K-$120K | ~85% savings |
At 20 prediction tasks, the flat-table approach costs 6-8x more than KumoRFM, driven almost entirely by the feature engineering and pipeline maintenance that KumoRFM eliminates.