Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
Learn14 min read

Kumo vs H2O.ai: Relational Foundation Model vs Open-Source AutoML

H2O.ai automates model selection and single-table feature engineering. Kumo automates multi-table feature discovery. H2O needs a pre-joined flat table as input. Kumo reads raw relational tables directly. This is not a marginal improvement - it eliminates the 80% bottleneck that H2O leaves entirely manual.

TL;DR

  • 1On the SAP SALT enterprise benchmark, KumoRFM scores 91% accuracy vs 75% for PhD data scientists with XGBoost and 63% for LLM+AutoML - with zero feature engineering and zero training time.
  • 2H2O.ai offers open-source AutoML (H2O-3) and commercial Driverless AI that automate model selection, hyperparameter tuning, and single-table feature engineering - the last 20% of the ML pipeline. Both still require a manually built flat feature table as input, leaving the 80% multi-table feature engineering bottleneck (12.3 hours and 878 lines of code per task) untouched.
  • 3KumoRFM is a relational foundation model that reads raw relational tables directly, discovering multi-hop predictive patterns across tables without any feature engineering. It automates the full pipeline, not just the modeling step.
  • 4On RelBench benchmarks, KumoRFM zero-shot achieves 76.71 AUROC vs AutoML + manual features at ~64-66 AUROC. The 10+ point gap comes from features the model discovers in relational structure that a flat table never contains.
  • 5At scale (20 prediction tasks), the H2O approach costs $650K-$900K/year including the data science team needed for feature engineering. The Kumo approach costs $80K-$120K/year - an 85% cost reduction.

H2O.ai is one of the most respected names in open-source machine learning. Since 2012, it has built a portfolio spanning the free, open-source H2O-3 library, the commercial Driverless AI platform, and more recently H2O LLM Studio for large language model fine-tuning. It has real strengths: algorithmic transparency, a strong academic community, proven performance in Kaggle competitions, and an open-source ethos that gives data scientists full control over the modeling process.

But H2O.ai is an AutoML platform. And AutoML, by design, solves one specific problem: given a flat feature table, find the best model. Driverless AI adds automatic single-table feature engineering - generating interactions, lag features, and transformations within a single table - but it does not solve the problem that consumes 80% of data science time: converting raw multi-table relational data into that flat feature table in the first place.

This is not a criticism of H2O's engineering. It is a description of AutoML's architecture. Every AutoML tool - H2O, DataRobot, Google AutoML, SageMaker Autopilot - takes a pre-built feature table as input. None of them can read a relational database directly. None of them discover features from table joins and multi-hop relationships. None of them eliminate the multi-table feature engineering bottleneck.

Kumo takes a different approach entirely. Instead of automating model selection on a feature table someone else built, KumoRFM reads raw relational tables directly and discovers predictive patterns across the full relational structure. This is the difference between optimizing a step in the pipeline and eliminating the pipeline.

The headline result: SAP SALT benchmark

Before diving into detailed comparisons, here is the result that matters most. The SAP SALT benchmark is an enterprise-grade evaluation where real business analysts and data scientists attempt prediction tasks on SAP enterprise data. It measures how accurately different approaches predict real business outcomes (customer behavior, demand patterns, operational metrics) on production-quality enterprise databases with multiple related tables.

sap_salt_enterprise_benchmark

approachaccuracywhat_it_means
LLM + AutoML63%Language model generates features, AutoML selects model
PhD Data Scientist + XGBoost75%Expert spends weeks hand-crafting features, tunes XGBoost
KumoRFM (zero-shot)91%No feature engineering, no training, reads relational tables directly

SAP SALT benchmark: KumoRFM outperforms expert data scientists by 16 percentage points and LLM+AutoML by 28 percentage points. Zero feature engineering. Zero training. The model reads raw enterprise tables and predicts.

This is not a marginal improvement. KumoRFM scores 91% where PhD-level data scientists with weeks of feature engineering and hand-tuned XGBoost score 75%. The 16 percentage point gap is the value of reading relational data natively instead of flattening it into a single table.

kumo_vs_h2o_comparison

dimensionH2O.aiKumo (KumoRFM)
Data inputSingle flat feature table (CSV, dataframe)Raw relational tables connected by foreign keys
Feature engineeringAutomatic within a single table (Driverless AI); manual across tablesAutomatic - model discovers features from full relational structure
Multi-table supportNone - requires pre-joined flat tableNative - reads multiple tables and discovers cross-table patterns
Time to first predictionWeeks (feature engineering) + hours (AutoML training)~1 second (zero-shot) to minutes (fine-tuned)
Accuracy on relational data~64-66 AUROC (limited by manual features)76.71 AUROC zero-shot, 81.14 fine-tuned
ExplainabilityFeature importance, SHAP values, transparent algorithmsFeature importance across discovered relational patterns
Open-source optionYes - H2O-3 is fully open-source (Apache 2.0)No - commercial platform with managed deployment
Snowflake integrationImport/export via connectorsNative Snowflake-based processing, no data movement
Pricing modelH2O-3 free; Driverless AI per-seat + compute licensingPer-prediction-task, no per-seat fees
Pipeline maintenanceFeature pipelines + model retraining + monitoringNo feature pipelines to maintain
Best forSingle-table problems, open-source transparency, data scientists who want controlMulti-table relational data, fast iteration, teams without dedicated DS

Head-to-head comparison across 11 dimensions. The key difference is not model quality - H2O builds strong models on the data it receives. The difference is what data it receives.

What H2O.ai does well

H2O.ai has earned its reputation in the ML community. A fair comparison requires acknowledging where the platform genuinely excels.

  • Open-source transparency. H2O-3 is fully open-source under an Apache 2.0 license. Data scientists can inspect every algorithm, trace every decision, and modify the code. For teams that value reproducibility and algorithmic control, this matters. There is no black box.
  • Model selection and tuning. H2O's AutoML tests a wide range of algorithms (GBMs, random forests, deep learning, GLMs, stacked ensembles) and automatically selects the best performer with optimized hyperparameters. On a clean, well-engineered feature table, it consistently outperforms manual model selection.
  • Single-table feature engineering (Driverless AI). Driverless AI goes beyond basic AutoML by automatically generating interactions, lag features, target encoding, and transformations within a single table. This is a meaningful advantage over AutoML tools that only do model selection - it partially automates feature engineering for single-table data.
  • Academic and Kaggle community. H2O has deep roots in the data science competition community. Many Kaggle grandmasters use H2O, and the platform is well-documented in academic research. This creates a rich ecosystem of tutorials, benchmarks, and community support.
  • Algorithmic control. H2O gives data scientists fine-grained control over algorithms, constraints, monotonicity, and model complexity. For teams that need to explain every model decision to regulators or auditors, this control is essential.

What H2O.ai requires you to do manually

H2O's input is a flat feature table. Driverless AI can engineer features within that single table, but everything that happens before the table exists - the multi-table joins, aggregations, and cross-entity pattern extraction - is your responsibility. For enterprise data that lives in relational databases, this is the majority of the work.

  • Table joins. Your customer data spans customers, orders, products, interactions, support tickets, and payment tables. Someone writes the SQL to join them. For 5 tables with temporal constraints, this is easily 100+ lines of SQL.
  • Cross-table aggregations. H2O cannot compute avg_order_value_last_90d, support_tickets_last_30d, or product_return_rate_by_category from raw relational tables. Each cross-table aggregation must be pre-computed and added as a column to the flat table before H2O sees it.
  • Temporal feature engineering across tables. Driverless AI can generate lag features within a single table, but cross-table temporal patterns (purchase frequency accelerating while support tickets increase, engagement declining across multiple product lines over 6 weeks) must be manually encoded. H2O sees a static snapshot, not cross-table temporal sequences.
  • Multi-hop pattern encoding. If a customer's churn risk depends on the satisfaction scores of other customers who bought the same products, that three-hop relationship (customer → orders → products → other customers' reviews) must be manually computed and flattened into a single column.
  • Feature iteration. When the first model underperforms, the data scientist goes back and engineers more features. This iteration loop - build features, train model, evaluate, build more features - averages 3-4 cycles per task.

what_the_flat_table_misses_vs_relational_model (lead scoring example)

signalvisible in flat table (H2O)visible in relational model (Kumo)
Total emails openedYes - single column: emails_opened = 7Yes - plus sequence, recency, and response time patterns
Content progressionNo - only total page viewsYes - Blog > Case study > API docs > Pricing (buying signal)
Multi-threaded engagementNo - aggregated to one rowYes - 4 contacts from 3 departments active on this account
Similar account outcomesNo - no cross-entity joinsYes - accounts with similar profile closed at 73% win rate
Firmographic momentumNo - static company size onlyYes - company raised Series B 30 days ago, hiring 12 engineers
Product engagement depthNo - boolean feature_used = trueYes - tried 3 integrations, API call volume increased 4x this week

A concrete lead scoring example. The flat table H2O receives captures simple counts. The relational model captures the behavioral patterns, sequences, and cross-entity signals that actually predict conversion.

H2O.ai workflow

  • Data scientist writes SQL to join 5+ tables (2-4 hours)
  • Data scientist computes cross-table aggregations and temporal features (4-6 hours)
  • Data scientist iterates on features 3-4 times (4-6 hours)
  • Upload flat table to H2O / Driverless AI
  • H2O runs AutoML: tests algorithms, tunes hyperparameters, generates single-table features (1-2 hours)
  • Deploy best model, maintain feature pipeline ongoing

Kumo workflow

  • Connect Kumo to your data warehouse (one-time setup)
  • Write a PQL query defining what you want to predict
  • KumoRFM reads raw tables, discovers features, returns predictions
  • Zero feature engineering, zero model selection, zero pipeline code
  • Time to first prediction: ~1 second (zero-shot)
  • No feature pipeline to maintain

Benchmark results: RelBench

The RelBench benchmark provides an apples-to-apples comparison across 7 databases, 30 prediction tasks, and 103 million rows. These are real relational datasets - not pre-flattened Kaggle tables - which is why the gap between approaches is so stark.

AUROC (Area Under the Receiver Operating Characteristic curve) measures how well a model distinguishes between positive and negative outcomes. An AUROC of 50 means random guessing. An AUROC of 100 means perfect prediction. In practice, moving from 65 to 77 AUROC is a significant improvement - it means the model correctly ranks a true positive above a true negative 77% of the time instead of 65%. For fraud detection, that difference can mean catching 40% more fraud with the same false positive rate. For churn prediction, it means identifying at-risk customers weeks earlier.

relbench_benchmark_results

approachAUROCfeature_engineering_timelines_of_codewhat_is_automated
LightGBM + manual features62.4412.3 hours per task878Nothing - fully manual pipeline
AutoML (H2O-class) + manual features~64-6610.5 hours per task878Model selection and tuning only
KumoRFM zero-shot76.71~1 second0Feature discovery + model + inference
KumoRFM fine-tuned81.14Minutes0Full pipeline + task-specific adaptation

Highlighted: KumoRFM zero-shot outperforms the AutoML approach by 10+ AUROC points with zero feature engineering. The gap is not about model quality - it is about the features the model discovers in the raw relational structure.

The 2-4 point improvement from LightGBM to AutoML reflects the value of better model selection. The 10+ point improvement from AutoML to KumoRFM reflects the value of better features - features that exist in the relational structure but never make it into the flat table. H2O cannot close this gap by building a better model or engineering better single-table features, because the cross-table signals are not in the data it receives.

PQL Query

PREDICT churn_90d
FOR EACH customers.customer_id
WHERE customers.segment = 'enterprise'

One PQL query replaces the entire H2O pipeline: the SQL joins, the feature engineering code, the feature iteration cycles, and the AutoML model selection. KumoRFM reads the raw customers, orders, products, support_tickets, and payments tables directly.

Output

customer_idchurn_prob_kumochurn_prob_automldelta
C-44010.870.72+15 points (Kumo detects declining multi-product engagement)
C-44020.120.31Kumo correctly lower (stable cross-department usage)
C-44030.930.58+35 points (Kumo sees support escalation + similar account churn pattern)
C-44040.080.11Both correctly low (healthy account)

The cost comparison at scale

The accuracy gap matters. But for most enterprises, the cost gap is what changes the decision. H2O-3 is free, but free software is not free to operate. Despite broad evaluation of AutoML tools, adoption as a primary ML workflow remains limited. Separately, Gartner and IDC estimate that 53-88% of ML models never reach production. The reason is not model quality - it is the cost and complexity of the feature engineering pipeline that AutoML still demands.

total_cost_of_ownership (20 prediction tasks, annual)

cost_dimensionH2O approachKumo approachsavings
Feature engineering labor246 hours ($61,500)0 hours$61,500
H2O / Kumo platform license$150K-$250K (Driverless AI) or $0 (H2O-3)$80K-$120K$70K-$130K
Data science team (feature pipelines)3-4 FTEs ($450K-$600K)0.5 FTE ($75K)$375K-$525K
Pipeline maintenance (annual)520 hours ($130K)20 hours ($5K)$125K
Time to new prediction task2-4 weeksMinutes99%+ reduction
Total annual cost$650K-$900K$80K-$120K~85% savings

Highlighted: the 85% cost savings come almost entirely from eliminating feature engineering labor and pipeline maintenance - work that H2O's AutoML does not automate. Even when H2O-3 is free, the data science team required to prepare multi-table data dominates total cost.

When to choose H2O.ai

H2O.ai is a strong platform in specific scenarios. Choose H2O when:

  • You need open-source transparency. If your organization requires full visibility into model algorithms, reproducibility without vendor lock-in, or the ability to modify the ML framework itself, H2O-3's open-source Apache 2.0 license is a genuine differentiator. You can audit every line of code.
  • Your data is already in a single flat table. If you have a well-curated CSV or dataframe with all the features you need, H2O's AutoML will find the best model efficiently. Driverless AI will additionally generate single-table features that can further improve accuracy.
  • Your team wants algorithmic control. If your data scientists want to select specific algorithms, set monotonicity constraints, customize ensembles, or understand every modeling decision, H2O provides that control. For regulated industries where model explainability is a compliance requirement, this is valuable.
  • You have a strong data science team. H2O is built by data scientists, for data scientists. If you have a skilled team that enjoys the feature engineering process and wants hands-on control, H2O is an excellent tool for the modeling step of their pipeline.
  • Kaggle-style benchmarking. For single-table competitions or internal model bake-offs where the feature table is provided, H2O is one of the best tools available. Its stacked ensemble approach consistently places well in competitions.

When to choose Kumo

Kumo solves a different problem than H2O.ai. Choose Kumo when:

  • Your data lives in multiple relational tables. Customers, orders, products, interactions, support tickets - if your predictive signals span table boundaries, Kumo discovers them automatically. H2O requires you to flatten them first.
  • You do not have a large data science team. If you cannot dedicate 3-4 FTEs to feature engineering and pipeline maintenance, Kumo eliminates that requirement entirely. A single ML engineer or analyst can operate the platform.
  • Speed to production matters. KumoRFM delivers predictions in approximately 1 second (zero-shot) versus weeks for the H2O pipeline. When business conditions change quickly, the ability to stand up a new prediction task in minutes is a competitive advantage.
  • You need maximum accuracy on relational data. The 10+ AUROC point gap between AutoML and KumoRFM on relational benchmarks translates directly to business outcomes: more fraud caught, fewer false positives, better-targeted campaigns, lower churn.
  • You want to scale prediction tasks. Going from 1 to 20 prediction tasks with H2O means 20 separate feature engineering pipelines. With Kumo, it means 20 PQL queries against the same connected data - marginal cost near zero.

Frequently asked questions

What is the main difference between Kumo and H2O.ai?

H2O.ai is an AutoML platform - available as the open-source H2O-3 library and the commercial Driverless AI product - that automates model selection, hyperparameter tuning, and single-table feature engineering on a pre-built flat feature table. Kumo uses a relational foundation model (KumoRFM) that reads raw relational tables directly, discovering predictive patterns across multiple tables without any manual feature engineering. H2O automates modeling on a single table. Kumo automates the full pipeline including the multi-table feature discovery that H2O leaves manual.

Does H2O.ai handle multi-table relational data?

No. Both H2O-3 and Driverless AI require a single flat feature table as input. If your data lives in multiple relational tables (customers, orders, products, interactions), someone must write the SQL joins, compute aggregations, and flatten everything into one row per entity before H2O can use it. H2O cannot discover multi-hop patterns across tables or preserve temporal sequences from raw relational data.

Is H2O.ai accurate on relational data?

H2O is effective at selecting and tuning models on the features it receives, and Driverless AI adds automatic single-table feature engineering that can improve results further. On single-table or well-engineered flat datasets, it performs well - it has a strong track record in Kaggle competitions. However, on relational data benchmarks like RelBench, AutoML approaches with manually engineered features score approximately 64-66 AUROC, while KumoRFM zero-shot achieves 76.71 AUROC. The gap is not about model selection quality but about the cross-table features the model never sees.

When should I choose H2O.ai over Kumo?

H2O.ai is a strong choice when your data is already in a single flat table, when you want an open-source option with full algorithmic transparency, when your team values control over the modeling process, or when you have strong data science talent that wants to understand and customize every step. It excels in Kaggle-style competitions, academic research, and environments where open-source licensing is a requirement.

How much does H2O.ai cost compared to Kumo for relational prediction tasks?

H2O-3 is free and open-source, but the total cost includes the data science team required for feature engineering. For an organization running 20 prediction tasks on relational data, the AutoML approach (including Driverless AI licensing or H2O-3 compute plus the data science team for feature engineering) costs approximately $650K-$900K per year. A foundation model approach with Kumo costs $80K-$120K per year, representing roughly 85% cost savings. The difference comes almost entirely from eliminating the manual feature engineering that H2O still requires.

Can I migrate from H2O.ai to Kumo?

Yes. Because Kumo reads raw relational tables directly, migration does not require rebuilding feature pipelines. You connect Kumo to your data warehouse (Snowflake, BigQuery, Databricks), define your prediction tasks in PQL (Predictive Query Language), and get predictions immediately. The feature engineering code you maintained for H2O becomes unnecessary. Many organizations see their first predictions within hours of connecting their data.

See it in action

KumoRFM delivers predictions on relational data in seconds. No feature engineering, no ML pipelines. Try it free.