Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
Learn11 min read

In-Context Learning for Structured Data: How Foundation Models Predict Without Training

In-context learning changed NLP: you give GPT a passage it has never seen, and it answers questions about it instantly. The same idea now works for structured data. A pre-trained model takes your tables as context and returns predictions in seconds, no training loop, no feature engineering, no waiting weeks for a pipeline. Here is how it works, which models do it, and where the limits are.

TL;DR

  • 1In-context learning (ICL) for structured data means a pre-trained model makes predictions on a new dataset without any training or fine-tuning. You pass the data as context, and the model returns predictions in a single forward pass. No gradient updates. No training loop.
  • 2Three models do ICL for structured data today: TabPFN (single flat tables, up to ~10K rows), KumoRFM (relational databases with multiple connected tables), and NICL/Neuralk (single tables for commerce use cases). KumoRFM is the only one that handles relational data.
  • 3Why it matters: traditional ML takes 6-12 weeks per prediction task (feature engineering, training, tuning). ICL gives predictions in seconds. On RelBench, KumoRFM zero-shot achieves 76.71 AUROC vs 62.44 for LightGBM with manual features, meaning ICL is not just faster but more accurate on relational data.
  • 4The hard part is relational data. A flat table is a matrix. A relational database is a graph of connected tables with different schemas, variable-length relationships, and temporal ordering. TabPFN cannot handle this. KumoRFM can, because it was pre-trained on tens of thousands of relational datasets.
  • 5ICL does not eliminate fine-tuning. KumoRFM fine-tuned reaches 81.14 AUROC on RelBench. But for speed to first prediction, rapid prototyping, and use cases where you cannot wait weeks, zero-shot ICL is already better than the traditional alternative.

If you have used ChatGPT or Claude, you have already experienced in-context learning. You paste a passage of text the model has never seen, ask a question about it, and get a correct answer instantly. The model did not train on that passage. It did not fine-tune. It recognized patterns from its pre-training and applied them to new content in a single forward pass.

Now apply that same idea to a database table. You give a pre-trained model a dataset it has never seen (say, your customer churn data), and it returns predictions without any training. No feature engineering. No hyperparameter tuning. No waiting for a training loop to converge. Just predictions, in seconds.

That is in-context learning for structured data. It is real, it works today, and it changes the economics of enterprise ML.

How in-context learning works for structured data

The mechanics are straightforward, even if the engineering behind them is not. An ICL model for structured data is pre-trained on a large corpus of datasets (thousands to millions of them). During pre-training, the model learns general patterns about how structured data behaves: what features predict outcomes, how tables relate to each other, what temporal patterns look like across different domains.

When you give it a new dataset at inference time, the model does not update its weights. It processes your data in a single forward pass, matches patterns from pre-training, and outputs predictions. This is different from traditional ML in a critical way: every new dataset requires a fresh training loop with gradient updates.

  1. Pre-training phase (done once): The model trains on thousands or millions of diverse datasets. For TabPFN, these are synthetic tabular datasets. For KumoRFM, these are real-world relational databases with multiple connected tables. The model learns general prediction patterns that transfer across domains.
  2. Inference phase (per new dataset): You pass your new dataset as input. The model recognizes which pre-trained patterns apply and generates predictions. No gradient updates. No training loop. One forward pass.
  3. Optional fine-tuning: For maximum accuracy on a specific dataset, you can fine-tune the model with a small number of gradient updates. This is faster than training from scratch and typically improves accuracy by 3-5 AUROC points over zero-shot.

The three models that do ICL for structured data

As of 2026, three models can perform genuine in-context learning on structured data. They differ significantly in what kind of structured data they handle and how they were built.

icl_models_structured_data_comparison

modeldeveloperdata_typepre-training_datamax_input_sizehandles_relational_data
TabPFNUniversity of FreiburgSingle flat tablesMillions of synthetic tabular datasets~10,000 rows, ~100 featuresNo (single table only)
KumoRFMKumo.aiRelational databases (multiple connected tables)Tens of thousands of real-world relational databasesEnterprise-scale relational graphsYes (multiple tables with foreign keys)
NICL (Neuralk)NeuralkSingle flat tablesCommerce and marketing datasetsModerate (single table)No (single table only)

Three ICL models for structured data. TabPFN and NICL handle single flat tables. KumoRFM is the only model that handles relational databases with multiple connected tables.

TabPFN: in-context learning on flat tables

TabPFN was the first model to demonstrate that in-context learning works for tabular data. Developed at the University of Freiburg, it was trained on millions of synthetic classification datasets generated by sampling from a prior over data-generating processes.

The result: you feed TabPFN a new flat table (features and labels for training rows, features only for test rows), and it returns class predictions in a single forward pass. No hyperparameter tuning. No model selection. On small to medium datasets (under 10,000 rows), TabPFN matches or beats tuned XGBoost and random forest. That is a genuine achievement.

The limitations are clear: TabPFN works on single flat tables only. Enterprise data does not live in single flat tables. If your churn prediction requires joining customer, order, support ticket, and usage tables, you must flatten them yourself before TabPFN can use them. That flattening step is exactly the feature engineering bottleneck that ICL was supposed to eliminate.

KumoRFM: in-context learning on relational data

KumoRFM extends in-context learning to relational databases. Instead of requiring a pre-flattened table, it takes multiple connected tables as input, understands their schema and foreign key relationships, and makes predictions that incorporate cross-table patterns.

This is a harder problem than flat-table ICL by a significant margin. A flat table is a fixed-size matrix. A relational database is a variable-structure graph: different numbers of tables, different schemas per table, different cardinalities in relationships (one customer has 3 orders, another has 300), temporal ordering across tables. KumoRFM handles this by representing relational data as a heterogeneous graph and using graph neural network architectures designed for variable-structure inputs.

The practical impact: you point KumoRFM at your Snowflake or data warehouse, write a PQL query like PREDICT churned_30d FOR EACH customers.customer_id, and get predictions in seconds. No joins. No feature engineering. No training. The model reads your relational structure directly.

NICL (Neuralk): in-context learning for commerce

NICL takes a more specialized approach. Rather than targeting general tabular or relational data, it focuses on commerce and marketing prediction tasks: purchase propensity, customer segmentation, campaign response prediction. It operates on single flat tables, similar to TabPFN, but is optimized for the feature distributions and patterns common in commerce data.

The tradeoff is scope. NICL may outperform TabPFN on commerce tasks specifically, but it does not generalize to arbitrary tabular problems or handle relational data.

Why relational data is the hard frontier

The jump from flat-table ICL to relational ICL is not incremental. It is a structurally different problem. Here is why:

  1. Variable structure. A flat table has a fixed schema: N rows by M columns. Every dataset has the same shape (a matrix). A relational database has a variable number of tables, each with different schemas, connected by different foreign key patterns. The model must handle any relational structure, not just matrices.
  2. Variable cardinality. In a relational database, one customer might have 3 orders and another might have 3,000. One product might have 10 reviews and another might have 10,000. The model must aggregate variable-length relationships without losing signal.
  3. Multi-hop patterns. The most predictive patterns in relational data often span multiple tables. A customer who churns might show declining order frequency (orders table), increasing support tickets (tickets table), and decreasing product diversity (order items table). These cross-table signals require the model to reason across 3-4+ table hops.
  4. Temporal ordering. Relational data has timestamps scattered across multiple tables. The model must respect temporal causality: a churn prediction at time T can only use data from before time T, even when that data is spread across 5 different tables with different temporal granularities.

This is why TabPFN and NICL stop at flat tables. Handling relational structure requires a different architecture (graph neural networks vs transformers on matrices) and a different pre-training strategy (real relational databases vs synthetic tables).

flat_table_icl_vs_relational_icl

dimensionflat-table ICL (TabPFN, NICL)relational ICL (KumoRFM)
Input formatSingle table (N rows x M columns)Multiple connected tables with foreign keys
Pre-training dataSynthetic tables (TabPFN) or domain-specific tables (NICL)Tens of thousands of real-world relational databases
Feature engineering requiredMust flatten relational data into one table firstNone. Reads relational structure directly.
Cross-table patternsCannot discover (only sees one table)Discovers automatically across all connected tables
Max input size~10K rows, ~100 features (TabPFN)Enterprise-scale relational databases
Enterprise readinessResearch stage. Works on small datasets.Production. Deployed at Fortune 500 companies.
Task typesClassification (TabPFN). Commerce classification (NICL).Classification, regression, ranking, recommendation across any relational domain.

Flat-table ICL handles the simple case well. Relational ICL handles the case that actually matters for enterprises, where data lives across multiple connected tables.

The benchmark evidence

In-context learning is not just faster. On relational data, it is actually more accurate than traditional approaches. This is counterintuitive until you understand why: manual feature engineering on relational data typically captures only a fraction of the available cross-table signal. A foundation model pre-trained on thousands of relational databases discovers patterns that human engineers miss.

relbench_icl_benchmark

approachAUROCtime_to_predictionfeature_engineering
LightGBM + manual features62.4412.3 hours per task878 lines of code per task
KumoRFM zero-shot (ICL)76.71~1 secondNone
KumoRFM fine-tuned81.14MinutesNone

RelBench benchmark across 7 databases and 30 prediction tasks. KumoRFM zero-shot (pure ICL, no fine-tuning) outperforms manually engineered LightGBM by 14+ AUROC points. Fine-tuning adds another 4.4 points.

On the SAP SALT enterprise benchmark:

sap_salt_icl_benchmark

approachaccuracyuses_ICL
LLM + AutoML63%No (trains from scratch)
PhD Data Scientist + XGBoost75%No (trains from scratch with manual features)
KumoRFM (zero-shot)91%Yes (in-context learning on relational data)

SAP SALT benchmark. KumoRFM's in-context learning on relational data outperforms both traditional ML and LLM-assisted approaches by wide margins.

Why this matters for enterprise ML teams

The traditional enterprise ML pipeline looks like this: a business team identifies a prediction need (churn, fraud, demand forecast). A data science team spends 2-4 weeks on feature engineering. They spend another 2-4 weeks on model training, tuning, and validation. Deployment takes another 2-4 weeks. Total: 6-12 weeks from request to production prediction.

With in-context learning on relational data, that timeline collapses. You write a PQL query. The model returns predictions in seconds. If you want to fine-tune for maximum accuracy, that takes minutes to hours, not weeks. A prediction that used to require a quarter of data science time now takes an afternoon.

This is not just a speed improvement. It changes what is economically viable. Use cases that were too small to justify a 6-week pipeline (predicting churn for a specific product line, scoring fraud risk for a new market, forecasting demand for a seasonal category) become feasible when the cost drops from weeks to seconds.

Traditional ML pipeline (per prediction task)

  • Identify prediction target and relevant tables (1-2 weeks)
  • Join tables, engineer features, handle temporal windowing (2-4 weeks, 878 lines of code)
  • Select model, tune hyperparameters, train (1-2 weeks)
  • Validate, test, deploy to production (2-4 weeks)
  • Total: 6-12 weeks, 3-4 data scientists
  • Repeat from scratch for every new prediction task

In-context learning with KumoRFM

  • Connect to data warehouse (one-time setup)
  • Write PQL: PREDICT churned_30d FOR EACH customers.customer_id
  • Model reads relational tables, returns predictions in seconds
  • Optional: fine-tune for maximum accuracy (minutes to hours)
  • Total: minutes to hours, 1 ML engineer or analyst
  • New prediction tasks take the same amount of time

PQL Query

PREDICT churned_30d
FOR EACH customers.customer_id

One PQL query triggers in-context learning on your full relational database. KumoRFM reads customers, orders, support tickets, usage logs, and any other connected tables. It discovers cross-table churn patterns and returns predictions without any training, feature engineering, or pipeline code.

Output

customer_idchurn_probabilitykey_signals
CUST-22010.89Order frequency down 70% (orders table), 4 support tickets in 14 days (tickets table), usage down 55% (usage table)
CUST-22020.14Stable order cadence, recent product expansion, no support escalations
CUST-22030.76Payment failures (billing table), reduced login frequency (sessions table), competitor product in tech stack (integrations table)
CUST-22040.08Increasing usage, active API calls, recent feature adoption

When to use ICL vs fine-tuning vs traditional ML

In-context learning is not always the right answer. Here is a practical decision framework:

when_to_use_icl_vs_finetuning_vs_traditional

scenariobest_approachwhy
Need predictions fast (hours, not weeks)ICL (KumoRFM zero-shot)Predictions in seconds. No pipeline to build.
Exploring whether a prediction task is feasibleICL (KumoRFM zero-shot or TabPFN)Get a baseline prediction in minutes to validate the use case before investing in a full pipeline.
Production use case, maximum accuracy neededFine-tuned foundation model (KumoRFM fine-tuned)Fine-tuning adds 3-5 AUROC points over zero-shot. Still faster than traditional ML.
Small, flat dataset with clean labelsTabPFN or tuned XGBoostBoth work well on small flat tables. TabPFN is faster. XGBoost may be slightly more accurate with tuning.
Relational data (multiple connected tables)KumoRFM (zero-shot or fine-tuned)Only option that reads relational structure natively. Flat-table models require manual flattening that loses signal.
Highly regulated domain requiring full explainabilityTraditional ML (XGBoost + SHAP)ICL models have improving but still limited explainability. If regulators require feature-level explanations, traditional models have an edge.

Decision framework for ICL vs alternatives. ICL wins on speed and relational data. Traditional ML wins when you have a simple flat dataset and weeks to tune.

Frequently asked questions

What is in-context learning for structured data?

In-context learning (ICL) for structured data means a pre-trained model can make predictions on a new dataset without any training or fine-tuning. You pass the dataset as context, and the model recognizes patterns from its pre-training to generate predictions in a single forward pass. No gradient updates, no training loop, no hyperparameter tuning. It is the same concept as how GPT can answer questions about a text passage it has never seen, but applied to tables and relational databases instead of text.

How does in-context learning differ from traditional machine learning on tabular data?

Traditional ML on tabular data requires a full pipeline for each new dataset: feature engineering, model selection, hyperparameter tuning, training, and evaluation. This takes 6-12 weeks per prediction task and requires data science expertise. In-context learning skips the entire pipeline. A pre-trained model takes your dataset as input and returns predictions in seconds. The tradeoff: ICL models may sacrifice a few accuracy points compared to a fully tuned traditional model, but the time savings are orders of magnitude. And for relational data, KumoRFM with ICL actually outperforms traditional approaches because it captures cross-table patterns that manual feature engineering typically misses.

What is TabPFN and how does it work?

TabPFN (Tabular Prior-Data Fitted Network) is a model from the University of Freiburg that performs in-context learning on single flat tables. It was trained on millions of synthetic classification datasets and can make predictions on new tabular datasets without training. You feed it the table (features and labels), and it returns predictions in a single forward pass. TabPFN works well on small to medium single-table datasets (up to around 10,000 rows and 100 features). It does not handle relational data (multiple connected tables), large datasets, or regression tasks natively.

What is KumoRFM and how does it do in-context learning?

KumoRFM is a relational foundation model built by Kumo.ai. It is pre-trained on tens of thousands of real-world relational databases (multiple connected tables, not just single flat tables). For in-context learning, you point KumoRFM at a new relational database, write a PQL query describing what to predict, and the model returns scored predictions without any training or fine-tuning. KumoRFM is the only ICL model that works on relational data, meaning it reads the connections between tables (customers linked to orders linked to products linked to reviews) and discovers cross-table patterns automatically.

What is NICL (Neuralk) and how does it compare to TabPFN?

NICL (from Neuralk) is an in-context learning model focused on single-table classification tasks for commerce and marketing use cases. Like TabPFN, it operates on flat tables and does not handle relational data. NICL differentiates by being optimized for commerce-specific prediction patterns (purchase propensity, customer segmentation) rather than general tabular classification. It is a more specialized tool compared to TabPFN's general-purpose approach.

Why is in-context learning on relational data harder than on flat tables?

A single flat table has a fixed schema: N rows and M columns. You can represent the entire dataset as a matrix and feed it to a transformer. Relational data is structurally different. You have multiple tables with different schemas, connected by foreign keys, with variable-length relationships (one customer has 3 orders, another has 300). The model must understand table schemas, join relationships, cardinality, temporal ordering across tables, and multi-hop patterns. This is why TabPFN works on flat tables but not relational databases. KumoRFM solves this by representing relational data as a heterogeneous graph and using graph neural network architectures that can handle variable-structure relational inputs.

Can in-context learning replace traditional ML for enterprise predictions?

For many enterprise prediction tasks, yes. KumoRFM zero-shot (pure in-context learning, no fine-tuning) achieves 76.71 AUROC on the RelBench benchmark across 30 prediction tasks, outperforming LightGBM with manual feature engineering at 62.44 AUROC. Fine-tuning KumoRFM pushes this to 81.14. For tasks where you need maximum accuracy and have the time to invest, fine-tuning a foundation model still wins. But for speed to first prediction, rapid prototyping, or use cases where you cannot wait 6-12 weeks for a traditional pipeline, ICL is already better than the traditional alternative.

What are the limitations of in-context learning for structured data?

Current limitations include: (1) Context window size: TabPFN is limited to roughly 10,000 rows and 100 features per prediction. KumoRFM handles much larger relational databases but still has practical limits on graph size. (2) Accuracy ceiling: for any single dataset, a carefully tuned traditional model may achieve 2-5% higher accuracy than zero-shot ICL. Fine-tuning closes most of this gap. (3) Domain specificity: ICL models trained primarily on structured data may not capture highly domain-specific patterns (like medical time series or sensor data) as well as specialized models. (4) Explainability: understanding why an ICL model made a specific prediction is harder than interpreting a decision tree or SHAP values on XGBoost.

See it in action

KumoRFM delivers predictions on relational data in seconds. No feature engineering, no ML pipelines. Try it free.