When DoorDash wants to decide which restaurants to show you at 6 PM on a Tuesday, it is making a prediction. When a bank decides whether to approve a wire transfer in real time, it is making a prediction. When Snowflake identifies which free-tier users are likely to convert to paid plans, it is making a prediction.
AI prediction is the engine behind these decisions. It takes historical data, finds patterns, and extrapolates them forward. The concept is simple. The execution is where things get complicated, because the quality of the prediction depends entirely on how much context the model can consume. And most models are starving.
The four types of prediction
Nearly every business prediction falls into one of four categories. The boundaries are fuzzy, the business applications are distinct.
Classification
Will this event happen or not? Is this transaction fraud? Will this customer churn in the next 30 days? Will this patient be readmitted within 90 days? The output is a probability between 0 and 1, and the business sets a threshold for action. Classification is the most common prediction type in enterprise ML.
Regression
How much? What will this customer's lifetime value be? What is the expected loss on this loan? What will revenue be next quarter? The output is a continuous number. Regression models are the backbone of financial planning, pricing, and risk management.
Recommendation
What should we show this user? Product recommendations, content recommendations, ad targeting, next-best-action in sales. The output is a ranked list of entities, ordered by predicted relevance or engagement probability. Recommendation drives the majority of revenue at companies like Amazon, Netflix, and Spotify.
Time-series forecasting
What will the future look like over time? Demand forecasting for inventory planning, capacity planning for cloud infrastructure, workload prediction for staffing. The output is a sequence of predicted values across future time steps, often with confidence intervals.
Here is a single database that demonstrates all four prediction types. The same relational structure supports different questions.
all four predictions from one database
| prediction_type | question | PQL_query | output |
|---|---|---|---|
| Classification | Will PH-304 file a fraudulent claim? | PREDICT claims.is_fraud FOR EACH policyholders | 0.84 (yes/no probability) |
| Regression | What will PH-301's total claims cost next year? | PREDICT SUM(claims.amount, 0, 365) FOR EACH policyholders | $22,700 (dollar amount) |
| Recommendation | Which risk mitigation products should we offer PH-303? | PREDICT products.relevance FOR EACH policyholders, products | Ranked list of products |
| Forecasting | How many auto claims will we see in Q1? | PREDICT COUNT(claims.*, 0, 90) FOR EACH claim_types | 847 claims (with confidence interval) |
Same database (policyholders, claims, adjusters from the tables below), four different prediction types. A foundation model handles all four without separate pipelines.
To make this concrete, here is what AI prediction looks like on insurance claims data. The signal that determines claim outcomes spans multiple tables.
policyholders
| policyholder_id | name | policy_type | premium | tenure |
|---|---|---|---|---|
| PH-301 | Andrea Collins | Auto + Home | $2,840/yr | 7 years |
| PH-302 | James Okafor | Auto | $1,420/yr | 2 years |
| PH-303 | Mei-Lin Chang | Home | $1,950/yr | 11 years |
| PH-304 | Derek Simmons | Auto | $2,100/yr | 4 years |
claims
| claim_id | policyholder_id | type | amount | date | status |
|---|---|---|---|---|---|
| CLM-501 | PH-301 | Auto collision | $8,200 | 2025-03-14 | Paid |
| CLM-502 | PH-301 | Home water damage | $14,500 | 2025-09-02 | Under review |
| CLM-503 | PH-302 | Auto theft | $22,000 | 2025-10-18 | Under review |
| CLM-504 | PH-304 | Auto collision | $4,100 | 2025-06-22 | Paid |
| CLM-505 | PH-304 | Auto collision | $6,800 | 2025-08-30 | Paid |
| CLM-506 | PH-304 | Auto collision | $9,200 | 2025-11-05 | Under review |
Highlighted: Derek has 3 collision claims in 5 months with escalating amounts ($4.1K, $6.8K, $9.2K). This temporal pattern is a strong fraud signal, but a flat feature table only shows 'claim_count = 3'.
adjusters
| adjuster_id | claim_id | assessment | payout_ratio | days_to_close |
|---|---|---|---|---|
| ADJ-01 | CLM-501 | Legitimate | 100% | 12 |
| ADJ-02 | CLM-504 | Legitimate | 100% | 8 |
| ADJ-03 | CLM-505 | Legitimate | 95% | 15 |
| ADJ-04 | CLM-506 | Pending investigation | --- | --- |
The payout ratio on Derek's second claim was already reduced to 95%. His third is pending investigation. The model needs to see the claims-adjusters relationship to predict fraud probability.
The context problem
The accuracy of any prediction is bounded by the information available to the model. This sounds obvious. In practice, it is the single biggest constraint on prediction quality in enterprise ML.
Consider a churn prediction for a SaaS product. The data lives across multiple tables: user accounts, login events, feature usage, support tickets, billing history, team memberships, contract terms. A thorough churn model needs signals from all of these tables: declining login frequency, reduced feature adoption, increasing support volume, approaching contract renewal.
But traditional ML models cannot read multiple tables. They need a single flat feature matrix: one row per user, one column per feature. To get there, a data scientist writes SQL joins and aggregations, compressing rich relational data into a handful of numbers. avg_logins_30d, support_tickets_90d, days_until_renewal.
This flattening destroys three categories of signal.
- Cross-table relationships. The fact that 3 of 5 users on the same team have already churned is a strong signal. But it requires traversing customer → team → other customers, a multi-hop path that no standard aggregation captures.
- Temporal sequences. A count of "5 support tickets in 30 days" does not distinguish between steady complaints and a sudden spike after a product update. The sequence matters.
- Combinatorial interactions. The interaction between declining usage and an approaching renewal and unresolved tickets is more predictive than any single feature. But engineering interaction features manually is combinatorially explosive.
Why the bottleneck exists
The gap between "data exists in the database" and "model can use the data" is feature engineering. This step consumes 80% of the time in a typical ML project. A Stanford study quantified it: 12.3 hours and 878 lines of code per prediction task, even for experienced data scientists with full access to the data.
The time cost is not the worst part. The worst part is the signal loss. A human data scientist exploring a 10-table database will test maybe 100 to 200 feature combinations. The total feature space (tables times columns times aggregation functions times time windows) can easily exceed 10,000 possibilities. The model is trained on 2% of the available signal.
This is why adding more data to a traditional ML pipeline often does not help. The data is there, in the database. But the pipeline cannot consume it. The model is limited not by data availability, but by the human capacity to transform that data into features.
Traditional prediction pipeline
- Flatten 10+ tables into one row per entity
- Human selects 100-200 features from 10,000+ candidates
- 12.3 hours and 878 lines of code per task
- Model sees 2% of available signal
- New prediction task = new pipeline from scratch
Foundation model prediction
- Model reads all tables directly as a graph
- Model explores full feature space automatically
- 1 second, 1 line of PQL per task
- Model sees 100% of relational structure
- New prediction task = new query, same model
PQL Query
PREDICT claims.status = 'Fraudulent' FOR EACH policyholders.policyholder_id
The model reads policyholders, claims, and adjusters as a graph. It discovers that Derek's escalating claim amounts, decreasing adjuster payout ratios, and 3-claim-in-5-months cadence produce a high fraud signal.
Output
| policyholder_id | fraud_probability | top_signal |
|---|---|---|
| PH-301 | 0.06 | Long tenure, first multi-policy claims |
| PH-302 | 0.38 | Auto theft on short-tenure policy |
| PH-303 | 0.02 | 11-year tenure, no claims history |
| PH-304 | 0.84 | Escalating collision amounts, 3 in 5 months |
The foundation model approach
The insight behind relational foundation models is that the prediction bottleneck is not the model. It is the data transformation. If you eliminate the transformation, the prediction becomes trivial.
KumoRFM represents your database as a temporal heterogeneous graph. Rows become nodes. Foreign keys become edges. Timestamps establish ordering. The model traverses this graph to find predictive patterns, including multi-hop relationships, temporal sequences, and structural signatures that no human would enumerate.
Because the model is pre-trained on billions of relational patterns across thousands of databases, it already understands the universal dynamics that recur in business data: recency effects, frequency patterns, seasonal cycles, network propagation. It does not need to learn these from scratch on your data. It recognizes them.
The interface is a query, not a pipeline. "For each customer, what is the probability of churn in the next 30 days?" The model reads your schema, builds the graph, traverses it, and returns predictions with cell-level explanations. One query. One second. No feature engineering. No training pipeline.
Real-world results
The claims are backed by production deployments at scale.
DoorDash deployed relational predictions across 30 million users for restaurant and content recommendations. The result: a 1.8% engagement lift over their existing recommendation system, which was already highly optimized. At DoorDash's scale, 1.8% translates to millions of additional orders per quarter.
Snowflake used the same approach to predict which free-tier users would convert to paid plans and which existing customers would expand their usage. The result: a 3.2x expansion revenue lift by targeting the right accounts with the right timing.
Reddit applied relational predictions to content recommendations, leveraging the full graph of users, communities, posts, comments, and interactions. The model found engagement patterns in multi-hop paths (user → community → post → commenters → other communities) that their previous system could not express.
Databricks measured a 5.4x conversion lift using relational predictions for their sales pipeline, identifying which trial users were most likely to convert based on usage patterns, team dynamics, and integration activity.
On the RelBench benchmark
Production case studies are compelling but hard to reproduce. That is why the RelBench benchmark exists: 7 databases, 30 prediction tasks, 103 million+ rows, temporal train/test splits. On this benchmark:
- LightGBM with manual features: 62.44 average AUROC on classification
- LLM on serialized tables (Llama 3.2 3B): 68.06 AUROC
- Task-specific GNN: 75.83 AUROC
- KumoRFM zero-shot: 76.71 AUROC
- KumoRFM fine-tuned: 81.14 AUROC
The pattern is clear. The more relational context a model can consume, the better its predictions. LightGBM sees a flat table. The LLM sees serialized text. The GNN sees the graph structure. KumoRFM sees the graph structure plus universal relational patterns learned from pre-training. Each step up in context produces a measurable jump in accuracy.
What this means for prediction strategy
If your organization treats AI prediction as a pipeline-building exercise, you are leaving accuracy and speed on the table. The pipeline approach caps your model's performance at whatever signal a human can manually extract from the database. The foundation model approach removes that cap.
The practical implication is speed. When a business stakeholder asks "can we predict X?", the answer should not be "let me scope a 3-month project." It should be "let me run that query." Every prediction task that takes months to deliver is a decision that was made without data for months. That cost is invisible but real.
The data for better predictions already exists in your database. The question is whether your prediction infrastructure can actually use it.