The delivery prediction challenge
Platforms like DoorDash sit on top of one of the most complex prediction problems in consumer technology. Every order triggers a cascade of interdependent predictions: How long will the restaurant take to prepare this order? Which driver is best positioned to pick it up? How long will the delivery take given current traffic, weather, and zone congestion? Will this customer order again this week?
These predictions are not independent. A driver's delivery time depends on the restaurant's prep speed. Restaurant prep speed depends on current order volume. Order volume depends on zone-level demand. Zone-level demand depends on time of day, weather, and local events. Everything connects to everything.
The data that drives these predictions lives across 10 or more connected tables: drivers, orders, restaurants, customers, zones, menus, ratings, promotions, weather feeds, and event calendars. Getting accurate predictions requires understanding the relationships between these tables, not just the data within them.
The headline result: SAP SALT benchmark
The SAP SALT benchmark is an enterprise-grade evaluation where real business analysts and data scientists attempt prediction tasks on SAP enterprise data. It measures how accurately different approaches predict real business outcomes on production-quality enterprise databases with multiple related tables.
sap_salt_enterprise_benchmark
| approach | accuracy | what_it_means |
|---|---|---|
| LLM + AutoML | 63% | Language model generates features, AutoML selects model |
| PhD Data Scientist + XGBoost | 75% | Expert spends weeks hand-crafting features, tunes XGBoost |
| KumoRFM (zero-shot) | 91% | No feature engineering, no training, reads relational tables directly |
SAP SALT benchmark: KumoRFM outperforms expert data scientists by 16 percentage points and LLM+AutoML by 28 percentage points on real enterprise prediction tasks.
KumoRFM scores 91% where PhD-level data scientists with weeks of feature engineering and hand-tuned XGBoost score 75%. The 16 percentage point gap is the value of reading relational data natively instead of flattening it into a single table.
Why traditional ML plateaus
The standard approach to delivery prediction follows a familiar pattern: extract data from multiple tables, join and aggregate it into a flat feature table with one row per prediction, and train a gradient-boosted model (XGBoost, LightGBM) on that table.
This works, up to a point. A well-engineered flat-table model can capture obvious signals: average delivery time by zone, restaurant average prep time, driver average speed. But it misses the relational patterns that drive the most important variations:
- Driver-restaurant affinity. A specific driver may be 15% faster at restaurants in a specific zone because they know the parking, the pickup flow, and the fastest route out. This signal lives in the relationship between the driver table and the restaurant table, filtered by the zone table.
- Prep time by order complexity. A restaurant's average prep time is misleading. Prep time varies dramatically by order composition: number of items, menu category mix, and whether the order includes items that share cooking infrastructure. This requires joining orders, order items, and menu tables.
- Customer temporal patterns. A customer who always orders 20 minutes before their usual dinner time is highly predictable, but only if the model can see the customer's full order history with timestamps correlated against their profile.
- Zone-event correlations. Demand at zone X spikes when there is a sporting event at the nearby stadium. This pattern requires connecting zone data to external event data and learning the radius of impact.
The relational approach
Relational deep learning takes a fundamentally different approach. Instead of flattening the database into a single table, it connects all tables into a graph that preserves the full relational structure.
Every row in every table becomes a node: each driver, each order, each restaurant, each customer, each zone, each menu item. Every foreign key relationship becomes an edge: orders connect to restaurants, drivers connect to orders, customers connect to orders, restaurants connect to zones. Timestamps are preserved as temporal attributes, so the model knows when each relationship was active.
A graph neural network then processes this structure by passing messages along edges. Information flows from restaurants to their orders, from orders to their drivers, from drivers to their zones. After multiple layers of message passing, each node accumulates information from its full relational neighborhood.
The result is that the model automatically learns the cross-entity patterns that flat-table models miss:
- This driver is 15% faster at restaurants in this zone (driver → orders → restaurant → zone).
- This customer always orders 20 minutes before their usual dinner time (customer → orders → timestamps).
- Demand at zone X spikes when there is an event at the nearby stadium (zone → events → temporal patterns).
- This restaurant's prep time increases 40% when more than 3 items share the same cooking station (restaurant → orders → order items → menu items → cooking categories).
Results: 30% accuracy improvement
By applying relational deep learning to delivery predictions, DoorDash achieved a 30% accuracy improvement over their existing internal model. To put this in context: their internal model had been refined over years of iterative feature engineering by a world-class data science team. The 30% gain did not come from better tuning of the same features. It came from seeing data that the previous approach structurally could not access.
Equally significant was the time-to-value. What had previously taken 4-5 years of iterative feature engineering, where data scientists would hypothesize a new feature, compute it, test it, and repeat, was achieved in months. The relational model discovered cross-entity patterns automatically that would have taken additional years to find manually.
traditional_ml_vs_relational_ml_for_delivery
| prediction_task | traditional_ML_signals | relational_ML_signals |
|---|---|---|
| Delivery time | Avg zone delivery time, distance, time of day | + Driver-restaurant affinity, route history, current zone congestion patterns |
| Driver availability | Driver shift schedule, current location | + Driver acceptance patterns by restaurant type, fatigue modeling from recent order sequence |
| Demand by zone | Historical hourly demand, day of week | + Event proximity, weather-demand correlation, promotional cascade effects across zones |
| Restaurant prep time | Restaurant avg prep time, order count | + Order complexity by menu mix, cooking station contention, prep time by order sequence position |
| Customer reorder | Days since last order, order frequency | + Menu preference shifts, response to promotions, restaurant closure impact on reorder |
Traditional ML captures single-table averages. Relational ML captures the cross-entity patterns that explain the variance traditional models miss.
Why years of feature engineering compressed into months
The traditional path to improving delivery predictions follows a slow, manual cycle. A data scientist hypothesizes that driver familiarity with a restaurant affects delivery time. They write SQL to join driver and order history, compute a familiarity score, add it to the feature table, retrain, and evaluate. If it helps, they move on to the next hypothesis. If not, they try a different formulation.
Each iteration takes days to weeks. Over 4-5 years, a team might explore a few hundred features out of the hundreds of thousands of possible cross-table combinations. The model improves incrementally, a few percentage points per year.
Relational deep learning bypasses this entire cycle. The graph neural network explores the full combinatorial space of cross-table patterns simultaneously. It does not need a human to hypothesize that driver-restaurant familiarity matters. It discovers it, along with thousands of other relational patterns, during training.
PQL Query
PREDICT delivery_time_minutes FOR EACH orders.order_id WHERE orders.status = 'pending'
One predictive query replaces the entire delivery time prediction pipeline. The model reads raw tables (drivers, orders, restaurants, customers, zones, menus) directly and returns predictions that incorporate cross-entity patterns a flat-table model would need years of feature engineering to approximate.
Output
| order_id | predicted_minutes | confidence | key_factors |
|---|---|---|---|
| ORD-44201 | 28 | 0.92 | Driver-restaurant familiarity, low zone congestion |
| ORD-44202 | 47 | 0.85 | Complex order (6 items, 3 cooking stations), new driver to zone |
| ORD-44203 | 22 | 0.94 | Regular customer route, restaurant in fast-prep phase |
| ORD-44204 | 55 | 0.78 | Event-driven zone surge, restaurant prep backlog detected |
What this means for delivery and logistics platforms
DoorDash's 30% improvement is not an outlier. It reflects a structural advantage of relational deep learning over flat-table approaches for any prediction that depends on interconnected entities. Delivery and logistics platforms are particularly well-suited because their data is inherently relational: every order connects a customer, a restaurant, a driver, a zone, and a time window.
The implications extend beyond delivery time prediction. The same relational approach applies to demand forecasting, driver dispatch optimization, dynamic pricing, customer retention, and restaurant quality scoring. Each of these problems depends on the same underlying relational structure, and each benefits from the same ability to see cross-entity patterns.
For platforms still relying on flat-table models with manually engineered features, the gap will only widen. Every year of manual feature engineering yields diminishing returns as the easy features are already built. Relational deep learning does not have this ceiling because it reads the full relational structure directly. The more complex and interconnected the data, the larger the advantage.