Hospital readmissions cost the US healthcare system $26 billion annually. Of that, $17 billion is considered avoidable. The patients are being discharged too early, without adequate follow-up plans, or without the interventions that would have prevented the complication that brings them back.
Every hospital knows this. Most run readmission prediction models. The standard approach: pull features from the patient's current admission (diagnosis, length of stay, procedures performed, age, comorbidities), train a logistic regression or gradient-boosted tree, and flag high-risk patients for care coordination.
These models achieve 0.65-0.70 AUROC. Better than random. Not good enough to meaningfully change outcomes. The problem is not the algorithm. The problem is the data representation. A patient's readmission risk depends on patterns that span 8-15 tables: the sequence of their lab values over the previous 72 hours, the interaction between their medications and their diagnoses, the historical outcomes of patients with similar clinical trajectories, and the capacity of the post-discharge care facilities available in their area.
No flat feature table captures this. The models that will transform healthcare prediction are the ones that learn from the full relational structure.
patient_encounter — sample clinical data
| patient_id | encounter_date | diagnosis (ICD-10) | procedure | provider | facility |
|---|---|---|---|---|---|
| PT-201 | 2025-01-10 | I50.9 Heart failure | Echocardiogram | Dr. R. Chen | Memorial Hospital |
| PT-201 | 2025-01-18 | I50.9 Heart failure | Diuretic dose increase | Dr. R. Chen | Memorial Hospital |
| PT-201 | 2025-01-25 | I50.9 + E87.6 Hypokalemia | K+ supplement added | Dr. R. Chen | Memorial Hospital |
| PT-201 | 2025-02-02 | I50.9 Heart failure | Discharge to SNF | Dr. R. Chen | Sunrise SNF |
| PT-202 | 2025-01-15 | J18.9 Pneumonia | Antibiotics IV | Dr. L. Park | Memorial Hospital |
Highlighted: the addition of K+ supplement after diuretic dose increase tells a clinical story of worsening heart failure management. A flat model sees 'heart failure' and 'hypokalemia' as separate diagnoses.
readmission_risk_factors — flat vs graph model
| Risk Factor | Flat Model Captures? | Graph Model Captures? | Signal Strength |
|---|---|---|---|
| Primary diagnosis severity | Yes | Yes | Moderate |
| Number of comorbidities | Yes | Yes | Moderate |
| Lab value trajectory (rising creatinine) | No (aggregated) | Yes (temporal) | High |
| Medication sequence patterns | No (count only) | Yes (ordered) | High |
| Discharge facility readmission rate | No (separate table) | Yes (2-hop) | Very high |
| Similar patient outcomes | No (requires graph) | Yes (3-hop) | Very high |
| Provider-specific outcome patterns | No (separate table) | Yes (2-hop) | High |
The highest-signal risk factors (highlighted) require multi-table reasoning that flat models cannot perform. These factors explain the 0.65-0.70 vs 0.72-0.78 AUROC gap.
The complexity of clinical data
The RelBench benchmark includes a clinical trial dataset that illustrates the challenge. It contains 15 tables and 140 columns: studies, patients, conditions, interventions, outcomes, adverse events, facilities, sponsors, eligibility criteria, and more. The prediction tasks include patient dropout risk, adverse event prediction, and outcome classification.
A production electronic health record (EHR) system is even more complex. Epic, the dominant EHR vendor in the US (used by hospitals covering 54% of the US population), stores data across hundreds of tables. A simplified clinical data model includes:
- Patients: demographics, insurance, primary care provider
- Encounters: admissions, outpatient visits, ED visits, telehealth
- Diagnoses: ICD-10 codes linked to encounters
- Procedures: CPT codes linked to encounters and providers
- Medications: prescriptions, administrations, dosing history
- Lab results: test orders, results, reference ranges, trends
- Vital signs: time series of temperature, BP, heart rate, O2 sat
- Providers: physicians, specialists, their patient panels and outcomes
- Facilities: hospitals, clinics, SNFs, their capacities and readmission rates
Each patient visit generates dozens of rows across these tables. A patient with chronic conditions may have thousands of connected records spanning years of clinical history. The predictive signal is not in any single table. It is in the relationships between them.
Readmission prediction: beyond flat features
The standard readmission model uses 20-50 features derived from the current admission: primary diagnosis, number of comorbidities, length of stay, number of ED visits in the past year, discharge disposition. These features predict roughly 65-70% of readmissions correctly.
Graph-based models add three categories of signal that flat models miss entirely.
Temporal clinical trajectories
A patient whose creatinine levels have been rising over 3 consecutive lab draws has a very different readmission risk than a patient whose creatinine spiked once and returned to baseline. A flat model sees "creatinine: abnormal" in both cases. A model that reads the temporal sequence of lab results distinguishes the progressive deterioration from the transient spike.
Similarly, the sequence of medications matters. A patient who was started on a diuretic, then had it dose-increased twice, then had a potassium supplement added, tells a clinical story of worsening heart failure management. The individual medication facts, without the temporal ordering, miss this trajectory.
Provider and facility outcomes
Readmission risk is not purely a patient characteristic. It is also a function of who provided care and where the patient goes after discharge. A skilled nursing facility with a 25% 30-day hospital return rate is a different discharge destination than one with an 8% return rate. The provider who managed the patient's heart failure has a historical readmission rate for similar patients. These facility and provider signals are in separate tables, connected through foreign keys.
Similar patient outcomes
The most powerful signal may be the outcomes of clinically similar patients. A patient with diabetes, heart failure, and chronic kidney disease, discharged on 8 medications, has a readmission risk that is best estimated by looking at what happened to other patients with the same clinical profile. Graph-based models capture this through patient similarity in the diagnosis-procedure- medication space, without requiring manual cohort definition.
Clinical trial optimization
Clinical trials are expensive. The average Phase III trial costs $19 million, and 40% of that cost is related to patient recruitment and retention. Patient dropout rates average 30% across all therapeutic areas, with some trials losing over 50% of enrolled participants.
Predicting which patients will drop out, experience adverse events, or respond to treatment is a relational prediction problem. The patient's medical history, the trial's protocol complexity, the site's historical retention rates, and the interaction between the patient's comorbidities and the investigational drug all contribute.
The RelBench clinical trial dataset tests exactly these prediction tasks. Graph-based models outperform flat baselines significantly, because dropout risk depends on the full relational context: a patient at a site with high historical dropout, enrolled in a protocol with 12 monthly visits, with 3 comorbidities that each require separate management, has a very different retention profile than the same patient at a high-performing site with a simpler protocol.
Traditional healthcare AI
- Flat features from current encounter only
- 20-50 manually engineered clinical features
- 0.65-0.70 AUROC for readmission prediction
- Ignores provider and facility outcome patterns
- Clinical trajectories lost in aggregation
Graph-based healthcare AI
- Full relational structure across 8-15 clinical tables
- Patterns learned automatically from data
- 0.72-0.78 AUROC for readmission prediction
- Provider and facility signals included
- Temporal sequences preserved and learned from
PQL Query
PREDICT readmission_30d FOR EACH encounters.encounter_id WHERE encounters.discharge_date > '2025-01-01'
One query scores every discharged patient against the full clinical graph: diagnoses, procedures, medications, lab trajectories, provider outcomes, and facility readmission rates.
Output
| patient_id | readmission_risk | confidence | top_clinical_signals |
|---|---|---|---|
| PT-201 | 0.78 | 0.89 | Worsening HF trajectory + SNF 25% return rate |
| PT-202 | 0.22 | 0.93 | Standard pneumonia resolution, good facility |
| PT-203 | 0.61 | 0.85 | 3 comorbidities + medication interaction risk |
| PT-204 | 0.09 | 0.95 | Surgical recovery on track, strong home support |
Resource planning and operational efficiency
Hospital operations generate relational prediction problems at every level. Bed capacity planning requires predicting admissions, discharges, and transfers across units. Here is what the underlying data looks like:
current_census — ICU snapshot
| patient_id | unit | days_in_unit | acuity_score | discharge_likelihood_24h |
|---|---|---|---|---|
| PT-301 | ICU | 3 | High (8/10) | Low (0.12) |
| PT-302 | ICU | 7 | Moderate (5/10) | High (0.78) |
| PT-303 | ICU | 1 | Critical (9/10) | Very low (0.04) |
| PT-304 | ICU | 5 | Moderate (6/10) | Moderate (0.45) |
Current ICU is at 4 of 6 beds. PT-302 is likely to discharge within 24 hours. But that alone does not predict tomorrow's census.
upstream_signals — what drives tomorrow's ICU census
| source | signal | ICU_admits_predicted | confidence |
|---|---|---|---|
| ED (current) | 3 patients pending admission, 1 likely ICU | +1 | 0.82 |
| OR schedule (tomorrow) | 2 cardiac surgeries, 40% ICU rate | +0.8 | 0.75 |
| Step-down unit | 1 patient deteriorating (rising lactate) | +1 | 0.68 |
| ICU discharges | PT-302 discharge likely | -1 | 0.78 |
A flat model predicts 'tomorrow ICU census = today's count + seasonal average.' The relational model reads ED admissions, OR schedules, step-down patient vitals, and discharge readiness. Predicted net: +1.8 beds needed. The flat model predicts +0.3.
staffing_impact — flat vs relational forecast
| metric | Flat Model | Relational Model | Actual | Impact |
|---|---|---|---|---|
| ICU beds needed (tomorrow) | 4.3 | 5.8 | 6 | Flat model under-staffs |
| Nursing hours needed | 48 | 72 | 74 | Flat model: 26 hours short |
| Overtime triggered | No | Yes (pre-scheduled) | Yes (emergency) | $2,400 saved per event |
The flat model under-predicts by 1.7 beds, resulting in emergency overtime and potential patient safety issues. The relational model pre-schedules additional staff.
Health systems that implement multi-table operational forecasting report 15-20% improvements in bed utilization and 10-15% reductions in overtime staffing costs. For a 500-bed hospital, improving bed utilization by 15% is equivalent to adding 75 beds without construction, representing $30M-50M in avoided capital expenditure.
The path forward
Healthcare has been slow to adopt graph-based AI for legitimate reasons: regulatory requirements, data privacy, the stakes of clinical predictions, and the complexity of health system IT infrastructure. But the gap between what flat models achieve and what relational models achieve is too large to ignore.
A relational foundation model like KumoRFM addresses several barriers simultaneously. It connects to existing data warehouses without requiring data to leave the institution. It provides attention-based interpretability that shows which clinical events and relationships drove a prediction. And it serves multiple prediction tasks from a single model, meaning the hospital does not need separate ML teams for readmission prediction, length-of-stay forecasting, and resource planning.
The institutions that move first will not just predict better. They will build an institutional advantage in understanding their own data that compounds over time. In an industry where readmissions, length of stay, and operational efficiency directly determine financial viability, that advantage is existential.