A bank deploys a fraud detection model. It flags 2,000 transactions per day. An investigator picks up case #1,247: a $3,400 wire transfer from a small-business account in Ohio to a supplier in Portugal. The model says the probability of fraud is 94%. The investigator needs to decide whether to block it.
The first question is not "is the model right?" It is "why does the model think this is fraud?" Without an answer to that question, the investigator has nothing actionable. They cannot write a Suspicious Activity Report. They cannot justify the hold to the customer. They cannot explain the decision to their compliance team, their regulator, or a judge.
This is the explainability problem. And in regulated industries, it is not an academic concern. It is a legal requirement.
What explainable AI actually means
Explainable AI (XAI) refers to methods that make model predictions understandable to humans. Not "the model is 94% confident." That is a score, not an explanation. An explanation answers the question: which specific inputs caused this specific output, and by how much?
For that fraud case, a real explanation might look like: "This transaction was flagged because (1) the recipient account was created 14 days ago and has received transfers from 6 other accounts that were later flagged for fraud, (2) the sender's transaction velocity increased 4x in the last 72 hours, and (3) the transfer amount is 8x the sender's median transaction size."
That is actionable. The investigator can verify each claim. The compliance team can document it. The regulator can audit it.
The XAI market reflects the urgency. Industry analysts project the global explainable AI market will grow from $6.2 billion in 2024 to over $24 billion by 2030, driven primarily by regulatory pressure in financial services, healthcare, and insurance.
The regulatory landscape
Explainability is not a feature request from product teams. It is mandated by law in most regulated industries.
EU AI Act (2024)
The EU AI Act classifies AI systems into risk tiers. Credit scoring, fraud detection, insurance pricing, and hiring decisions are all classified as high-risk. High-risk systems must provide transparency to users, enable human oversight, and maintain documentation that explains how the system reaches its decisions. The penalties for non-compliance reach 35 million euros or 7% of global annual turnover, whichever is higher.
SR 11-7 (Federal Reserve)
The Fed's model risk management guidance, SR 11-7, requires banks to validate every model used in material business decisions. Validation includes testing model outputs against expectations, verifying the conceptual soundness of the approach, and being able to explain model behavior to examiners. A model that produces accurate predictions but cannot be explained fails validation.
ECOA and Regulation B
The Equal Credit Opportunity Act requires lenders to provide specific, principal reasons for adverse credit decisions. If a model denies a loan application, the lender must tell the applicant which factors drove the denial. "The model said no" is not a legally sufficient reason. The lender needs to say "your debt-to-income ratio exceeds our threshold" or "your payment history shows 3 late payments in the last 12 months." This requires the model to produce factor-level explanations.
HIPAA and FDA (Healthcare)
Clinical decision support systems that influence treatment or diagnosis are subject to FDA oversight. The FDA's guidance on AI/ML in medical devices emphasizes the need for transparency in how algorithms reach their recommendations. Physicians must understand why a model is recommending a particular treatment path.
Here is what the explainability challenge looks like in practice. A bank uses a model for auto-loan decisions. The data spans three tables.
applicants
| applicant_id | name | income | dti_ratio | credit_score |
|---|---|---|---|---|
| APP-601 | Diana Marsh | $92,000 | 28% | 738 |
| APP-602 | Robert Kang | $67,000 | 42% | 691 |
| APP-603 | Lucia Ferreira | $54,000 | 19% | 714 |
credit_history
| record_id | applicant_id | account_type | balance | late_payments |
|---|---|---|---|---|
| CR-01 | APP-601 | Mortgage | $284,000 | 0 |
| CR-02 | APP-601 | Credit Card | $4,200 | 0 |
| CR-03 | APP-602 | Auto Loan | $18,500 | 3 |
| CR-04 | APP-602 | Credit Card | $12,800 | 2 |
| CR-05 | APP-603 | Student Loan | $31,000 | 0 |
Highlighted: Robert has 5 late payments across two accounts. The model needs to cite these specific records as adverse action reasons, not just 'late_payment_count = 5'.
decisions
| decision_id | applicant_id | model_score | decision | rate_offered |
|---|---|---|---|---|
| DEC-01 | APP-601 | 0.92 | Approved | 5.4% |
| DEC-02 | APP-602 | 0.34 | Denied | --- |
| DEC-03 | APP-603 | 0.78 | Approved | 6.8% |
Highlighted: Robert's denial requires specific ECOA-compliant adverse action reasons. A flat feature model says 'dti_ratio was important.' A relational model cites the 3 late payments on auto loan CR-03 and the $12,800 credit card balance on CR-04.
Two kinds of explanations
Not all explanations serve the same purpose. Regulators, business users, and data scientists each need different views into model behavior.
Global explanations
Global explanations describe how the model behaves in aggregate. Which features matter most across all predictions? Are there systematic biases? How does the model treat different customer segments? These are the explanations that model risk teams and regulators examine during validation. They answer the question: "Is this model conceptually sound?"
Common approaches include SHAP summary plots, permutation importance, and partial dependence plots. For tree-based models like XGBoost or LightGBM, these are straightforward to compute. For deep learning models, they require more sophisticated methods.
Local explanations
Local explanations describe why a specific prediction was made. Why was this transaction flagged? Why was this customer predicted to churn? These are the explanations that investigators, loan officers, and customer-facing teams need. They answer the question: "Why this particular outcome for this particular entity?"
LIME and SHAP values are the most common approaches for local explanation. But they have a fundamental limitation when applied to relational data: they explain the prediction in terms of the engineered features, not the raw data. If your model says "feature X27 (avg_transaction_value_30d) drove this prediction," that is one level removed from the actual data. The investigator needs to know which specific transactions in which specific time window caused the flag.
Typical XAI on flat features
- Explains engineered features, not raw data
- Feature X27 was important (what is X27?)
- No visibility into cross-table relationships
- Separate post-hoc explanation pipeline
- Explanations approximate the model, not reflect it
XAI on relational graphs
- Explains raw data cells across all tables
- This specific transaction on this date drove the flag
- Multi-hop relational paths visible in explanations
- Explanations computed from model gradients directly
- Explanations reflect exactly what the model computed
How gradient-based attribution works on graphs
Deep learning models have a structural advantage for explainability that is often overlooked: they are differentiable end-to-end. This means you can compute the gradient of any output with respect to any input, giving you a precise measure of how much each input contributed to the prediction.
For a graph neural network operating on relational data, this produces something uniquely powerful: cell-level attribution. The gradient tells you not just which table or which column mattered, but which specific value in which specific row of which specific table drove the prediction.
Consider a churn prediction. Here is the raw data and what each type of model explains.
customer_reviews (raw data)
| review_id | customer_id | product_id | rating | date |
|---|---|---|---|---|
| RV-801 | C-4521 | PRD-887 | 3 | 2025-03-02 |
| RV-802 | C-4522 | PRD-887 | 2 | 2025-03-04 |
| RV-803 | C-4523 | PRD-887 | 1 | 2025-03-05 |
| RV-804 | C-4524 | PRD-887 | 2 | 2025-03-06 |
| RV-805 | C-4525 | PRD-887 | 1 | 2025-03-07 |
Highlighted: Customer 4521 left a 3-star review on PRD-887 on March 2. Four other customers who bought PRD-887 in the same week gave it 1-2 stars.
customer_status (what happened next)
| customer_id | status_before | status_after | changed_date |
|---|---|---|---|
| C-4521 | Active | ??? | --- |
| C-4522 | Active | Churned | 2025-04-01 |
| C-4523 | Active | Churned | 2025-03-28 |
| C-4524 | Active | Churned | 2025-04-05 |
| C-4525 | Active | Churned | 2025-03-30 |
Highlighted: 4 of 5 customers who bought PRD-887 in that week have already churned. Customer 4521 is the last one standing.
XGBoost explanation (from flat features)
| feature | shap_value | human_interpretation |
|---|---|---|
| avg_review_score | -0.42 | Important, but which reviews? Which products? |
| order_count_30d | +0.15 | Recent activity looks healthy |
| days_since_last_login | -0.08 | Slightly concerning |
The tree model explains in terms of engineered aggregates. 'avg_review_score was important' tells the retention team nothing actionable. Which review? Which product? Why does it matter?
KumoRFM explanation (cell-level attribution from graph)
| source_table | row | cell | attribution_score |
|---|---|---|---|
| customer_reviews | RV-801 | rating = 3 on PRD-887 (Mar 2) | 0.38 |
| customer_status | C-4522 | Churned (same product, same week) | 0.27 |
| customer_status | C-4523 | Churned (same product, same week) | 0.19 |
| customer_reviews | RV-803 | rating = 1 on PRD-887 (Mar 5) | 0.11 |
The graph model traces the prediction to specific database records: Customer 4521's own review of PRD-887, plus the churn status of other customers who bought the same product in the same window. Every attribution is a verifiable row in the database.
That is a qualitatively different explanation from “avg_review_score was the most important feature.” It traces the prediction back to specific, verifiable data points in the original database. The retention team now has an actionable playbook: investigate PRD-887 quality issues and proactively reach out to Customer 4521 before they follow the other four.
PQL Query
PREDICT decisions.decision = 'Approved' FOR EACH applicants.applicant_id
The model produces predictions with cell-level attribution. For Robert's denial, it traces the decision back to specific rows in the credit_history table, not abstract feature names.
Output
| applicant_id | approval_prob | reason_1 | reason_2 |
|---|---|---|---|
| APP-601 | 0.92 | Clean payment history (CR-01, CR-02) | DTI 28% below threshold |
| APP-602 | 0.34 | 3 late payments on auto loan (CR-03) | High revolving balance $12,800 (CR-04) |
| APP-603 | 0.78 | Low DTI 19%, no delinquencies | Student loan balance manageable at income level |
The dual-explanation approach
KumoRFM generates both global and local explanations automatically, without additional engineering or a separate explanation pipeline.
Global: behavioral cohorts
For a given prediction task, KumoRFM clusters entities into behavioral cohorts based on the relational patterns driving their predictions. For a churn model, this might produce cohorts like "high-frequency buyers whose recent orders were all discounted" and "seasonal shoppers who missed their typical purchase window." Each cohort comes with the relational patterns that define it and the prediction distribution within it.
This gives model risk teams a structured view of model behavior. They can verify that cohorts align with business logic, check for unintended biases, and validate that the model is making predictions for defensible reasons.
Local: gradient saliency
For every individual prediction, KumoRFM computes cell-level attribution using gradient-based saliency. The output is a ranked list of data points across all connected tables, ordered by their contribution to the prediction. An investigator reviewing a flagged transaction sees exactly which data points, in which tables, drove the flag.
Because the model operates on the raw relational graph (not a derived feature table), these attributions point to actual database records. They are verifiable. An auditor can query the original data to confirm that the cited records exist and contain the values the model claims drove the decision.
Practical applications
Fraud detection
When a transaction is flagged, the explanation traces through the graph: the recipient account's creation date, its connections to other flagged accounts, the sender's transaction velocity change, the amount relative to historical patterns. Each element is a specific, verifiable fact. Investigators can confirm or override the flag in minutes rather than hours. Suspicious Activity Reports write themselves from the attribution list.
Credit decisions
ECOA requires specific adverse action reasons. Cell-level attribution maps directly to this requirement. The model denied the application because of three specific factors: payment history on account #X, utilization ratio on card #Y, and the short tenure of the applicant's newest account. The reasons are drawn from the data, not from engineered feature names that mean nothing to the applicant.
Churn intervention
Knowing that a customer will churn is only useful if you know why. Cell-level attribution tells the retention team: this customer is likely to churn because their last three support tickets were unresolved, they switched from monthly to annual billing and immediately filed a downgrade request, and two other customers from the same company have already cancelled. That is an intervention playbook, not just a probability score.
The explainability advantage of relational models
There is an irony in the XAI landscape. Tree-based models (XGBoost, LightGBM) are often called "interpretable" because you can extract feature importances. But those feature importances describe engineered aggregates, not raw data. "avg_order_value_90d was the most important feature" is not a useful explanation for a business user or a regulator.
Graph neural networks operating on relational data are technically more complex. But their explanations are more useful because they reference actual database records. The "black box" concern about deep learning is valid for models trained on opaque feature vectors. It is less valid for models trained on structured, queryable relational data where every attribution can be traced back to a specific row in a specific table.
As regulatory requirements tighten, the ability to produce cell-level, verifiable explanations from raw data will shift from a differentiator to a prerequisite. Models that cannot explain themselves will not be deployable. Models that explain themselves in terms of engineered features will not satisfy regulators who want to see the underlying data. Models that trace their predictions back to specific database records will be the only ones that pass muster.