What is explainable AI (XAI)?

Explainable AI refers to methods and techniques that make the output of machine learning models understandable to humans. Rather than treating a model as a black box that produces a score, XAI provides reasons: which input features drove the prediction, how much each feature contributed, and what patterns the model detected. In regulated industries like banking, insurance, and healthcare, XAI is not optional. It is a compliance requirement.

Why is explainability required for AI in financial services?

Multiple regulations mandate it. The Equal Credit Opportunity Act (ECOA) requires lenders to provide specific reasons when denying credit. The Federal Reserve's SR 11-7 guidance requires banks to validate and explain model outputs. The EU AI Act classifies credit scoring and fraud detection as high-risk AI systems that require transparency and human oversight. A model that cannot explain its decisions cannot be deployed in these contexts.

What is the difference between local and global explanations?

Global explanations describe how a model behaves overall: which features matter most across all predictions, what patterns the model has learned, and how different customer segments are treated. Local explanations describe why a specific individual prediction was made: why this particular transaction was flagged, or why this specific customer was predicted to churn. Regulators typically require both.

Can deep learning models be made explainable?

Yes, through gradient-based attribution methods. These techniques compute how much each input feature contributed to a specific prediction by analyzing the gradients flowing through the network. For graph neural networks operating on relational data, this produces cell-level attribution: the model can identify that a specific value in a specific row of a specific table was the primary driver of a prediction. This is more granular than feature-importance methods used with tree-based models.

How does KumoRFM provide explanations?

KumoRFM uses a dual approach. Global explanations group predictions into behavioral cohorts (e.g., high-frequency buyers vs. seasonal shoppers) and show which relational patterns drive each cohort's predictions. Local explanations use gradient-based saliency to compute cell-level attribution for every individual prediction, identifying which specific data points across all connected tables drove the result. Both are generated automatically without additional engineering.

Explainable AI: When the Regulator Asks 'Why Did Your Model Say That?' | Kumo.ai

A bank deploys a fraud detection model. It flags 2,000 transactions per day. An investigator picks up case #1,247: a $3,400 wire transfer from a small-business account in Ohio to a supplier in Portugal. The model says the probability of fraud is 94%. The investigator needs to decide whether to block it.

The first question is not "is the model right?" It is "why does the model think this is fraud?" Without an answer to that question, the investigator has nothing actionable. They cannot write a Suspicious Activity Report. They cannot justify the hold to the customer. They cannot explain the decision to their compliance team, their regulator, or a judge.

This is the explainability problem. And in regulated industries, it is not an academic concern. It is a legal requirement.

What explainable AI actually means

Explainable AI (XAI) refers to methods that make model predictions understandable to humans. Not "the model is 94% confident." That is a score, not an explanation. An explanation answers the question: which specific inputs caused this specific output, and by how much?

For that fraud case, a real explanation might look like: "This transaction was flagged because (1) the recipient account was created 14 days ago and has received transfers from 6 other accounts that were later flagged for fraud, (2) the sender's transaction velocity increased 4x in the last 72 hours, and (3) the transfer amount is 8x the sender's median transaction size."

That is actionable. The investigator can verify each claim. The compliance team can document it. The regulator can audit it.

The XAI market reflects the urgency. Industry analysts project the global explainable AI market will grow from $6.2 billion in 2024 to over $24 billion by 2030, driven primarily by regulatory pressure in financial services, healthcare, and insurance.

The regulatory landscape

Explainability is not a feature request from product teams. It is mandated by law in most regulated industries.

EU AI Act (2024)

The EU AI Act classifies AI systems into risk tiers. Credit scoring, fraud detection, insurance pricing, and hiring decisions are all classified as high-risk. High-risk systems must provide transparency to users, enable human oversight, and maintain documentation that explains how the system reaches its decisions. The penalties for non-compliance reach 35 million euros or 7% of global annual turnover, whichever is higher.

SR 11-7 (Federal Reserve)

The Fed's model risk management guidance, SR 11-7, requires banks to validate every model used in material business decisions. Validation includes testing model outputs against expectations, verifying the conceptual soundness of the approach, and being able to explain model behavior to examiners. A model that produces accurate predictions but cannot be explained fails validation.

ECOA and Regulation B

The Equal Credit Opportunity Act requires lenders to provide specific, principal reasons for adverse credit decisions. If a model denies a loan application, the lender must tell the applicant which factors drove the denial. "The model said no" is not a legally sufficient reason. The lender needs to say "your debt-to-income ratio exceeds our threshold" or "your payment history shows 3 late payments in the last 12 months." This requires the model to produce factor-level explanations.

HIPAA and FDA (Healthcare)

Clinical decision support systems that influence treatment or diagnosis are subject to FDA oversight. The FDA's guidance on AI/ML in medical devices emphasizes the need for transparency in how algorithms reach their recommendations. Physicians must understand why a model is recommending a particular treatment path.

Here is what the explainability challenge looks like in practice. A bank uses a model for auto-loan decisions. The data spans three tables.

applicants

applicant_id	name	income	dti_ratio	credit_score
APP-601	Diana Marsh	$92,000	28%	738
APP-602	Robert Kang	$67,000	42%	691
APP-603	Lucia Ferreira	$54,000	19%	714

credit_history

record_id	applicant_id	account_type	balance	late_payments
CR-01	APP-601	Mortgage	$284,000	0
CR-02	APP-601	Credit Card	$4,200	0
CR-03	APP-602	Auto Loan	$18,500	3
CR-04	APP-602	Credit Card	$12,800	2
CR-05	APP-603	Student Loan	$31,000	0

Highlighted: Robert has 5 late payments across two accounts. The model needs to cite these specific records as adverse action reasons, not just 'late_payment_count = 5'.

decisions

decision_id	applicant_id	model_score	decision	rate_offered
DEC-01	APP-601	0.92	Approved	5.4%
DEC-02	APP-602	0.34	Denied	---
DEC-03	APP-603	0.78	Approved	6.8%

Highlighted: Robert's denial requires specific ECOA-compliant adverse action reasons. A flat feature model says 'dti_ratio was important.' A relational model cites the 3 late payments on auto loan CR-03 and the $12,800 credit card balance on CR-04.

Two kinds of explanations

Not all explanations serve the same purpose. Regulators, business users, and data scientists each need different views into model behavior.

Global explanations

Global explanations describe how the model behaves in aggregate. Which features matter most across all predictions? Are there systematic biases? How does the model treat different customer segments? These are the explanations that model risk teams and regulators examine during validation. They answer the question: "Is this model conceptually sound?"

Common approaches include SHAP summary plots, permutation importance, and partial dependence plots. For tree-based models like XGBoost or LightGBM, these are straightforward to compute. For deep learning models, they require more sophisticated methods.

Local explanations

Local explanations describe why a specific prediction was made. Why was this transaction flagged? Why was this customer predicted to churn? These are the explanations that investigators, loan officers, and customer-facing teams need. They answer the question: "Why this particular outcome for this particular entity?"

LIME and SHAP values are the most common approaches for local explanation. But they have a fundamental limitation when applied to relational data: they explain the prediction in terms of the engineered features, not the raw data. If your model says "feature X27 (avg_transaction_value_30d) drove this prediction," that is one level removed from the actual data. The investigator needs to know which specific transactions in which specific time window caused the flag.

Typical XAI on flat features

Explains engineered features, not raw data
Feature X27 was important (what is X27?)
No visibility into cross-table relationships
Separate post-hoc explanation pipeline
Explanations approximate the model, not reflect it

XAI on relational graphs

Explains raw data cells across all tables
This specific transaction on this date drove the flag
Multi-hop relational paths visible in explanations
Explanations computed from model gradients directly
Explanations reflect exactly what the model computed

How gradient-based attribution works on graphs

Deep learning models have a structural advantage for explainability that is often overlooked: they are differentiable end-to-end. This means you can compute the gradient of any output with respect to any input, giving you a precise measure of how much each input contributed to the prediction.

For a graph neural network operating on relational data, this produces something uniquely powerful: cell-level attribution. The gradient tells you not just which table or which column mattered, but which specific value in which specific row of which specific table drove the prediction.

Consider a churn prediction. Here is the raw data and what each type of model explains.

customer_reviews (raw data)

review_id	customer_id	product_id	rating	date
RV-801	C-4521	PRD-887	3	2025-03-02
RV-802	C-4522	PRD-887	2	2025-03-04
RV-803	C-4523	PRD-887	1	2025-03-05
RV-804	C-4524	PRD-887	2	2025-03-06
RV-805	C-4525	PRD-887	1	2025-03-07

Highlighted: Customer 4521 left a 3-star review on PRD-887 on March 2. Four other customers who bought PRD-887 in the same week gave it 1-2 stars.

customer_status (what happened next)

customer_id	status_before	status_after	changed_date
C-4521	Active	???	---
C-4522	Active	Churned	2025-04-01
C-4523	Active	Churned	2025-03-28
C-4524	Active	Churned	2025-04-05
C-4525	Active	Churned	2025-03-30

Highlighted: 4 of 5 customers who bought PRD-887 in that week have already churned. Customer 4521 is the last one standing.

XGBoost explanation (from flat features)

feature	shap_value	human_interpretation
avg_review_score	-0.42	Important, but which reviews? Which products?
order_count_30d	+0.15	Recent activity looks healthy
days_since_last_login	-0.08	Slightly concerning

The tree model explains in terms of engineered aggregates. 'avg_review_score was important' tells the retention team nothing actionable. Which review? Which product? Why does it matter?

KumoRFM explanation (cell-level attribution from graph)

source_table	row	cell	attribution_score
customer_reviews	RV-801	rating = 3 on PRD-887 (Mar 2)	0.38
customer_status	C-4522	Churned (same product, same week)	0.27
customer_status	C-4523	Churned (same product, same week)	0.19
customer_reviews	RV-803	rating = 1 on PRD-887 (Mar 5)	0.11

The graph model traces the prediction to specific database records: Customer 4521's own review of PRD-887, plus the churn status of other customers who bought the same product in the same window. Every attribution is a verifiable row in the database.

That is a qualitatively different explanation from “avg_review_score was the most important feature.” It traces the prediction back to specific, verifiable data points in the original database. The retention team now has an actionable playbook: investigate PRD-887 quality issues and proactively reach out to Customer 4521 before they follow the other four.

PQL Query

PREDICT decisions.decision = 'Approved'
FOR EACH applicants.applicant_id

The model produces predictions with cell-level attribution. For Robert's denial, it traces the decision back to specific rows in the credit_history table, not abstract feature names.

Output

applicant_id	approval_prob	reason_1	reason_2
APP-601	0.92	Clean payment history (CR-01, CR-02)	DTI 28% below threshold
APP-602	0.34	3 late payments on auto loan (CR-03)	High revolving balance $12,800 (CR-04)
APP-603	0.78	Low DTI 19%, no delinquencies	Student loan balance manageable at income level

The dual-explanation approach

KumoRFM generates both global and local explanations automatically, without additional engineering or a separate explanation pipeline.

Global: behavioral cohorts

For a given prediction task, KumoRFM clusters entities into behavioral cohorts based on the relational patterns driving their predictions. For a churn model, this might produce cohorts like "high-frequency buyers whose recent orders were all discounted" and "seasonal shoppers who missed their typical purchase window." Each cohort comes with the relational patterns that define it and the prediction distribution within it.

This gives model risk teams a structured view of model behavior. They can verify that cohorts align with business logic, check for unintended biases, and validate that the model is making predictions for defensible reasons.

Local: gradient saliency

For every individual prediction, KumoRFM computes cell-level attribution using gradient-based saliency. The output is a ranked list of data points across all connected tables, ordered by their contribution to the prediction. An investigator reviewing a flagged transaction sees exactly which data points, in which tables, drove the flag.

Because the model operates on the raw relational graph (not a derived feature table), these attributions point to actual database records. They are verifiable. An auditor can query the original data to confirm that the cited records exist and contain the values the model claims drove the decision.

Practical applications

Fraud detection

When a transaction is flagged, the explanation traces through the graph: the recipient account's creation date, its connections to other flagged accounts, the sender's transaction velocity change, the amount relative to historical patterns. Each element is a specific, verifiable fact. Investigators can confirm or override the flag in minutes rather than hours. Suspicious Activity Reports write themselves from the attribution list.

Credit decisions

ECOA requires specific adverse action reasons. Cell-level attribution maps directly to this requirement. The model denied the application because of three specific factors: payment history on account #X, utilization ratio on card #Y, and the short tenure of the applicant's newest account. The reasons are drawn from the data, not from engineered feature names that mean nothing to the applicant.

Churn intervention

Knowing that a customer will churn is only useful if you know why. Cell-level attribution tells the retention team: this customer is likely to churn because their last three support tickets were unresolved, they switched from monthly to annual billing and immediately filed a downgrade request, and two other customers from the same company have already cancelled. That is an intervention playbook, not just a probability score.

The explainability advantage of relational models

There is an irony in the XAI landscape. Tree-based models (XGBoost, LightGBM) are often called "interpretable" because you can extract feature importances. But those feature importances describe engineered aggregates, not raw data. "avg_order_value_90d was the most important feature" is not a useful explanation for a business user or a regulator.

Graph neural networks operating on relational data are technically more complex. But their explanations are more useful because they reference actual database records. The "black box" concern about deep learning is valid for models trained on opaque feature vectors. It is less valid for models trained on structured, queryable relational data where every attribution can be traced back to a specific row in a specific table.

As regulatory requirements tighten, the ability to produce cell-level, verifiable explanations from raw data will shift from a differentiator to a prerequisite. Models that cannot explain themselves will not be deployable. Models that explain themselves in terms of engineered features will not satisfy regulators who want to see the underlying data. Models that trace their predictions back to specific database records will be the only ones that pass muster.

Key Takeaways

1Explainability is not optional in regulated industries. The EU AI Act, SR 11-7, and ECOA all mandate that AI decisions be explainable to regulators, auditors, and affected individuals.
2Traditional XAI explains engineered features ('feature X27 was important'), not raw data. Regulators and business users need explanations that reference specific database records.
3Gradient-based attribution on graph models produces cell-level explanations: which specific value in which specific row of which specific table drove the prediction.
4KumoRFM provides dual explanations automatically: global behavioral cohorts for model validation, and local gradient saliency for individual prediction accountability.
5The explainability advantage of relational models is that attributions point to actual, queryable database records, making them verifiable by auditors and actionable by business teams.

Explainable AI: When the Regulator Asks 'Why Did Your Model Say That?'