You built a fraud detection model. It works. The AUC looks good, the false positive rate is acceptable, and engineering deployed it to production last quarter. Then compliance walks in and asks you to explain why Customer #48291 got flagged.
You open your SHAP dashboard and show them: transaction_amount contributed +0.3, merchant_risk_score contributed +0.2, velocity_7d contributed +0.15. The compliance officer stares at it and says: "But what is the merchant_risk_score? Where does velocity_7d come from? Which specific transactions drove it?"
This is the explainability gap. Your model explains itself in terms of engineered features. Your compliance team needs explanations in terms of actual data records. And the distance between those two things is where most ML teams get stuck.
The four levels of ML explainability
Not all explanations are the same. Different audiences need different depths. Here are the four levels, from broadest to most specific:
Level 1: Global feature importance
Which features matter most across all predictions? This answers the executive question: "What drives fraud in our portfolio?" Every serious ML tool provides this. It is table stakes.
- Best for: Executive briefings, model governance reviews, and initial model validation.
- Watch out for: Global importance masks per-prediction variation. A feature that matters on average may not matter for the specific case your compliance officer is asking about.
Level 2: Per-prediction feature attribution (SHAP)
For this specific prediction, which features pushed the score up or down? This answers the operations question: "Why was this particular transaction flagged?" Most modern ML tools provide SHAP values or similar per-prediction explanations.
- Best for: Operations teams triaging alerts, data scientists debugging model behavior, and FCRA/GDPR basic compliance.
- Watch out for: SHAP explains engineered features, not source data. If your feature is "device_risk_score," SHAP cannot tell you which device or which connected accounts made it risky.
Level 3: Path-based explanations
Which relationships across connected tables drove this prediction? This answers the compliance question: "Show me the chain of data that led to this alert." Very few tools provide this because most models operate on flat tables and have no concept of relational paths.
- Best for: BSA/AML SAR filings, ECOA adverse action notices, EU AI Act compliance, and any regulation requiring data traceability.
- Watch out for: Requires a model that operates on relational data natively. You cannot bolt path-based explanations onto a flat-table model after the fact.
Level 4: Cell-level explanations
Which specific data points, in which specific rows and columns of which specific tables, had the most influence on this prediction? This answers the audit question: "Give me an exact trail from prediction back to source data." This is the hardest level and the one regulators increasingly demand.
- Best for: External audits, regulatory examinations, and building evidentiary packages for legal proceedings.
- Watch out for: Very few tools provide this today. KumoRFM provides cell-level traceability because it operates on raw relational tables rather than pre-aggregated features.
Why most tools stop at Level 2
The reason is architectural. Most ML tools work on flat tables. You give XGBoost, LightGBM, or a DataRobot AutoML pipeline a CSV with one row per entity and columns for each feature. The model learns which columns predict the target. SHAP values tell you how much each column contributed to each row's prediction.
This is Level 2. It is useful. But it has a hard ceiling: the explanation can only reference columns in the flat table. If you manually engineered a feature called num_shared_devices, SHAP can tell you that feature mattered. It cannot tell you which devices were shared, with which accounts, or what those accounts did.
The information that would make the explanation actually useful to a compliance officer was destroyed during feature engineering. You collapsed a rich relational structure into a single number, and SHAP can only explain the number, not the structure behind it.
What compliance actually needs
Here is what different regulations require and what level of explainability satisfies them:
regulation_explainability_requirements
| regulation | requirement | minimum_explainability_level | what_auditors_ask_for |
|---|---|---|---|
| ECOA (Equal Credit Opportunity Act) | Specific adverse action reasons for denied loans | Level 3 - Path-based | Which specific data points caused denial? Can you trace back to source records? |
| FCRA (Fair Credit Reporting Act) | Disclosure of factors negatively affecting credit decisions | Level 2 - SHAP minimum, Level 3 preferred | Which factors lowered the score? Are they traceable to credit report data? |
| BSA/AML (Bank Secrecy Act) | Documented reasoning for suspicious activity reports | Level 3 - Path-based | Why is this activity suspicious? What connected transactions and entities form the pattern? |
| GDPR Article 22 | Right to explanation of automated decisions | Level 2 - SHAP minimum | What logic was involved? What data was used? How can the individual contest it? |
| EU AI Act (2026) | Transparency and human oversight for high-risk AI | Level 3 - Path-based | How does the system reach its decisions? Can a human reviewer understand and verify the reasoning? |
Regulatory requirements mapped to explainability levels. BSA/AML and ECOA demand path-level traceability. The EU AI Act will raise the bar further starting in 2026.
Tool comparison: who explains what
Not all ML tools provide the same depth of explainability. Here is an honest comparison:
explainability_tool_comparison
| tool | global_feature_importance | per_prediction_SHAP | path_based_explanations | cell_level_traceability |
|---|---|---|---|---|
| XGBoost (manual) | Yes | Yes (with SHAP library) | No - flat table input only | No |
| LightGBM (manual) | Yes | Yes (with SHAP library) | No - flat table input only | No |
| DataRobot | Yes - dashboard | Yes - built-in | No - operates on flat feature sets | No |
| H2O.ai | Yes | Yes - SHAP and LIME | No - flat table input only | No |
| KumoRFM | Yes | Yes | Yes - traces through relational graph | Yes - identifies specific source records |
Most ML tools provide Level 1 and Level 2 explainability. KumoRFM is the only platform that provides Level 3 path-based and Level 4 cell-level explanations because it operates on relational tables directly rather than pre-flattened features.
The SHAP gap: what Level 2 explanations miss
To make this concrete, consider a fraud alert on Transaction TXN-9934. Here is what a Level 2 SHAP explanation looks like from XGBoost:
shap_explanation_xgboost
| feature | SHAP_value | direction |
|---|---|---|
| device_risk_score | +0.42 | Increases fraud probability |
| transaction_amount | +0.18 | Increases fraud probability |
| velocity_24h | +0.12 | Increases fraud probability |
| merchant_category | +0.08 | Increases fraud probability |
| account_age_days | -0.05 | Decreases fraud probability |
XGBoost SHAP explanation for TXN-9934. Tells you which features mattered, but not which specific devices, transactions, or accounts are connected.
A compliance officer reviewing this alert will ask: "What makes the device risky? Which transactions are in the 24-hour velocity window? What is the connection to other flagged accounts?" The SHAP explanation cannot answer these questions because the underlying data was collapsed into aggregate features before the model ever saw it.
Now here is what a Level 3 path-based explanation from KumoRFM looks like for the same transaction:
path_explanation_kumo
| path_step | entity | detail | contribution |
|---|---|---|---|
| 1 | Transaction TXN-9934 | Amount $2,847, merchant category 5411, 11:42 PM | Base signal: amount and time anomaly |
| 2 | Device D-4412 | Used by 4 accounts in last 7 days | Device shared with flagged accounts |
| 3 | Account A-7719 | Also uses Device D-4412, 3 chargebacks in 30 days | Connected account has confirmed fraud |
| 4 | Address AD-1182 | Shipping address shared by A-7719 and 2 other flagged accounts | Address cluster linked to synthetic identity ring |
KumoRFM path-based explanation for TXN-9934. Traces the prediction through specific devices, accounts, and addresses. Each step is auditable back to source records.
This is the difference compliance teams care about. The SHAP explanation says "device_risk_score was important." The path explanation says "Device D-4412 connects this account to Account A-7719, which has 3 chargebacks, through a shared shipping address linked to a known synthetic identity cluster." One is a feature weight. The other is an evidence trail.
Why path-based explainability requires relational models
You cannot bolt path-based explanations onto a flat-table model after the fact. The reason is structural: once you flatten relational data into a feature table, the paths are gone. You cannot reconstruct which specific device connected to which specific account from a column called num_shared_devices = 3.
Path-based explanations require a model that operates on the relational structure directly. The model needs to see the graph of connections between entities, make predictions using those connections, and then trace back through the graph to explain which connections mattered.
This is why KumoRFM provides path-based explainability and flat-table tools do not. KumoRFM reads raw relational tables (accounts, transactions, devices, addresses) and constructs the heterogeneous graph internally. When it makes a prediction, it knows which paths through the graph contributed, because the graph is the model's native input, not a flattened derivative of it.
Level 2 explainability (SHAP on flat table)
- Feature weights: device_risk_score +0.42, velocity_24h +0.12
- Compliance asks: which device? which transactions? SHAP cannot answer
- Adverse action notice: 'transaction velocity and device risk exceeded thresholds'
- Auditor asks for source data trail. You rebuild it manually from logs
- SAR filing requires narrative. You reverse-engineer it from feature weights
- Hours of manual work per alert to satisfy compliance
Level 3 explainability (path-based on relational graph)
- Evidence path: TXN-9934 > Device D-4412 > Account A-7719 (3 chargebacks) > Address AD-1182 (synthetic identity cluster)
- Compliance sees exactly which records drove the alert
- Adverse action notice: specific reasons traced to source data
- Auditor gets a complete trail from prediction to source records
- SAR narrative writes itself from the relational path
- Minutes per alert instead of hours
PQL Query
PREDICT is_fraud FOR EACH transactions.transaction_id EXPLAIN paths
One PQL query produces both the fraud prediction and the path-based explanation. KumoRFM reads accounts, transactions, devices, and address tables directly and returns the relational paths that drove each prediction. No feature engineering, no separate explainability tooling.
Output
| transaction_id | fraud_probability | top_explanation_path | path_contribution |
|---|---|---|---|
| TXN-9934 | 0.94 | TXN-9934 > Device D-4412 > Account A-7719 (3 chargebacks) | 0.52 |
| TXN-9935 | 0.87 | TXN-9935 > Address AD-1182 > 5 accounts with synthetic ID flags | 0.44 |
| TXN-9936 | 0.12 | No high-risk relational paths detected | 0.00 |
| TXN-9937 | 0.91 | TXN-9937 > Account A-3301 > 4-hop money mule chain to known fraud | 0.61 |
Making the case to your compliance team
If you are an ML engineer trying to get compliance sign-off on a model, here is what matters to them:
- Traceability. Can every prediction be traced back to specific data records? Not feature weights, not aggregate scores, but actual rows in actual tables. Path-based explanations provide this. SHAP on flat tables does not.
- Auditability. Can an external auditor reproduce the explanation? If your explanation requires understanding how 87 engineered features were derived from raw data, that is a documentation burden that grows with every feature. If your explanation is a path through source tables, the auditor can verify it directly.
- Adverse action compliance. For lending decisions under ECOA and FCRA, you must provide specific reasons for denial. "Model feature #47 exceeded threshold" is not a valid reason. "Applicant's debt-to-income ratio of 0.62 exceeded the 0.45 threshold, based on reported debts in credit bureau records" is a valid reason. The second requires traceability to source data.
- SAR narrative generation. For BSA/AML suspicious activity reports, you need a narrative explaining why the activity is suspicious. Path-based explanations map directly to SAR narratives because they describe the chain of connected entities and behaviors. SHAP values require manual translation.
- EU AI Act readiness. Starting in 2026, high-risk AI systems (including fraud detection and credit scoring) must provide transparency documentation and support human oversight. Path-based explanations are the most natural way to satisfy these requirements because they show the reasoning in terms a human reviewer can follow.
The accuracy advantage of explainable relational models
There is a common assumption that explainability comes at the cost of accuracy. That is true when you are choosing between a complex black-box model and a simple interpretable one on the same flat table. But when you move from flat-table models to relational models, you get both higher accuracy and deeper explainability.
accuracy_and_explainability
| approach | SAP_SALT_accuracy | explainability_level |
|---|---|---|
| LLM + AutoML | 63% | Level 1 - Global feature importance |
| PhD Data Scientist + XGBoost | 75% | Level 2 - SHAP values |
| KumoRFM (zero-shot) | 91% | Level 3/4 - Path-based + cell-level |
SAP SALT benchmark: KumoRFM achieves both higher accuracy (91% vs 75%) and deeper explainability (path-based vs SHAP-only). Relational models do not trade accuracy for explainability.
The 16-percentage-point accuracy gap between KumoRFM and XGBoost comes from the same relational patterns that enable path-based explanations. The model is more accurate because it reads the connections between entities. It is more explainable because it can trace predictions back through those same connections. The relational structure serves both purposes.