Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
Learn11 min read

How to Explain ML Predictions to Business Stakeholders and Compliance Teams

Your model says this transaction is 94% likely fraud. Your compliance officer asks: why? Your VP asks: what drives fraud overall? Your regulator asks: show me the specific data that triggered this alert. These are three different questions that require three different levels of explainability. Most ML tools only answer one of them.

TL;DR

  • 1There are 4 levels of ML explainability: (1) global feature importance, (2) per-prediction SHAP values, (3) path-based explanations across related tables, and (4) cell-level explanations tracing to specific data points. Most tools stop at level 2.
  • 2Compliance regulations (ECOA, FCRA, GDPR, BSA/AML, EU AI Act) require you to explain WHY a fraud alert fired or WHY a loan was denied. SHAP values on a flat table are not enough when the real signal comes from relationships between accounts, devices, and transactions.
  • 3XGBoost provides SHAP-based explanations for single-table features. DataRobot adds feature importance dashboards. KumoRFM provides path-based explanations that trace each prediction through the full relational graph back to specific records.
  • 4On the SAP SALT enterprise benchmark, KumoRFM achieves 91% accuracy vs 75% for PhD data scientists with XGBoost and 63% for LLM+AutoML. Higher accuracy with built-in explainability means fewer false alerts to investigate and better audit trails.
  • 5KumoRFM reads raw relational tables and provides path-based explanations showing which connected records (accounts, devices, addresses) drove each prediction. No feature engineering, no graph construction, no separate explainability tooling.

You built a fraud detection model. It works. The AUC looks good, the false positive rate is acceptable, and engineering deployed it to production last quarter. Then compliance walks in and asks you to explain why Customer #48291 got flagged.

You open your SHAP dashboard and show them: transaction_amount contributed +0.3, merchant_risk_score contributed +0.2, velocity_7d contributed +0.15. The compliance officer stares at it and says: "But what is the merchant_risk_score? Where does velocity_7d come from? Which specific transactions drove it?"

This is the explainability gap. Your model explains itself in terms of engineered features. Your compliance team needs explanations in terms of actual data records. And the distance between those two things is where most ML teams get stuck.

The four levels of ML explainability

Not all explanations are the same. Different audiences need different depths. Here are the four levels, from broadest to most specific:

Level 1: Global feature importance

Which features matter most across all predictions? This answers the executive question: "What drives fraud in our portfolio?" Every serious ML tool provides this. It is table stakes.

  • Best for: Executive briefings, model governance reviews, and initial model validation.
  • Watch out for: Global importance masks per-prediction variation. A feature that matters on average may not matter for the specific case your compliance officer is asking about.

Level 2: Per-prediction feature attribution (SHAP)

For this specific prediction, which features pushed the score up or down? This answers the operations question: "Why was this particular transaction flagged?" Most modern ML tools provide SHAP values or similar per-prediction explanations.

  • Best for: Operations teams triaging alerts, data scientists debugging model behavior, and FCRA/GDPR basic compliance.
  • Watch out for: SHAP explains engineered features, not source data. If your feature is "device_risk_score," SHAP cannot tell you which device or which connected accounts made it risky.

Level 3: Path-based explanations

Which relationships across connected tables drove this prediction? This answers the compliance question: "Show me the chain of data that led to this alert." Very few tools provide this because most models operate on flat tables and have no concept of relational paths.

  • Best for: BSA/AML SAR filings, ECOA adverse action notices, EU AI Act compliance, and any regulation requiring data traceability.
  • Watch out for: Requires a model that operates on relational data natively. You cannot bolt path-based explanations onto a flat-table model after the fact.

Level 4: Cell-level explanations

Which specific data points, in which specific rows and columns of which specific tables, had the most influence on this prediction? This answers the audit question: "Give me an exact trail from prediction back to source data." This is the hardest level and the one regulators increasingly demand.

  • Best for: External audits, regulatory examinations, and building evidentiary packages for legal proceedings.
  • Watch out for: Very few tools provide this today. KumoRFM provides cell-level traceability because it operates on raw relational tables rather than pre-aggregated features.

Why most tools stop at Level 2

The reason is architectural. Most ML tools work on flat tables. You give XGBoost, LightGBM, or a DataRobot AutoML pipeline a CSV with one row per entity and columns for each feature. The model learns which columns predict the target. SHAP values tell you how much each column contributed to each row's prediction.

This is Level 2. It is useful. But it has a hard ceiling: the explanation can only reference columns in the flat table. If you manually engineered a feature called num_shared_devices, SHAP can tell you that feature mattered. It cannot tell you which devices were shared, with which accounts, or what those accounts did.

The information that would make the explanation actually useful to a compliance officer was destroyed during feature engineering. You collapsed a rich relational structure into a single number, and SHAP can only explain the number, not the structure behind it.

What compliance actually needs

Here is what different regulations require and what level of explainability satisfies them:

regulation_explainability_requirements

regulationrequirementminimum_explainability_levelwhat_auditors_ask_for
ECOA (Equal Credit Opportunity Act)Specific adverse action reasons for denied loansLevel 3 - Path-basedWhich specific data points caused denial? Can you trace back to source records?
FCRA (Fair Credit Reporting Act)Disclosure of factors negatively affecting credit decisionsLevel 2 - SHAP minimum, Level 3 preferredWhich factors lowered the score? Are they traceable to credit report data?
BSA/AML (Bank Secrecy Act)Documented reasoning for suspicious activity reportsLevel 3 - Path-basedWhy is this activity suspicious? What connected transactions and entities form the pattern?
GDPR Article 22Right to explanation of automated decisionsLevel 2 - SHAP minimumWhat logic was involved? What data was used? How can the individual contest it?
EU AI Act (2026)Transparency and human oversight for high-risk AILevel 3 - Path-basedHow does the system reach its decisions? Can a human reviewer understand and verify the reasoning?

Regulatory requirements mapped to explainability levels. BSA/AML and ECOA demand path-level traceability. The EU AI Act will raise the bar further starting in 2026.

Tool comparison: who explains what

Not all ML tools provide the same depth of explainability. Here is an honest comparison:

explainability_tool_comparison

toolglobal_feature_importanceper_prediction_SHAPpath_based_explanationscell_level_traceability
XGBoost (manual)YesYes (with SHAP library)No - flat table input onlyNo
LightGBM (manual)YesYes (with SHAP library)No - flat table input onlyNo
DataRobotYes - dashboardYes - built-inNo - operates on flat feature setsNo
H2O.aiYesYes - SHAP and LIMENo - flat table input onlyNo
KumoRFMYesYesYes - traces through relational graphYes - identifies specific source records

Most ML tools provide Level 1 and Level 2 explainability. KumoRFM is the only platform that provides Level 3 path-based and Level 4 cell-level explanations because it operates on relational tables directly rather than pre-flattened features.

The SHAP gap: what Level 2 explanations miss

To make this concrete, consider a fraud alert on Transaction TXN-9934. Here is what a Level 2 SHAP explanation looks like from XGBoost:

shap_explanation_xgboost

featureSHAP_valuedirection
device_risk_score+0.42Increases fraud probability
transaction_amount+0.18Increases fraud probability
velocity_24h+0.12Increases fraud probability
merchant_category+0.08Increases fraud probability
account_age_days-0.05Decreases fraud probability

XGBoost SHAP explanation for TXN-9934. Tells you which features mattered, but not which specific devices, transactions, or accounts are connected.

A compliance officer reviewing this alert will ask: "What makes the device risky? Which transactions are in the 24-hour velocity window? What is the connection to other flagged accounts?" The SHAP explanation cannot answer these questions because the underlying data was collapsed into aggregate features before the model ever saw it.

Now here is what a Level 3 path-based explanation from KumoRFM looks like for the same transaction:

path_explanation_kumo

path_stepentitydetailcontribution
1Transaction TXN-9934Amount $2,847, merchant category 5411, 11:42 PMBase signal: amount and time anomaly
2Device D-4412Used by 4 accounts in last 7 daysDevice shared with flagged accounts
3Account A-7719Also uses Device D-4412, 3 chargebacks in 30 daysConnected account has confirmed fraud
4Address AD-1182Shipping address shared by A-7719 and 2 other flagged accountsAddress cluster linked to synthetic identity ring

KumoRFM path-based explanation for TXN-9934. Traces the prediction through specific devices, accounts, and addresses. Each step is auditable back to source records.

This is the difference compliance teams care about. The SHAP explanation says "device_risk_score was important." The path explanation says "Device D-4412 connects this account to Account A-7719, which has 3 chargebacks, through a shared shipping address linked to a known synthetic identity cluster." One is a feature weight. The other is an evidence trail.

Why path-based explainability requires relational models

You cannot bolt path-based explanations onto a flat-table model after the fact. The reason is structural: once you flatten relational data into a feature table, the paths are gone. You cannot reconstruct which specific device connected to which specific account from a column called num_shared_devices = 3.

Path-based explanations require a model that operates on the relational structure directly. The model needs to see the graph of connections between entities, make predictions using those connections, and then trace back through the graph to explain which connections mattered.

This is why KumoRFM provides path-based explainability and flat-table tools do not. KumoRFM reads raw relational tables (accounts, transactions, devices, addresses) and constructs the heterogeneous graph internally. When it makes a prediction, it knows which paths through the graph contributed, because the graph is the model's native input, not a flattened derivative of it.

Level 2 explainability (SHAP on flat table)

  • Feature weights: device_risk_score +0.42, velocity_24h +0.12
  • Compliance asks: which device? which transactions? SHAP cannot answer
  • Adverse action notice: 'transaction velocity and device risk exceeded thresholds'
  • Auditor asks for source data trail. You rebuild it manually from logs
  • SAR filing requires narrative. You reverse-engineer it from feature weights
  • Hours of manual work per alert to satisfy compliance

Level 3 explainability (path-based on relational graph)

  • Evidence path: TXN-9934 > Device D-4412 > Account A-7719 (3 chargebacks) > Address AD-1182 (synthetic identity cluster)
  • Compliance sees exactly which records drove the alert
  • Adverse action notice: specific reasons traced to source data
  • Auditor gets a complete trail from prediction to source records
  • SAR narrative writes itself from the relational path
  • Minutes per alert instead of hours

PQL Query

PREDICT is_fraud
FOR EACH transactions.transaction_id
EXPLAIN paths

One PQL query produces both the fraud prediction and the path-based explanation. KumoRFM reads accounts, transactions, devices, and address tables directly and returns the relational paths that drove each prediction. No feature engineering, no separate explainability tooling.

Output

transaction_idfraud_probabilitytop_explanation_pathpath_contribution
TXN-99340.94TXN-9934 > Device D-4412 > Account A-7719 (3 chargebacks)0.52
TXN-99350.87TXN-9935 > Address AD-1182 > 5 accounts with synthetic ID flags0.44
TXN-99360.12No high-risk relational paths detected0.00
TXN-99370.91TXN-9937 > Account A-3301 > 4-hop money mule chain to known fraud0.61

Making the case to your compliance team

If you are an ML engineer trying to get compliance sign-off on a model, here is what matters to them:

  1. Traceability. Can every prediction be traced back to specific data records? Not feature weights, not aggregate scores, but actual rows in actual tables. Path-based explanations provide this. SHAP on flat tables does not.
  2. Auditability. Can an external auditor reproduce the explanation? If your explanation requires understanding how 87 engineered features were derived from raw data, that is a documentation burden that grows with every feature. If your explanation is a path through source tables, the auditor can verify it directly.
  3. Adverse action compliance. For lending decisions under ECOA and FCRA, you must provide specific reasons for denial. "Model feature #47 exceeded threshold" is not a valid reason. "Applicant's debt-to-income ratio of 0.62 exceeded the 0.45 threshold, based on reported debts in credit bureau records" is a valid reason. The second requires traceability to source data.
  4. SAR narrative generation. For BSA/AML suspicious activity reports, you need a narrative explaining why the activity is suspicious. Path-based explanations map directly to SAR narratives because they describe the chain of connected entities and behaviors. SHAP values require manual translation.
  5. EU AI Act readiness. Starting in 2026, high-risk AI systems (including fraud detection and credit scoring) must provide transparency documentation and support human oversight. Path-based explanations are the most natural way to satisfy these requirements because they show the reasoning in terms a human reviewer can follow.

The accuracy advantage of explainable relational models

There is a common assumption that explainability comes at the cost of accuracy. That is true when you are choosing between a complex black-box model and a simple interpretable one on the same flat table. But when you move from flat-table models to relational models, you get both higher accuracy and deeper explainability.

accuracy_and_explainability

approachSAP_SALT_accuracyexplainability_level
LLM + AutoML63%Level 1 - Global feature importance
PhD Data Scientist + XGBoost75%Level 2 - SHAP values
KumoRFM (zero-shot)91%Level 3/4 - Path-based + cell-level

SAP SALT benchmark: KumoRFM achieves both higher accuracy (91% vs 75%) and deeper explainability (path-based vs SHAP-only). Relational models do not trade accuracy for explainability.

The 16-percentage-point accuracy gap between KumoRFM and XGBoost comes from the same relational patterns that enable path-based explanations. The model is more accurate because it reads the connections between entities. It is more explainable because it can trace predictions back through those same connections. The relational structure serves both purposes.

Frequently asked questions

How do I explain ML model predictions to business stakeholders?

Start by identifying what level of explanation the stakeholder needs. Executives typically want global feature importance: which factors drive the model overall. Operations teams need per-prediction explanations: why did this specific customer get flagged? Compliance teams need the deepest level: which specific data points and relationships produced this score, traceable back to source records. Most tools provide SHAP values, which show per-prediction feature weights. For relational data (multiple connected tables), you need path-based explanations that show which cross-table relationships drove the prediction. KumoRFM provides path-based explainability that traces predictions back through the relational graph to specific records.

Our compliance team requires we explain every fraud alert. Which tools support that?

Most ML tools provide SHAP-based explanations, which show how much each input feature contributed to a prediction. This satisfies basic requirements but breaks down when the real signal comes from relationships across tables, like a shared device connecting two accounts. For compliance under ECOA, FCRA, BSA/AML, GDPR, and the EU AI Act, you need to explain not just what features mattered, but which specific data records and relationships produced the alert. XGBoost with SHAP explains single-table feature contributions. DataRobot adds feature importance dashboards. KumoRFM provides path-based explanations that trace each prediction through the relational graph, showing which connected records (accounts, devices, transactions) contributed to the fraud score. This level of traceability is what auditors and regulators actually ask for.

What is the difference between SHAP values and path-based explainability?

SHAP values tell you how much each feature in a flat table contributed to a prediction. For example, SHAP might say transaction_amount contributed +0.3, merchant_category contributed +0.15, and time_of_day contributed +0.08 to a fraud score. This is useful but limited to features in a single row. Path-based explainability traces the prediction through connected data. Instead of just saying 'device_risk_score contributed +0.4,' a path-based explanation says 'this transaction was made from Device X, which was also used by Account Y, which had 3 confirmed fraud cases last month.' The path shows the reasoning chain across tables, not just the weight of a pre-computed feature.

Which regulations require explainable ML for fraud detection and lending?

Several regulations require that automated decisions be explainable. ECOA (Equal Credit Opportunity Act) requires specific adverse action reasons when a loan is denied. FCRA (Fair Credit Reporting Act) requires disclosure of factors that negatively affected a credit decision. BSA/AML (Bank Secrecy Act) requires that suspicious activity reports be supported by documented reasoning. GDPR Article 22 gives individuals the right to an explanation of automated decisions that significantly affect them. The EU AI Act classifies credit scoring and fraud detection as high-risk AI systems, requiring detailed documentation of how decisions are made. In all cases, 'the model said so' is not sufficient. You need to trace each decision back to specific data and reasoning.

Can XGBoost explain its fraud detection predictions?

Yes, but only at the single-table level. XGBoost supports SHAP values and feature importance, which explain how much each column in your flat feature table contributed to each prediction. If your feature table contains manually engineered columns like 'num_shared_devices' or 'avg_transaction_velocity_7d,' SHAP can tell you these features mattered. But it cannot tell you which specific devices were shared or which specific transactions drove the velocity calculation. For single-transaction fraud on tabular features, XGBoost's explanations are solid. For relational fraud patterns (fraud rings, shared-device networks), the explanations are only as good as the features you manually engineered, and they stop at the aggregate level.

How does KumoRFM explain predictions on relational data?

KumoRFM reads raw relational tables (accounts, transactions, devices, addresses) and automatically discovers predictive patterns across connected records. For each prediction, it provides path-based explanations that trace back through the relational graph. For a fraud alert, this might look like: 'This transaction scored 0.92 fraud probability because the account shares Device D-4412 with Account A-7719, which had 3 chargebacks in the last 30 days, and the shipping address matches Address AD-1182, linked to 5 other accounts flagged for synthetic identity fraud.' This level of detail is what compliance officers need to file SARs and what lending teams need for adverse action notices.

What are the four levels of ML explainability?

Level 1 is global feature importance: which features matter most across all predictions (useful for executives and model governance). Level 2 is per-prediction feature attribution, typically using SHAP values: which features mattered for this specific prediction (useful for operations teams). Level 3 is path-based explanations: which relationships across connected data tables drove this prediction (required for compliance on relational data). Level 4 is cell-level explanations: which specific data points (individual records, field values) had the most influence (required for audit trails and regulatory filings). Most tools stop at Level 2. KumoRFM provides Level 3 and Level 4 explanations natively because it operates on the full relational graph rather than a pre-flattened feature table.

Is explainability required for the EU AI Act?

Yes. The EU AI Act classifies credit scoring and fraud detection as high-risk AI systems under Annex III. Article 13 requires that high-risk AI systems be designed to be sufficiently transparent for users to interpret outputs and use them appropriately. Article 14 requires human oversight with the ability to understand the system's capacities and limitations. In practice, this means you need to document how your model makes decisions, provide per-decision explanations that a human can review, and maintain audit trails. Black-box models without explainability will face compliance challenges in the EU market starting in 2026.

See it in action

KumoRFM delivers predictions on relational data in seconds. No feature engineering, no ML pipelines. Try it free.