Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
Learn12 min read

XGBoost vs Random Forest vs GNN for Fraud Detection: Which Algorithm Wins?

XGBoost is the default choice for tabular fraud detection, and for good reason. But fraud rings are 6-7 hops deep in transaction graphs. No amount of feature engineering on a flat table will recover those signals. Here is when each algorithm works, when it breaks down, and how to get the best of both.

TL;DR

  • 1XGBoost is the strongest single-transaction fraud detector for tabular data. It handles missing values natively, trains fast, and dominates Kaggle fraud benchmarks. Random forest is more robust to label noise but slightly less accurate on clean datasets.
  • 2Graph neural networks (GNNs) catch fraud that XGBoost structurally cannot see: shared-device rings, synthetic identity networks, coordinated account takeovers. Fraud rings hide in the connections between entities, not in any single transaction row.
  • 3The org chart analogy: flattening fraud network data into a flat table is like flattening an org chart into a list of names. You lose who reports to whom, who is connected to whom - which is the whole point of fraud ring detection.
  • 4On the SAP SALT enterprise benchmark, KumoRFM scores 91% accuracy vs 75% for PhD data scientists with XGBoost and 63% for LLM+AutoML. On RelBench, KumoRFM zero-shot achieves 76.71 AUROC vs 62.44 for LightGBM with manual features.
  • 5KumoRFM 2.0 handles both single-table and multi-table predictions, so you can start with XGBoost-style tabular fraud scoring and upgrade to graph-based ring detection without rebuilding your pipeline.

If you search "best algorithm for fraud detection," you will get a dozen blog posts that all say the same thing: XGBoost. And they are not wrong - for a specific type of fraud. XGBoost is excellent at scoring individual transactions against tabular features like amount, time of day, merchant category, and velocity counts. It has earned its reputation.

But here is what those posts leave out: the fraud that is actually growing fastest - organized rings, synthetic identity networks, coordinated account takeovers - produces signals that do not live in any single transaction row. The signal is in the connections between accounts. And XGBoost cannot read connections. Neither can random forest, LightGBM, or any model that takes a flat table as input.

This is not a theoretical gap. It is a structural one. And understanding it changes how you architect fraud detection.

The three algorithms, compared directly

Before getting into the details, here is a head-to-head comparison across the dimensions that matter most for fraud detection teams.

xgboost_vs_random_forest_vs_gnn_fraud_detection

dimensionXGBoostRandom ForestGraph Neural Network (GNN)
Input formatFlat table (one row per transaction)Flat table (one row per transaction)Graph of connected entities (accounts, devices, addresses, transactions)
Best fraud typeIndividual anomalous transactionsIndividual anomalous transactionsOrganized rings, synthetic identity networks, coordinated attacks
Handles missing valuesYes - nativelyRequires imputationDepends on implementation
Training speedFast (minutes to hours)Fast (minutes to hours)Slower (hours to days for custom GNNs); seconds for KumoRFM zero-shot
InterpretabilityHigh - SHAP values, feature importanceHigh - feature importance, decision pathsModerate - attention weights, subgraph explanations
Can detect shared-device fraud ringsNo - cannot see cross-entity connectionsNo - cannot see cross-entity connectionsYes - reads device-account-transaction graph directly
Multi-hop pattern detectionNo - limited to single-row featuresNo - limited to single-row featuresYes - propagates signals 6-7+ hops across entity graph
Feature engineering requiredHeavy - velocity features, aggregations, time windowsHeavy - same as XGBoostMinimal with foundation model approach; heavy with custom GNN
False positive rateModerate - good on known patterns, blind to ring contextModerate to high - ensemble averaging can over-triggerLower for organized fraud - graph context reduces false alarms
Production maturityVery high - industry standard since 2016Very high - established since 2001Growing - major banks deploying since 2022
ScalabilityExcellent - handles billions of rowsGood - memory-heavy on large datasetsDepends on graph size; managed platforms handle enterprise scale

Head-to-head comparison across 11 dimensions. XGBoost and random forest dominate single-transaction detection. GNNs dominate organized fraud and ring detection. The two approaches are complementary, not competing.

How each algorithm works for fraud detection

XGBoost

XGBoost has been the go-to fraud detection algorithm at banks, fintechs, and payment processors for nearly a decade. It builds an ensemble of decision trees sequentially, where each new tree corrects errors from the previous ones. For single-transaction fraud - a stolen credit card used at an unusual merchant, a suspiciously large wire transfer, an account login from a new country - XGBoost is hard to beat. It handles missing values natively (common in fraud data), trains quickly on large datasets, and produces well-calibrated probability scores that fraud operations teams can threshold and act on.

  • Best for: Individual transaction scoring on tabular features (amount, velocity, merchant category, time of day). Production-proven at scale with fast training and inference.
  • Watch out for: Cannot see connections between entities. Blind to fraud rings, shared-device networks, and coordinated attacks because it treats each transaction as an independent row.

Random Forest

Random forest builds hundreds of independent decision trees on random subsets of the data and averages their predictions. For fraud detection, it is a solid alternative to XGBoost. It is more robust to noisy fraud labels (a real problem since fraud labels are often delayed or incomplete) and less prone to overfitting on small datasets. But on clean, well-labeled fraud datasets, XGBoost consistently edges it out by 1-3% accuracy. That is why XGBoost became the industry default.

  • Best for: Fraud detection on noisy or incomplete labels, smaller datasets, and situations where model stability matters more than squeezing out maximum accuracy.
  • Watch out for: Same structural limitation as XGBoost - cannot read connections between entities. Also more memory-heavy on large datasets and slightly less accurate than XGBoost on clean data.

Graph Neural Networks (GNNs)

A graph neural network models entities (accounts, devices, addresses, transactions) as nodes and their relationships as edges, then propagates fraud signals across those connections. GNNs catch organized fraud that flat-table models structurally cannot see: shared-device rings, synthetic identity networks, money mule chains, and coordinated account takeovers. Real-world fraud rings are typically 6-7 hops deep in entity graphs, and GNNs traverse these paths automatically rather than requiring manual feature engineering at each hop.

  • Best for: Organized fraud and ring detection where the signal is in the connections between entities - synthetic identities, device-sharing clusters, money mule chains, coordinated account takeovers.
  • Watch out for: Custom GNNs require graph infrastructure and longer training times (hours to days). Interpretability is moderate compared to tree-based models. KumoRFM eliminates the infrastructure burden by handling graph construction automatically from raw relational tables.

The org chart problem: why flat tables fail on fraud rings

Here is the simplest way to understand the limitation. Imagine your company's org chart - a tree of who reports to whom, which teams collaborate, which departments share resources. Now flatten that into a spreadsheet: one row per employee, columns for name, title, salary, and department.

You have lost the structure. You cannot tell who reports to whom. You cannot see which teams are connected. You cannot identify the VP whose entire department is underperforming. The list of names contains the same people, but the relationships that give the org chart its meaning are gone.

This is exactly what happens when you flatten fraud network data into a feature table for XGBoost. You start with a rich graph of connections - Account A shares a device with Account B, which shares a shipping address with Account C, which used the same phone number as Account D. You flatten it into per-transaction rows with aggregate columns like num_shared_devices = 2 and address_reuse_count = 3.

Those aggregate counts capture some signal. But they destroy the topology. XGBoost sees that an account has 2 shared devices. It cannot see that those 2 shared devices connect to 47 other accounts in a pattern that matches known fraud rings. The number tells you something. The graph tells you everything.

What fraud rings look like in practice

A fraud ring is not one bad actor with a stolen credit card. It is an organized operation where multiple accounts - sometimes hundreds - coordinate to exploit a system. Here are the patterns that GNNs catch and flat-table models miss:

  1. Synthetic identity rings. Fraudsters create fake identities by combining real Social Security numbers (often from children or deceased individuals) with fabricated names and addresses. They build credit over months, then "bust out" - maxing out all credit lines simultaneously. Each individual account looks legitimate. The ring pattern (shared address fragments, similar application timing, connected credit inquiries) is only visible in the graph.
  2. Device-sharing networks. Twenty accounts that all logged in from the same three devices within a 48-hour window. XGBoost sees each account individually and might flag the device fingerprint if it was manually engineered as a feature. A GNN sees the full cluster of 20 accounts connected through 3 device nodes and recognizes the coordinated pattern.
  3. Money mule chains. Stolen funds move through a chain of accounts: Account A sends to B, B splits to C and D, C and D send to E, E withdraws. Each individual transfer might be below reporting thresholds. The chain structure - visible only in the transaction graph - is the signal.
  4. Account takeover clusters. Compromised credentials are sold in batches. The accounts taken over in a single batch share behavioral fingerprints: similar login timing, same credential-testing patterns, similar first actions post-takeover. A GNN detects these temporal and behavioral clusters across the account graph.
  5. Return fraud rings. Groups that coordinate return abuse across multiple accounts and store locations, staying below individual detection thresholds. The graph reveals the shared addresses, payment methods, and timing patterns that connect them.

Why you cannot just engineer more features

The most common response from XGBoost-trained teams is: "We will just add graph-derived features to our flat table." Compute some graph metrics - degree centrality, PageRank, community detection scores - flatten them into columns, and feed them to XGBoost.

This helps. It typically adds 3-5% accuracy. But it has hard limits:

  • Static snapshots. Graph metrics computed offline are stale by the time XGBoost uses them. Fraud rings evolve hourly. A pre-computed PageRank score from yesterday's batch run misses the ring that formed this morning.
  • Fixed hop distance. Manual features capture 1-2 hops at most (direct neighbors, maybe neighbors of neighbors). Real fraud rings are 6-7 hops deep. Engineering features at that depth creates a combinatorial explosion that is not practical to maintain.
  • Lost subgraph structure. Flattening a ring's topology into a single number (like a centrality score) loses the shape of the ring. Two accounts can have identical centrality scores but completely different fraud risk because of how their neighborhoods are structured. The GNN sees the structure. The aggregate column does not.
  • Engineering cost. Each graph feature requires custom pipeline code: extract the graph, compute the metric, join it back to the transaction table, handle temporal windowing, maintain the pipeline as the graph schema changes. For teams already spending 12+ hours on feature engineering per task, adding graph features doubles the complexity.

The benchmark evidence

The SAP SALT benchmark tests prediction accuracy on real enterprise relational data - the kind of multi-table structure where fraud data actually lives. Here is how the approaches compare:

sap_salt_benchmark_fraud_relevant

approachaccuracywhat_it_means
LLM + AutoML63%Language model generates features, AutoML selects model
PhD Data Scientist + XGBoost75%Expert spends weeks hand-crafting features, tunes XGBoost
KumoRFM (zero-shot)91%No feature engineering, no training, reads relational tables directly

SAP SALT benchmark: KumoRFM outperforms expert-tuned XGBoost by 16 percentage points. The gap comes from relational patterns that a flat feature table structurally cannot contain.

On the RelBench benchmark across 7 databases and 30 prediction tasks:

relbench_benchmark_results

approachAUROCfeature_engineering_time
LightGBM + manual features62.4412.3 hours per task
KumoRFM zero-shot76.71~1 second
KumoRFM fine-tuned81.14Minutes

KumoRFM zero-shot outperforms manually engineered LightGBM by 14+ AUROC points. Fine-tuning pushes the gap to nearly 19 points.

A practical migration path: tabular first, graph when ready

This is not an all-or-nothing decision. Most fraud teams already have XGBoost models in production, and ripping them out is not realistic. The smart path is layered:

  1. Keep your XGBoost models running. They catch single-transaction fraud well. Do not break what works.
  2. Add graph-based scoring as a second layer. Use a GNN or relational foundation model to score entities based on their graph neighborhood. This catches the ring patterns your XGBoost models miss.
  3. Combine scores in your decisioning layer. Weight tabular and graph scores based on fraud type. Individual card fraud leans on XGBoost. Organized ring patterns lean on the graph score.
  4. Gradually shift to a unified model. KumoRFM 2.0 supports both single-table predictions (similar to what XGBoost does on tabular features) and multi-table relational predictions (graph-based). Over time, you can consolidate into a single platform that handles both fraud types without maintaining two separate pipelines.

XGBoost-only fraud detection

  • Engineer tabular features: velocity, amount, time, merchant category (8-12 hours)
  • Train XGBoost on single-transaction features
  • Catches individually anomalous transactions
  • Blind to fraud rings, shared-device networks, and coordinated attacks
  • Manually add graph-derived features for partial ring detection (+3-5% accuracy)
  • Maintain two pipelines: tabular features + graph features

KumoRFM fraud detection

  • Connect to data warehouse - accounts, transactions, devices, addresses
  • Write PQL: PREDICT is_fraud FOR EACH transactions.transaction_id
  • Model reads all tables and discovers both tabular and relational fraud signals
  • Catches single-transaction fraud AND fraud rings in one pass
  • Zero feature engineering, zero graph construction, zero pipeline code
  • One platform, one query, both fraud types

PQL Query

PREDICT is_fraud
FOR EACH transactions.transaction_id
WHERE transactions.amount > 50

One PQL query replaces the full fraud detection pipeline: tabular feature engineering, graph construction, model training, and scoring. KumoRFM reads raw accounts, transactions, devices, and address tables directly and discovers both single-transaction and ring-based fraud patterns.

Output

transaction_idfraud_prob_kumofraud_prob_xgboostwhy_kumo_differs
TXN-88210.940.91Both flag - high amount, new merchant (tabular signal)
TXN-88220.880.23Kumo detects shared-device ring (7 accounts, 2 devices)
TXN-88230.910.18Kumo sees money mule chain (4 hops to known fraud account)
TXN-88240.050.41Kumo correctly lower - graph context shows legitimate business pattern

Why KumoRFM handles both worlds

Most fraud teams face a bad choice: stick with XGBoost and miss rings, or build a custom GNN pipeline and deal with months of graph infrastructure work. KumoRFM removes this tradeoff.

KumoRFM is a relational foundation model. It reads raw relational tables - accounts, transactions, devices, addresses, merchants - and automatically constructs the heterogeneous graph that connects them. It then discovers predictive patterns across both individual entity features (what XGBoost sees) and multi-hop relational structure (what only a GNN can see).

The key advantage: you do not need to choose between tabular and graph. You do not need to build a graph database, write graph queries, or maintain graph infrastructure. You point KumoRFM at your existing data warehouse tables and write a PQL query. The model figures out which patterns - tabular, relational, or both - predict fraud for your specific data.

When to use each algorithm

Here is the direct guidance, based on fraud type:

algorithm_recommendation_by_fraud_type

fraud_typebest_algorithmwhy
Card-not-present fraudXGBoost or KumoRFMStrong tabular signals (amount, merchant, velocity). XGBoost handles well. KumoRFM adds device-graph context.
Account takeover (individual)XGBoost or KumoRFMLogin anomalies, behavioral shifts. Tabular features capture most signal.
Account takeover (coordinated batch)GNN or KumoRFMCredential batches produce temporal clusters across account graph. Tabular models miss the coordination.
Synthetic identity fraudGNN or KumoRFMShared SSN fragments, address similarities, application timing. Ring structure is the primary signal.
Money mule networksGNN or KumoRFMChain-of-transfer patterns across 4-7 hops. Invisible to flat-table models.
Return fraud ringsGNN or KumoRFMCoordinated returns across accounts, stores, and payment methods. Connection patterns are the signal.
First-party fraudXGBoost or KumoRFMBehavioral patterns of the account holder. Tabular features capture most signal.
Bust-out fraudGNN or KumoRFMAccounts that build credit and default together share hidden connections. Graph reveals the coordination.

Recommendation by fraud type. XGBoost works for individual-level fraud. GNNs are required for organized and coordinated fraud. KumoRFM handles both in a single platform.

Frequently asked questions

Which algorithm is best for fraud detection - XGBoost, random forest, or graph neural networks?

It depends on the fraud type. XGBoost is the strongest single-transaction fraud detector for tabular data - it handles missing values well, trains fast, and produces high-accuracy scores on per-transaction features like amount, time, and merchant category. Random forest is more robust to noisy labels but slightly less accurate than XGBoost on clean datasets. Graph neural networks (GNNs) are the best choice for organized fraud and fraud ring detection because they read the connections between accounts, devices, and transactions - patterns that XGBoost and random forest structurally cannot see. The ideal approach uses both: XGBoost-style scoring for single-transaction signals and GNN-based scoring for relational and ring-based fraud. KumoRFM 2.0 supports both single-table and multi-table predictions, so you can start with tabular fraud scoring and upgrade to graph-based detection without rebuilding your pipeline.

How do GNNs detect fraud rings that XGBoost misses?

A fraud ring is a group of accounts that coordinate to commit fraud - sharing devices, shipping addresses, payment methods, or behavioral patterns. XGBoost sees each transaction as an independent row in a table. It cannot see that Account A shares a device with Account B, which shares a shipping address with Account C, which shares a phone number with Account D. A GNN reads the graph of connections directly and propagates fraud signals across these relationships, detecting coordinated patterns even when each individual transaction looks normal. Fraud rings are typically 6-7 hops deep in real-world transaction graphs, well beyond what manual feature engineering can capture.

Can graph neural networks detect fraud rings?

Yes. Fraud ring detection is one of the strongest use cases for GNNs. A GNN maps accounts, devices, transactions, and addresses as nodes in a graph and learns which connection patterns indicate coordinated fraud. Unlike rule-based link analysis, GNNs learn these patterns from labeled data and generalize to new ring structures they have not seen before. KumoRFM reads raw relational tables (accounts, transactions, devices, addresses) and automatically discovers ring-like patterns without any manual graph construction or feature engineering.

What is a fraud ring?

A fraud ring is an organized group of individuals or synthetic identities that coordinate to commit fraud at scale. Members share resources like devices, IP addresses, shipping addresses, phone numbers, or payment credentials. Common examples include synthetic identity fraud rings (fabricated identities used to build credit and then default), account takeover rings (compromised credentials tested and sold in bulk), and return fraud rings (coordinated return abuse across multiple accounts). Individual transactions from a fraud ring often look legitimate in isolation. The fraud signal is in the connections between accounts, which is why graph-based detection outperforms transaction-level models.

How does KumoRFM detect fraud?

KumoRFM is a relational foundation model that reads raw relational tables directly - accounts, transactions, devices, addresses, merchants - without requiring any feature engineering or manual graph construction. For fraud detection, you write a PQL (Predictive Query Language) query like PREDICT is_fraud FOR EACH transactions.transaction_id, and KumoRFM automatically discovers predictive patterns across all connected tables. This includes both single-transaction signals (amount, time, merchant) and multi-hop relational signals (shared devices, address clusters, behavioral similarity to known fraud). On the SAP SALT enterprise benchmark, KumoRFM achieves 91% accuracy vs 75% for PhD data scientists with XGBoost.

Do I need a data science team to build fraud detection with ML?

With traditional approaches (XGBoost, random forest, or custom GNNs), yes. You need data scientists to engineer features, build and tune models, validate results, and maintain pipelines. Feature engineering alone averages 12.3 hours and 878 lines of code per prediction task. With KumoRFM, you do not need a dedicated data science team for fraud detection. A single ML engineer or analyst can write a PQL query, and the foundation model handles feature discovery, model training, and inference automatically. This reduces the team requirement from 3-4 FTEs to 0.5 FTE for most fraud detection use cases.

Is XGBoost still good for fraud detection in 2026?

XGBoost remains an excellent algorithm for single-transaction fraud detection on tabular data. It is fast, well-understood, and handles the kind of structured features (transaction amount, time of day, merchant category, velocity counts) that characterize individual fraudulent transactions. Where XGBoost falls short is organized fraud. Fraud rings, synthetic identity networks, and coordinated account takeover attacks produce signals that live in the connections between entities, not in any single transaction row. For these patterns, you need a graph-based approach. The practical answer: keep XGBoost for what it does well, and add graph-based detection for the fraud types it cannot see.

What is the difference between tabular fraud detection and graph-based fraud detection?

Tabular fraud detection (XGBoost, random forest, logistic regression) treats each transaction as an independent row with features like amount, time, location, and velocity. It excels at catching individually anomalous transactions. Graph-based fraud detection (GNNs, graph analytics) models the relationships between entities - accounts sharing devices, addresses linked to multiple identities, transaction chains between suspicious accounts. It excels at catching coordinated fraud where individual transactions look normal but the pattern of connections reveals organized criminal activity. The two approaches are complementary: tabular models catch the lone bad actor, graph models catch the organized ring.

See it in action

KumoRFM delivers predictions on relational data in seconds. No feature engineering, no ML pipelines. Try it free.