Kumo Co-Founder Hema Raghavan Named to Inc.’s 2026 Female Founders 500

Learn more
2Binary Classification · Deduplication

Duplicate Detection

For each record in the CRM, is there a duplicate entry in the system?

Book a demo and get a free trial of the full platform: data science agent, fine-tune capabilities, and forward-deployed engineer support.

By submitting, you accept the Terms and Privacy Policy.

Loved by data scientists, ML engineers & CXOs at

Catalina Logo

A real-world example

For each record in the CRM, is there a duplicate entry in the system?

Duplicate records inflate customer counts by 15-25%, skew analytics, and cause embarrassing double-outreach. Traditional dedup rules (exact email match) miss variations like "john@acme.com" vs "j.smith@acme-corp.com". Kumo detects duplicates through behavioral overlap — same purchasing patterns, shared addresses, overlapping device fingerprints. Each unmerged duplicate costs $10-30 annually in wasted marketing, and enterprises with millions of records face seven-figure losses.

How KumoRFM solves this

Relational intelligence for identity resolution

Kumo connects CRM records to their transactions, support interactions, and behavioral signals in a unified relational graph. Instead of comparing email strings, Kumo learns that Record R-101 and Record R-204 share the same purchasing cadence, contact the same support agents, and transact with the same merchants. The binary classifier predicts whether each record has a duplicate anywhere in the system — flagging matches that deterministic rules would never catch.

From data to predictions

See the full pipeline in action

Connect your tables, write a PQL query, and get predictions with built-in explainability — all in minutes, not months.

1

Your data

The relational tables Kumo learns from

RECORDS

record_idnameemailcompanysource
R-101John Smithjohn@acme.comAcme Corpwebsite
R-204J. Smithj.smith@acme-corp.comACMEtrade show
R-350Maria Lopezmlopez@bigco.ioBigCo Increferral

MATCH_CANDIDATES

match_idrecord_idcandidate_idsimilarity_scoretimestamp
MC-001R-101R-2040.822025-09-14
MC-002R-350R-6120.742025-09-14
MC-003R-101R-5500.452025-09-15

TRANSACTIONS

txn_idrecord_idamounttimestamp
TXN-8001R-101$1,249.002025-09-10
TXN-8002R-204$1,249.002025-09-10
TXN-8003R-350$487.502025-09-12
2

Write your PQL query

Describe what to predict in 2–3 lines — Kumo handles the rest

PQL
PREDICT COUNT(MATCH_CANDIDATES.*
    WHERE MATCH_CANDIDATES.SIMILARITY_SCORE > 0.8,
    0, 30, days) > 0
FOR EACH RECORDS.RECORD_ID
3

Prediction output

Every entity gets a score, updated continuously

RECORD_IDTIMESTAMPTARGET_PREDTrue_PROB
R-1012025-10-01True0.96
R-2042025-10-01True0.96
R-3502025-10-01False0.18
4

Understand why

Every prediction includes feature attributions — no black boxes

Record R-101 (John Smith, Acme Corp)

Predicted: 96% probability of having a duplicate

Top contributing features

Transaction amount overlap with R-204

Exact match

32% attribution

Company name similarity

0.88

24% attribution

Phone number overlap

Same

20% attribution

Behavioral cadence similarity

0.91

14% attribution

Source channel difference

Different

10% attribution

Feature attributions are computed automatically for every prediction. No separate tooling required. Learn more about Kumo explainability

Bottom line: Eliminate 15-25% duplicate records from your CRM — correcting inflated customer counts, fixing attribution, and saving $1-5M annually in wasted outreach.

Topics covered

duplicate detection AICRM deduplication machine learningrecord deduplicationdata quality AIgraph-based deduplicationKumoRFMrelational deep learningpredictive query languagemaster data managementduplicate record detectionCRM data cleaningautomated deduplication

One Platform. One Model. Predict Instantly.

KumoRFM

Relational Foundation Model

Turn structured relational data into predictions in seconds. KumoRFM delivers zero-shot predictions that rival months of traditional data science. No training, feature engineering, or infrastructure required. Just connect your data and start predicting.

For critical use cases, fine-tune KumoRFM on your data using the Kumo platform and Data Science Agent for 30%+ higher accuracy than traditional models.

Book a demo and get a free trial of the full platform: data science agent, fine-tune capabilities, and forward-deployed engineer support.