Duplicate Detection
“For each record in the CRM, is there a duplicate entry in the system?”
Book a demo and get a free trial of the full platform: data science agent, fine-tune capabilities, and forward-deployed engineer support.
By submitting, you accept the Terms and Privacy Policy.

Loved by data scientists, ML engineers & CXOs at

A real-world example
For each record in the CRM, is there a duplicate entry in the system?
Duplicate records inflate customer counts by 15-25%, skew analytics, and cause embarrassing double-outreach. Traditional dedup rules (exact email match) miss variations like "john@acme.com" vs "j.smith@acme-corp.com". Kumo detects duplicates through behavioral overlap — same purchasing patterns, shared addresses, overlapping device fingerprints. Each unmerged duplicate costs $10-30 annually in wasted marketing, and enterprises with millions of records face seven-figure losses.
How KumoRFM solves this
Relational intelligence for identity resolution
Kumo connects CRM records to their transactions, support interactions, and behavioral signals in a unified relational graph. Instead of comparing email strings, Kumo learns that Record R-101 and Record R-204 share the same purchasing cadence, contact the same support agents, and transact with the same merchants. The binary classifier predicts whether each record has a duplicate anywhere in the system — flagging matches that deterministic rules would never catch.
From data to predictions
See the full pipeline in action
Connect your tables, write a PQL query, and get predictions with built-in explainability — all in minutes, not months.
Your data
The relational tables Kumo learns from
RECORDS
| record_id | name | company | source | |
|---|---|---|---|---|
| R-101 | John Smith | john@acme.com | Acme Corp | website |
| R-204 | J. Smith | j.smith@acme-corp.com | ACME | trade show |
| R-350 | Maria Lopez | mlopez@bigco.io | BigCo Inc | referral |
MATCH_CANDIDATES
| match_id | record_id | candidate_id | similarity_score | timestamp |
|---|---|---|---|---|
| MC-001 | R-101 | R-204 | 0.82 | 2025-09-14 |
| MC-002 | R-350 | R-612 | 0.74 | 2025-09-14 |
| MC-003 | R-101 | R-550 | 0.45 | 2025-09-15 |
TRANSACTIONS
| txn_id | record_id | amount | timestamp |
|---|---|---|---|
| TXN-8001 | R-101 | $1,249.00 | 2025-09-10 |
| TXN-8002 | R-204 | $1,249.00 | 2025-09-10 |
| TXN-8003 | R-350 | $487.50 | 2025-09-12 |
Write your PQL query
Describe what to predict in 2–3 lines — Kumo handles the rest
PREDICT COUNT(MATCH_CANDIDATES.* WHERE MATCH_CANDIDATES.SIMILARITY_SCORE > 0.8, 0, 30, days) > 0 FOR EACH RECORDS.RECORD_ID
Prediction output
Every entity gets a score, updated continuously
| RECORD_ID | TIMESTAMP | TARGET_PRED | True_PROB |
|---|---|---|---|
| R-101 | 2025-10-01 | True | 0.96 |
| R-204 | 2025-10-01 | True | 0.96 |
| R-350 | 2025-10-01 | False | 0.18 |
Understand why
Every prediction includes feature attributions — no black boxes
Record R-101 (John Smith, Acme Corp)
Predicted: 96% probability of having a duplicate
Top contributing features
Transaction amount overlap with R-204
Exact match
32% attribution
Company name similarity
0.88
24% attribution
Phone number overlap
Same
20% attribution
Behavioral cadence similarity
0.91
14% attribution
Source channel difference
Different
10% attribution
Feature attributions are computed automatically for every prediction. No separate tooling required. Learn more about Kumo explainability
PQL Documentation
Learn the Predictive Query Language — SQL-like syntax for defining any prediction task in 2–3 lines.
Python SDK
Integrate Kumo predictions into your pipelines. Train, evaluate, and deploy models programmatically.
Explainability Docs
Understand feature attributions, model evaluation metrics, and how to build trust with stakeholders.
Bottom line: Eliminate 15-25% duplicate records from your CRM — correcting inflated customer counts, fixing attribution, and saving $1-5M annually in wasted outreach.
Related use cases
Explore more entity resolution use cases
Topics covered
One Platform. One Model. Predict Instantly.
KumoRFM
Relational Foundation Model
Turn structured relational data into predictions in seconds. KumoRFM delivers zero-shot predictions that rival months of traditional data science. No training, feature engineering, or infrastructure required. Just connect your data and start predicting.
For critical use cases, fine-tune KumoRFM on your data using the Kumo platform and Data Science Agent for 30%+ higher accuracy than traditional models.
Book a demo and get a free trial of the full platform: data science agent, fine-tune capabilities, and forward-deployed engineer support.




