Identity Matching
“For each customer record, which other records in the database represent the same person?”
Book a demo and get a free trial of the full platform: data science agent, fine-tune capabilities, and forward-deployed engineer support.
By submitting, you accept the Terms and Privacy Policy.

Loved by data scientists, ML engineers & CXOs at

A real-world example
For each customer record, which other records in the database represent the same person?
Customer databases contain 15-25% duplicates on average. Fuzzy string matching catches only obvious cases (typos). Kumo uses behavioral and relational signals — shared devices, overlapping transactions, same IP addresses — to find matches that string-based methods miss entirely. A single unresolved identity costs $10-50 in wasted marketing spend per year, and at enterprise scale that adds up to millions in misattributed revenue and duplicated outreach.
How KumoRFM solves this
Relational intelligence for identity resolution
Kumo builds a relational graph connecting customers to their interactions, devices, transactions, and channels. Instead of comparing name strings, Kumo learns that Customer C001 and Customer C847 share the same device fingerprint, transact at the same merchants, and browse from overlapping IP ranges. These behavioral signals create a rich identity graph where matches emerge from structural similarity — not surface-level text overlap. The graph captures identity signals that rules-based systems cannot encode.
From data to predictions
See the full pipeline in action
Connect your tables, write a PQL query, and get predictions with built-in explainability — all in minutes, not months.
Your data
The relational tables Kumo learns from
CUSTOMERS
| customer_id | name | phone | address | |
|---|---|---|---|---|
| C001 | John Smith | john@acme.com | 555-0142 | 123 Oak St |
| C847 | J. Smith | jsmith@gmail.com | 555-0142 | 123 Oak Street |
| C302 | Maria Lopez | mlopez@corp.io | 555-0891 | 456 Elm Ave |
INTERACTIONS
| interaction_id | customer_id | channel | device_id | timestamp |
|---|---|---|---|---|
| INT-5001 | C001 | web | DEV-A1 | 2025-09-14 10:23 |
| INT-5002 | C847 | mobile | DEV-A1 | 2025-09-14 14:05 |
| INT-5003 | C302 | web | DEV-B7 | 2025-09-15 09:11 |
DEVICES
| device_id | device_type | browser | os |
|---|---|---|---|
| DEV-A1 | laptop | Chrome | macOS |
| DEV-B7 | desktop | Firefox | Windows |
| DEV-C3 | mobile | Safari | iOS |
Write your PQL query
Describe what to predict in 2–3 lines — Kumo handles the rest
PREDICT LIST_DISTINCT(INTERACTIONS.CUSTOMER_ID, 0, 30, days) FOR EACH CUSTOMERS.CUSTOMER_ID
Prediction output
Every entity gets a score, updated continuously
| CUSTOMER_ID | MATCHED_CUSTOMER_ID | SCORE | TIMESTAMP |
|---|---|---|---|
| C001 | C847 | 0.94 | 2025-10-01 |
| C302 | C918 | 0.87 | 2025-10-01 |
| C455 | C712 | 0.72 | 2025-10-01 |
Understand why
Every prediction includes feature attributions — no black boxes
Customer C001 (John Smith)
Predicted: 94% match with C847 (J. Smith)
Top contributing features
Shared device fingerprint (DEV-A1)
Same device
35% attribution
Phone number overlap
Exact match
25% attribution
Transaction merchant overlap
87%
20% attribution
IP address proximity
Same subnet
12% attribution
Address string similarity
0.91
8% attribution
Feature attributions are computed automatically for every prediction. No separate tooling required. Learn more about Kumo explainability
PQL Documentation
Learn the Predictive Query Language — SQL-like syntax for defining any prediction task in 2–3 lines.
Python SDK
Integrate Kumo predictions into your pipelines. Train, evaluate, and deploy models programmatically.
Explainability Docs
Understand feature attributions, model evaluation metrics, and how to build trust with stakeholders.
Bottom line: Resolve 15-25% hidden duplicates in your customer database — recovering millions in misattributed revenue and eliminating embarrassing double-outreach.
Related use cases
Explore more entity resolution use cases
Topics covered
One Platform. One Model. Predict Instantly.
KumoRFM
Relational Foundation Model
Turn structured relational data into predictions in seconds. KumoRFM delivers zero-shot predictions that rival months of traditional data science. No training, feature engineering, or infrastructure required. Just connect your data and start predicting.
For critical use cases, fine-tune KumoRFM on your data using the Kumo platform and Data Science Agent for 30%+ higher accuracy than traditional models.
Book a demo and get a free trial of the full platform: data science agent, fine-tune capabilities, and forward-deployed engineer support.




