1Link Prediction · Identity Matching

Identity Matching

“For each customer record, which other records in the database represent the same person?”

Book a demo and get a free trial of the full platform: data science agent, fine-tune capabilities, and forward-deployed engineer support.

By submitting, you accept the Terms and Privacy Policy.

Loved by data scientists, ML engineers & CXOs at

A real-world example

For each customer record, which other records in the database represent the same person?

Customer databases contain 15-25% duplicates on average. Fuzzy string matching catches only obvious cases (typos). Kumo uses behavioral and relational signals — shared devices, overlapping transactions, same IP addresses — to find matches that string-based methods miss entirely. A single unresolved identity costs $10-50 in wasted marketing spend per year, and at enterprise scale that adds up to millions in misattributed revenue and duplicated outreach.

How KumoRFM solves this

Relational intelligence for identity resolution

Kumo builds a relational graph connecting customers to their interactions, devices, transactions, and channels. Instead of comparing name strings, Kumo learns that Customer C001 and Customer C847 share the same device fingerprint, transact at the same merchants, and browse from overlapping IP ranges. These behavioral signals create a rich identity graph where matches emerge from structural similarity — not surface-level text overlap. The graph captures identity signals that rules-based systems cannot encode.

From data to predictions

See the full pipeline in action

Connect your tables, write a PQL query, and get predictions with built-in explainability — all in minutes, not months.

Your data

The relational tables Kumo learns from

CUSTOMERS

customer_id	name	email	phone	address
C001	John Smith	john@acme.com	555-0142	123 Oak St
C847	J. Smith	jsmith@gmail.com	555-0142	123 Oak Street
C302	Maria Lopez	mlopez@corp.io	555-0891	456 Elm Ave

INTERACTIONS

interaction_id	customer_id	channel	device_id	timestamp
INT-5001	C001	web	DEV-A1	2025-09-14 10:23
INT-5002	C847	mobile	DEV-A1	2025-09-14 14:05
INT-5003	C302	web	DEV-B7	2025-09-15 09:11

DEVICES

device_id	device_type	browser	os
DEV-A1	laptop	Chrome	macOS
DEV-B7	desktop	Firefox	Windows
DEV-C3	mobile	Safari	iOS

Write your PQL query

Describe what to predict in 2–3 lines — Kumo handles the rest

PQL

PREDICT LIST_DISTINCT(INTERACTIONS.CUSTOMER_ID, 0, 30, days)
FOR EACH CUSTOMERS.CUSTOMER_ID

Prediction output

Every entity gets a score, updated continuously

CUSTOMER_ID	MATCHED_CUSTOMER_ID	SCORE	TIMESTAMP
C001	C847	0.94	2025-10-01
C302	C918	0.87	2025-10-01
C455	C712	0.72	2025-10-01

Understand why

Every prediction includes feature attributions — no black boxes

Customer C001 (John Smith)

Predicted: 94% match with C847 (J. Smith)

Top contributing features

Shared device fingerprint (DEV-A1)

Same device

35% attribution

Phone number overlap

Exact match

25% attribution

Transaction merchant overlap

87%

20% attribution

IP address proximity

Same subnet

12% attribution

Address string similarity

0.91

8% attribution

Feature attributions are computed automatically for every prediction. No separate tooling required. Learn more about Kumo explainability

PQL Documentation

Learn the Predictive Query Language — SQL-like syntax for defining any prediction task in 2–3 lines.

Read docs

Python SDK

Integrate Kumo predictions into your pipelines. Train, evaluate, and deploy models programmatically.

Read docs

Explainability Docs

Understand feature attributions, model evaluation metrics, and how to build trust with stakeholders.

Read docs

Bottom line: Resolve 15-25% hidden duplicates in your customer database — recovering millions in misattributed revenue and eliminating embarrassing double-outreach.

Related use cases

Explore more entity resolution use cases

Use Case #2Duplicate DetectionLearn more

Use Case #4Household MappingLearn more

Use Case #6Cross-Device MatchingLearn more

Next#2 Duplicate Detection

Topics covered

identity matching AIcustomer identity resolutiongraph neural network identityentity resolution machine learningrecord matching AIKumoRFMrelational deep learningpredictive query languagecustomer deduplicationidentity graphcross-system identity matchingbehavioral identity resolution

From a leadership team with proven experience

Vanja Josifovski

CEO and Co-Founder, ex-CTO Airbnb, ex-CTO Pinterest

Jure Leskovec

Co-Founder & Chief Scientist, Stanford Professor

Hema Raghavan

Co-Founder & Head of Engineering, ex-AI Lead, LinkedIn

One Platform. One Model. Predict Instantly.

KumoRFM

Relational Foundation Model

Turn structured relational data into predictions in seconds. KumoRFM delivers zero-shot predictions that rival months of traditional data science. No training, feature engineering, or infrastructure required. Just connect your data and start predicting.

For critical use cases, fine-tune KumoRFM on your data using the Kumo platform and Data Science Agent for 30%+ higher accuracy than traditional models.

Book a demo and get a free trial of the full platform: data science agent, fine-tune capabilities, and forward-deployed engineer support.

Book a Demo Try Free