The headline result: SAP SALT benchmark
Before comparing individual tools, here is the result that matters most. The SAP SALT benchmark is an enterprise-grade evaluation where real business analysts and data scientists attempt prediction tasks on SAP enterprise data. It measures how accurately different approaches predict real business outcomes on production-quality enterprise databases with multiple related tables.
sap_salt_enterprise_benchmark
| approach | accuracy | what_it_means |
|---|---|---|
| LLM + AutoML | 63% | Language model generates features, AutoML selects model |
| PhD Data Scientist + XGBoost | 75% | Expert spends weeks hand-crafting features, tunes XGBoost |
| KumoRFM (zero-shot) | 91% | No feature engineering, no training, reads relational tables directly |
SAP SALT benchmark: KumoRFM outperforms expert data scientists by 16 percentage points and LLM+AutoML by 28 percentage points on real enterprise prediction tasks.
KumoRFM scores 91% where PhD-level data scientists with weeks of feature engineering and hand-tuned XGBoost score 75%. The 16 percentage point gap is the value of reading relational data natively instead of flattening it into a single table.
Why lead scoring is still broken in most B2B enterprises
Every B2B sales team has a lead scoring model. Most of them work the same way: assign points for firmographic attributes (company size = 500+, job title contains "VP", industry = financial services) and activity signals (downloaded whitepaper = +10, visited pricing page = +20, opened email = +5). Leads above a threshold get routed to sales.
These models catch the obvious leads: the VP at a Fortune 500 company who visited the pricing page and requested a demo. But they miss the signals that actually differentiate buyers from browsers. And in B2B enterprise, where deal cycles are long and conversion rates are low, the missed signals are where the pipeline lives.
The problem is not that rules are wrong - company size and job title do correlate with conversion. The problem is that these signals are necessary but not sufficient. They describe who the lead is, not what the lead is doing, and certainly not what the people around the lead are doing.
Two signals rule-based scoring cannot see
The colleague signal
In B2B enterprise, purchases are rarely isolated decisions. When one person at a company buys your product, the probability that their colleagues will also buy jumps 3-5x. This is not a guess - it is a measurable pattern across B2B datasets. The reason is straightforward: an existing purchase means proven budget, established vendor relationship, internal advocacy, and reduced procurement friction.
But this signal only exists in the relational graph. To see it, a model needs to traverse: lead → company → other leads at that company → their purchase/order history. That is a multi-hop path across at least three tables (leads, companies, orders). No flat feature table contains this signal, because flattening destroys the relationship structure.
You could manually engineer a feature like colleagues_who_purchased - but then you need to do the same for every possible relational pattern (colleagues who attended a webinar, colleagues who opened a support ticket, colleagues in the same department who engaged with a case study). The combinatorial explosion makes manual feature engineering impractical for relational signals.
The content progression signal
A lead who visited 22 pages is not necessarily more likely to convert than a lead who visited 8. What matters is which pages, in what order. The B2B buying journey has a recognizable progression:
- Awareness: Blog posts, thought leadership content
- Evaluation: Case studies, comparison pages, ROI calculators
- Technical validation: API documentation, integration guides, security whitepapers
- Purchase intent: Pricing page, demo request, contact sales
A lead who follows this progression - blog post, then case study, then API docs, then demo request - is exhibiting a buying pattern. A lead who visited 22 blog posts but never looked at a case study or pricing page is researching, not buying.
This sequence information lives in the activities table. But when you flatten it to pages_viewed = 22 or content_downloads = 3, the progression signal is destroyed. You cannot reconstruct it from aggregate counts.
The 7 best AI lead scoring tools, compared
lead_scoring_tool_comparison
| Tool | Approach | Data Sources | Multi-Table Native | Handles Content Progression | Explainability | Best For |
|---|---|---|---|---|---|---|
| Kumo.ai | Multi-table relational GNN | CRM + product usage + support + billing + content | Yes - reads relational tables directly | Yes - learns sequences from activity table | PQL queries + feature importance | Enterprise teams with complex relational data |
| Salesforce Einstein | ML on CRM data | Salesforce CRM objects | No - CRM objects only | Limited - counts, not sequences | Score factors breakdown | Teams fully embedded in Salesforce |
| 6sense | Intent data + account ID | Third-party intent + web + CRM | No - aggregated intent signals | No - intent topics, not sequences | Intent topic + buying stage | ABM teams targeting anonymous buyers |
| ZoomInfo | Firmographic + intent | Firmographic database + intent signals | No - enrichment layer | No | Data attribute breakdown | Prospecting and list building |
| HubSpot | Rules-based + predictive scoring | HubSpot CRM + marketing data | No - single CRM view | Limited - rule-assigned points | Score property breakdown | SMB to mid-market HubSpot users |
| MadKudu | Product-led growth scoring | Product usage + CRM + billing | Partial - integrates multiple sources but flattens | Limited - tracks events, not learned sequences | Score breakdown + segment rules | PLG companies with freemium/trial motions |
| DataRobot | AutoML on flat feature table | Pre-engineered features from any source | No - requires flat feature table | No - features are pre-aggregated | SHAP, partial dependence | Data science teams wanting model automation |
Highlighted: Kumo.ai is the only tool that reads multi-table relational data natively and learns content progression patterns from raw activity sequences. All other tools require flattening data into a single table or operate on limited CRM data, which structurally cannot represent colleague signals or content progressions.
1. Kumo.ai - relational lead scoring from multi-table data
Kumo.ai takes a fundamentally different approach to lead scoring. Instead of requiring a pre-built feature table or operating within a single CRM, it connects directly to your relational data warehouse and reads the raw tables: CRM records, product usage logs, support tickets, billing history, and content interaction sequences.
The system represents your data as a temporal heterogeneous graph. Each lead, each company, each content interaction, each support ticket, each order becomes a node. Foreign key relationships become edges. The graph neural network traverses this structure, automatically discovering which cross-table patterns are predictive of conversion.
Why the relational approach matters for lead scoring
Consider a concrete example. Lead Sarah at a mid-market SaaS company shows these signals:
- She visited 8 pages over 3 weeks, following a clear progression: blog post →case study →API documentation →pricing page (activities table)
- Two of her colleagues at the same company already purchased last quarter (leads →companies →other leads →orders)
- She opened a support inquiry about enterprise integration before even becoming a customer (support table)
Each signal individually is moderate. Content engagement with 8 pages is not particularly high. A support inquiry might mean curiosity, not intent. Colleague purchases at the same company could be coincidence.
But together, in the relational graph, these signals form a clear buying pattern. The content progression shows she moved from awareness to technical evaluation. The colleague signal confirms budget and vendor relationship exist. The support inquiry about integration shows she is already planning implementation. Kumo.ai's GNN sees this full picture and assigns Sarah a 91% conversion probability. A rule-based system scoring her 8 page views and VP title might give her a score of 45 out of 100 - below the threshold for sales routing.
PQL for lead scoring with backward window
One of the most powerful techniques in Kumo.ai's PQL (Predictive Query Language) is the backward window, which ensures scoring focuses on leads with recent activity rather than stale contacts who will never respond.
PQL Query
PREDICT COUNT(ORDERS.*, 0, 30, days) > 0 FOR EACH LEADS.LEAD_ID WHERE COUNT(ACTIVITIES.*, -60, 0, days) > 0
This query predicts which leads will place an order in the next 30 days, but only for leads who had at least one activity (page view, email open, content download) in the previous 60 days. The WHERE clause is the backward window - it filters out stale leads who have not engaged recently, focusing the model on leads who are actively in a buying cycle. This eliminates the noise from dead contacts that inflates accuracy in naive scoring models.
Output
| lead_id | conversion_prob | content_progression | colleague_purchases | activity_last_60d |
|---|---|---|---|---|
| L-7201 (Sarah) | 0.91 | Blog->Case study->API docs->Pricing | 2 colleagues purchased | 8 activities |
| L-7202 (James) | 0.34 | Blog->Blog->Blog | 0 | 22 activities |
| L-7203 (Priya) | 0.78 | Case study->API docs->Demo request | 1 colleague purchased | 5 activities |
| L-7204 (Marcus) | 0.15 | Blog only | 0 | 2 activities |
Notice that James has 22 activities but a low conversion probability. A rule-based system would score him highly for engagement volume. But Kumo.ai sees that his content pattern is Blog →Blog →Blog - he is reading, not buying. No colleague purchases, no progression toward technical content or pricing. Meanwhile, Priya has only 5 activities but a high conversion probability because her progression pattern and colleague signal both indicate buying intent.
2. Salesforce Einstein - CRM-native scoring
Salesforce Einstein Lead Scoring is built directly into the Salesforce platform. It analyzes historical CRM data - lead fields, activity history, opportunity outcomes - and builds a predictive model that scores new leads based on patterns in your existing conversion data.
Strengths: Zero setup friction for Salesforce customers. Scores appear directly on lead records, integrating seamlessly into existing sales workflows. The model automatically retrains as new conversion data accumulates. Sales reps see scoring factors explaining why each lead received its score.
Limitations: Operates only on Salesforce CRM objects. Cannot ingest product usage data, support tickets, or raw content interaction sequences from a data warehouse. The scoring model sees CRM fields and activity counts but not the relational structure between leads, companies, and their broader engagement patterns. Requires Salesforce ecosystem - not an option if your CRM is elsewhere.
3. 6sense - intent data and account identification
6sense focuses on identifying anonymous buying intent at the account level. It combines third-party intent data (what topics accounts are researching across the web), web visitor identification (de-anonymizing website traffic to accounts), and CRM data to score accounts and contacts for ABM (Account-Based Marketing) campaigns.
Strengths: The strongest tool for identifying accounts that are actively researching your category but have not yet engaged with your brand. Buying stage predictions help marketing and sales time their outreach. De-anonymization of website traffic reveals accounts you did not know were interested.
Limitations: Intent data is aggregated at the topic level (e.g., "this account is researching CRM software"), not at the content progression level. Cannot track the specific sequence of content interactions or identify colleague purchase patterns. Scoring is account-level, not individual-lead-level, which can be too coarse for enterprise sales motions with multiple stakeholders.
4. ZoomInfo - firmographic and intent enrichment
ZoomInfo provides the largest B2B contact and company database, with firmographic data (company size, revenue, industry, tech stack) and intent signals. Lead scoring in ZoomInfo is primarily about enrichment - adding data attributes that improve your existing scoring model rather than building the scoring model itself.
Strengths: The most comprehensive firmographic database in B2B. Tech stack data identifies companies already using complementary or competing products. Intent signals add behavioral context to static firmographic data. Strong prospecting workflows for building targeted lead lists.
Limitations: Enrichment, not prediction. ZoomInfo tells you what a company looks like, not whether a specific lead will convert. Intent signals are topic-level, not progression-level. No ability to model relationships between leads at the same company or track content engagement sequences. Best used as a data input to another scoring tool, not as a standalone scoring solution.
5. HubSpot - built-in CRM scoring
HubSpot offers both manual lead scoring (rules-based point assignment) and predictive lead scoring (ML-based, available in Enterprise tier). The manual scoring lets marketing teams assign points for contact properties and behaviors. The predictive scoring uses HubSpot's ML to analyze historical conversion patterns.
Strengths: Easiest setup on this list for teams already using HubSpot. Manual scoring gives marketing direct control over lead qualification criteria. Predictive scoring requires zero configuration - it learns from your historical data automatically. Tight integration with HubSpot's marketing automation for score-based workflows.
Limitations: Operates on HubSpot CRM data only. Predictive scoring is a black box - less transparency into scoring factors than Salesforce Einstein. Cannot ingest external data sources (product usage, data warehouse tables, support systems). Best suited for SMB to mid-market companies fully committed to HubSpot, not enterprises with complex multi-system data landscapes.
6. MadKudu - product-led growth scoring
MadKudu specializes in scoring for product-led growth (PLG) companies. It tracks product usage signals - feature adoption, activation milestones, usage frequency - and combines them with firmographic data to identify free users or trial accounts most likely to convert to paid.
Strengths: The best tool for PLG motions where product usage is the primary conversion signal. Tracks specific product events and milestones, not just aggregate usage counts. Integrates with Segment, Amplitude, and other product analytics tools. Helps sales teams identify which free/trial accounts to engage and when.
Limitations: Integrates multiple data sources but ultimately flattens them into a scoring model. Does not discover multi-hop relational patterns or learn content progression sequences from raw data. The scoring rules are partially manual (segment definitions) which limits the model's ability to find unexpected patterns. Best for PLG-specific scoring, less suited for complex enterprise sales cycles with multiple data sources.
7. DataRobot - AutoML lead scoring
DataRobot applies AutoML to lead scoring: you upload a feature table with one row per lead and columns representing lead attributes and engineered features, and it tries dozens of model architectures, tunes hyperparameters, and returns the best-performing model. It is the most sophisticated AutoML platform for tabular ML.
Strengths: Best-in-class model selection and tuning. Excellent explainability (SHAP values, partial dependence plots). Handles large feature tables well. Strong MLOps features for model monitoring, drift detection, and retraining. Enterprise-grade security and governance.
Limitations: Requires a pre-built flat feature table. All feature engineering - joining tables, computing aggregates, encoding content progressions, creating colleague features - is your team's responsibility. The 12+ hours of data preparation remain a manual bottleneck. Cannot model relational structure natively. Accuracy is bounded by the quality of the features you build, and the most predictive features (colleague signals, content sequences) are the hardest to engineer manually.
The signals that separate buyers from browsers
The core difference between lead scoring tools is which signals they can access. Here is a breakdown of signal types and which tools can capture them:
lead_scoring_signal_comparison
| Signal Type | Example | Visible in Rule-Based Scoring | Relative Predictive Power |
|---|---|---|---|
| Firmographic match | Company size = 500+, industry = SaaS | Yes | Low-Moderate (necessary but not sufficient) |
| Activity count | Viewed 15 pages, opened 8 emails | Yes | Low (volume != intent) |
| Demographic match | Job title = VP, department = Engineering | Yes | Moderate |
| Content progression | Blog →Case study →API docs →Demo request | No - flattened to page counts | High (shows buying journey stage) |
| Colleague purchase | 2 colleagues at same company already purchased | No - requires graph traversal | Very High (3-5x conversion lift) |
| Multi-signal convergence | Content progression + colleague signal + support inquiry | No - requires multi-table join with graph | Highest (signals reinforce across tables) |
Highlighted: the three strongest conversion signals - content progression, colleague purchases, and multi-signal convergence - are invisible to rule-based scoring and flat-table models. These signals explain why rule-based scoring misses approximately 60% of actual conversions.
The pattern is clear. The signals that every tool can capture (firmographics, activity counts, demographics) are the weakest predictors. They describe the lead's profile, not the lead's intent. The strongest predictors are relational and sequential - and they require a tool that can read multiple tables and their relationships natively.
How to choose the right tool
The right lead scoring tool depends on your data landscape, your go-to-market motion, and what you are optimizing for.
lead_scoring_selection_guide
| If you... | Consider | Why |
|---|---|---|
| Run entirely on Salesforce and want zero-setup scoring | Salesforce Einstein | Native CRM scoring with no data engineering required |
| Run ABM and need to identify anonymous buying accounts | 6sense | Best intent data and account de-anonymization |
| Need firmographic enrichment for prospecting | ZoomInfo | Most comprehensive B2B contact and company database |
| Run on HubSpot and want simple scoring | HubSpot | Easiest setup for HubSpot-native teams |
| Have a PLG motion and need to score free/trial users | MadKudu | Best product-led scoring for freemium conversion |
| Have a data science team and want model control | DataRobot | Best AutoML on pre-engineered feature tables |
| Have complex multi-table data and need maximum accuracy | Kumo.ai | Only tool that captures colleague signals and content progressions natively |
Highlighted: if your data spans multiple systems (CRM, product usage, support, billing, content) and you need the highest conversion prediction accuracy, the relational approach captures signals that no other tool on this list can represent.
The accuracy ceiling is a data ceiling
The most important insight in lead scoring is that the accuracy ceiling of most tools is not a model limitation - it is a data limitation. Better algorithms on the same CRM fields or the same flat feature table yield diminishing returns. The jump from rule-based scoring to ML-based scoring on CRM data might add 10-15 percentage points in accuracy. But you are still operating on the same incomplete picture of each lead.
The jump from CRM-only data to multi-table relational data adds another 15-20 points, because you are adding entirely new categories of signals: colleague purchase patterns, content progression sequences, cross-table behavioral convergence. This is why the tool comparison is not primarily about which algorithm is best. It is about which tool can ingest the data that contains the signals that matter.
For B2B enterprises with data spread across CRM, product analytics, support systems, billing platforms, and content management, the question is not "which scoring algorithm should we use?" It is "which tool can read our full relational data without requiring six months of feature engineering first?"