If you are reading this, your lead scoring probably works like this: company has 500+ employees, industry is SaaS, title contains "VP" = hot lead. Maybe you added some behavioral points for visiting the pricing page or opening an email.
That model is doing something. But it is not doing what you think. Firmographic scoring tells you which leads fit your ideal customer profile. It does not tell you which leads are actually going to buy. Those are different questions, and confusing them is why most B2B sales teams spend the majority of their time chasing leads that will never close.
The gap between "fits the profile" and "ready to buy" is where the real scoring signal lives. And most of that signal comes from data you already have but are not using correctly.
The five levels of lead scoring maturity
Not every company needs the most advanced lead scoring. But you should know where you are, what the next level looks like, and what it unlocks. Here are the five levels, with honest tradeoffs.
lead_scoring_maturity_levels
| level | approach | data_sources | typical_accuracy | team_required |
|---|---|---|---|---|
| 1. Manual rules | HubSpot/Salesforce built-in scoring. Marketing sets point values: +10 for VP title, +5 for pricing page visit. | CRM fields only | Slightly better than random | Marketing ops (1 person) |
| 2. Firmographic + behavioral | Add web visits, email opens, content downloads to rule-based scoring. | CRM + marketing automation | 20-30% better than random | Marketing ops (1 person) |
| 3. Predictive ML on flat data | Logistic regression or XGBoost on exported CRM data. Model learns which features predict conversion. | CRM export (flat table) | 40-60% better than random | Data scientist (1-2 FTEs) |
| 4. Multi-table relational ML | Model reads CRM + product usage + marketing tables together. Discovers cross-table patterns like the colleague effect. | CRM + product + marketing (connected) | 2-3x improvement over Level 3 | ML engineer or analyst (0.5 FTE with KumoRFM) |
| 5. Real-time scoring | Continuous model updates. Scores refresh within hours of new signals. Triggered alerts for high-signal events. | All sources, streaming | Highest accuracy + timeliness | ML engineer (1 FTE) or managed platform |
Most B2B companies are stuck at Level 1 or 2. The jump from Level 2 to Level 4 is where the biggest accuracy gain happens, and where most teams stall because they lack the data science resources for Level 3.
Level 1: Manual rules
HubSpot or Salesforce built-in scoring. Marketing sets point values manually: +10 for VP title, +5 for pricing page visit, -10 for a personal email address. These rules reflect human intuition about what matters but do not adapt to your actual conversion data.
- Best for: Early-stage teams with simple sales motions and no data science resources. Gets you started with zero technical lift.
- Watch out for: Rules reflect what you think matters, not what actually predicts conversion. Accuracy is barely better than random ranking.
Level 2: Firmographic + behavioral scoring
Add web visits, email opens, content downloads, and demo requests to your rule-based scoring. This is where most B2B companies land. It is a real improvement over firmographics alone because you are now capturing intent signals, not just fit signals.
- Best for: Marketing ops teams that want better signal without ML investment. One person can maintain it.
- Watch out for: Still rule-based, so the weights are guesses. Cannot see cross-contact patterns at the account level. Typically 20-30% better than random.
Level 3: Predictive ML on flat CRM data
Export CRM data to a flat table and train logistic regression or XGBoost. The model learns which features predict conversion from your historical data instead of relying on human-set rules. This is genuinely better than rules, but the best signals live in the relationships between tables, not in any single flat export.
- Best for: Teams with 1-2 data scientists who want a meaningful accuracy jump over rules. Works when your conversion patterns are mostly captured in single-table features.
- Watch out for: High cost for low marginal gain over Level 2. The colleague effect and cross-table patterns are invisible. Most teams spend 4-8 weeks building this and get incremental improvement.
Level 4: Multi-table relational ML
The model reads CRM, product usage, and marketing tables together and discovers cross-table patterns automatically. This is where the colleague effect becomes visible: when 3+ contacts at the same account are engaging, the model sees it and scores the account accordingly. The accuracy jump from Level 2 to Level 4 is typically 2-3x.
- Best for: Teams that want the highest accuracy without building custom ML pipelines. KumoRFM makes this accessible with a single PQL query.
- Watch out for: Requires connected data in a warehouse (CRM + product usage + marketing tables with foreign keys). The more tables you connect, the better the results.
Level 5: Real-time scoring
Continuous model updates with scores refreshing within hours of new signals. Triggered alerts for high-signal events like a new executive contact engaging or a sudden spike in product usage across an account. This is the frontier for teams with fast sales cycles.
- Best for: High-velocity PLG motions with sales cycles under 14 days, where a user can go from casual to high-intent in a single session.
- Watch out for: Requires streaming data infrastructure. For enterprise sales cycles (6+ months), weekly batch updates are usually sufficient.
Here is the thing most vendors will not tell you: Level 3 is often not worth the investment. Building a flat-table ML model on CRM data requires a data scientist, takes 4-8 weeks, and produces incremental improvement because the best signals (cross-table relationships) are not in the flat export. If you are going to invest in ML-based scoring, skip to Level 4 where the real accuracy gains are.
Why firmographic scoring hits a ceiling
Company size, industry, and job title are not bad signals. They are necessary signals. The problem is they are static. A 500-person SaaS company with a VP of Engineering had the same firmographic score six months ago as they do today. But six months ago they had no budget, no pain, and no urgency. Today three of their engineers are using your free tier, their VP attended your webinar, and they just posted a job listing for the problem your product solves.
Firmographic scoring cannot see any of that. It gives the same score to a perfect-profile company that will never buy and a perfect-profile company that is about to sign. The signal that separates them is behavioral, relational, and temporal.
The three signal types that actually predict conversion
Once you move beyond firmographics, there are three categories of signal that drive real predictive accuracy. Each one compounds on the others.
- Behavioral signals: what they do. Product usage is the strongest single predictor of conversion for product-led companies. Logins, features used, time-in-app, usage limits hit, integrations set up. For non-PLG companies, content engagement fills a similar role: pages visited, resources downloaded, webinars attended, demo requests. The key is frequency and recency, not just occurrence. A pricing page visit last week is a different signal than a pricing page visit six months ago.
- Relational signals: who else is engaging. This is where most scoring models fall down completely. When a single developer signs up for your free tier, that is a weak signal. When three developers and a VP at the same company all sign up in the same week, that is a buying signal. The individual contacts might each look unremarkable. The pattern across the account is what matters. This is the colleague effect, and it produces a substantial lift in conversion likelihood.
- Timing signals: when activity changes. A lead that has been steadily active for six months is less urgent than a lead whose activity just spiked in the last two weeks. Timing signals capture acceleration: sudden increases in product usage, a burst of content downloads, a cluster of new contacts from the same account. These spikes often correspond to internal events (new budget approval, a failed competitor evaluation, a mandate from leadership) that you cannot see directly but can infer from the behavioral shift.
Why flat-table ML misses the best signals
Most predictive lead scoring tools export CRM data to a flat table: one row per contact, columns for firmographic fields, behavioral counts, and a conversion label. Then they train logistic regression or XGBoost on that table.
This works better than rules. But it has a structural limitation: the best signals live in the relationships between tables, not in any single table.
Consider what a flat export loses:
- Account-level patterns. Three contacts at the same account all engaging in the same week. The flat table has three independent rows. The cross-contact pattern is invisible.
- Product usage trajectories. A user who went from 2 logins/week to 15 logins/week in the last month. The flat table might have a "total_logins" column, but the acceleration curve is gone.
- Marketing-to-product connections. A contact downloaded a whitepaper, then signed up for the free tier, then invited two colleagues. The flat table has each action as a column count. The sequence and the invitation chain are lost.
- Opportunity history context. This account had a closed-lost opportunity 8 months ago, and now a different contact is engaging. The flat table either ignores the history or flattens it into a binary "had_previous_opp" flag that loses all nuance.
You can try to engineer these signals manually: compute "num_contacts_same_account_active_last_7_days" and add it as a column. But each cross-table feature requires custom SQL, careful temporal windowing, and ongoing maintenance. Most teams add a few and stop because the engineering cost compounds faster than the accuracy gains.
Lead scoring tool comparison
The tool landscape ranges from simple rule-based scoring to full relational ML. Here is an honest comparison of what each platform actually does.
lead_scoring_tool_comparison
| tool | approach | data_sources | handles_colleague_effect | needs_data_science_team | best_for |
|---|---|---|---|---|---|
| HubSpot (built-in) | Manual rule-based scoring | CRM fields, email engagement | No | No | Early-stage teams, simple sales motions |
| Salesforce Einstein | Flat ML (logistic regression on CRM export) | CRM fields, some activity data | No | No (built-in), but limited tuning | Salesforce-native teams wanting basic prediction |
| 6sense | Intent data + firmographic matching | Third-party intent signals, firmographics | No | No | ABM-heavy teams focused on account identification |
| MadKudu | Product usage scoring + firmographics | Product analytics, CRM, billing data | Partial (account-level aggregates) | No | PLG companies with strong product usage data |
| KumoRFM | Relational foundation model across all tables | CRM + product + marketing + billing (connected) | Yes (reads multi-hop relational patterns natively) | No (single analyst writes PQL query) | Teams that want the highest accuracy across all data sources |
HubSpot and Salesforce Einstein cover Levels 1-2. 6sense adds intent data. MadKudu adds product usage. KumoRFM connects all tables and discovers cross-table patterns including the colleague effect.
The colleague signal: a worked example
Here is a concrete example of why relational scoring matters. Suppose you are looking at two leads:
- Lead A: VP of Engineering at a 1,000-person fintech. Visited your pricing page twice. Opened 3 emails. Firmographic score: 85/100.
- Lead B: Senior developer at a 200-person healthcare company. Signed up for your free tier 3 weeks ago. Firmographic score: 45/100.
Traditional scoring ranks Lead A much higher. But here is what traditional scoring cannot see: Lead B's company has 4 other developers already using the free tier. Their engineering manager attended your webinar last Tuesday. And their CTO just connected with your sales VP on LinkedIn. Five contacts, three different engagement types, all in the last three weeks.
Lead A's company has zero other contacts engaging. The VP opened some emails. That is it.
Which company is closer to buying? The 200-person healthcare company with 6 engaged contacts, and it is not close. The colleague effect makes this obvious, but only if your scoring model can see across contacts within an account and connect their engagement patterns. Flat scoring models cannot.
Building relational lead scoring with PQL
With KumoRFM, you do not manually engineer cross-table features. You connect your data tables and write a predictive query. The relational foundation model discovers which patterns across all connected tables predict conversion.
PQL Query
PREDICT will_convert FOR EACH contacts.contact_id WHERE contacts.created_at > '2025-01-01'
One PQL query replaces the entire lead scoring pipeline: CRM data export, feature engineering, cross-table joins, model training, and scoring. KumoRFM reads contacts, accounts, opportunities, product_usage, and marketing_events tables directly and discovers behavioral, relational, and timing signals automatically.
Output
| contact_id | conversion_prob | firmographic_only_score | key_signal |
|---|---|---|---|
| C-4401 | 0.92 | 0.45 | 4 colleagues active in product, usage spike this week |
| C-4402 | 0.87 | 0.82 | VP title + pricing page + 2 colleagues in free tier |
| C-4403 | 0.31 | 0.88 | Perfect firmographic fit, but only contact at account, no product usage |
| C-4404 | 0.78 | 0.29 | Small company but 3 team members hit usage limits this month |
Look at the divergence. Contact C-4403 is the lead that traditional scoring loves: perfect title, perfect company size, perfect industry. But they are the only person at their account engaging, and there is no product usage. KumoRFM scores them at 0.31. Contact C-4404 would be ignored by firmographic scoring: small company, junior title. But three team members hitting usage limits is a strong buying signal. KumoRFM scores them at 0.78.
This is not a hypothetical. This is the kind of re-ranking that happens when you move from flat scoring to relational scoring. The leads your sales team should be calling change.
What connecting your data actually looks like
The biggest barrier to better lead scoring is not algorithms. It is connecting data sources. Most companies have the data they need spread across 3-5 systems: CRM (Salesforce, HubSpot), product analytics (Amplitude, Mixpanel, Segment), marketing automation (Marketo, HubSpot), billing (Stripe, Zuora), and sometimes a data warehouse (Snowflake, BigQuery).
Traditional ML approaches require you to ETL all of this into a single flat table. That project alone takes 2-4 weeks and breaks every time a schema changes. KumoRFM reads relational tables directly from your data warehouse. You define the relationships between tables (contacts belong to accounts, product_usage events link to contacts, opportunities link to accounts) and the model handles the rest. No flattening, no joins, no feature engineering.
Flat-table lead scoring
- Export CRM data to flat CSV. Lose all multi-table relationships.
- Manually engineer features: num_emails_opened, days_since_last_login, total_page_views.
- Cannot see cross-contact patterns. Each lead scored independently.
- Train logistic regression or XGBoost. Accuracy plateaus quickly.
- Rebuild pipeline every time a data source changes. 4-8 week cycles.
- Requires 1-2 data scientists for ongoing maintenance.
KumoRFM relational lead scoring
- Connect CRM, product, and marketing tables in your data warehouse.
- Write PQL: PREDICT will_convert FOR EACH contacts.contact_id.
- Model reads all tables and discovers cross-table signals automatically.
- Colleague effect, usage trajectories, and timing signals all captured.
- Schema changes handled automatically. No pipeline rebuilds.
- Single analyst writes the query. No data science team required.
The SAP SALT and RelBench evidence
Lead scoring is a relational prediction task. Your CRM is not one table. It is a set of related tables: contacts, accounts, opportunities, activities, product usage events, marketing touches. The accuracy of your scoring depends on how well your model reads across these relationships.
The SAP SALT benchmark tests prediction accuracy on real enterprise relational data. Here is how different approaches compare:
sap_salt_benchmark_lead_scoring_relevant
| approach | accuracy | what_it_means |
|---|---|---|
| LLM + AutoML | 63% | Language model generates features, AutoML selects model. Misses relational structure. |
| PhD Data Scientist + XGBoost | 75% | Expert spends weeks hand-crafting features from flat tables. Good, but structurally limited. |
| KumoRFM (zero-shot) | 91% | No feature engineering, no training. Reads relational tables directly and discovers cross-table patterns. |
SAP SALT benchmark: KumoRFM outperforms expert-tuned XGBoost by 16 percentage points on enterprise relational prediction tasks. The gap comes from cross-table patterns that flat feature tables cannot contain.
On the RelBench benchmark across 7 databases and 30 prediction tasks:
relbench_benchmark_results
| approach | AUROC | feature_engineering_time |
|---|---|---|
| LightGBM + manual features | 62.44 | 12.3 hours per task |
| KumoRFM zero-shot | 76.71 | ~1 second |
| KumoRFM fine-tuned | 81.14 | Minutes |
KumoRFM zero-shot outperforms manually engineered LightGBM by 14+ AUROC points. Fine-tuning pushes the gap to nearly 19 points.
Getting started: a practical path from Level 1 to Level 4
You do not need to jump from rule-based scoring to relational ML overnight. Here is the practical sequence:
- Audit your current scoring model. Pull the conversion rates for your top-scored leads vs. bottom-scored leads. If the ratio is less than 3:1, your scoring is not adding much signal beyond random ranking. Most rule-based models land at 1.5-2:1.
- Connect product usage data to your CRM. If you are product-led, this is the single highest-ROI step. Even simple rules like "contacted AND used product in last 7 days" will outperform firmographic-only scoring.
- Identify your colleague effect. Look at your last 20 closed-won deals. How many had multiple contacts engaging before the deal closed? If the answer is most of them, you have a strong colleague signal waiting to be captured.
- Run a KumoRFM pilot. Connect your CRM and product usage tables. Write a PQL query. Compare the output against your current scoring on a holdout set of recent opportunities. The accuracy gap will tell you whether the relational signals in your data are worth capturing.
- Deploy and iterate. Start with KumoRFM scores as a parallel signal alongside your existing scoring. Let sales use both for 30 days. Measure which score better predicts actual outcomes. Then shift routing decisions to the more accurate score.