Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
Learn14 min read

Lead Scoring Beyond Firmographics: From Company Size to Buying Signals

Most B2B lead scoring models rely on company size, industry, and job title. That gets you a ranked list that is barely better than random. The signals that actually predict conversion are behavioral (what they do), relational (who else at their company is engaging), and temporal (when activity spikes). Here is how to get there.

TL;DR

  • 1Firmographic lead scoring (company size + industry + job title) tells you who could buy. Behavioral + relational scoring tells you who is about to buy. The difference in conversion rate is 2-3x.
  • 2The colleague effect: when 3+ contacts at the same account are engaging, conversion likelihood increases significantly. This is a 2-hop relational pattern that flat lead scoring models structurally cannot see.
  • 3Five levels of lead scoring maturity: (1) manual rules, (2) firmographic + behavioral, (3) predictive ML on flat CRM data, (4) multi-table ML connecting CRM + product usage + marketing, (5) real-time scoring with continuous model updates.
  • 4On the SAP SALT enterprise benchmark, KumoRFM achieves 91% accuracy vs 75% for PhD data scientists with XGBoost and 63% for LLM+AutoML. On RelBench, KumoRFM zero-shot scores 76.71 AUROC vs 62.44 for LightGBM with manual features.
  • 5KumoRFM reads raw CRM, product usage, and marketing tables directly. One PQL query replaces weeks of feature engineering, model training, and pipeline maintenance. No data science team required.

If you are reading this, your lead scoring probably works like this: company has 500+ employees, industry is SaaS, title contains "VP" = hot lead. Maybe you added some behavioral points for visiting the pricing page or opening an email.

That model is doing something. But it is not doing what you think. Firmographic scoring tells you which leads fit your ideal customer profile. It does not tell you which leads are actually going to buy. Those are different questions, and confusing them is why most B2B sales teams spend the majority of their time chasing leads that will never close.

The gap between "fits the profile" and "ready to buy" is where the real scoring signal lives. And most of that signal comes from data you already have but are not using correctly.

The five levels of lead scoring maturity

Not every company needs the most advanced lead scoring. But you should know where you are, what the next level looks like, and what it unlocks. Here are the five levels, with honest tradeoffs.

lead_scoring_maturity_levels

levelapproachdata_sourcestypical_accuracyteam_required
1. Manual rulesHubSpot/Salesforce built-in scoring. Marketing sets point values: +10 for VP title, +5 for pricing page visit.CRM fields onlySlightly better than randomMarketing ops (1 person)
2. Firmographic + behavioralAdd web visits, email opens, content downloads to rule-based scoring.CRM + marketing automation20-30% better than randomMarketing ops (1 person)
3. Predictive ML on flat dataLogistic regression or XGBoost on exported CRM data. Model learns which features predict conversion.CRM export (flat table)40-60% better than randomData scientist (1-2 FTEs)
4. Multi-table relational MLModel reads CRM + product usage + marketing tables together. Discovers cross-table patterns like the colleague effect.CRM + product + marketing (connected)2-3x improvement over Level 3ML engineer or analyst (0.5 FTE with KumoRFM)
5. Real-time scoringContinuous model updates. Scores refresh within hours of new signals. Triggered alerts for high-signal events.All sources, streamingHighest accuracy + timelinessML engineer (1 FTE) or managed platform

Most B2B companies are stuck at Level 1 or 2. The jump from Level 2 to Level 4 is where the biggest accuracy gain happens, and where most teams stall because they lack the data science resources for Level 3.

Level 1: Manual rules

HubSpot or Salesforce built-in scoring. Marketing sets point values manually: +10 for VP title, +5 for pricing page visit, -10 for a personal email address. These rules reflect human intuition about what matters but do not adapt to your actual conversion data.

  • Best for: Early-stage teams with simple sales motions and no data science resources. Gets you started with zero technical lift.
  • Watch out for: Rules reflect what you think matters, not what actually predicts conversion. Accuracy is barely better than random ranking.

Level 2: Firmographic + behavioral scoring

Add web visits, email opens, content downloads, and demo requests to your rule-based scoring. This is where most B2B companies land. It is a real improvement over firmographics alone because you are now capturing intent signals, not just fit signals.

  • Best for: Marketing ops teams that want better signal without ML investment. One person can maintain it.
  • Watch out for: Still rule-based, so the weights are guesses. Cannot see cross-contact patterns at the account level. Typically 20-30% better than random.

Level 3: Predictive ML on flat CRM data

Export CRM data to a flat table and train logistic regression or XGBoost. The model learns which features predict conversion from your historical data instead of relying on human-set rules. This is genuinely better than rules, but the best signals live in the relationships between tables, not in any single flat export.

  • Best for: Teams with 1-2 data scientists who want a meaningful accuracy jump over rules. Works when your conversion patterns are mostly captured in single-table features.
  • Watch out for: High cost for low marginal gain over Level 2. The colleague effect and cross-table patterns are invisible. Most teams spend 4-8 weeks building this and get incremental improvement.

Level 4: Multi-table relational ML

The model reads CRM, product usage, and marketing tables together and discovers cross-table patterns automatically. This is where the colleague effect becomes visible: when 3+ contacts at the same account are engaging, the model sees it and scores the account accordingly. The accuracy jump from Level 2 to Level 4 is typically 2-3x.

  • Best for: Teams that want the highest accuracy without building custom ML pipelines. KumoRFM makes this accessible with a single PQL query.
  • Watch out for: Requires connected data in a warehouse (CRM + product usage + marketing tables with foreign keys). The more tables you connect, the better the results.

Level 5: Real-time scoring

Continuous model updates with scores refreshing within hours of new signals. Triggered alerts for high-signal events like a new executive contact engaging or a sudden spike in product usage across an account. This is the frontier for teams with fast sales cycles.

  • Best for: High-velocity PLG motions with sales cycles under 14 days, where a user can go from casual to high-intent in a single session.
  • Watch out for: Requires streaming data infrastructure. For enterprise sales cycles (6+ months), weekly batch updates are usually sufficient.

Here is the thing most vendors will not tell you: Level 3 is often not worth the investment. Building a flat-table ML model on CRM data requires a data scientist, takes 4-8 weeks, and produces incremental improvement because the best signals (cross-table relationships) are not in the flat export. If you are going to invest in ML-based scoring, skip to Level 4 where the real accuracy gains are.

Why firmographic scoring hits a ceiling

Company size, industry, and job title are not bad signals. They are necessary signals. The problem is they are static. A 500-person SaaS company with a VP of Engineering had the same firmographic score six months ago as they do today. But six months ago they had no budget, no pain, and no urgency. Today three of their engineers are using your free tier, their VP attended your webinar, and they just posted a job listing for the problem your product solves.

Firmographic scoring cannot see any of that. It gives the same score to a perfect-profile company that will never buy and a perfect-profile company that is about to sign. The signal that separates them is behavioral, relational, and temporal.

The three signal types that actually predict conversion

Once you move beyond firmographics, there are three categories of signal that drive real predictive accuracy. Each one compounds on the others.

  1. Behavioral signals: what they do. Product usage is the strongest single predictor of conversion for product-led companies. Logins, features used, time-in-app, usage limits hit, integrations set up. For non-PLG companies, content engagement fills a similar role: pages visited, resources downloaded, webinars attended, demo requests. The key is frequency and recency, not just occurrence. A pricing page visit last week is a different signal than a pricing page visit six months ago.
  2. Relational signals: who else is engaging. This is where most scoring models fall down completely. When a single developer signs up for your free tier, that is a weak signal. When three developers and a VP at the same company all sign up in the same week, that is a buying signal. The individual contacts might each look unremarkable. The pattern across the account is what matters. This is the colleague effect, and it produces a substantial lift in conversion likelihood.
  3. Timing signals: when activity changes. A lead that has been steadily active for six months is less urgent than a lead whose activity just spiked in the last two weeks. Timing signals capture acceleration: sudden increases in product usage, a burst of content downloads, a cluster of new contacts from the same account. These spikes often correspond to internal events (new budget approval, a failed competitor evaluation, a mandate from leadership) that you cannot see directly but can infer from the behavioral shift.

Why flat-table ML misses the best signals

Most predictive lead scoring tools export CRM data to a flat table: one row per contact, columns for firmographic fields, behavioral counts, and a conversion label. Then they train logistic regression or XGBoost on that table.

This works better than rules. But it has a structural limitation: the best signals live in the relationships between tables, not in any single table.

Consider what a flat export loses:

  • Account-level patterns. Three contacts at the same account all engaging in the same week. The flat table has three independent rows. The cross-contact pattern is invisible.
  • Product usage trajectories. A user who went from 2 logins/week to 15 logins/week in the last month. The flat table might have a "total_logins" column, but the acceleration curve is gone.
  • Marketing-to-product connections. A contact downloaded a whitepaper, then signed up for the free tier, then invited two colleagues. The flat table has each action as a column count. The sequence and the invitation chain are lost.
  • Opportunity history context. This account had a closed-lost opportunity 8 months ago, and now a different contact is engaging. The flat table either ignores the history or flattens it into a binary "had_previous_opp" flag that loses all nuance.

You can try to engineer these signals manually: compute "num_contacts_same_account_active_last_7_days" and add it as a column. But each cross-table feature requires custom SQL, careful temporal windowing, and ongoing maintenance. Most teams add a few and stop because the engineering cost compounds faster than the accuracy gains.

Lead scoring tool comparison

The tool landscape ranges from simple rule-based scoring to full relational ML. Here is an honest comparison of what each platform actually does.

lead_scoring_tool_comparison

toolapproachdata_sourceshandles_colleague_effectneeds_data_science_teambest_for
HubSpot (built-in)Manual rule-based scoringCRM fields, email engagementNoNoEarly-stage teams, simple sales motions
Salesforce EinsteinFlat ML (logistic regression on CRM export)CRM fields, some activity dataNoNo (built-in), but limited tuningSalesforce-native teams wanting basic prediction
6senseIntent data + firmographic matchingThird-party intent signals, firmographicsNoNoABM-heavy teams focused on account identification
MadKuduProduct usage scoring + firmographicsProduct analytics, CRM, billing dataPartial (account-level aggregates)NoPLG companies with strong product usage data
KumoRFMRelational foundation model across all tablesCRM + product + marketing + billing (connected)Yes (reads multi-hop relational patterns natively)No (single analyst writes PQL query)Teams that want the highest accuracy across all data sources

HubSpot and Salesforce Einstein cover Levels 1-2. 6sense adds intent data. MadKudu adds product usage. KumoRFM connects all tables and discovers cross-table patterns including the colleague effect.

The colleague signal: a worked example

Here is a concrete example of why relational scoring matters. Suppose you are looking at two leads:

  • Lead A: VP of Engineering at a 1,000-person fintech. Visited your pricing page twice. Opened 3 emails. Firmographic score: 85/100.
  • Lead B: Senior developer at a 200-person healthcare company. Signed up for your free tier 3 weeks ago. Firmographic score: 45/100.

Traditional scoring ranks Lead A much higher. But here is what traditional scoring cannot see: Lead B's company has 4 other developers already using the free tier. Their engineering manager attended your webinar last Tuesday. And their CTO just connected with your sales VP on LinkedIn. Five contacts, three different engagement types, all in the last three weeks.

Lead A's company has zero other contacts engaging. The VP opened some emails. That is it.

Which company is closer to buying? The 200-person healthcare company with 6 engaged contacts, and it is not close. The colleague effect makes this obvious, but only if your scoring model can see across contacts within an account and connect their engagement patterns. Flat scoring models cannot.

Building relational lead scoring with PQL

With KumoRFM, you do not manually engineer cross-table features. You connect your data tables and write a predictive query. The relational foundation model discovers which patterns across all connected tables predict conversion.

PQL Query

PREDICT will_convert
FOR EACH contacts.contact_id
WHERE contacts.created_at > '2025-01-01'

One PQL query replaces the entire lead scoring pipeline: CRM data export, feature engineering, cross-table joins, model training, and scoring. KumoRFM reads contacts, accounts, opportunities, product_usage, and marketing_events tables directly and discovers behavioral, relational, and timing signals automatically.

Output

contact_idconversion_probfirmographic_only_scorekey_signal
C-44010.920.454 colleagues active in product, usage spike this week
C-44020.870.82VP title + pricing page + 2 colleagues in free tier
C-44030.310.88Perfect firmographic fit, but only contact at account, no product usage
C-44040.780.29Small company but 3 team members hit usage limits this month

Look at the divergence. Contact C-4403 is the lead that traditional scoring loves: perfect title, perfect company size, perfect industry. But they are the only person at their account engaging, and there is no product usage. KumoRFM scores them at 0.31. Contact C-4404 would be ignored by firmographic scoring: small company, junior title. But three team members hitting usage limits is a strong buying signal. KumoRFM scores them at 0.78.

This is not a hypothetical. This is the kind of re-ranking that happens when you move from flat scoring to relational scoring. The leads your sales team should be calling change.

What connecting your data actually looks like

The biggest barrier to better lead scoring is not algorithms. It is connecting data sources. Most companies have the data they need spread across 3-5 systems: CRM (Salesforce, HubSpot), product analytics (Amplitude, Mixpanel, Segment), marketing automation (Marketo, HubSpot), billing (Stripe, Zuora), and sometimes a data warehouse (Snowflake, BigQuery).

Traditional ML approaches require you to ETL all of this into a single flat table. That project alone takes 2-4 weeks and breaks every time a schema changes. KumoRFM reads relational tables directly from your data warehouse. You define the relationships between tables (contacts belong to accounts, product_usage events link to contacts, opportunities link to accounts) and the model handles the rest. No flattening, no joins, no feature engineering.

Flat-table lead scoring

  • Export CRM data to flat CSV. Lose all multi-table relationships.
  • Manually engineer features: num_emails_opened, days_since_last_login, total_page_views.
  • Cannot see cross-contact patterns. Each lead scored independently.
  • Train logistic regression or XGBoost. Accuracy plateaus quickly.
  • Rebuild pipeline every time a data source changes. 4-8 week cycles.
  • Requires 1-2 data scientists for ongoing maintenance.

KumoRFM relational lead scoring

  • Connect CRM, product, and marketing tables in your data warehouse.
  • Write PQL: PREDICT will_convert FOR EACH contacts.contact_id.
  • Model reads all tables and discovers cross-table signals automatically.
  • Colleague effect, usage trajectories, and timing signals all captured.
  • Schema changes handled automatically. No pipeline rebuilds.
  • Single analyst writes the query. No data science team required.

The SAP SALT and RelBench evidence

Lead scoring is a relational prediction task. Your CRM is not one table. It is a set of related tables: contacts, accounts, opportunities, activities, product usage events, marketing touches. The accuracy of your scoring depends on how well your model reads across these relationships.

The SAP SALT benchmark tests prediction accuracy on real enterprise relational data. Here is how different approaches compare:

sap_salt_benchmark_lead_scoring_relevant

approachaccuracywhat_it_means
LLM + AutoML63%Language model generates features, AutoML selects model. Misses relational structure.
PhD Data Scientist + XGBoost75%Expert spends weeks hand-crafting features from flat tables. Good, but structurally limited.
KumoRFM (zero-shot)91%No feature engineering, no training. Reads relational tables directly and discovers cross-table patterns.

SAP SALT benchmark: KumoRFM outperforms expert-tuned XGBoost by 16 percentage points on enterprise relational prediction tasks. The gap comes from cross-table patterns that flat feature tables cannot contain.

On the RelBench benchmark across 7 databases and 30 prediction tasks:

relbench_benchmark_results

approachAUROCfeature_engineering_time
LightGBM + manual features62.4412.3 hours per task
KumoRFM zero-shot76.71~1 second
KumoRFM fine-tuned81.14Minutes

KumoRFM zero-shot outperforms manually engineered LightGBM by 14+ AUROC points. Fine-tuning pushes the gap to nearly 19 points.

Getting started: a practical path from Level 1 to Level 4

You do not need to jump from rule-based scoring to relational ML overnight. Here is the practical sequence:

  1. Audit your current scoring model. Pull the conversion rates for your top-scored leads vs. bottom-scored leads. If the ratio is less than 3:1, your scoring is not adding much signal beyond random ranking. Most rule-based models land at 1.5-2:1.
  2. Connect product usage data to your CRM. If you are product-led, this is the single highest-ROI step. Even simple rules like "contacted AND used product in last 7 days" will outperform firmographic-only scoring.
  3. Identify your colleague effect. Look at your last 20 closed-won deals. How many had multiple contacts engaging before the deal closed? If the answer is most of them, you have a strong colleague signal waiting to be captured.
  4. Run a KumoRFM pilot. Connect your CRM and product usage tables. Write a PQL query. Compare the output against your current scoring on a holdout set of recent opportunities. The accuracy gap will tell you whether the relational signals in your data are worth capturing.
  5. Deploy and iterate. Start with KumoRFM scores as a parallel signal alongside your existing scoring. Let sales use both for 30 days. Measure which score better predicts actual outcomes. Then shift routing decisions to the more accurate score.

Frequently asked questions

Our lead scoring model only uses company size and industry. How do we make it smarter?

Start by adding behavioral signals from data you already have: product usage (logins, feature adoption, time-in-app), content engagement (pages visited, resources downloaded, webinars attended), and email interaction (opens, clicks, replies). That alone will outperform firmographic-only scoring. The next step is relational signals. When multiple contacts at the same account are engaging, in our experience, conversion rates improve by 3-5x. This colleague effect is a 2-hop relational pattern (contact to account to other contacts to their engagement) that flat scoring models cannot see. A relational foundation model like KumoRFM reads these multi-table patterns automatically without requiring you to manually engineer features or build a graph database.

What is the best tool for lead scoring if we don't have a data science team?

It depends on your maturity level. If you are just starting, HubSpot or Salesforce built-in scoring gives you manual rule-based scoring with zero technical lift. If you want predictive scoring without a data science team, MadKudu is strong for product-led growth companies because it connects product usage data. For the most accurate predictions across CRM, product usage, and marketing data together, KumoRFM lets a single analyst write a PQL query and get predictions without building ML models, engineering features, or managing pipelines. On the SAP SALT enterprise benchmark, KumoRFM achieves 91% accuracy vs 75% for PhD data scientists with XGBoost, so you get better results with significantly less technical effort.

How do I build a lead scoring model using our CRM and product usage data?

The traditional approach requires a data scientist to export CRM data (contacts, accounts, opportunities) and product usage data (logins, feature events, session data), join the tables manually, engineer features like days-since-last-login or feature-adoption-rate, train a logistic regression or XGBoost model, validate it, and deploy it to production. This takes 4-8 weeks and ongoing maintenance. With KumoRFM, you connect your CRM and product usage tables, write a PQL query like PREDICT will_convert FOR EACH contacts.contact_id, and the relational foundation model automatically discovers predictive patterns across all connected tables, including cross-table signals like the colleague effect that manual feature engineering typically misses. No joins, no feature engineering, no model training.

What is the colleague effect in lead scoring?

The colleague effect is the observation that when 3 or more contacts at the same company are actively engaging with your product or content, the likelihood of that account converting increases dramatically compared to single-contact engagement. This is a relational signal: it requires connecting contacts to their account, then to other contacts at that account, then to those contacts' engagement data. That is a 2-hop pattern. Traditional lead scoring models score each contact independently and cannot see that three colleagues are all evaluating the product simultaneously. A relational model like KumoRFM reads this pattern directly from your CRM and product usage tables.

What is the difference between rule-based and predictive lead scoring?

Rule-based lead scoring assigns points based on fixed rules set by marketing or sales: +10 for VP title, +5 for visiting the pricing page, -10 for a personal email address. These rules reflect human intuition but do not adapt to your actual conversion data. Predictive lead scoring uses machine learning to find patterns in your historical conversion data and score leads based on which patterns actually predicted past deals. The difference in practice is significant: rule-based scoring reflects what you think matters, predictive scoring reflects what actually matters. Teams that switch from rule-based to predictive lead scoring typically see significant improvement in conversion rates from marketing qualified leads.

Can lead scoring work with product-led growth?

Product-led growth makes lead scoring more powerful, not less. In PLG, you have a signal that most B2B companies lack: actual product usage data. You can see which free-tier users are hitting usage limits, which teams are adding collaborators, which accounts are adopting advanced features. The challenge is connecting product usage data to CRM data to marketing data. Most lead scoring tools only read one source. KumoRFM connects all three and finds cross-table patterns: a free-tier user who hit the usage limit, works at a company with 3 other active users, and attended a webinar last week is a structurally different lead than a free-tier user with the same usage who is the only person at their company.

How often should lead scores be updated?

Most lead scoring tools update scores in daily or weekly batches. That is too slow for product-led motions where a user can go from casual to high-intent in a single session. The ideal cadence depends on your sales cycle: for high-velocity PLG (sales cycle under 14 days), scores should update within hours. For mid-market (30-90 day cycles), daily updates are sufficient. For enterprise (6+ month cycles), weekly updates work but you should still trigger real-time alerts for specific high-signal events like a new executive contact engaging or a sudden spike in product usage across an account.

What data do I need for effective lead scoring?

At minimum, you need CRM data (contacts, accounts, opportunity outcomes) and enough closed-won and closed-lost history to train on, typically 200+ outcomes. Beyond that, each additional data source compounds the accuracy. Product usage data (logins, features used, session duration) adds behavioral intent signals. Marketing data (email engagement, content downloads, event attendance) adds interest signals. The biggest unlock is connecting these sources so the model can find cross-table patterns. A contact who downloaded a whitepaper, works at a company with 4 active product users, and reports to someone who attended your webinar is a qualitatively different lead than any of those signals alone suggest.

See it in action

KumoRFM delivers predictions on relational data in seconds. No feature engineering, no ML pipelines. Try it free.