Context
In the D&B CDP, you can use custom event models to create fit ratings for your accounts. This
will enable your sales and marketing teams to focus their spend and effort on the right accounts.
Once a fit model is built, it’s important to review the model before activation to ensure it will
perform for the use case for which it was designed. This guide describes best practices for
evaluating model quality.
Step 1: Check Model Warnings
The D&B CDP surfaces warnings on attributes that are used by the model but may be
problematic.
To check model warnings, head to the Attribute List screen and click the Warnings link
under the green summary banner. View the attributes with warnings.
There are four types of warnings:
- Prediction from later data (future information): This attribute looks like it was populated
later in the business cycle, often called future information. This warning comes when
available values show good lift (greater than 1.5), but 80% or more records are unpopulated
and have low lift (below 0.6). - Prediction from missing data: This attribute brings prediction from missing data into the
model. When unpopulated numbers or categories show significant prediction (less than 0.7
or greater than 1.2), later scores are often inaccurate. - Too many category values: This attribute has more than 200 category values. Attributes
with more than 200 different category values cannot be used in modeling. Where possible,
replace with a picklist attribute instead of a free field. - Too many identical values: This attribute has the same value for 98% or more records. This
can lead to poor segments or inaccurate scores.
For optimal model performance, attributes with warnings generally should not be present in a
model.
If you see attributes with warnings, try the following:
- Review the attributes from a business perspective. If there is business justification to keep an
attribute with a warning in the model, do so. Otherwise, remove it from the model.
Step 2: Review Model Health Score

- Check the Attribute List for first-party attributes (those in the My Attributes category)
used by the model that may contain future information. An attribute in which values are
populated only after an account reaches a particular marketing or sales stage is one with
future information. Attributes for which the ‘Not Populated’ value has a lift below 0.3x are
ones that likely contain future information. Remove these attributes from the model.
If your model health score is below 0.65, try the following:
- Ensure the data foundation of the model is good. A model built with inaccurate data will
produce inaccurate results.
Step 3: Review Conversions And Conversion Rate
Providing an appropriate number of conversions and an appropriate conversion rate is essential
to ensuring data scarcity will not negatively impact the performance of the model.
To review the number of conversions and conversion rate, view the ‘Total Conversions’ box in
the green summary banner at the top of the Attribute List screen. The successes (or 1s) in
the event column of your training set become conversions. Divide the number of successes by
the total number of records in your training set to determine the conversion rate.
For optimal model performance, aim for at least 500 conversions and an overall conversion rate
between 1 and 5 percent. If too few conversions or too low a conversion rate is provided, the
model may experience some instability. If too high a conversion rate is provided, overall lift will
be driven down.
If your conversion count is below 500 and/or your conversion rate is below 1 percent, try one
of the following:
- Provide more successes in your training set if available.
- Provide fewer non-successes in your training set. Only use this option if you can retain at
least 10,000 records in your training set. - Adjust your definition of success so it’s less granular. For instance, if you are creating a model
to predict which accounts are likely to convert but have a conversion rate lower than 1
percent, you could instead try using accounts with opportunities as your success event. The
model will now predict which accounts are likely to have an opportunity with your business.
If your conversion rate is above 5 percent, try one of the following:
- Ensure the training set has the full population of non-successes.
- Provide fewer successes in your training set. Only use this option if you can retain at least
500 successes in your training set. - Adjust your definition of success so it’s more granular. For instance, if you are creating a
model to predict which accounts are likely to convert but have a conversion rate above
5 percent, you could instead try using converted accounts that have high spend as your
success event. The model will now predict which accounts are likely to have high spend with
your business.
Step 4: Review Performance
The Performance screen shows how the model is performing on a holdout set of accounts.
A random 80% of the accounts from the training set are used to build and train the model.
The remaining 20% of accounts are held out from the model build to do back testing. This is a
machine learning best practice that helps ensure the model is performing correctly.
From the Attribute List, click the Performance menu item on the left side of the screen and
scroll to the bottom of the page to see how the random 20% of holdout accounts were scored
against the model.
The bars represent score deciles. The first bar represents accounts from the random 20% that
were scored from 91 to 100; the second bar represents accounts scored from 81 to 90; and so
on.
The y-axis represents the actual lift relative to the average. The horizontal dotted line denotes
1.0x lift, which corresponds to the average conversion rate. The average conversion rate is the
conversion rate provided in your training set.
A decent model is one in which the first decile has high performance, the chart has clear
segmentation between deciles and the deciles generally get smaller toward the right side of the
chart.
If your lift chart does meet these guidelines, try one of the following:
- Check the Attribute List for attributes where the ‘Not Populated’ value has a lift below
0.3x, as these likely contain future information. More detail is provided in the Step 6:
Check ‘Not Populated’ Lift section of this guide. Strongly consider removing these
attributes from the model. - Check the Attribute List for Firmographics attributes that are more than 40% ‘Not
Populated’ or non-Firmographics attributes that are more than 70% ‘Not Populated.’ More
detail is provided in the Step 5: Check Attribute Population section of this guide.
Strongly consider removing these attributes from the model. - Review the conversions and conversion rate. For optimal model performance, aim for at
least 500 conversions and an overall conversion rate between 1 and 5 percent. If your
conversions and/or conversion rate are outside the recommendation, refer to the Step 3:
Review Conversions And Conversion Rate section of this guide for next steps.
Step 5: Check Attribute Population
A common problem is training data consisting of many accounts that do not match to the D&B
Data Cloud, which yields a model that rates unmatched accounts either very high or very low.
This situation should be avoided.
To check match rates and ensure a healthy population of attributes selected for the model, head
to the Attribute List screen and click the Used by Model link under the green summary
banner. View the % Accts column on the ‘Not Populated’ value on each attribute. This value
shows the percentage of accounts from the training set that did not match to the D&B Data
Cloud.
Healthy population depends on the attribute category. Data in the Firmographics category
is usually highly available. For optimal model performance, aim for no more than 40% ‘Not
Populated’ on Firmographics attributes.
In other categories (like Technology Profile, Online Presence and Website Keywords), data is
often less available. For optimal model performance, aim for no more than 70% ‘Not Populated’
on attributes in non-Firmographics categories.
If a Firmographics attribute used by the model is more than 40% ‘Not Populated’ or a nonFirmographics attribute used by the model more than 70% ‘Not Populated,’ try the following:
- Review the attributes from a business perspective. If there is business justification to keep
an attribute with a high percentage of records not populated in the model, do so. Otherwise,
remove it from the model.
Step 6: Check ‘Not Populated’ Lift
It’s not desirable when the absence of data drives predictions. The best models are those in
which predictions come from specific internal and D&B Data Cloud attributes.
To check the lift on attributes selected for the model, head to the Attribute List screen and
click the Used by Model link under the green summary banner. View the lift column on the
‘Not Populated’ value on each attribute.
The lift represents how likely records with each value are to convert. 1.0x lift is average,
meaning records with this lift are neither more nor less likely to convert than average.
Across the board, an attribute may be problematic if the ‘Not Populated’ value has a lift smaller
than 0.6x or larger than 1.2x. Generally, the lack of data should not be predictive.
If an attribute used by the model has a lift smaller than 0.6x or larger than 1.2x, try the
following:
- Review the attributes from a business perspective. If there is business justification to keep
an attribute with high or low lift on not populated records in the model, do so. Otherwise,
remove it from the model.
Step 7: Review RF Model CSV
As a final step, advanced users may want to review the random forest (RF) Model CSV, which is
an output of the predictive model.
Each attribute in the random forest model is represented by a row in the RF Model CSV file.
The Feature Importance column represents an attribute’s relative importance in predicting
the outcome. Any attribute with a feature importance above 0.05 is a potential issue, often
representing overfitting.
To download the RF Model CSV, head to the Attribute List screen and click the Model
Summary menu item on the left side of the screen. From the Model Diagnostics section,
download the RF Model CSV.
If an attribute has a feature importance above 0.05, try the following:
- Review the attributes from a business perspective. If there is business justification to keep an
attribute with a feature importance above 0.05, do so. Otherwise, remove it from the model.
Comments
0 comments
Please sign in to leave a comment.