Context
Lattice Predictive Insights (LPI) models are used score your records in real-time that your sellers can use to prioritize the right leads or accounts at the right time. This article covers how to review an LPI scoring model.
Pre-Requisites:
Summary
In Lattice Predictive Insights (LPI), you can create fit models to help your sales and marketing teams focus their spend and effort on the right accounts. Once a fit model is built, it’s important to review the model before activation to ensure it will perform for the use case for which it was designed. This guide describes best practices for evaluating model quality.
Step 1: Review Model Health Score
The model health score is a single number that represents how effective a model is at making predictions. For those with a statistics background, the model health score is analogous to the receiver operating characteristic (ROC).
For optimal model performance, aim for a model health score between 0.65 and 0.85. A model health score above 0.85 is often indicative of overfitting. A model health score below 0.65 is indicative of a model with poor predictability.
To check the model health score, head to the Attributes screen and click the Model Summary menu item on the left side of the screen. View the ‘Model Health Score’ value.
If your model health score is above 0.85, try the following:
- Check the Attributes screen for first-party attributes (those in the My Company Attributes category) that may contain future information. An attribute in which values are populated only after an account reaches a particular marketing or sales stage is one with future information. Attributes for which the ‘Not Populated’ value has a lift below 0.3x are ones that likely contain future information. More detail is provided in the Step 5: Check ‘Not Populated’ Lift section of this guide. Remove these attributes from the model.
If your model health score is below 0.65, try the following:
- Ensure the data foundation of the model is good. A model built with inaccurate data will produce inaccurate results.
Step 2: Review Conversions And Conversion Rate
Providing an appropriate number of conversions and an appropriate conversion rate is essential to ensuring data scarcity will not negatively impact the performance of the model.
To review the number of conversions and conversion rate, view the ‘Conversions’ box in the summary banner at the top of the Attributes screen. The successes (or 1s) in the event column of your training set become conversions. Divide the number of successes by the total number of records in your training set to determine the conversion rate.
For optimal model performance, aim for at least 500 conversions and an overall conversion rate between 1 and 5 percent. If too few conversions or too low a conversion rate is provided, the model may experience some instability. If too high a conversion rate is provided, overall lift will be driven down.
If your conversion count is below 500 and/or your conversion rate is below 1 percent, try one of the following:
- Provide more successes in your training set if available.
- Provide fewer non-successes in your training set. Only use this option if you can retain at least 10,000 records in your training set.
- Adjust your definition of success so it’s less granular. For instance, if you are creating a model to predict which accounts are likely to convert but have a conversion rate lower than 1 percent, you could instead try using accounts with opportunities as your success event. The model will now predict which accounts are likely to have an opportunity with your business.
If your conversion rate is above 5 percent, try one of the following:
- Ensure the training set has the full population of non-successes.
- Provide fewer successes in your training set. Only use this option if you can retain at least 500 successes in your training set.
- Adjust your definition of success so it’s more granular. For instance, if you are creating a model to predict which accounts are likely to convert but have a conversion rate above 5 percent, you could instead try using converted accounts that have high spend as your success event. The model will now predict which accounts are likely to have high spend with your business.
Step 3: Review Performance
The Performance screen shows how the model is performing on a holdout set of accounts. A random 80% of the accounts from the training set are used to build and train the model. The remaining 20% of accounts are held out from the model build to do back testing. This is a machine learning best practice that helps ensure the model is performing correctly.
From the Attributes screen, click the Performance menu item on the left side of the screen and scroll to the bottom of the page to see how the random 20% of holdout accounts were scored against the model.
The bars represent score deciles. The first bar represents accounts from the random 20% that were scored from 91 to 100; the second bar represents accounts scored from 81 to 90; and so on.
The y-axis represents the actual lift relative to the average. The horizontal dotted line denotes 1.0x lift, which corresponds to the average conversion rate. The average conversion rate is the conversion rate provided in your training set.
A decent model is one in which the first decile has high performance, the chart has clear segmentation between deciles and the deciles generally get smaller toward the right side of the chart.
If your lift chart does meet these guidelines, try one of the following:
- Check the Attributes screen for attributes where the ‘Not Populated’ value has a lift below 0.3x, as these likely contain future information. More detail is provided in the Step 5: Check ‘Not Populated’ Lift section of this guide. Strongly consider removing these attributes from the model.
- Check the Attributes screen for Firmographics attributes that are more than 40% ‘Not Populated’ or non-Firmographics attributes that are more than 70% ‘Not Populated.’ More detail is provided in the Step 4: Check Attribute Population section of this guide. Strongly consider removing these attributes from the model.
- Review the conversions and conversion rate. For optimal model performance, aim for at least 500 conversions and an overall conversion rate between 1 and 5 percent. If your conversions and/or conversion rate are outside the recommendation, refer to the Step 2: Review Conversions And Conversion Rate section of this guide for next steps.
Step 4: Check Attribute Population
A common problem is training data consisting of many accounts that do not match to the Lattice Data Cloud, which yields a model that rates unmatched accounts either very high or very low. This situation should be avoided.
To check match rates and ensure a healthy population of attributes selected for the model, head to the Attributes screen and hover over the various slices of the donut. View the % Accts column on the ‘Not Populated’ value on each attribute. This value shows the percentage of accounts from the training set that did not match to the Lattice Data Cloud.
Healthy population depends on the attribute category. Data in the Firmographics category is usually highly available. For optimal model performance, aim for no more than 40% ‘Not Populated’ on Firmographics attributes.
In other categories (like Technology Profile, Online Presence and Website Keywords), data is often less available. For optimal model performance, aim for no more than 70% ‘Not Populated’ on attributes in non-Firmographics categories.
If a Firmographics attribute used by the model is more than 40% ‘Not Populated’ or a non-Firmographics attribute used by the model is more than 70% ‘Not Populated,’ try the following:
- Review the attributes from a business perspective. If there is business justification to keep an attribute with a high percentage of records not populated in the model, do so. Otherwise, remove it from the model.
Step 5: Check ‘Not Populated’ Lift
It’s not desirable when the absence of data drives predictions. The best models are those in which predictions come from specific internal and Lattice Data Cloud attributes.
To check the lift on attributes selected for the model, head to the Attributes screen and hover over the various slices of the donut. View the Lift column on the ‘Not Populated’ value on each attribute.
The lift represents how likely records with each value are to convert. 1.0x lift is average, meaning records with this lift are neither more nor less likely to convert than average.
Across the board, an attribute may be problematic if the ‘Not Populated’ value has a lift smaller than 0.6x or larger than 1.2x. Generally, the lack of data should not be predictive.
If an attribute used by the model has a lift smaller than 0.6x or larger than 1.2x, try the following:
- Review the attributes from a business perspective. If there is business justification to keep an attribute with high or low lift on not populated records in the model, do so. Otherwise, remove it from the model.
Step 6: Review RF Model CSV
As a final step, review the random forest (RF) Model CSV, which is an output of the predictive model.
Each attribute in the random forest model is represented by a row in the RF Model CSV file. The Feature Importance column represents an attribute’s relative importance in predicting the outcome. Any attribute with a feature importance above 0.05 is a potential issue, often representing overfitting.
To download the RF Model CSV, head to the Attributes screen and click the Model Summary menu item on the left side of the screen. From the Model Diagnostics section, download the RF Model CSV.
If an attribute has a feature importance above 0.05, try the following:
- Review the attributes from a business perspective. If there is business justification to keep an attribute with a feature importance above 0.05, do so. Otherwise, remove it from the model.
Comments
0 comments
Please sign in to leave a comment.