The Information Value (IV) statistic is a popular screener for selecting predictor variables for binary logistic regression. Familiar, but perhaps mysterious, guidelines for deciding if the IV of a predictor X is high enough to use in modeling are given in many textbooks on credit scoring. For example, these texts say that IV > 0.3 shows X to be a strong predictor. These guidelines must be considered in the context of binning. A common practice in preparing a predictor X is to bin the levels of X to remove outliers and reveal a trend. But IV decreases as the levels of X are collapsed. This paper has two goals: (1) Provide a method for collapsing the levels of X which maximizes IV at each iteration and (2) show how the guidelines (e.g. IV > 0.3) relate to other measures of predictive power. All data processing was performed using Base SAS®.
Information Value Statistics.pdf
(556.64 KB)