The opportunity to write a second edition of my book allowed me to reassess an old familiar topic. Gordon Linoff and I recently completed the new edition of Data Mining Techniques for Marketing, Sales, and Customer Relationship Management. The difference between the first and second editions says a lot about how the field has evolved. It says even more about how our own perspective has changed by years of running a data mining consulting practice. (Even the title of the book has changed. When we wrote the first edition, we didn’t even know the term “customer relationship management”, so we didn’t use it in the title, even though it is an apt description of the applications of data mining we describe today.)
The first edition of Data Mining Techniques appeared in 1997. If you think back to that time, the dot-com bubble had barely started to inflate. U.S. cell phone calls cost 56 cents per minute, on average, and fewer than 25% of Americans owned a mobile phone. Data mining was a buzz word for many business people, but very little actual business data mining was occurring.
A lot has changed in seven years. Now, data mining and analytic CRM are considered mainstream. Data mining software has also matured. Instead of downloading source code you need to compile, you can buy data mining suites that come with full documentation and reasonable user interfaces.
But even if the technological and business worlds had remained the same, we would have wanted to update the book, because we learned so much in those intervening years. One of the joys of consulting is the constant exposure to new ideas, new problems, and new solutions. We may not be any smarter than when we wrote the first edition, but we do have more experience which has changed the way we approach the material.
One thing that has been driven home to us over and over again during the past seven years, is that data mining is almost all about process and only a little about clever algorithms. When the data mining process is not well understood, all the clever techniques and algorithms get applied to the wrong data, in the wrong ways, and yield wrong results. A corollary is that the skills of the human data miner and that individual’s knowledge and intuition, about how to coax meaning from recalcitrant data, are more important than tools and techniques.
The new book does cover a few more data mining techniques than the original. In addition to the seven techniques covered in the first edition—decision trees, neural networks, memory based reasoning, association rules, cluster detection, link analysis, and genetic algorithms—there is now a chapter on data mining using standard statistical techniques. These familiar tools include cross tabs and histograms. There is also another new chapter on survival analysis. Survival analysis is a technique that has been adapted from the small samples and continuous time measurements of the medical world to the large samples and discrete time measurements found in marketing data. It is used to study time-to-event problems, such as estimating the remaining lifetime of a customer relationship or the time to the next purchase. More importantly, the new edition is careful to show these techniques in their proper business context and to point out the ways they can be misused.
In our consulting practice, we have seen how often data mining is misused:
a) to learn things that aren’t true; or
b) to learn things that are true, but not useful.
For that reason, the new edition features a much-expanded discussion of the ways that data mining can provide unintended results and advises the reader of the data mining methodology and best practices that will help avoid these perils.
Finding data that is inaccurate is more dangerous than finding factual data that is not useful because important business decisions may be based on incorrect information. Data mining results often seem reliable because they are based on actual data derived in a seemingly scientific manner. This appearance of reliability can be deceiving. The data itself may be incorrect or not relevant to the question at hand. The patterns discovered may reflect past business decisions or nothing at all. Data transformations, within the system, such as summarization, may have destroyed or hidden important information. The rest of this article illustrates how these problems can arise.
It is often said that figures don’t lie, but liars can figure. When it comes to finding patterns in data, figures don’t have to actually lie in order to suggest results that aren’t true. There are so many ways to construct patterns, that any random set of data points will reveal a pattern if examined long enough.
Human beings depend so heavily on patterns in their day-to-day lives that they tend to see patterns even when they don’t exist. If you look at the night-time sky, you probably do not see a random arrangement of stars, but rather, the Big Dipper, or the Southern Cross, or Orion’s Belt. Some of you even see astrological patterns and portents that can be used to predict the future. This was an early form of data mining! The widespread acceptance of outlandish conspiracy theories is further evidence of the human need to find patterns in data.
Presumably, the reason that humans have developed such an affinity for patterns is that patterns often do reflect some underlying truth about the way the world works. The phases of the moon, the progression of the seasons, the constant alternation of night and day, even the regular appearance of a favorite TV show, at the same time, on the same day of the week, are useful because they are stable and therefore predictive. One can use these patterns to decide when it is safe to plant tomatoes or how to program the VCR. Other patterns clearly do not have any predictive power. If a fair coin comes up heads 5 times in a row, there is still a 50-50 chance that it will come up tails on the sixth toss. The challenge for data miners is to figure out which patterns are predictive and which are not—to separate signal from noise.
In more than one industry, we have been told that usage often goes down in the month before a customer leaves. Upon closer examination, this turns out to be an example of learning something that is not true. The graph below appears to illustrate putative discovery. It shows the monthly minutes of use for a cellular telephone subscriber. For seven months, the subscriber uses about 100 minutes per month. Then, in the 8th month, usage goes down to about half that. In the 9th month, there is no usage at all.
Does declining usage in month 8 predict cessation in month 9?
This subscriber appears to fit the pattern of a month with decreased usage preceding abandonment of the service. But appearances are deceiving. Looking at minutes of use by day instead of by month, would show that the customer continued to use the service at a constant rate until the middle of the eighth month and then stopped completely. One could presume this was because on that day, the customer began using a competing service. The putative period of declining usage does not actually exist and, certainly, does not provide a window of opportunity during which the customer can be retained. What appears to be a leading indicator is actually a trailing one.
Another common problem is finding patterns in one dataset that don’t generalize to others. The technical term for this is “overfitting.” It happens when the data miner spends too much effort trying to get the best possible results from the data that happens to be at hand and not enough effort making sure that the resulting model is stable. Model stability is the focus of our data mining methodology. A model that does a great job of explaining who placed an order from last month’s catalog, but fails to predict who will place an order from this month’s catalog, is not as useful as one that yields a predictable response rate month after month.
A third problem is discovering valid patterns and rules that can’t be applied in the intended way. For example, one way that data mining is used to find new prospects for a product is to profile the current customers and then look for people who match that profile. This is a powerful technique, but it runs into trouble when using the specific product changes the very variables used to build the profile. I once built a profile of certificate of deposit holders for a retail bank. One of the striking things the CD holders had in common was low balances in their savings accounts. Clearly, however, identifying all the people with nothing in their savings accounts and then trying to sell them CDs is highly unlikely to be a winning strategy! The point of this story is that, although a data mining tool can find the patterns, it still takes a human being to interpret them.
The good news is that once you understand these problems, they are relatively easy to avoid. Gordon and I had to learn this the hard way. Our hope is that others can now learn from our mistakes.