Monday, August 3, 2009

Pretty picture = bad methodology?

"Information is not knowledge" -- Albert Einstein

A couple of jobs ago I was working for a large, publicly traded retail company that spent a lot of money on being able to track transactions to particular customers, and got lots of insight from the analysis of their purchasing behavior. Most of the analysis was done very well, however, one particularly important piece of analytics that went through the executive levels like a firestorm as soon as it appeared in a presentation prepared by one very respected management consulting company. The piece of information was so popular, I have heard it reiterated numerous times at investor calls, so I have no conciseness spasms dragging it out to the blog, as this is no longer proprietary information.

Here is how it went: "Our customers who purchase from just one channel spend $X with us, while those who purchase from two channels spend $2X, and those who purchase from all three channels spend $3X with us". By "channels" the company meant physical stores, catalogue, and web-site. The conclusion always was that you want to get your customer to purchase from as many channels as possible, thus increasing her involvement with the brand. What got people to love this particular piece of data the most was the pretty chart - nice bar graph, growing from left to right, showing how the amounts of "spent" money spent nicely stack up, as we see increased "involvement".

I have stated that before and I am going to do it now - beware of the pretty numbers. The ones nicely confirming your hypothesis, and almost magically putting that conclusion into your mind. The numbers may not necessarily be misleading, but as analysts we need to understand the nature of the beast first.

Going back to the beast... namely, the database. I had been working with that database for a while, dutifully slicing and dicing the data, and at some point I got acquainted with the beast enough to project the kind of outcome we get from every dice. For example, if you pick up customers who have had a transaction in Large Category B, their average dollars spent and even the number of transactions is likely to be more than average for the database. Same went for Large Category V, and even not so large Category N. It always looked pretty - we get customers to buy something, and boom they spend more money with us. But do they? Is it a self-fulfilling prophecy again?

Turns out, it was. Around 60% of customers in the database were single-purchase customers, and they probably bought 2 items on average, thus limiting their possible exposure to the product categories. Naturally, the 40% of customers who had multiple transactions were vastly more likely to buy items from any given category. Thus, the comparison was always between average customer and more than an average customer.

Now let's go back to our beautiful channel chart. My careful investigation of the analysis methodology revealed that there was no adjustment for the number of transactions the customers had made. Basically, those who have purchased from three channels must have had at least three transactions (average for the database at the time 1.5), and those who have purchased from two channels must have made at least two purchases, and the single channel purchasers were mostly comprised of the single-purchase customers who by definition could not have have made it to the other groupings - they just did not spend enough with the company! Now, it is my best knowledge that the comparison of those who have spent more with the company to those who have spent less always confirms that the first group spent more, and the second spent less. I believe that's an imprtant finding that we need to keep in mind whenever the numbers are a little too pretty.

No comments: