Monday, December 21, 2009

Want instant results? Concentrate on improving the worst performers.

"He who joyfully marches to music in rank and file has already earned my contempt. He has been given a large brain by mistake, since for him the spinal cord would suffice." -- Albert Einstein

Concentrating on the worst performers can be particularly fruitful in a well-managed organization, where performance is distributed in a close to random manner.

Mind you, we are not talking real improvements here. We are talking about putting yourself in a situation where there is nothing, but the upside.

Here is how you can spot the right type of the "worst" performers. They are usually smaller in size compared to an average performer. Whether it is smaller regions or smaller segments, groups of smaller sample size tend to have a higher variance, and thus, a higher probability to be an outlier, including the "worst" kind. You also need to make sure there are no systemic factors that drive poor performance, or otherwise simple random nature of the world may not be able to compensate for the systemic factor. That means you will have to actually do something to improve performance of your "worst" group. So, check your averages over several time periods to confirm that your worst performers had a particularly hard time when you pick them.

After you have validated the random (or even semi-random) nature of your "worst" performance group, you can crate a project to improve how it is doing. You would not expect your "worst" performer group to stay exactly the same every period, would you? If so, there are pretty good chances that next period you will "pull" a good number of the worst performers closer to the average, or out of the "worst" group. Congratulation, you just got yourself a "real" quantifiable result!

Remember, there is always a bottom 5% to work on!

Simplicity is the king

"If you can't explain it simply, you don't understand it well enough." -- Albert Einstein


Today I had a conversation about a very interesting churn model that we may try to build. The model will let us assess impact of different factors on churn, one of those factors is price, or to be precise, pricing changes. When the conversation ventured to the problem at hand, which is to quantify the impact of the most recent price change, I had to explain that I do not want to put this price change into the model. This is anathema to someone with an interest in econometrics, however, there is not as much driven by the scientific truth as by communication, i.e. being able to explain your results. Though adding the most recent data will improve the model, it is unlikely to help with understanding of the issue at hand by those with little knowledge of regression.Having a known coefficient is good, but it is hard to explain what this coefficient means to layperson. Even if you express it in the form of elasticity, let's say, your churn goes up by 1.5% per every percent of a price increase, it does not quite mean anything to most executives. The alternative approach we agreed upon was to build the model on the data before the price increase, and then determine the churn baseline for every segment we are tracking. Then, we can compare post price change churn to that baseline to show the difference. For example, you had a 2% rate increase for this group of customers, and their churn was 6% compared to 3% we would have expected with no price increase. That is something people can understand.

Another example of simplification to aid communication is the correlation analysis I have done a few years ago. For every variable X correlated to my output Y (sales), I would create a bar chart of Y by grouping subjects with "low X", "medium X" and "high X". This spoke better than any scatterplots or correlation numbers. The only difference is correlation in time between two variables - when shown on a nice chart and visibly correlated they make the best case for making executives feel smart.

Friday, December 11, 2009

When comparing, make sure the groups are representative

Sometimes people call them "matched", which is a layman's term for representative. So, why do they have to be matched? Because having non-representative groups may be so misleading, your analysis result may be the opposite of what they should be.

Here is a quick example. Let's say you have two groups of customers, each one of them consists of customers of two types/segments. Sometimes you may not even be aware that there two types of customers in your groups. Let's assume those segments exhibit different behavior. The sample behavior I chose was Churn, but it may be anything. Let's say we applied some sort of treatment to Group #2, and their churn went down by 1% in both segments. We are trying to use Group #1 to establish a baseline (or, what would have happened to the best of our knowledge) to Group #2 if we had not had the treatment. However, because the composition of groups is not representative of each other, we get exactly opposite result for the total - Group #2 appears to have a higher, not lower churn. See table below.















Group 1
Group 2
Difference







Segment #1






1,000
5,000








Seg #1 Churn






5.0%
4.0%
-1.0%







Segment #2






5,000
1,000









Seg #2 Churn






2.0%
1.0%
-1.0%







Total






6,000
6,000









Total Churn






2.5%
3.5%
+1.0%


Wednesday, December 9, 2009

It's easy to calculate a number, it is much harder to tell what it means

We all know, that's how it usually starts - we need to know X. So, let's say X=25%. Is it bad? Is it good? Too high? Too low? In my practice, when I bring the number within context with all the other numbers that shed light on what it means... people think it is too much, that they do not need "all of that". That just need one number. So, when they get the number, they start asking questions about "all of that". The circle is now complete. If anyone knows a way out of this conundrum, please let me know.

Tuesday, December 8, 2009

My take on the classic

"Every truth passes through three stages before it is recognized. In the first it is ridiculed, in the second it is opposed, in the third it is regarded as self-evident" - Arthur Schopenhauer

Here is my version: "Every truth passes through three stages before it is recognized. In the first it is ignored, in the second it becomes a fad, in the third it is forgotten".

Every business is seasonal

At least, I have yet to see one that is not. Out of all basic analytical concepts seasonality is the one that I find underestimated the most. On more than a few occasions I have heard that "our business is not that seasonal" while in reality, seasonal swings may explain up to 80% of sales deviation. Always check for seasonality.

Monday, October 26, 2009

Scientific Principles of Direct Marketing Optimization

Here is the list of my principles so far:
  1. Always use a control group. Preferably randomized, representative (of your treatment) control group takes care of other things going on in the marketplace, including your own campaigns, and also acts as the “great equalizer”, fixing even the worst metrics, response models.
  2. Maximize lift, not response. Lift is the difference between treated and control groups. That’s what you are trying to impact.
  3. Optimal frequency is often more powerful/important than optimal segmentation. Ideally, you want to optimize (i.e. maximize lift) frequency by segment, but if you are confused where to start testing, you should start with frequency. None of the segmentation work will be insightful unless your frequency is within the shooting range of the optimal.
  4. Test, test, test. It’s one of the easiest and simplest ways to learn.
  5. When testing, have a hypothesis, then design the test around it. Sending a customized piece to a segment is a great idea, until you realize that you did not send your regular piece to the same audience at the same time, and thus can’t tell whether customized piece would have done better than a regular one.
  6. Track your treated group against control group for a while to understand how long the impact of your mailing lasts. Some people want to use LTV. That’s because they want a higher ROI. True measurable difference traceable to the impact of a direct mail piece rarely lasts more than a few months, even though average lifetime maybe measured in years.
  7. When choosing the size of the control group, you first need to understand what kind of a difference will justify the effort (i.e. break even lift), and then determine a sample size that will make this difference statistically significant. If you’re measuring with a yardstick, it’s hard to determine a half-inch of a difference.

Thursday, October 22, 2009

Marketing analytics case study - Direct Mail list cleanup

"I think and think for months and years. Ninety-nine times, the conclusion is false. The hundredth time I am right." -- Albert Einstein

Just in time for my posts on measurement against a control group, I got a perfect real-life case at work. The situation is pretty typical for many people who run large direct mail lists out of a corporate system. The system has addresses of your current customers as well as prospects, and after you apply your targeting criteria, you can use a random selection procedure to identify your control, and make a record of both mail and control addresses. In the last step, the system produces your mail list to be sent to the mail house. For the measurement, customer purchases are tracked back to the addresses that were recorded in the mail and control groups, and the count and revenue of mail group and control groups are compared to determine incremental purchases and revenue.

The mail house does all sorts of address hygiene and cleaning, like removing duplicate addresses, taking out vacancies, running the addresses against known address database by USPS, which both cleans out non-compliant and nonexistent addresses. While current customer lists usually yield a very high percentage of mailable addresses, prospect lists lose around 20%-25% of the addresses in the hygiene process. This presents an issue for tracking, because we are tracking the purchases back to the lists that do not accurately reflect the addresses that were actually mailed. To improve measurement of the direct mail performance, the IT system proposes a solution that can take the post clean-up mail list (to be received from the mailhouse), and use it to clean up the original mail group list.

Will this solution improve quality of measurement? What are the advantages and shortcomings of this solution?

(I will pulish my opinion as a comment to the post)

Friday, October 16, 2009

Maximizing response often leads to poor campaign performance

I did some research for presentation at work today, and found this very nice white paper on the use of lift (as they call it, "uplift") modeling in driving true incremental sales. It correctly highlights the difference between correlation and causation, as response models simply correlate to propensity to buy, and incremental lift models track impact of marketing communication. Then it goes to explain the impact of segmentation on response and incremental sales, and I just loved those two charts showing how response is correlated to lift. Negatively, in their particular case. The higher the response, the lower the incremental sales. Actually, that's the conclusion that I often found in my own job. I am not saying it always happens, but it does happen often when we maximize response and focus solely on those who are likely to buy from us anyway. You should also watch the opposite end of the curve, where you have prospects so unlikely to purchase, that even though you do get high incrementality, it may still not be enough to pay for the program. Thus, your most profitable targets are usually in the sweet spot somewhere in the middle of the response curve. This is kind of article every direct marketer needs to read.
Generating Incremental Sales

Thursday, October 15, 2009

Who wants to be positive all the time? Being skeptical is a lot more fun.

"I never think delusion is OK" -- Barbara Ehrenreich

Jon Stewart had Barbara Ehrenreich on his program yesterday, and she wrote a book about the annoying and unhelpful side of unsubstantiated optimism. Now, I am not your bouncy cheerleader type, but I do find a certain comfort in realism, so I am with Barbara here. I think, everyone shares those feelings, at least, partially. Don't tell me you have never explored the depth of pessimist about the current state of your project/department/employer/economy with your co-workers, and did not feel the power of a bitching session brotherhood! Being critical/skeptical/realistic may not be as pumped-up optimistic as some want all things workforce to be, but it is still fun! It may even help you with the feeling of elation when the real achievements happen.

The Daily Show With Jon StewartMon - Thurs 11p / 10c
Barbara Ehrenreich
http://www.thedailyshow.com/
Daily Show
Full Episodes
Political HumorRon Paul Interview

Tuesday, October 13, 2009

Correlation vs causation

I found this great video, where a doctor explains difference between correlation and causation, and it is probably the best I have seen.

If you want to understand how to determine causality in marketing, one of the simplest and best ways is to use a control group whenever possible.



P.S. Get a flu shot this year.

Actually, there is no exception.

After I published my previous post on the use of control groups, I did a google search on the use of control groups, and came up with the following quote:
The cardinal rule of direct marketing is to include a control group. Without it, you will never know whether customers purchased your product because of this marketing effort, or because of the billboard ad, the radio spot, a friend’s suggestion, an in-store brochure, or because Elvis told them to. There is one exception. If you have an air-tight fulfillment set up, whereby the customer can only purchase the product through your channel, e.g. a special 800#, then you don’t need to hold out a sample; you can be certain that every sale came from your effort (except the referrals from Elvis).
The conclusion is incorrect. I have seen more campaigns than I care to count on all of my fingers that got a ton of calls to the 1-800 number, but no incremental sales whatsoever. Zero. Zilch. Ноль.

Sales that come to the 1-800 number is something that I like to call "associated sales", i.e. sales that came from the target audience and somehow "associated" with the communication, but they are is not the same as incremental sales. Your mail piece may be very good an convincing potential customers who would make a purchase anyway to call a particular phone number, but that's no proof they would not have called if not for the DM piece. They would have called alright (that's what the control group is for!), just called a different number.

I don't like the 1-800 phone number salesmen much. Oh, well, comes with the territory (and I do believe in Elvis referrals).

Monday, October 12, 2009

That's what control groups are for

Gary Loveman famously said that there three ways to get fired from Harrah's: steal from the company, harass women, and not use control groups in testing. He is correct, particularly, in cases when we are trying to properly measure the effectiveness of marketing communications for a fairly popular consumer product.

What is a control group? In most cases, control group (sometimes called a hold out group) is a group that does not receive the communication we measure. It is used to assess the effectiveness of this particular piece of communication. By effectiveness I mean the impact of the communication on the test group that is not observed in the control group. Therefore, it is important to have a control group that behaves in exactly the same manner as the test group, a condition that is generally called being representative.

It is not always possible to pick out a perfectly representative control group, but we should always do our best to try. Often, control group is one of the few reliable and easy ways to truly assess incremental impact of your marketing communications.

Here are a few practical implications of the use of the control group that are worth mentioning.
  • Often, companies use multiple marketing communications to reach out to the customer, and then try to untangle how many sales were driven by each type of communication. When we send out a direct mail piece and put advertising on TV at the same time, it is hard to determine how many of the people on the direct mail list who purchased your product were truly driven by the mail.
    That's what the control group is for. Compared to other ways of measuring the impact (separate 1-800 numbers, sales funnel, and so on) measurement against control group looks at the true incremental sales from a marketing vehicle, as control group shows how many sales we get from everything else (TV, radio, web, spontaneous) except for the DM piece.
  • The concept of "what would have happened" is the cornerstone of any effectiveness measurement. It is relatively easy to determine "what happened" - how many additional products we sold, how much revenue we got, but it is not always easy to determine what would have happened have we decided to save some money and not have the communication. The "would have happened" estimate is usually... an estimate, which is a number with standard error (or degree of uncertainty) attached to it. This is why all of the estimates of the impact have to be statistically tested.

  • Sometimes the marketer is able to estimate the "associated" sales from a marketing vehicle pretty well - those who call 1-800 number, those who click on the online ad, and so on, and the use of control groups is deemed excessive.
    In this case, we assume that 100% of associated sales can be attributed to the marketing vehicle, and no other sales are being influenced by the marketing vehicle. It may be a good way to assess effectiveness in cases when the likelihood of a potential customer to call you spontaneously is low. For example, if you are a small consulting company sending out a brochure, chances are that the call on your number from a recipient of the brochure  was driven by the mail.
    However, if you are a consumer company with a high rate of unsolicited walk-ins/call-ins, the situation may be very different. If your DM piece yielded 2% call rate, and you expect a 1.5% spontaneous call rate for the same measurement period of time from the same target population, all of a sudden your ROI on marketing communication does not look as attractive.

  • Many marketers are confused by the use of control groups when they have multiple overlapping marketing campaigns. Some suggest that no clean measurement can be achieved unless we exclude the same group of targets/controls from all other campaigns.
    This is not true. Again, this is what control groups are for - to control for overlapping campaigns. As long as we exclude the control from the particular communication we are measuring, the results are valid.
    For example, if we have a marketing campaign consisting of 3 consecutive letters, we can employ a different random control group to measure effectiveness of each part of the campaign (by "different" I mean "separately selected", which probably mean that some of the control targets will overlap - again, no big deal). Suppressing the same group of customers from all 3 pieces will give you an estimate of the effectiveness of the 3 pieces together. Suppressing a control group only from the last mailing will give you an estimate of the effectiveness of this last piece (i.e. is a two-letter campaign less effective, and if it is, by how much).

  • Building on the previous argument, it is not necessary to create the unified control group if you have several DM campaigns in the market. For example, you have two campaigns to a similar/overlapping target with response windows that also overlap. Inevitably, you will have targets that received two pieces of communication, and purchased your product, but what campaign drove the results? In this case, the best way to measure is to have two random control groups, one for each particular campaign, so we can measure effectiveness of each campaign against its control group. The point of contention is usually that the targets in the control group for the second campaign received the mail piece of the first campaign. However, this does not muddy up the measurement of the second campaign because the groups are still representative of each other, as the same percentage of the test and control group received the first communication.
    If there is no difference between response rates the test and control group for the second mail piece, it is not because the control group received the first piece, it is simply because sending the second piece did not make any difference - exactly what we were trying to determine by having a control group. Having the same control group for both campaigns will not help you determine the effect of each campaign separately, but rather the impact of two campaign together.

  • What if selecting a representative control group is next to impossible? In this case, the marketer should try to employ all available methods to understand what would have happened if the marketing communication did not happen. One of the ways is to use an imperfect control group that is reasonably representative of the test group, and adjust for the differences based on customer attributes and behavior in the period before the test.

  • Sometimes the language of control groups appears in what I would call "A vs B" test. This type of test is used when two different marketing communications (A and B) are tested on representative [of each other] audiences. In some cases, one of the groups is called a control group. Personally, I don't have an issue with naming as long as the marketers understand that the test results are limited to the comparison only, i.e. they only give information about relative effectiveness of the methods A and B, and not absolute effectiveness. Absolute effectiveness needs to be measured against a real control, which does not receive the marketing communication we are measuring.

  • Precision of the estimate is another consideration of testing, and it is usually a function of the size of the groups, and how representative of each other they are. There are a lot of calculators out there that help one estimate the confidence interval of measurement depending on the sample size. Two very large near-perfectly representative groups (think of a mass mailing, randomly split into test and control groups) may give extraordinarily precision of measurement, in my practice, up to 0.1%. Further precision is usually limited by either size of the sample, or the sampling methodology, which is not always the purest of random. Though often we assume random splits, the machine generated pseudo-random distribution do have a threshold of "randomness", which can become noticeable in some high sample size measurement, usually, over 100K trackable targets in the smallest group. Another consideration for precision is related to the break-even point for marketing communications. For example, if your break-even lift is 1%, it would not be very practical to measure with a 5% precision
I have written more posts on the use of control groups on my personal web page: http://zyabkina.com/thoughts.html

Sunday, August 23, 2009

Knowledge is only power when you know what to do with it

"Strive not to be a success, but rather to be of value" -- Albert Einstein

Not that long ago I was pitched some predictive analytics project that was promising (according to the vendor) to solve many of my problems. Or, in their words, "just imagine all of the things you could do if you knew that this person is more likely to buy your services than that person". That's usually when the thinking stops and the dreaming begins. However, back on the ground, the question remains - given that knowledge, what exactly do you think you should do, and how do you know that this will be more efficient than what you are already doing?

The whole concept of "doing something" in marketing assumes that 1) your action will change behavior (analysts usually compare to a control group to assess if the behavior changed) and 2) the benefit from change is large enough to pay for the action you have taken. Often, both of those things are assumed to be true, and the action is taken. However, our curious analytical minds cannot take anything for granted, so let's just list some thoughts on why this type of knowledge may turn out a lot less useful when we start executing on it.
  • It is not clear that the behavior can be changed. For example, if you know that someone is likely to disconnect your services (stop purchasing your product, stop paying on the mortgage you are holding) because they lost their job and can no longer afford it, then there is really little you can do to make them keep the services - or, at least, do it in a way that is profitable to you.
  • It is not clear that the behavior needs to be changed. If someone is likely to buy your product, maybe they will - in quantities large enough that you can't improve with your communication. If someone shops at your store every two weeks, provinging a discount often only leads to giving away discount on the purchase that would have happened anyway.
  • It is not clear that you can change the behavior with your action. This is related to the previous point stating that there may not be enough change in the behavior for the action you have chosen. This means that you may have to test a variety of actions, which often makes the knowledge you have obtained from the predictive research less and less relevalnt.
  • The research does not give you much clue on who you want to target. That's when your vendor is going to throw a fit - of course it does, that's the whole purpose of the exercise! Hold your horses, though. So, if one person is more likely to buy your product, why does it mean that this person is more likely to change their behavior and buy your product after receiving a direct mail piece than the other person? From my experience in measurement based on recency, those who have bought the product most recently, are much more likely to buy again, in fact, so likely, that it makes little sense to send them a coupon. It is the other group - people who have bought before, but have not bought in a while that show most change in behavior when sent a coupon. The change is measured as a lift over control group, not as overal response rate. What that says is that if you know that person A is more likely to buy than person B, person B may turn out to be a more profitable target. Combined with the previous point - after all that money spent on predictive research, you still don't know what action to take, and whom to target with it!
  • The research does not give you any information on whether your action is a cost effective way to act on the knowledge. We are again measuring our action against status quo, and looking for a lift in revenue that makes our efforts worthy.
  • It is not clear that you should change your approach to the market. Assuming you have done some prior in-market testing and measurement, and figured out how to segment your market in a way that appears to makes sense based on responses to your communication, it is not clear at all that the additional predictive knowledge is going to help you optimize your marketing. It is nice to know that someone is likely to purchase or cancel your services, however, it does not necessarily ensure a change in strategy. If you know that a certain group of targets needs to be mailed every X month for optimal response, you don't really care that it is because they are more likely to purchase the product - or because they are less likely to. All that matters is that your mailing has been tested and optimized for efficiency.
  • It is possible that your optimal go to market strategy is independent of the segment. That actually happens to good strategies - they let the customers to reasonably self-select, and they offer solutions optimized to serve certain needs. Customizing them on an individual basis is not going to move the needle much, but efficiencies will be lost.
The conclusion is that although the knowledge is a good thing, it is not always possible to use knowledge to improve business results. Correct testing, measurment and execution are generally more important for marketing optimization than having predictive knowledge.

Saturday, August 15, 2009

How to create a great analytical report

"Joy in looking and comprehending is nature's most beautiful gift" -- Albert Einstein

How many times have you been thinking/told that "we need to understand A", and automatically assumed than this means "we need a report on A"? After this conclusion, things usually get into gear - people get together to come up with metrics, they task analytical (code word for "data puller") or IT people to come up with the number, and eventually put it in a nice regular email report. Sometimes things work out, but sometimes the report becomes just one of those efforts that are never used afterwards. In my particular place of work, not only the process spits out something ugly and often completely unuseful, it also takes a Herculean effort wasted on it. The end result is usually hailed a great success, and one of those days you get an email stating that the great new report you have been asking for (no , that was not me) has been created and now published, but we can't send it over email because it is more than 10Mb zipped, so go pull it yourself. Oh, by the way, this report does not include some of the data because we could not trace it properly in the set up we had created, so, it is pretty much not useful, anyway.

How do you avoid this situation from happening? Here are some of my thoughts.
  1. Find out what the true question is.The assumptions about what should be in the report most often come from two camps - the executives and the data pullers. Whenever executive asks for a report that is too specific, like "shows percentage of customers who's discount expires next month", beware. Chances are that what the executive think will answer the question is not what will answer it in the best way, and often, it will not answer the question at all. My personal take on that is to reframe the inquiry, and simply go back to the root - ask the executives about question they are trying to answer. Most times, you will find out that the answer lies in totally different cut of data that was initially assumed. The second camp of people that I have encountered, are data people, particularly, IT people in the organization I am currently working for (was not the case in my other jobs, so I have to make a caveat for them). For some reason, they tend to skip all of the initial exploration steps and jump to a conclusion about what kind of data will answer the question without much regard to the question itself. Sometimes it is not their fault, "they are just doing their jobs", but have yet to see a great report built upon their assumptions.
  2. Research. Can't stress it any more since this is the most common error made in the process, especially, if you are creating something new and not particularly well understood. Before you create a report, you must know what measures to put in it. Sometimes the measures are pretty simple, and sometimes they are not - there is a million ways to slice and dice the data, so if you determined to spend your company's time and money on creating the report, it will pay a hundred-fold to come up with the best slice/dicing technique. A good deep dive research project should have the following properties: 1) it should explain how things work - i.e. what impacts what in that particular area, and by how much; 2) it should compare several metrics for the same process and look at them from different angles - by store/franchise/DMA/division/region, by day/week/month/year, by product type, month of month/year over year, and so on; 3) ideally, it should have that best cut that you just lift from the presentation into the report and feel confident that tracking this measure(s) will tell you all you need to know about the issue going forward in the most efficient manner, meaning, with the least number of measures possible.
  3. Revisit your decision about regular report. Review your research project with peers and executives and decide if creating a structured, regularly updated report is truly needed. Even though it seems not plausible, but understanding of most business processes does not require a regular report. Remember, the conclusion that "we need a report" was just a conclusion, and may have been a wrong one. A deep dive research project may be able to answer most if not all questions, and thus the regular report is not needed. The decision may be to refresh certain parts of the deep dive at some later time, but not to turn it into a regular thing. There may be other reasons not to create a report, including: 1) the data quality is not good; 2) you have not been able to answer the question any better than existing reports; 3) the data is too stable or too volatile, and tracking it will not be insightful (will it do you any good to know that 20% of your orders come from type A customers every month, and the number never budges even a little?); 4) there is insight in the research, but not enough to warrant regular reporting. Remember, a weekly report is 52 times the work of an annual report, and a daily report is 5 to 7 times work of a weekly report. Too many regular reports are not going to promote learning but rather clutter the mailboxes, so be selective about reports that are produced on a regular basis.
  4. Listen to the data - it has the mind of its own. When you finally lift that chart that you want to recreate in a regular report, you have to evaluate it for appropriateness for regular reporting. First question is how hard it is to pull. Sometimes, the very best and meaningful cut of the data is 10 times harder to pull than a cut that is just a hair lower in quality. Multiply the difference in difficulty by the frequency of the report, and see if it makes sense to go for the harder to get number. Second consideration is the amount of maintenance the report will require. What can be pulled for a deep dive not always can be turned into a daily email without daily maintenance of the code. Any regular data pulls must be very stable, i.e. not impacted by customary changes in the information or operational systems. The other side of the maintenance consideration is how much you can automate the report, which should be tied to the frequency. Daily reports must be "set it and forget it" automated. Monthly report may have some pasting and copying in them.
  5. Put metrics in context. One of the biggest mistakes I see made over and over is forgetting about proper setting of the metric. Let's say your metric is total sales by region. Now, how do you know if those sales are good or bad - the regions may be different size and have different sales trends or inherent idiosyncrasies, so comparing them is not always meaningful. You may compare to the previous time period, but what if your business is seasonal? That's where we usually get to the year over year comparison. Now let's say your sales this month are 20% down compared to the sales a year ago (automakers in 2008-2009 are a good example), that must be bad - but not until you see that in the previous months they were 30% down. Proper trending is one of the most important settings in which you should show your metrics. The regular report should be able to tell you if the results are good or bad, otherwise, it is not very meaningful, and nothing can put the metrics in perspective like trending. This also impacts the execution of the report, because it is one thing to pull the data for one month, and quite another to pull it for 15 or 24 months. In an ideal situation, your timeframe and properties of the metric were well defined during the research stage.
  6. Now, finally, it is the time to build and execute the report. First, comes the format. I have always maintained that a pretty chart works magic - so, do use pretty charts. Second, think about the report delivery, distribution list, putting it out on the web-site and all that pleasant stuff. Your analytical part is now over, but polishing your splendid work and delivering it correctly makes a cherry on a sundae. You are working in marketing, after all. Good luck!

Monday, August 3, 2009

Pretty picture = bad methodology?

"Information is not knowledge" -- Albert Einstein

A couple of jobs ago I was working for a large, publicly traded retail company that spent a lot of money on being able to track transactions to particular customers, and got lots of insight from the analysis of their purchasing behavior. Most of the analysis was done very well, however, one particularly important piece of analytics that went through the executive levels like a firestorm as soon as it appeared in a presentation prepared by one very respected management consulting company. The piece of information was so popular, I have heard it reiterated numerous times at investor calls, so I have no conciseness spasms dragging it out to the blog, as this is no longer proprietary information.

Here is how it went: "Our customers who purchase from just one channel spend $X with us, while those who purchase from two channels spend $2X, and those who purchase from all three channels spend $3X with us". By "channels" the company meant physical stores, catalogue, and web-site. The conclusion always was that you want to get your customer to purchase from as many channels as possible, thus increasing her involvement with the brand. What got people to love this particular piece of data the most was the pretty chart - nice bar graph, growing from left to right, showing how the amounts of "spent" money spent nicely stack up, as we see increased "involvement".

I have stated that before and I am going to do it now - beware of the pretty numbers. The ones nicely confirming your hypothesis, and almost magically putting that conclusion into your mind. The numbers may not necessarily be misleading, but as analysts we need to understand the nature of the beast first.

Going back to the beast... namely, the database. I had been working with that database for a while, dutifully slicing and dicing the data, and at some point I got acquainted with the beast enough to project the kind of outcome we get from every dice. For example, if you pick up customers who have had a transaction in Large Category B, their average dollars spent and even the number of transactions is likely to be more than average for the database. Same went for Large Category V, and even not so large Category N. It always looked pretty - we get customers to buy something, and boom they spend more money with us. But do they? Is it a self-fulfilling prophecy again?

Turns out, it was. Around 60% of customers in the database were single-purchase customers, and they probably bought 2 items on average, thus limiting their possible exposure to the product categories. Naturally, the 40% of customers who had multiple transactions were vastly more likely to buy items from any given category. Thus, the comparison was always between average customer and more than an average customer.

Now let's go back to our beautiful channel chart. My careful investigation of the analysis methodology revealed that there was no adjustment for the number of transactions the customers had made. Basically, those who have purchased from three channels must have had at least three transactions (average for the database at the time 1.5), and those who have purchased from two channels must have made at least two purchases, and the single channel purchasers were mostly comprised of the single-purchase customers who by definition could not have have made it to the other groupings - they just did not spend enough with the company! Now, it is my best knowledge that the comparison of those who have spent more with the company to those who have spent less always confirms that the first group spent more, and the second spent less. I believe that's an imprtant finding that we need to keep in mind whenever the numbers are a little too pretty.

Wednesday, July 29, 2009

There is nothing more misleading than facts with no context

I have read an excellent article the other day about the value of good analysis, and the pitfalls of the bad. It researches WSJ's claim that increasing taxes on the people in the highest income bracket in Maryland led to their flight to other places within a year. It did not take into account any national trends of wealth and income, or any other measures "the rich" may take to lower their tax base (munis anyone?), and simple attributed the 50% decline in returns filed in the corresponding bucket to the fictional flight. However, a simple national trend analysis showed that Maryland was sitting close to the middle in terms of dynamics of wealthy households.

P.S. I put quotes around "the rich" because I believe the whole the rich vs the poor battle is pretty much made up, especially, on taxation. We all heard Warren Buffet advocating higher taxes for "the rich", and $40K-a-year Joe the Plumber being outraged by out of control taxes Democrats were allegedly ready to put into the law. My take, saying things like that makes Joes feel like they are one of the rich. Much like buying a Louis Vuitton bag with credit - not a particularly rational economic behavior, but what the heck, there was a time I was working for a company that successfully translated this behavior into a nice chunk of change, and studying it made a fascinating subject of research. But I digress.

So, do you think our CSRs are hot?

"If we knew what it was we were doing, it would not be called research, would it?" -- Albert Einstein

Let's talk about customer satisfaction research, and in particular, drivers of customer satisfaction. This is probably one of the most sacred grounds of satisfaction analysis, and every company that offers customer sat research is usually pitching some sort of proprietary procedure or knowledge on how to get those key drivers out of the survey data. The same story refers to NPS, loyalty, or pretty much any number of more expensive measures of the same thing.

Usually, these procedures are based on some sort of correlation between overall satisfaction (NPS, loyalty) and the drivers. Some use fancier math, some use simple math, but the idea is pretty much the same. To prove that the idea is working well, your MR vendor will create a bunch of pretty charts, show you statistically significant p scores, and what not. Now, every time I see a chart that is a bit too perfect, I get a nagging feeling of suspicion - is it really happening, or we are dealing with a self fulfilling prophecy again?

Fortunately, one day I got my answer. As I mentioned before, at some point in my career I was in charge a customer sat survey, and it had one of those drivers that makes you sigh - satisfaction with store hours. So, whenever I ran the correlation between the overall sat and the drivers, I would always see that nice positive correlation with the store hours. Must be one of those important factors, right? Well, turns out, the correlation held even for the flagship stores, which were open 24 hours. I don't know how one may be dissatisfied with a 24-hour store hours, but apparently, if you piss the customers off enough, they will be. They will also think your signage colors are hideous, parking spaces are too small, and CSRs are ugly. Obviously, it is not the store hours that drive overall satisfaction, but the other way around. If anything, those bogus questions are going to correlate with overall sat very well, as they really don't reflect anything else but the overall satisfaction.

Now, every time I answer a customer sat survey (yes, I take other companys' surveys - guilty as charged), I always laugh when faced with a million dimensions, half of which I have absolutely no opinion on, except... well, they are pretty good, so I guess I am "satisfied" with the advice and information they give me.

What's the conclusion? I guess, the conclusion is that in the context of customer sat, those drivers are not of much help. There are other ways to understand what's important to your customers, and by all means you should employ them in an intelligent manner. Should your satisfaction really grow if you change those signage colors? There is a sure way to find out - change them and see if satisfaction budges. If not, move on to another variable.

Thursday, July 16, 2009

Noteworthy recent HBR articles

"Any man who reads too much and uses his own brain too little falls into lazy habits of thinking." -- Albert Einstein

This is not new news, but there was a new article by Thomas Davenport (the author of "Competing on Analytics") in February 2009 issue of HBR called "How to Design Smart Business Experiments". I actually read it, and I liked it better than his iconic article that was eventually tuned into the book. The most recent article shows a very practical approach, and it is executive-proof. I made copies and distributed at work with an obvious goal to educate my co-workers about smart testing.

July-August 2009 issue of HBR features an interesting article by Dan Ariely "The End of Rational Economics", where he gives interesting examples of irrational economic behavior, and which I read. Not surprisingly, Ariely also has a book on the topic "Predictably Irrational", and a web site with a good amount of information and interviews. Obviously, I am going to recommend checking out the web-site first.

P.S. I am not a great reader (of books), so unless it is explicitly stated that I read something, you can assume that I skimmed the ideas in reviews and thought they were worthy of notice.

In defense of the big picture

"Confusion of goals and perfection of means seems, in my opinion, to characterize our age" -- Albert Einstein

Someone needs to defend the wisdom of looking at the big picture, so I am going to do just that. How many times do people want to look at the forest, but start looking at trees, then at the leaves of the trees, then at the veins on the leaves? The problem is that while leaves and veins may be fascinating, the forest may be shrinking while you are looking at them, maybe even due to logging. Well, hope, not that severe.

My analysis du jour was looking at a very clever and nicely sampled test that I had devised several months ago. The test has not survived the latest iteration of never ending organizational change, and had to be prematurely ended after a few months in the market. I decided to take a closer look at the results anyway - testing, but never analyzing/making conclusions is one of my pet peeves.

In the test, the target customer universe is randomly split into several groups, and each one of them is delivered a certain dose of our marketing poison (kidding, it's of course marketing manna). Basically, full dose, half-dose, and quarter dose. A few months later, I am looking at the results to understand what happened. What we really want to look at first, is whether consumer sales grew during that period of time - and by how much/how long, not the details of how it grew. That's because at the end of the day, if you are not growing your subscriber/product/sales base, and get more money out of it than you are putting in, nothing else matters. Obviously, the first question I get, is how the subscriber base grew - was it increase in connects, was it drop in disconnects, or a combination of those - because anyone in marketing automatically thinks that they only need to care about connects. Well, plainly speaking, that's wrong. Higher connects usually lead to higher disconnects as certain (and actually surprisingly high) percentage of customers are going to disconnect within the first month or two from the connect. Those disconnects are a direct result of the connects you are driving, and it would be incorrect to count all connects in. On the other side, if higher marketing dose results in lower churn, I will still take it - I really don't care why applying marketing reduces churn, what I care is being able to experimentally confirm that it does and by how much.

Now, I should admit that knowing a certain amount of detail may help you chisel some helpful insight, however, many times it is hard to nip the tendency to evaluate the end result of a program based on that detail. If the bottomline question about a program is whether it worked (aka paid for itself), then this conclusion should be drawn from the bottomline, most "big picture" number. In our particular case, after all the connects, disconnects, upgrades, downgrades, and all sorts of other moves, what difference we are left with, and for how long. The "what" comes first. For how long comes second, and let's not kid ourselves, that "how long" is usually not the lifetime value. LTV and it's [mis]use for campaign evaluation is a totally different topic, which I hope to write about pretty soon.

Tuesday, July 14, 2009

Score!

"Any intelligent fool can make things bigger, more complex, and more violent. It takes a touch of genius -- and a lot of courage -- to move in the opposite direction." -- Albert Einstein

We have all heard of NPS, the Net Promoter Score. It is supposed to be the holly grail of loyalty, and a great alternative to your regular old tired Satisfaction Score. Maybe.

I guess it is time to share my experience and call it as I have seen it.

Numero uno - the absolute NPS score. I worked in a couple of consumer industries in marketing analytics, retail and telecommunications. If you pulled any research done... ever, you will see that retail, generally, has high satisfaction scores, and telecom - not so much. In fact, retail often gets over 60% in top two (on 10-point) or top box (on 5-point scale) . Now, if you ever look at cross shopping patterns in retail, you will see how fickle the customers are. I worked for a retail company where around 90% of its best customers cross-shopped with competitors. Yet, it had an NPS of well over 50%. I have worked for a telecom company that had an NPS way down at the bottom of the scale. I should admit that it has not always treated customers perfectly, however, customers were surprisingly loyal to their services. So, yes, it's all relative.

Numero dos - NPS and other scores. As a part of my former job I was in charge of the company's satisfaction survey. I hated the things I had to do to maintain it, but I loved the results, especially the given sample size that was well into the hundreds of thousands. Obviously, at some point it came to the NPS measurement, and as a sucker for general understanding of the nature of things, I did test that NPS against the overall satisfaction (top box on 5-point scale). Does not matter how you cut it, by weeks, by months, by stores, by regions... the NPS was 97% correlated to the Top Box. I do not know how much of the groundbreaking insight was packed in the remaining 3% of the information (OK, it's more like 6% in terms of variable variation), but I highly doubt it is going to change my view of what's going on with the customers.

So, my conclusion, basically, is that NPS is the same as good ole Satisfaction Score, freshly repackaged, and obviously, more expensive.

P.S. Next time let's talk about "drivers" of satisfaction.

Customers who switched from [...], saved...

"We can't solve problems by using the same kind of thinking we used when we created them" -- Albert Einstein

We all saw them, the ads that promise you to save money on your car insurance. Geico does it, Allstate does it, 21st Century does it. We look at them and think that maybe the better deal is around the corner. OK, maybe we are not that gullible, so let's dig into the numbers.

The claim is that customers, who switched from [another insurance company] on average have saved $X, and those who switched from [the other insurance company] saved even more, $Y. Sounds like a good deal; sounds like everybody is saving. But is it really everybody? Those who switched by definition must have had a lower rate with the new company, otherwise, they... would not have switched. This is typical case of not looking at the total picture, but using a qualifying condition to isolate the part of picture that we will be looking at. In this particular case, it is caused by self-selection, since the customers self select to switch.

So, basically, if we have an insurance company A that for 90% of the insurance seeking population on charges more than insurance company B, but for 10% of the population charges less than A, on average, A will be a higher priced alternative. However, it still will be able to make a low price price against company B because, yes, it is correct, when people in the 10% group switch from B to A, they do indeed save money.

Now, this was kind of a silly case. We all know advertisers will tell everything and anything to get prospects interested. However, this type of self fulfilling prophecy is being used every day at the workplace to justify programs - and justify them with what appears to an untrained eye to be solid quantitative analysis. The most upsetting case of selection bias that I have seen was a program where customers "competed" for a prize from a company. Those who have increased their purchases most during a qualifying period of time were declared winners, and then their purchasing lift during that period of time was used to justify the program. Basically, it's like the race was judged based on speed, and then the winners were compared to everyone else and declared that they... had the highest speed. Obviously, the program has always "delivered".

Let's get it started!

"It's not that I'm so smart, it's just that I stay with problems longer." -- Albert Einstein.

I decided to start this blog so that I can write down and organize my thoughts on analytics in general and marketing analytics in particular. To take interesting problems and observations, and not barge right pass them, but stay with them longer, try to understand what they mean, what they are trying to tell us about the nature of things. I have worked in the area of marketing analytics for about eight years, and to be honest, I really like it. Maybe, I will even have a few visitors to kick the thoughts around and have some fun.

Use of quotes. It was my decision to draw on the bits of wisdom from other analytical people, but since I don't read much, I decided to pretty much stick with Einstein. Not in hopes that the glory of his great mind rubs on my blog and people would think I must be smart, but basically because he was so prolific, that one quick search turned out everything I needed. Plus, it looks like I pretty much agree with him on... everything. Kind of scary, actually.

Well, wish me good luck!