Monday, October 26, 2009

Scientific Principles of Direct Marketing Optimization

Here is the list of my principles so far:
  1. Always use a control group. Preferably randomized, representative (of your treatment) control group takes care of other things going on in the marketplace, including your own campaigns, and also acts as the “great equalizer”, fixing even the worst metrics, response models.
  2. Maximize lift, not response. Lift is the difference between treated and control groups. That’s what you are trying to impact.
  3. Optimal frequency is often more powerful/important than optimal segmentation. Ideally, you want to optimize (i.e. maximize lift) frequency by segment, but if you are confused where to start testing, you should start with frequency. None of the segmentation work will be insightful unless your frequency is within the shooting range of the optimal.
  4. Test, test, test. It’s one of the easiest and simplest ways to learn.
  5. When testing, have a hypothesis, then design the test around it. Sending a customized piece to a segment is a great idea, until you realize that you did not send your regular piece to the same audience at the same time, and thus can’t tell whether customized piece would have done better than a regular one.
  6. Track your treated group against control group for a while to understand how long the impact of your mailing lasts. Some people want to use LTV. That’s because they want a higher ROI. True measurable difference traceable to the impact of a direct mail piece rarely lasts more than a few months, even though average lifetime maybe measured in years.
  7. When choosing the size of the control group, you first need to understand what kind of a difference will justify the effort (i.e. break even lift), and then determine a sample size that will make this difference statistically significant. If you’re measuring with a yardstick, it’s hard to determine a half-inch of a difference.

Thursday, October 22, 2009

Marketing analytics case study - Direct Mail list cleanup

"I think and think for months and years. Ninety-nine times, the conclusion is false. The hundredth time I am right." -- Albert Einstein

Just in time for my posts on measurement against a control group, I got a perfect real-life case at work. The situation is pretty typical for many people who run large direct mail lists out of a corporate system. The system has addresses of your current customers as well as prospects, and after you apply your targeting criteria, you can use a random selection procedure to identify your control, and make a record of both mail and control addresses. In the last step, the system produces your mail list to be sent to the mail house. For the measurement, customer purchases are tracked back to the addresses that were recorded in the mail and control groups, and the count and revenue of mail group and control groups are compared to determine incremental purchases and revenue.

The mail house does all sorts of address hygiene and cleaning, like removing duplicate addresses, taking out vacancies, running the addresses against known address database by USPS, which both cleans out non-compliant and nonexistent addresses. While current customer lists usually yield a very high percentage of mailable addresses, prospect lists lose around 20%-25% of the addresses in the hygiene process. This presents an issue for tracking, because we are tracking the purchases back to the lists that do not accurately reflect the addresses that were actually mailed. To improve measurement of the direct mail performance, the IT system proposes a solution that can take the post clean-up mail list (to be received from the mailhouse), and use it to clean up the original mail group list.

Will this solution improve quality of measurement? What are the advantages and shortcomings of this solution?

(I will pulish my opinion as a comment to the post)

Friday, October 16, 2009

Maximizing response often leads to poor campaign performance

I did some research for presentation at work today, and found this very nice white paper on the use of lift (as they call it, "uplift") modeling in driving true incremental sales. It correctly highlights the difference between correlation and causation, as response models simply correlate to propensity to buy, and incremental lift models track impact of marketing communication. Then it goes to explain the impact of segmentation on response and incremental sales, and I just loved those two charts showing how response is correlated to lift. Negatively, in their particular case. The higher the response, the lower the incremental sales. Actually, that's the conclusion that I often found in my own job. I am not saying it always happens, but it does happen often when we maximize response and focus solely on those who are likely to buy from us anyway. You should also watch the opposite end of the curve, where you have prospects so unlikely to purchase, that even though you do get high incrementality, it may still not be enough to pay for the program. Thus, your most profitable targets are usually in the sweet spot somewhere in the middle of the response curve. This is kind of article every direct marketer needs to read.
Generating Incremental Sales

Thursday, October 15, 2009

Who wants to be positive all the time? Being skeptical is a lot more fun.

"I never think delusion is OK" -- Barbara Ehrenreich

Jon Stewart had Barbara Ehrenreich on his program yesterday, and she wrote a book about the annoying and unhelpful side of unsubstantiated optimism. Now, I am not your bouncy cheerleader type, but I do find a certain comfort in realism, so I am with Barbara here. I think, everyone shares those feelings, at least, partially. Don't tell me you have never explored the depth of pessimist about the current state of your project/department/employer/economy with your co-workers, and did not feel the power of a bitching session brotherhood! Being critical/skeptical/realistic may not be as pumped-up optimistic as some want all things workforce to be, but it is still fun! It may even help you with the feeling of elation when the real achievements happen.

The Daily Show With Jon StewartMon - Thurs 11p / 10c
Barbara Ehrenreich
http://www.thedailyshow.com/
Daily Show
Full Episodes
Political HumorRon Paul Interview

Tuesday, October 13, 2009

Correlation vs causation

I found this great video, where a doctor explains difference between correlation and causation, and it is probably the best I have seen.

If you want to understand how to determine causality in marketing, one of the simplest and best ways is to use a control group whenever possible.



P.S. Get a flu shot this year.

Actually, there is no exception.

After I published my previous post on the use of control groups, I did a google search on the use of control groups, and came up with the following quote:
The cardinal rule of direct marketing is to include a control group. Without it, you will never know whether customers purchased your product because of this marketing effort, or because of the billboard ad, the radio spot, a friend’s suggestion, an in-store brochure, or because Elvis told them to. There is one exception. If you have an air-tight fulfillment set up, whereby the customer can only purchase the product through your channel, e.g. a special 800#, then you don’t need to hold out a sample; you can be certain that every sale came from your effort (except the referrals from Elvis).
The conclusion is incorrect. I have seen more campaigns than I care to count on all of my fingers that got a ton of calls to the 1-800 number, but no incremental sales whatsoever. Zero. Zilch. Ноль.

Sales that come to the 1-800 number is something that I like to call "associated sales", i.e. sales that came from the target audience and somehow "associated" with the communication, but they are is not the same as incremental sales. Your mail piece may be very good an convincing potential customers who would make a purchase anyway to call a particular phone number, but that's no proof they would not have called if not for the DM piece. They would have called alright (that's what the control group is for!), just called a different number.

I don't like the 1-800 phone number salesmen much. Oh, well, comes with the territory (and I do believe in Elvis referrals).

Monday, October 12, 2009

That's what control groups are for

Gary Loveman famously said that there three ways to get fired from Harrah's: steal from the company, harass women, and not use control groups in testing. He is correct, particularly, in cases when we are trying to properly measure the effectiveness of marketing communications for a fairly popular consumer product.

What is a control group? In most cases, control group (sometimes called a hold out group) is a group that does not receive the communication we measure. It is used to assess the effectiveness of this particular piece of communication. By effectiveness I mean the impact of the communication on the test group that is not observed in the control group. Therefore, it is important to have a control group that behaves in exactly the same manner as the test group, a condition that is generally called being representative.

It is not always possible to pick out a perfectly representative control group, but we should always do our best to try. Often, control group is one of the few reliable and easy ways to truly assess incremental impact of your marketing communications.

Here are a few practical implications of the use of the control group that are worth mentioning.
  • Often, companies use multiple marketing communications to reach out to the customer, and then try to untangle how many sales were driven by each type of communication. When we send out a direct mail piece and put advertising on TV at the same time, it is hard to determine how many of the people on the direct mail list who purchased your product were truly driven by the mail.
    That's what the control group is for. Compared to other ways of measuring the impact (separate 1-800 numbers, sales funnel, and so on) measurement against control group looks at the true incremental sales from a marketing vehicle, as control group shows how many sales we get from everything else (TV, radio, web, spontaneous) except for the DM piece.
  • The concept of "what would have happened" is the cornerstone of any effectiveness measurement. It is relatively easy to determine "what happened" - how many additional products we sold, how much revenue we got, but it is not always easy to determine what would have happened have we decided to save some money and not have the communication. The "would have happened" estimate is usually... an estimate, which is a number with standard error (or degree of uncertainty) attached to it. This is why all of the estimates of the impact have to be statistically tested.

  • Sometimes the marketer is able to estimate the "associated" sales from a marketing vehicle pretty well - those who call 1-800 number, those who click on the online ad, and so on, and the use of control groups is deemed excessive.
    In this case, we assume that 100% of associated sales can be attributed to the marketing vehicle, and no other sales are being influenced by the marketing vehicle. It may be a good way to assess effectiveness in cases when the likelihood of a potential customer to call you spontaneously is low. For example, if you are a small consulting company sending out a brochure, chances are that the call on your number from a recipient of the brochure  was driven by the mail.
    However, if you are a consumer company with a high rate of unsolicited walk-ins/call-ins, the situation may be very different. If your DM piece yielded 2% call rate, and you expect a 1.5% spontaneous call rate for the same measurement period of time from the same target population, all of a sudden your ROI on marketing communication does not look as attractive.

  • Many marketers are confused by the use of control groups when they have multiple overlapping marketing campaigns. Some suggest that no clean measurement can be achieved unless we exclude the same group of targets/controls from all other campaigns.
    This is not true. Again, this is what control groups are for - to control for overlapping campaigns. As long as we exclude the control from the particular communication we are measuring, the results are valid.
    For example, if we have a marketing campaign consisting of 3 consecutive letters, we can employ a different random control group to measure effectiveness of each part of the campaign (by "different" I mean "separately selected", which probably mean that some of the control targets will overlap - again, no big deal). Suppressing the same group of customers from all 3 pieces will give you an estimate of the effectiveness of the 3 pieces together. Suppressing a control group only from the last mailing will give you an estimate of the effectiveness of this last piece (i.e. is a two-letter campaign less effective, and if it is, by how much).

  • Building on the previous argument, it is not necessary to create the unified control group if you have several DM campaigns in the market. For example, you have two campaigns to a similar/overlapping target with response windows that also overlap. Inevitably, you will have targets that received two pieces of communication, and purchased your product, but what campaign drove the results? In this case, the best way to measure is to have two random control groups, one for each particular campaign, so we can measure effectiveness of each campaign against its control group. The point of contention is usually that the targets in the control group for the second campaign received the mail piece of the first campaign. However, this does not muddy up the measurement of the second campaign because the groups are still representative of each other, as the same percentage of the test and control group received the first communication.
    If there is no difference between response rates the test and control group for the second mail piece, it is not because the control group received the first piece, it is simply because sending the second piece did not make any difference - exactly what we were trying to determine by having a control group. Having the same control group for both campaigns will not help you determine the effect of each campaign separately, but rather the impact of two campaign together.

  • What if selecting a representative control group is next to impossible? In this case, the marketer should try to employ all available methods to understand what would have happened if the marketing communication did not happen. One of the ways is to use an imperfect control group that is reasonably representative of the test group, and adjust for the differences based on customer attributes and behavior in the period before the test.

  • Sometimes the language of control groups appears in what I would call "A vs B" test. This type of test is used when two different marketing communications (A and B) are tested on representative [of each other] audiences. In some cases, one of the groups is called a control group. Personally, I don't have an issue with naming as long as the marketers understand that the test results are limited to the comparison only, i.e. they only give information about relative effectiveness of the methods A and B, and not absolute effectiveness. Absolute effectiveness needs to be measured against a real control, which does not receive the marketing communication we are measuring.

  • Precision of the estimate is another consideration of testing, and it is usually a function of the size of the groups, and how representative of each other they are. There are a lot of calculators out there that help one estimate the confidence interval of measurement depending on the sample size. Two very large near-perfectly representative groups (think of a mass mailing, randomly split into test and control groups) may give extraordinarily precision of measurement, in my practice, up to 0.1%. Further precision is usually limited by either size of the sample, or the sampling methodology, which is not always the purest of random. Though often we assume random splits, the machine generated pseudo-random distribution do have a threshold of "randomness", which can become noticeable in some high sample size measurement, usually, over 100K trackable targets in the smallest group. Another consideration for precision is related to the break-even point for marketing communications. For example, if your break-even lift is 1%, it would not be very practical to measure with a 5% precision
I have written more posts on the use of control groups on my personal web page: http://zyabkina.com/thoughts.html