Monday, October 12, 2009

That's what control groups are for

Gary Loveman famously said that there three ways to get fired from Harrah's: steal from the company, harass women, and not use control groups in testing. He is correct, particularly, in cases when we are trying to properly measure the effectiveness of marketing communications for a fairly popular consumer product.

What is a control group? In most cases, control group (sometimes called a hold out group) is a group that does not receive the communication we measure. It is used to assess the effectiveness of this particular piece of communication. By effectiveness I mean the impact of the communication on the test group that is not observed in the control group. Therefore, it is important to have a control group that behaves in exactly the same manner as the test group, a condition that is generally called being representative.

It is not always possible to pick out a perfectly representative control group, but we should always do our best to try. Often, control group is one of the few reliable and easy ways to truly assess incremental impact of your marketing communications.

Here are a few practical implications of the use of the control group that are worth mentioning.
  • Often, companies use multiple marketing communications to reach out to the customer, and then try to untangle how many sales were driven by each type of communication. When we send out a direct mail piece and put advertising on TV at the same time, it is hard to determine how many of the people on the direct mail list who purchased your product were truly driven by the mail.
    That's what the control group is for. Compared to other ways of measuring the impact (separate 1-800 numbers, sales funnel, and so on) measurement against control group looks at the true incremental sales from a marketing vehicle, as control group shows how many sales we get from everything else (TV, radio, web, spontaneous) except for the DM piece.
  • The concept of "what would have happened" is the cornerstone of any effectiveness measurement. It is relatively easy to determine "what happened" - how many additional products we sold, how much revenue we got, but it is not always easy to determine what would have happened have we decided to save some money and not have the communication. The "would have happened" estimate is usually... an estimate, which is a number with standard error (or degree of uncertainty) attached to it. This is why all of the estimates of the impact have to be statistically tested.

  • Sometimes the marketer is able to estimate the "associated" sales from a marketing vehicle pretty well - those who call 1-800 number, those who click on the online ad, and so on, and the use of control groups is deemed excessive.
    In this case, we assume that 100% of associated sales can be attributed to the marketing vehicle, and no other sales are being influenced by the marketing vehicle. It may be a good way to assess effectiveness in cases when the likelihood of a potential customer to call you spontaneously is low. For example, if you are a small consulting company sending out a brochure, chances are that the call on your number from a recipient of the brochure  was driven by the mail.
    However, if you are a consumer company with a high rate of unsolicited walk-ins/call-ins, the situation may be very different. If your DM piece yielded 2% call rate, and you expect a 1.5% spontaneous call rate for the same measurement period of time from the same target population, all of a sudden your ROI on marketing communication does not look as attractive.

  • Many marketers are confused by the use of control groups when they have multiple overlapping marketing campaigns. Some suggest that no clean measurement can be achieved unless we exclude the same group of targets/controls from all other campaigns.
    This is not true. Again, this is what control groups are for - to control for overlapping campaigns. As long as we exclude the control from the particular communication we are measuring, the results are valid.
    For example, if we have a marketing campaign consisting of 3 consecutive letters, we can employ a different random control group to measure effectiveness of each part of the campaign (by "different" I mean "separately selected", which probably mean that some of the control targets will overlap - again, no big deal). Suppressing the same group of customers from all 3 pieces will give you an estimate of the effectiveness of the 3 pieces together. Suppressing a control group only from the last mailing will give you an estimate of the effectiveness of this last piece (i.e. is a two-letter campaign less effective, and if it is, by how much).

  • Building on the previous argument, it is not necessary to create the unified control group if you have several DM campaigns in the market. For example, you have two campaigns to a similar/overlapping target with response windows that also overlap. Inevitably, you will have targets that received two pieces of communication, and purchased your product, but what campaign drove the results? In this case, the best way to measure is to have two random control groups, one for each particular campaign, so we can measure effectiveness of each campaign against its control group. The point of contention is usually that the targets in the control group for the second campaign received the mail piece of the first campaign. However, this does not muddy up the measurement of the second campaign because the groups are still representative of each other, as the same percentage of the test and control group received the first communication.
    If there is no difference between response rates the test and control group for the second mail piece, it is not because the control group received the first piece, it is simply because sending the second piece did not make any difference - exactly what we were trying to determine by having a control group. Having the same control group for both campaigns will not help you determine the effect of each campaign separately, but rather the impact of two campaign together.

  • What if selecting a representative control group is next to impossible? In this case, the marketer should try to employ all available methods to understand what would have happened if the marketing communication did not happen. One of the ways is to use an imperfect control group that is reasonably representative of the test group, and adjust for the differences based on customer attributes and behavior in the period before the test.

  • Sometimes the language of control groups appears in what I would call "A vs B" test. This type of test is used when two different marketing communications (A and B) are tested on representative [of each other] audiences. In some cases, one of the groups is called a control group. Personally, I don't have an issue with naming as long as the marketers understand that the test results are limited to the comparison only, i.e. they only give information about relative effectiveness of the methods A and B, and not absolute effectiveness. Absolute effectiveness needs to be measured against a real control, which does not receive the marketing communication we are measuring.

  • Precision of the estimate is another consideration of testing, and it is usually a function of the size of the groups, and how representative of each other they are. There are a lot of calculators out there that help one estimate the confidence interval of measurement depending on the sample size. Two very large near-perfectly representative groups (think of a mass mailing, randomly split into test and control groups) may give extraordinarily precision of measurement, in my practice, up to 0.1%. Further precision is usually limited by either size of the sample, or the sampling methodology, which is not always the purest of random. Though often we assume random splits, the machine generated pseudo-random distribution do have a threshold of "randomness", which can become noticeable in some high sample size measurement, usually, over 100K trackable targets in the smallest group. Another consideration for precision is related to the break-even point for marketing communications. For example, if your break-even lift is 1%, it would not be very practical to measure with a 5% precision
I have written more posts on the use of control groups on my personal web page: http://zyabkina.com/thoughts.html