Friday, December 11, 2009

When comparing, make sure the groups are representative

Sometimes people call them "matched", which is a layman's term for representative. So, why do they have to be matched? Because having non-representative groups may be so misleading, your analysis result may be the opposite of what they should be.

Here is a quick example. Let's say you have two groups of customers, each one of them consists of customers of two types/segments. Sometimes you may not even be aware that there two types of customers in your groups. Let's assume those segments exhibit different behavior. The sample behavior I chose was Churn, but it may be anything. Let's say we applied some sort of treatment to Group #2, and their churn went down by 1% in both segments. We are trying to use Group #1 to establish a baseline (or, what would have happened to the best of our knowledge) to Group #2 if we had not had the treatment. However, because the composition of groups is not representative of each other, we get exactly opposite result for the total - Group #2 appears to have a higher, not lower churn. See table below.















Group 1
Group 2
Difference







Segment #1






1,000
5,000








Seg #1 Churn






5.0%
4.0%
-1.0%







Segment #2






5,000
1,000









Seg #2 Churn






2.0%
1.0%
-1.0%







Total






6,000
6,000









Total Churn






2.5%
3.5%
+1.0%


No comments: