Thursday, March 12, 2015

Continue on another platform

I am going to continue posting my thoughts on my own web-site which I have just redone. If you enjoyed reading this blog, check out new entries here:
http://zyabkina.com/

Wednesday, January 13, 2010

One well designed metric is better than multiple poorly designed metrics

Obvious, is not it? However, in practice we often go for quantity rather than quality. How about looking at this metric by region? What about product categories? Customer status? Average revenue? Those are all familiar ways to get to the answer by numbers rather than by insight. Instead of going on a slicing and dicing rampage, often it does pay to think twice about where you want to cut next, and cut very diligently. The is one of my zen habits of analytics - to answer the question with the least amount of data possible. Too many numbers on the report/screen translate into junk in the head.
Interesting take on the issue of choice and simplicity is described in the book Paradox of Choice by Barry Schwartz. Just adding to the eternal struggle of wanting more, but being unhappy when we get more. Simple is beautiful.

Wednesday, January 6, 2010

The most powerful analytical approach I have seen to date

This sounds a little pompous, but it is true. I have used this approach several times in different industries, and in different customer analytics settings, and every time it was huge success. Fortunately, the method is very simple (me like simple!), and can be replicated in a variety of situations.

1. Find a natural break of your total sales into units and dollars (aka rate and volume).
2. Pull the data by various groupings and trend over a few years. Find contribution of each part to the total and trend.
3. Take your volume variable, and repeat steps 1 and 2 again.

Here is an example. Total retail sales by week were broken down by the number of transactions and an average dollar amount per transaction and trended over five years. That of itself was quite a revelation. On the next level, we looked at the number of units per transaction and dollars per unit. Next, we looked at the breakdown of units and dollars per unit by category over time, and then split it into changes due to mix shift, or share of more expensive items that have longer life and lower sales volume, and inflation, or change in price of the same SKU. This was probably the most simple, perfect, insightful, and successful analytical project I lead.

Simple analytics is the best kind of analytics

"Make everything as simple as possible, but not simpler." -- Albert Einstein

Due to the nature of job, I get to have three measurement development and five measurement explanation conversations a week. Thus, it is topic near and dear to my heart. So dear, I sometimes want shoot myself rather than having another conversation about it. Here are my suggestions on how to build an easier road to measurement.

1. Concentrate on the bottom line. Did you make money or lose money? Did you improve overall sales or not? Did you improve overall churn? If you are not using control groups, concentrate on overall numbers. Granted it is much easier to show a stellar performance in one group of customers, but it is usually a false assessment due to self selection. If you have not moved the overall needle, you have not moved the needle. Overall revenue change is good. Overall revenue change year over year is even better. Revenue for a category or a certain selected sub-group of customers may be misleading.

2. Use control groups, if you can. Control groups are the great equalizer of metrics. It's the only panacea against a poorly designed measurement. Let's face it, shit happens, and metrics don't always work out the way we want them. Yet, if you measure it against a matched control group, even the worst metric will usually point you in the right direction.

3. Resist going into the weeds. Weeds require careful use of judgment, otherwise, they will turn into the hell of 100,000 numbers.You all know the drill, after the first look at the report, the executive asks, "and how about we look at those measures by cutting them by X... and Y, and of course, Z". Depending on how many categories are in XYZ, your total numbers in the report all of a sudden explode into a incomprehensible mess. So, this is where the analyst must go into the best diplomatic dance they can manage, saying that of course, we will look and see how that turns out, and surely report back. That's your chance to look at the weeds and see which ones are worth going into. If you see real insight that sheds light on the question at hand, the cut is worth reporting on. If it is just another iteration of the same numbers, simply cut into smaller pieces, then stay away.

It is my firm belief that a report should contain the least amount of data it needs to provide insight, and more data clog your understanding of the picture just as well as too much clothes clog up your clothes, so be ruthless with the stuff that does not add to the understanding. It may lead people on the wrong path, too.

Monday, December 21, 2009

Want instant results? Concentrate on improving the worst performers.

"He who joyfully marches to music in rank and file has already earned my contempt. He has been given a large brain by mistake, since for him the spinal cord would suffice." -- Albert Einstein

Concentrating on the worst performers can be particularly fruitful in a well-managed organization, where performance is distributed in a close to random manner.

Mind you, we are not talking real improvements here. We are talking about putting yourself in a situation where there is nothing, but the upside.

Here is how you can spot the right type of the "worst" performers. They are usually smaller in size compared to an average performer. Whether it is smaller regions or smaller segments, groups of smaller sample size tend to have a higher variance, and thus, a higher probability to be an outlier, including the "worst" kind. You also need to make sure there are no systemic factors that drive poor performance, or otherwise simple random nature of the world may not be able to compensate for the systemic factor. That means you will have to actually do something to improve performance of your "worst" group. So, check your averages over several time periods to confirm that your worst performers had a particularly hard time when you pick them.

After you have validated the random (or even semi-random) nature of your "worst" performance group, you can crate a project to improve how it is doing. You would not expect your "worst" performer group to stay exactly the same every period, would you? If so, there are pretty good chances that next period you will "pull" a good number of the worst performers closer to the average, or out of the "worst" group. Congratulation, you just got yourself a "real" quantifiable result!

Remember, there is always a bottom 5% to work on!

Simplicity is the king

"If you can't explain it simply, you don't understand it well enough." -- Albert Einstein


Today I had a conversation about a very interesting churn model that we may try to build. The model will let us assess impact of different factors on churn, one of those factors is price, or to be precise, pricing changes. When the conversation ventured to the problem at hand, which is to quantify the impact of the most recent price change, I had to explain that I do not want to put this price change into the model. This is anathema to someone with an interest in econometrics, however, there is not as much driven by the scientific truth as by communication, i.e. being able to explain your results. Though adding the most recent data will improve the model, it is unlikely to help with understanding of the issue at hand by those with little knowledge of regression.Having a known coefficient is good, but it is hard to explain what this coefficient means to layperson. Even if you express it in the form of elasticity, let's say, your churn goes up by 1.5% per every percent of a price increase, it does not quite mean anything to most executives. The alternative approach we agreed upon was to build the model on the data before the price increase, and then determine the churn baseline for every segment we are tracking. Then, we can compare post price change churn to that baseline to show the difference. For example, you had a 2% rate increase for this group of customers, and their churn was 6% compared to 3% we would have expected with no price increase. That is something people can understand.

Another example of simplification to aid communication is the correlation analysis I have done a few years ago. For every variable X correlated to my output Y (sales), I would create a bar chart of Y by grouping subjects with "low X", "medium X" and "high X". This spoke better than any scatterplots or correlation numbers. The only difference is correlation in time between two variables - when shown on a nice chart and visibly correlated they make the best case for making executives feel smart.

Friday, December 11, 2009

When comparing, make sure the groups are representative

Sometimes people call them "matched", which is a layman's term for representative. So, why do they have to be matched? Because having non-representative groups may be so misleading, your analysis result may be the opposite of what they should be.

Here is a quick example. Let's say you have two groups of customers, each one of them consists of customers of two types/segments. Sometimes you may not even be aware that there two types of customers in your groups. Let's assume those segments exhibit different behavior. The sample behavior I chose was Churn, but it may be anything. Let's say we applied some sort of treatment to Group #2, and their churn went down by 1% in both segments. We are trying to use Group #1 to establish a baseline (or, what would have happened to the best of our knowledge) to Group #2 if we had not had the treatment. However, because the composition of groups is not representative of each other, we get exactly opposite result for the total - Group #2 appears to have a higher, not lower churn. See table below.















Group 1
Group 2
Difference







Segment #1






1,000
5,000








Seg #1 Churn






5.0%
4.0%
-1.0%







Segment #2






5,000
1,000









Seg #2 Churn






2.0%
1.0%
-1.0%







Total






6,000
6,000









Total Churn






2.5%
3.5%
+1.0%