May 28, 2012

In statistics, statistical dispersion (also called statistical variability or variation) is variability or spread in a variable or a probability distribution. Common examples of measures of statistical dispersion are the variance, standard deviation and interquartile range.

A measure of statistical dispersion is a nonnegative real number that is zero if all the data are the same and increases as the data become more diverse (Wikipedia).

Dispersion Analysis is one of the most basics and powerful statistical analysis, allowing you to understand how homogeneous is a population or sampling you are analyzing. Why would that be important? Well, if we talk specially about digital analytics one of the biggest problems that that we love to use “average” (or even rates), average time on site, average pages per visit, bounce rate, etc. The problem with averages and rates is that you put in the same bag apples and oranges, driving you to very Inconsistent decisions. Disperse points can drive you to a wrong understanding of the reality, ergo, will make you make wrong decisions.

One way of analyzing dispersions is taking off the disperse points from the analysis. I mean, if most of your visits have an average pageviews per visit ranging from 8 to 15, and have some with just one pageview (maybe bounces if the can’t do any other measured action in that particular page) and you know those ones are non qualified visitors, you can just take them out of the analysis. It will allows you to analyze a more homogeneous set of data, effectively increasing the certainty of the decision making scenarios.

So, let’s do this in Google Analytics. In order to select a particular set of data we will use Advanced Segments.

1. Take the set of data you wanted to analyze. In this case, let’s take the above mentioned case, Average pages / visit.

2. Create a new segment that takes only the set of data you are looking for. In this simple case we will just select all the traffic but those that visited just one page.

3. Analyze the information with the new set of data. Take a look at the difference in the results with one and the other set of data.

Looks like a huge difference right? Is much more than that, you can avoid disperse points and having not a huge difference in the results but you have the certainty of using the correct information.