Variance
Variance is a measure of the dispersion of data points around the mean. This means we are measuring the distance between the mean and the data points. You can visualize it by looking at the image below.
Let's get to calculating the Variance. Here is the formula for that:
σ2 = ∑ (X – μ)2 / N
Here σ2 refers to the variance, X is a value in the data set, μ is the Mean, and N is the count of the values in the data set. ∑ refers to the sum of - all the data values, subtracted by the mean and squared.
You could ask why we square the values.
Earlier, we mentioned that we were measuring the distance in this case. Now, the funny part about distance is that it is always positive. Clearly, you always have data points in the data set that are smaller than the mean. When you subtract the mean from these, you will end up with negative values, which cancel out the values that result from the other data points that are larger than the mean.
Hard to imagine? Here's an example:
Data set: {1, 2, 3, 4, 5, 6, 7}
Mean: 4
Values that you get when you subtract the mean from each data point: {-3, -2, -1, 0, 1, 2, 3}
Sum of these values - ZERO!
Does this mean there is no variance in the data? No! You get the point.
To prevent this from happening, these values are squared before being summed.
Squares of values from the example: { (-3)^2, ...} when calculated {9, 4, 1, 0, 1, 4, 9}.
The squaring up also amplifies the variance leading to an increase in variance for data sets with values that are farther from the mean.
Let's finish the calculation for our example here. The sum of squares comes to 28, which when divided by N, the count of data points, that is 7, gives us a variance of 4 (28 / 7).
In conclusion, this is the variance for the population, but the calculation for a sample is quite similar. Although the variances for population and sample could vary (no pun intended) and we are not going into that here.
P.S: There is a common practice of using a denominator of n-1 (count of sample data points minus one) when calculating variance for a sample. This leads to a less biased result. We will not be going into this area in this book, but feel free to research on this if curious.