OdinSchool OdinSchool

Median

line-below

 

Data is what it is. You can't decide if it contains outliers or not. You have to work with it no matter what.

Given this, what are your alternatives to Mean?

First up, we have the median.

It's a lot easier to compute. You start with ordering the data in an ascending or a descending manner. Yes, either way works.

Once you have ordered the data, you just have to pick the middle value.

In case you have 9 data points that are ordered, you pick the 5th value as the median. The fifth value splits the data in half with four values on either side.

What if you have an even number of values, say 10. Simple, you pick the two values that are in the middle, the 5th and 6th, add them and divide the sum by 2.

Let's work with a couple of examples.

Back to our tennis players data.

Here it is for your convenience.

Novak Djokovic - 77

Daniil Medvedev - 83

Rafael Nadal - 85

Dominic Thiem - 82

Stefanos Tsitsipas - 85

Alexander Zverev - 86

Andrey Rublev - 85

Roger Federer - 85

Diego Sebastian Schwartzman - 65

Matteo Berrettini - 95

We have 10 values here and they are not ordered. Let's start by ordering them.

65 | 77 | 82 | 83 | 85 | 85 | 85 | 85 | 86 | 95

Given that we have an even number of values (10), we pick two values from the middle of the data. How do we know which are these two?

We first divide the count (10) by 2. That gives us 5 (call this n).

The values at n and n+1 positions are our two middle values - in this case, they are the 5th and 6th values.

Looking up our ordered data above, we find these to be 85 and 85. The median works out to be (85 + 85) / 2 = 85.

What if we bring Yamamotoyama in again?

The data set looks like this:

65 | 77 | 82 | 83 | 85 | 85 | 85 | 85 | 86 | 95 | 272

That's 11 values we have in the data set. To get the mean in case of an odd number (11) of values, we take the number of values n (11 in this case), add 1 to it and divide the total by 2.

Median (for a data set with an odd number of values) = value at position (n + 1) / 2 ... here n is the number of values in the data set.

In our example we have 11 values so we get our median at position 6 (you get this with (11+1)/2). The value at 6th position is 85.

Shocking, is it not?

Clearly Median has stood the test of Yamamotoyama!

But does it make it the default choice? Not really.

Imagine you are trying to make a statement about the average income of a country. It may not be realistic to get the entire data and sort it to calculate the median. In such a case, mean does a better job?