Mean, Median, and Mode in Statistics

Nhan Tran
4 min readMar 14, 2019

Every numerical data set has an average value that represents the weight of its array value. There are many different types of average! Today we will introduce you 3 of the most popular average values: Mean, Median and Mode.

Example of central tendency values in graph

Mean, Median and Mode are average values or central tendency of a numerical data set. Before going to deep dive into each term, let’s take a look on below example:

Observation of pizza prices in NY and LA

Mean

The first measure we will study is the mean also known as average. Mean can be calculated by adding all data points and dividing by the number of data points.

Mean formula

Apply to the N.Y. observation, the Mean of pizza prices in NY is:

Finding Mean of pizza prices in NY

Note: Mean is the most common measure of central tendency but it has a huge downside because it is easily effected by outliers which value is significantly greater than other values in the data set.

Median

Median is the middle value of a sorted data set; found by ordering all data points and picking out the one in the middle (or if there are two middle numbers, taking the mean of those two numbers). Let’s find Median of our data set.

Finding Median of pizza prices in NY and LA

As you can see, we have total 11 observations for NY so the middle position is at index of 6th which can be calculated as (11+1)/2=6. So the Median of pizza prices in NY is $6.00

What’s about LA? We have 10 observations in LA so the middle position is between 5th and 6th which can be calculated as (10+1)/2=5.5. So the Median of pizza prices in LA is $5.50

Note: Median is not affected by outliers ($66.00)

Mode

Mode is the most frequent number — that is, the number that occurs the highest number of times.

Finding Mode of pizza prices in NY and LA

For the data set of NY, you can see $3.00 appears twice and it has the most appearance. Then Mode of pizza prices in NY is $3.00

For the data set of LA, you can see no number appears twice (or more). Then we can say no mode of pizza prices in LA.

Generally there are 2 or 3 modes are quite frequently. You can pick one of them depends in the purpose of your work.

Which measure is the best?

There is no best, but using only one is definitely worst!

These measurement values of central tendency should be used together rather than independently. Depends on particular scenarios, some of measurement value is more meaningful than others, but use them together is better than individual.

Finding Mean, Median, and Mode in Microsoft Excel and Python

Excel is the most popular software and easy to use to work with data provided by Microsoft in their Office package. In Excel, there are 3 formulas to find Mean, Median, and Mode:

Mean, Median, and Mode formulas in Excel

Note: your_data_set is the range of your data set, should be 1 dimension array.

Python is more powerful and flexible than Excel. But it’s a programming language, so you need to install an IDE to compile your code. We highly recommend to use Spyder to do it.

First, you need to import statistics library

…after that, you can call statistics library using it’s short name as stats. You need to create an array that contain data set for NY and LA as per line 2nd and 9th. Then you can find Mean, Median, and Mode using statistics predefined functions:

Let’s check the Variable explorer windows in Spyder:

Variable explorer windows in Spyder

You can find out the Mean, Median, and Mode for NY data set is 11, 6, and 3 which exactly the same with values we got from manual calculation and Excel. It’s the same for Mean, and Median for LA data set except Mode for LA. Why? Let’s print out the value of mode_la

print(mode_la)

Because there is no mode for pizza prices in LA. As we mentioned before, all values of data set for LA does not appear twice or greater. So we can say “there is no Mode for LA data set”.

Conclusion:

  • Mean: the average value.
  • Median: the middle value of a sorted data set.
  • Mode: the most appearance value of data set.
  • There is no best measure (among Mean, Median, and Mode), but using only one is definitely worst!

--

--