Percentiles and Quartiles

Johns accountancy firm is advertising for new staff to join the company and has set an entrance test to examine the ability of candidates to answer question on statistics. In a statement on the application form the company states that. All those candidates above the 80th percentile will be offered an interview. What does this mean?

The median is very special example of a percentile. It is placed exactly half way through a list of ordered data so that 50% of the data is smaller than the median. Positions other than the median can however also be useful.

The tenth percentile, for example, would lie such that 10% of the data was smaller than its value. The 75th percentile would lie such that 75% of the values are smaller than its value.

Quartiles

Two very important percentiles are the upper and lower quartiles. These lie 25% and 75% of the way through the data respectively.

Use the following rules to estimate the positions of each quartile within a set of ordered data:

$\displaystyle {{Q}_{1}}$ =lower quartile= value in person $\displaystyle \frac{1}{4}(n+1)$

$\displaystyle {{Q}_{2}}$ = median

$\displaystyle {{Q}_{3}}$ = upper quartile= value in position $\displaystyle \frac{3}{4}(n+1)$

If the position does not turn out to be a whole number, you simply find the mean of the pair of the numbers on either side. For example, if the position of the lower quartile turns out to be 5.25, then you find the mean of the 5th and 6th pair.

Interquartile range

As with the range, the interquartile range gives a measure of how spread out or consistent the data is. The main difference is that the interquartile range (IQR) avoids using extreme data by finding the difference between the lower and upper quartiles. You are, effectively, measuring the spread of the central 50% of the data.

$\displaystyle IQR={{Q}_{3}}-{{Q}_{1}}$

If one set of data has a smaller IQR than on other set, then the first set is more consistent and less spread out. This can be a useful comparison tool.

Example 1: For each of the following sets of data calculate the median, upper and lower quartiles. In each case calculate the interquartile range.

a) 13, 12, 8, 6, 11, 14, 8, 5, 1, 10, 16, 12

First sort the data into ascending order.

1, 5, 6, 8, 10, 11, 12, 13, 14, 16

There is an even number of items (12). So the median, you find the value of the middle pair, the first of which is in position $\displaystyle& \frac{{12}}{2}=6$. So the median is $\displaystyle \frac{{(10+11)}}{2}=10.5$

There are 12 items so, for quartiles, you calculate the positions $\displaystyle \frac{1}{4}(12+1)=3.25$ and $\displaystyle \frac{3}{4}(12+1)=9.75$

Notice that these are not whole numbers, so the lower quartile will be the mean of the 3rd and 4th values, and upper quartile will be the mean of the 9th and 10th values.

$\displaystyle {{Q}_{1}}=\frac{{(6+8)}}{2}=7$ and $\displaystyle {{Q}_{3}}=\frac{{(12+13)}}{2}=12.5$

Thus, the IQR=12.5-7=5.5

b) 14, 10, 8, 19, 15, 14, 9

The ordered data is:

8, 9, 10, 14, 14, 15, 19

The number of data is odd, so the median will be in position $\displaystyle \frac{{(7+1)}}{2}=4$. The median is 14.

There are seven items, so calculate $\displaystyle \frac{1}{4}(7+1)=2$ and $\displaystyle \frac{3}{4}(7+1)=6$

These are whole numbers so the lower quartile is in position two and the upper quartile is in position six.

So $\displaystyle {{Q}_{1}}=9$ and $\displaystyle {{Q}_{3}}=15$

$\displaystyle IQR=15-9=6$

Example 2: Two companies sell sunflower seeds. Over the period of a year, seeds from Allbright produce flowers with a median height of 98 cm IQR of 13 cm. In the same year seeds from Barstows produce flowers with a median height of 95 cm and IQR of 4 cm. Which seeds would you buy if you wanted to enter for growing the tallest sunflower and why?

I would buy Bar-stows seeds. Although Allbright sunflowers seem taller ( with a higher median) they are less consistent. So, whilst there is a chance of a very big sunflower there is also a good chance of a small sunflower. Bar-stows sunflowers are a bit shorter, but are more consistent in their heights so you are more likely to get flowers around the height of 95 cm.

Example 3: The back to back stem and leaf diagram shows the concentration of low density lipoprotein (bad) cholesterol in the blood (miligrams per 100 ml of blood (mg/dl) in 70 adults, half of whom are smokers and half of whom are non-smokers.

a) Determine the median for each group.

The data is already ordered and there are 35 values in each set. $\displaystyle \frac{1}{2}(35+1)=18$ so median is the 18th value.

Non smokers median=128

Smokers median=164

b) Find the range for:

i) non-smokers Range=173-90=83

ii) smokers 215-112=103

c) Determine the interquartile range for:

Determine the position of Q1 and Q3

The lower quartile $\displaystyle =\frac{1}{4}(35+1)=9th$ value

The upper quartile $\displaystyle =\frac{3}{4}(35+1)=27th$ value

i) non-smokers

IQR=Q3-Q1=142-116=26 for non-smokers

ii) smokers

IQR=Q3-Q1=180-145=35 for smokers

d) LDL levels of <130 are desirable, levels of 130-160 are considered borderline high and levels >160 are considered high risk (more so for people with medical conditions that increase risk. Using these figures, comment on what the distribution on the stem and leaf diagram suggests.

For non-smokers the data is skewed toward the lower levels on the stem and leaf diagram. More than half the values are in the desirable range, with only three in the high risk range. For smokers, the data is further spread out, only 3 values are in the risk range. For smokers, the data is further high and 20 are in the high risk category, suggesting that smokers have higher levels of bad cholesterol in general. However, without considering other risk factors or medical history, you cannot say this for certain from one set of data.

Tip! Remember to count the data in ascending order when you work with the left hand side. The lowest values are closest to the stem in each row.