|
Section 5 Lesson material |
|
(what you should know what to do ) after completing this section you should be able to:
|
Measures of central tendency give us measures of where the middle of a set of data occurs, but this is not enough to characterize a set of data. Consider the following 2 data sets:
|
50 |
60 |
70 |
80 |
90 |
And
|
69 |
69 |
70 |
71 |
71 |
Both these data sets have a mean of 70. Yet the first data set is more widely dispersed than the second data set. So a measure of variation is clearly needed.
Consider the following data - it represents the actual weight of a 20 oz steak at a restaurant. We will use this throughout this section
|
17 |
20 |
21 |
18 |
20 |
20 |
20 |
18 |
19 |
19 |
|
20 |
19 |
22 |
20 |
18 |
20 |
18 |
19 |
20 |
19 |
The range is the difference between the highest value and the lowest value in a dataset. To compute it simply subtract the lowest value from the highest value. In the example above the range is (22-17)=5
Range can be misleading since it does not take into consideration every value. Consider each of the following data sets:
|
1 |
10 |
10 |
10 |
10 |
And
|
1 |
2 |
5 |
8 |
10 |
Both have a range of 9, yet the first data set is clearly not as dispersed as the second.
Standard Deviation and variance
A more accurate measure of variation can be given by the standard deviation of the data.
The standard deviation of a set of sample scores is a measure of variation of scores about the mean. It is calculated by

Your calculator most likely has a button for standard deviation, but calculating at least one by hand gives you an idea of why this is such a valid measure for variation. Notice that each the distance from each data point to the mean is calculated (this is
in the formula) If we were to add these all up some would be positive and some would be negative and the sum would be zero. To avoid this not so useful number we square each difference (this is
in the formula) and add them all up (this gives
in the formula). We divide by n-1 to average these variations, then we take the square root to account for the squaring we did earlier. The procedure for finding the standard deviation is as follows:
Here is an example using the data given above (the steak example):
First calculate the mean of this data, it is 19.35
|
X |
|
|
|
17 |
-2.35 |
5.5225 |
|
20 |
0.65 |
0.4225 |
|
21 |
1.65 |
2.7225 |
|
18 |
-1.35 |
1.8225 |
|
20 |
0.65 |
0.4225 |
|
20 |
0.65 |
0.4225 |
|
20 |
0.65 |
0.4225 |
|
18 |
-1.35 |
1.8225 |
|
19 |
-0.35 |
0.1225 |
|
19 |
-0.35 |
0.1225 |
|
20 |
0.65 |
0.4225 |
|
19 |
-0.35 |
0.1225 |
|
20 |
0.65 |
0.4225 |
|
22 |
2.65 |
7.0225 |
|
18 |
-1.35 |
1.8225 |
|
20 |
0.65 |
0.1225 |
|
18 |
-1.35 |
1.8225 |
|
19 |
-0.35 |
0.1225 |
|
20 |
0.65 |
0.4225 |
|
19 |
-0.35 |
0.1225 |
The sum of the last column is 26.55, dividing this by n-1 = 19 yields 1.397368. Finally taking the square root gives 1.182
The sample variance is the standard deviation squared. To calculate all you do all the steps for the standard deviation except taking the final square root. Here is the formula
![]()
There is an alternative form for calculating the standard deviation, you can read this if you want - but usually you will be using your technology (calculator) to calculate this value, so this formula is probably not useful.
What is useful however is a formula for estimating the standard deviation from a frequency table. The estimation formula used for this is

Where x is the class mark, f is class frequency and n is the sample size (total number of frequencies)
It looks more complicated than it is - here is an example using the ACT score data from section 1. Recall the frequency table
|
ACT SCORE |
Frequency |
Class Mark |
|
0-4 |
3 |
2 |
|
5-9 |
10 |
7 |
|
10-14 |
17 |
12 |
|
15-19 |
35 |
17 |
|
20-24 |
25 |
22 |
|
25-29 |
15 |
27 |
|
30-34 |
5 |
32 |
So here is the estimate:
|
ACT SCORE |
Frequency, f |
Class Mark, x |
|
|
|
0-4 |
3 |
2 |
6 |
12 |
|
5-9 |
10 |
7 |
70 |
490 |
|
10-14 |
17 |
12 |
204 |
2448 |
|
15-19 |
35 |
17 |
595 |
10115 |
|
20-24 |
25 |
22 |
550 |
12100 |
|
25-29 |
15 |
27 |
405 |
10935 |
|
30-34 |
5 |
32 |
160 |
5120 |
So
,
and
, hence the estimated standard deviation is:
![]()
More notation:
denotes the standard deviation of the set of sample data
denotes the standard deviation of the set of population data
denotes the variance of the set of sample data
denotes the variance of the set of population data
Interpretation of standard deviation:
There are some ideas you remember about standard deviation and variance
A small standard deviation means the data is close together, a large deviation means the data is wide spread
The range rule of thumb states that for typical data sets, the range of the data is about 4 standard deviations wide so the standard deviation is about the range divided by 4. This is a very rough estimate
The 68-95-99 rule states that about 68% of all scores fall within one standard deviation of the mean, 95% of all scores fall within about 2 standard deviations of the mean and 99.7% of all scores fall within 3 standard deviations from the mean. This only works for data that is approximately bell shaped. The picture below gives the idea
The above rule tells us that data more than 2 standard deviations from the mean is unusual. While data within 2 standard deviations is normal
Chebyshev's Theorem states that at least 75% of all scores fall within 2 standard deviations from the mean and at least 89% fall within at least 3 standard deviations from the mean. This works for ANY distribution (not just bell shaped)

Need more help with standard deviation and variance, see these web sites:
Exploring data
![]()
For help doing standard deviation on the TI-83 (from straight data and NOT a frequency table) click here
For help doing standard deviation on the TI-83 from a frequency table click
here