Stat 2005 homepage

Stat 2005 activity page

Stat 2005 bulletin board

Back to Chapter 2 page

CHAPTER 2

Section 5

Lesson material

Measures of variation

Range

Standard Deviation and Variance

Interpreting the Standard Deviation

Objectives (what you should know what to do ) after completing this section you should be able to:

  1. Know how to construct the range, standard deviation and variance for any data set
  2. Be able to interpret these results - especially standard deviation

Measures of Variation

Measures of central tendency give us measures of where the middle of a set of data occurs, but this is not enough to characterize a set of data. Consider the following 2 data sets:

50

60

70

80

90

And

69

69

70

71

71

Both these data sets have a mean of 70. Yet the first data set is more widely dispersed than the second data set. So a measure of variation is clearly needed.

Consider the following data - it represents the actual weight of a 20 oz steak at a restaurant. We will use this throughout this section

17

20

21

18

20

20

20

18

19

19

20

19

22

20

18

20

18

19

20

19

Range

The range is the difference between the highest value and the lowest value in a dataset. To compute it simply subtract the lowest value from the highest value. In the example above the range is (22-17)=5

Range can be misleading since it does not take into consideration every value. Consider each of the following data sets:

1

10

10

10

10

And

1

2

5

8

10

Both have a range of 9, yet the first data set is clearly not as dispersed as the second.

Standard Deviation and variance

A more accurate measure of variation can be given by the standard deviation of the data.

The standard deviation of a set of sample scores is a measure of variation of scores about the mean. It is calculated by

Your calculator most likely has a button for standard deviation, but calculating at least one by hand gives you an idea of why this is such a valid measure for variation. Notice that each the distance from each data point to the mean is calculated (this is in the formula) If we were to add these all up some would be positive and some would be negative and the sum would be zero. To avoid this not so useful number we square each difference (this is in the formula) and add them all up (this gives in the formula). We divide by n-1 to average these variations, then we take the square root to account for the squaring we did earlier. The procedure for finding the standard deviation is as follows:

  1. Find the mean of the scores
  2. Subtract the mean from each individual score
  3. Square each of the values in step 2
  4. Add up all the squares obtained in step 3
  5. Divide the total in step 4 by n-1
  6. Find the square root of step 5.

Here is an example using the data given above (the steak example):

First calculate the mean of this data, it is 19.35

X

17

-2.35

5.5225

20

0.65

0.4225

21

1.65

2.7225

18

-1.35

1.8225

20

0.65

0.4225

20

0.65

0.4225

20

0.65

0.4225

18

-1.35

1.8225

19

-0.35

0.1225

19

-0.35

0.1225

20

0.65

0.4225

19

-0.35

0.1225

20

0.65

0.4225

22

2.65

7.0225

18

-1.35

1.8225

20

0.65

0.1225

18

-1.35

1.8225

19

-0.35

0.1225

20

0.65

0.4225

19

-0.35

0.1225

The sum of the last column is 26.55, dividing this by n-1 = 19 yields 1.397368. Finally taking the square root gives 1.182

The sample variance is the standard deviation squared. To calculate all you do all the steps for the standard deviation except taking the final square root. Here is the formula

There is an alternative form for calculating the standard deviation, you can read this if you want - but usually you will be using your technology (calculator) to calculate this value, so this formula is probably not useful.

What is useful however is a formula for estimating the standard deviation from a frequency table. The estimation formula used for this is

Where x is the class mark, f is class frequency and n is the sample size (total number of frequencies)

It looks more complicated than it is - here is an example using the ACT score data from section 1. Recall the frequency table

ACT SCORE

Frequency

Class Mark

0-4

3

2

5-9

10

7

10-14

17

12

15-19

35

17

20-24

25

22

25-29

15

27

30-34

5

32

So here is the estimate:

ACT SCORE

Frequency, f

Class Mark, x

0-4

3

2

6

12

5-9

10

7

70

490

10-14

17

12

204

2448

15-19

35

17

595

10115

20-24

25

22

550

12100

25-29

15

27

405

10935

30-34

5

32

160

5120

So , and , hence the estimated standard deviation is:

More notation:

denotes the standard deviation of the set of sample data

denotes the standard deviation of the set of population data

denotes the variance of the set of sample data

denotes the variance of the set of population data

Interpretation of standard deviation:

There are some ideas you remember about standard deviation and variance

 A small standard deviation means the data is close together, a large deviation means the data is wide spread

 The range rule of thumb states that for typical data sets, the range of the data is about 4 standard deviations wide so the standard deviation is about the range divided by 4. This is a very rough estimate

 The 68-95-99 rule states that about 68% of all scores fall within one standard deviation of the mean, 95% of all scores fall within about 2 standard deviations of the mean and 99.7% of all scores fall within 3 standard deviations from the mean. This only works for data that is approximately bell shaped. The picture below gives the idea

 The above rule tells us that data more than 2 standard deviations from the mean is unusual. While data within 2 standard deviations is normal

 Chebyshev's Theorem states that at least 75% of all scores fall within 2 standard deviations from the mean and at least 89% fall within at least 3 standard deviations from the mean. This works for ANY distribution (not just bell shaped)

Need more help with standard deviation and variance, see these web sites:

 Exploring data

think no one cares about standard deviation - read this

For help doing standard deviation on the TI-83 (from straight data and NOT a frequency table) click here

For help doing standard deviation on the TI-83 from a frequency table click here

Back to the top