Section 2 - Summarizing data with frequency tables
|
Objectives (what you should know what to do ) after completing these sections
|
SUMMARIZING DATA WITH FREQUENCY TABLES
Data is often collected in order to answer some question or give us some insight on information. For example we might look at college entrance exam scores to determine how well prepared students are for college. In all cases when looking at data we want to have a variety of tools to help us understand the data set. The material in this chapter gives us the tools to do this
When analyzing a data set we should first consider whether the data comes from a complete population (remember this means everything - for example test scores in this class) or a sample. Methods of descriptive statistics are used to summarize the important characteristics of a set of population data
One easy way to analyze data is the frequency table - A frequency table lists categories (called classes ) of scores along with the number of scores that fall into each category. Consider the example below which involves the ACT math score of incoming students at a small college
|
ACT SCORE |
Frequency |
|
0-4 |
3 |
|
5-9 |
10 |
|
10-14 |
17 |
|
15-19 |
35 |
|
20-24 |
25 |
|
25-29 |
15 |
|
30-34 |
5 |
This frequency table has 7 classes (0-4,5-9,10-14,15-19,20-24,25-29,30-34). The frequency represents the number of students receiving that score.
Here are some important terms for frequency tables
Lower class limits are the smallest numbers that can actually belong to a class (in table above these are 0,5,10, 15, 20, 25, 30)
Upper class limits are the largest numbers that can actually belong to a class (in table above these are 4,9,14,19,24,29,34)
Class boundaries are the numbers used to separate class limits, but without the gap created by class limits. To get class boundaries find the size of the gap between the upper class limit of one class and the lower class limit of the next class. Add half of this to the each upper class limit to find the upper class boundaries and subtract half of it from each lower class limit to find the lower class boundaries. For example in the table above, the size of the gap between lower and upper class limits is 1, so you add 1/2 to each upper class limit and subtract 1/2 from each lower class limit. Here is a chart of the class boundaries:
|
Class Boundaries |
|
-0.5 to 4.5 |
|
4.5 to 9.5 |
|
9.5 to 14.5 |
|
14.5 to 19.5 |
|
19.5 to 24.5 |
|
24.5 to 29.5 |
|
29.5 to 34.5 |
Class marks are the midpoints of the classes. To calculate these add the lower class limit and the upper class limit and divide this by two. (they are 2,7,12,17,22,27,32)
Class width is the difference between two consecutive lower class limits or two consecutive lower class boundaries (in example above it is 5)
HOW TO CREATE YOUR OWN FREQUENCY TABLE
Let's figure the method out with an example: Here is some data on distance to school for all kindergarten students at Sunnyflowery Elementary. The distances are given to closest 0.1 of a mile
|
Student ID |
Miles |
Student ID |
Miles |
Student ID |
Miles |
Student ID |
Miles |
|
1362 |
1.5 |
2877 |
1 |
4355 |
1.2 |
6573 |
0.4 |
|
1486 |
2.1 |
2964 |
0.5 |
4454 |
1.5 |
8436 |
2.8 |
|
1587 |
1.3 |
3491 |
0.8 |
4531 |
1.7 |
8592 |
0.3 |
|
1877 |
0.2 |
3588 |
0.3 |
5482 |
2.3 |
8854 |
0.1 |
|
1932 |
2.5 |
3711 |
1.5 |
5533 |
1.4 |
8964 |
2.2 |
|
1946 |
0.7 |
3780 |
0.2 |
5717 |
8.5 |
||
|
2103 |
3.5 |
3921 |
1.3 |
6307 |
6.2 |
The ID number data is interesting, but not particularly important. We want to create a frequency table for the miles from school the students must travel
To create a frequency table, we first need to decide how many classes we want. Normally we choose between 5-20 classes (people can't seem to handle more than 20 easily). Looking at my data I see that students travel from 0 to 8.5 miles to school each day. I'm going to choose 15 classes. We now need to determine the class width you can do this by taking the range of the data (highest value - lowest value) and dividing by the number of classes. For our example this is: (8.5-0.1)/15=0.56 Always round up (not off) this number. Round this to the same number of decimal places as the original data - in this case we get 0.6. Now choose as a lower limit of the fist class either the lowest value or a convenient value slightly less (not less than the lowest data point minus the value calculated above). Our lowest value is 0.1, so I'll choose 0. Now add the class width to the starting point to get the second lower class limit (in our case this will be 0.6) keep doing this for all classes. Here is what your chart should look like with just lower class limits:
|
0.0- |
|
0.6- |
|
1.2- |
|
1.8- |
|
2.4- |
|
3.0- |
|
3.6- |
|
4.2- |
|
4.8- |
|
5.4- |
|
6.0- |
|
6.6- |
|
7.2- |
|
7.8- |
|
8.4- |
Now that we have the lower class limits in a column, we easily identify the upper class limits:
|
0.0-0.5 |
|
0.6-1.1 |
|
1.2-1.7 |
|
1.8-2.3 |
|
2.4-2.9 |
|
3.0-3.5 |
|
3.6-4.1 |
|
4.2-4.7 |
|
4.8-5.3 |
|
5.4-5.9 |
|
6.0-6.5 |
|
6.6-7.1 |
|
7.2-7.7 |
|
7.8-8.3 |
|
8.4-8.9 |
Now count the number of students who fall into each class. Here is the finished frequency table
|
Miles |
Frequency |
|
0.0-0.5 |
7 |
|
0.6-1.1 |
3 |
|
1.2-1.7 |
8 |
|
1.8-2.3 |
3 |
|
2.4-2.9 |
2 |
|
3.0-3.5 |
1 |
|
3.6-4.1 |
0 |
|
4.2-4.7 |
0 |
|
4.8-5.3 |
0 |
|
5.4-5.9 |
0 |
|
6.0-6.5 |
1 |
|
6.6-7.1 |
0 |
|
7.2-7.7 |
0 |
|
7.8-8.3 |
0 |
|
8.4-8.9 |
1 |
A relative frequency table includes the percent of items in each class of the frequency table. For example in our kindergarten example above, we would calculate the percent of students in each class. To calculate the relative frequencies you use:
For example for the class: 0.0 - 0.5 in the table we have a count of 7. This means that there are 7 students that live within 0.5 miles from SunnyFlowery Elementary. There are a total of 26 students overall in the table (to get this number just add the frequency column). This gives a relative frequency of 0.269. Doing this for all the classes yields the following table:
|
Miles |
Relative Frequency |
|
0.0-0.5 |
0.269 |
|
0.6-1.1 |
0.115 |
|
1.2-1.7 |
0.308 |
|
1.8-2.3 |
0.115 |
|
2.4-2.9 |
0.076 |
|
3.0-3.5 |
0.038 |
|
3.6-4.1 |
0.00 |
|
4.2-4.7 |
0.00 |
|
4.8-5.3 |
0.00 |
|
5.4-5.9 |
0.00 |
|
6.0-6.5 |
0.038 |
|
6.6-7.1 |
0.00 |
|
7.2-7.7 |
0.00 |
|
7.8-8.3 |
0.00 |
|
8.4-8.9 |
0.038 |
The relative frequencies should add up to 1 (for 100%) or be very close (our chart is 0.997 due to roundoff error)
Sometimes we want to know the cumulative total instead of the total for individual classes. In this case we have a cumulative frequency table. The frequency column is replaced with a cumulative total. For example look at the second row of the table below -it contains the number from 0 to 0.5 (this is 7) and the number from 0.6 to 1.1 (this is 3) to give a total of 10. The next row contains these ten values plus the ones from 1.2 to 1.7 and so on. Note the labels in the first column (i.e. less than 1.8 means 0 to 1.7 - the less than notation is typical) Here is the cumulative frequency for our kindergarten data:
|
Miles |
Cumulative Frequency |
|
Less than 0.6 |
7 |
|
Less than 1.2 |
10 |
|
Less than 1.8 |
18 |
|
Less than 2.4 |
21 |
|
Less than 3.0 |
23 |
|
Less than 3.6 |
24 |
|
Less than 4.2 |
24 |
|
Less than 4.8 |
24 |
|
Less than 5.4 |
24 |
|
Less than 6.0 |
24 |
|
Less than 6.6 |
25 |
|
Less than 7.2 |
25 |
|
Less than 7.8 |
25 |
|
Less than 8.4 |
25 |
|
Less than 9.0 |
26 |
Of course the last number in the cumulative frequency column should be the total number of values in the data set
REMEMBER WHEN CONSTRUCTING TABLES
|
Here is another worked example:
Example: Heights of students
The heights (in inches) of 30 students are as follows:
|
66 |
68 |
64 |
70 |
67 |
67 |
68 |
64 |
65 |
68 |
|
64 |
70 |
72 |
71 |
69 |
64 |
63 |
70 |
71 |
63 |
|
68 |
67 |
67 |
65 |
69 |
65 |
67 |
66 |
69 |
61 |
Let's create a frequency table, a relative frequency table and cumulative frequency table:
The range of value runs from about 60 to about 75, so lets choose 5 classes.
Class width calculation: (72-61)/5=2.2, round UP to 3 , so the class limits on our table look like
|
Height range |
Number of Students |
Height range |
Number of students |
|
60-62 |
69-71 |
||
|
63-65 |
72-74 |
||
|
66-68 |
Now we count the number of data points in each class and finish the table
|
Height range |
Number of Students |
Height range |
Number of students |
|
60-62 |
1 |
69-71 |
8 |
|
63-65 |
9 |
72-74 |
1 |
|
66-68 |
11 |
For relative frequency we take each value in Number of Students column and divide by the total number of students (which is 30) to get relative frequency.
|
Height range |
Relative Frequency |
Height range |
Relative Frequency |
|
60-62 |
0.03 |
69-71 |
0.27 |
|
63-65 |
0.30 |
72-74 |
0.03 |
|
66-68 |
0.37 |
Finally to change to cumulative frequency, just keep a running total of the frequency data
|
Height range |
Number of Students |
Height range |
Number of students |
|
Less than 63 |
1 |
Less than 72 |
29 |
|
Less than 66 |
10 |
Less than 75 |
30 |
|
Less than 69 |
21 |