They say a picture is worth 1000 words - well what holds for pictures also holds for data. If you present your data in a visual fashion you can get the message across in an interesting and effective manner . You can represent data in a variety of forms. In this section we look at many options:
A histogram is like a bar chart - it consists of a horizontal scale for values of the data being represented and a vertical scale for frequencies, and bars representing the frequency of each class of values. A relative frequency histogram will have the same shape and horizontal scale as the histogram - but the vertical scale will be marked with relative frequencies. Theoretically each bar should be marked with the lower class boundary at the left and upper class boundary at the right. THIS IS HOW I WANT YOU TO DO THEM! However most histograms in real life do not do this - they use the class marks or class limits instead. Here is are some examples of histograms on the web.
Sea Range Climitology - a traditional historgram from Dept of Navy
GRAIN CROPS BREEDING AND GENETICS RESEARCH, at McGill Univeristy Another traditional histogram
A circular histogram - very interesting!
To construct a histogram
Statisticians like histograms because you can see the distribution of the data easily. However histograms can be deceiving. If you make your class width too large you can distort the distribution and give a false impression. A nice web site that illustrates this is the histogram java applet developed by University of South Carolina. If you load this applet you can see that increasing the class width (stated as bin width in the applet) you can get a very different impression of data. Try setting the class width about 1/2 way above 1 - note the data looks evenly distributed, but if you set the class width close to 0, you get a clearer impression of the data
Want to read more about histograms, take this link to Exploring data
Tired of constructing histograms - try these Java based pages for interactive help in histogram generation
Histogram program - from UCLA
DOTPLOTS
A dotplot is similar to a histogram except that instead of a bar, each actual data point is represented by a dot. Like a histogram the dotplot gives us a feel for the distribution of the data. Here are some examples
From Exploring data - different ways to do dotplots
A variety of dotplots - kind of hard to read
To construct a dotplot, first make you make a frequency table, then choose a scale for the horizontal and vertical axes. Then plot the dots accordingly. Like histograms, dotplots can be completed easily using technology
A Stem-and-Leaf Plot is very useful. It can show the distribution of the data, yet not lose the actual data points. The Stem-and-Leaf plot is most easily explained using and example.
Consider the following data - which represents the daily high temperature for a city over a day span
|
78 |
76 |
82 |
75 |
85 |
82 |
78 |
74 |
83 |
90 |
|
70 |
76 |
85 |
92 |
87 |
67 |
65 |
68 |
73 |
74 |
|
83 |
88 |
86 |
85 |
92 |
90 |
82 |
75 |
69 |
80 |
|
85 |
77 |
86 |
85 |
90 |
85 |
80 |
70 |
65 |
60 |
We can see that the data ranges from about 60 to about 95. We sort the data from lowest to highest:
|
60 |
65 |
65 |
67 |
68 |
69 |
70 |
70 |
73 |
74 |
|
74 |
75 |
75 |
76 |
76 |
77 |
78 |
78 |
80 |
80 |
|
82 |
82 |
82 |
83 |
83 |
85 |
85 |
85 |
85 |
85 |
|
85 |
86 |
86 |
87 |
88 |
90 |
90 |
90 |
92 |
92 |
Now we can create the stem and leaf graph as follows:
|
Stem |
Leaves |
|
6 |
055789 |
|
7 |
003445566788 |
|
8 |
00222335555556678 |
|
9 |
00022 |
If you look at the page sideways you can see the distribution of the data. The same rule that says you should 5-20 classes of data in a histogram applies to a stem and leaf diagram. We could clearly expand the stem and leaf diagram to include more rows and could also be condensed to include fewer rows.
Here are some more examples and help with stem and leaf plots:
More explanation of stem and leaf
Hyperstat on line - give stem and leaf display - try it with the above data
WebStat - a wonderful stat software package that does it all including stem and leaf
A Pareto chart is a bar graph for qualitative data, with the bars arranged in order according to frequencies. Here are some links to examples:
An example of a Pareto chart from engineering
Pareto charts can include relative frequencies instead of frequencies (like a histogram)
You are probably familiar with pie charts. They are used to depict qualitative data in a way that makes them more understandable. Here are some examples of pie charts:
Java Pie Chart - an example and some neat software
More pie! Java based applet
To construct a pie chart you need relative frequencies of your items. Use this plus the fact that there are 360 degrees in a circle to construct the chart. Most modern pie charts are created using some form of technology that creates the proper proportions of the pie.
A scatter diagram is used to correlate two items of data. They match one item from one data set with another from another data set. For example we may want to plot the tar and nicotine values of some brands of cigarettes. Here is some sample data for brands of cigarettes:
|
Tar |
Nicotine |
|
16 |
1.2 |
|
16 |
1.2 |
|
16 |
1.0 |
|
9 |
0.8 |
|
1 |
0.1 |
|
8 |
0.8 |
|
10 |
0.8 |
|
16 |
1.0 |
|
14 |
1.0 |
|
13 |
1.0 |
|
13 |
1.1 |
|
15 |
1.2 |
|
9 |
0.7 |
|
5 |
0.5 |
|
18 |
1.4 |
Here is a scatter diagram representing this data:
For an interactive web based scatter diagram generator take this link
SUMMARY
Using graphs to illustrate your data can be very useful. We have investigated a number of graphs in this section:
Histograms, which can be used to illustrate the frequency or relative frequency of your data. Be careful with class width when using histograms. They can show the distribution of the data
Dotplots are like histograms except they show all the data points and not just the bars. They also show the distribution of the data
Stem and Leaf plots are useful to show the distribution of the data. They require data to be sorted and they make it easier to rank the data
Pareto charts and pie charts show the distribution of qualitative data
Scatter diagrams relate values from two data sets.