|
PROJECT #1 DATE DUE: 2/2/2000 Spring Semester 2000
|
Instructions:
Please complete the
following project and deliver to me (via e-mail, fax, mail or by hand by Wed.
Feb. 2.
As you recall chapter 2
is about describing, exploring and comparing data. In this project I will ask
you to look at some specific data sites and apply some of techniques you have
used in Chapter 2 to the data. This project will also show you how to read and
interpret data sets
In each part below you
will look at a data set, do some type of graph and/or descriptive statistics
and be asked to do some interpretation. Please write up your answers to include
all the stated information. HAVE FUN!!
Part I: In the Microprocessor Industry,
computer microchips are placed on extremely small boards called wafers. These
wafers are coated with a layer of silicon. Businesses that manufacturer
microprocessors would like to insure that wafers are coated as evenly as possible
with silicon. This is not an easy process! To check on their manufacturing
process, a company will take a random sample of the wafers, and select a random
number of sites on the wafer to test for thickness. The following Internet site
contains data for one such test (you may want to print this data)
http://lib.stat.cmu.edu/jasadata/hughes-l-d-g
The data is organized as
follows: each row contains the data for one wafer. The wafer label is in column
1 (for example A16), you are to ignore columns 2 and 3, columns 4 through 16 consist of data measuring the
thickness of the silicon on the wafer at 13 different spots. For the wafers
labelled A16, A20, B5, and B21 do the following
1.
Compute
the mean, variance, standard deviation, five-number summary, range and
interquartile range for the 13 thickness measures (These are in columns 4
through 16).
2.
On
a single graph, display boxplots for the four wafers appropriately labeled.
Correctly indicate any outliers in the boxplots.
3.
Write
a short report on the results of the quality test on silicon thickness done by
this company. Include in your report the raw data, the boxplots, the
descriptive statistics done in part 1 with an interpretation of what these
values mean in terms of each wafer. You boss wishes to select one of these
wafers for further development. Based on the criteria of even thickness of the
wafers, which one would you select from these four and why?
NOTE: YOU
ONLY HAVE TO TURN IN PART 3 - but part 3 should contain the information from
part 1 and 2!
Part II: Many of you probably watch baseball
on T.V. Sports is a major part of American culture and professional athletes
are rewarded with large salaries. The internet site:
http://www.amstat.org/publications/jse/archive.htm
Contains a number of
data files. Look for the file marked baseball.dat and save it (just click on
the link) . Now open it in your word processor. This is the data for baseball
players during the 1992 season. The columns are organized as follows:
#1 the salary of the
baseball player in thousands of dollars
#2 the players batting
average
#3 the players base
percentage,
#4 the number of runs
scored by the player
#5 the number of hits by
the player
We will also look at
column #8 - which is number of home runs. Since there are 337 players, we will
do a systematic sample of the data. Do the following
1.
Choose
every 10th player in the dataset (this is every ten rows). Using
this sample find the mean, variance, standard deviation, range, midrange, mode
and five number summary for EACH of the following pieces of data in the sample:
salary, batting average, and number of home runs.
2.
Create
two scatter plots using your sample - in the first scatter plot compare salary
and batting average. In the second scatter plot compare salary and home runs.
3.
Write
up the results of your investigation as follows. Include the raw data in your
sample, the scatter plots and a discussion of the following questions - does it
seem that baseball players with better performance (higher batting average,
more home runs) are paid higher salaries?
NOTE: YOU
ONLY HAVE TO TURN IN PART 3 - but part 3 should contain the information from
part 1 and 2!
Part III: Everyone has a car and I know many
people who would like to have a better car, Here is a site with some data about
cars from 1993
http://www.amstat.org/publications/jse/archive.htm
Look for the link to the
file 93cars.dat. Click on this link to save this file. Now open it in a word
processor. This time I would like you to determine how the data is organized -
here is a file that explains the data organization
http://www.amstat.org/publications/jse/datasets/93cars.txt
For this data set do the
following
1.
Create
a frequency table, histogram and relative frequency histogram for cost of all
cars classified compact. Create the same for all cars marked sporty. You should
determine the appropriate number of classes for your table and histogram.
2.
Interpret
the graphs in part 1 - what information can you glean from them - can you draw
any preliminary conclusions about the relative costs of sporty cars vs. compact
cars
That's all folks -
please direct questions to me if any of this is unclear.
Especially if you have
trouble reading the data after you save the file - I can send the data to you
directly!