Stat 2005 homepage

Stat 2005 information

Stat 2005 activity page

Stat 2005 bulletin board

Stat 2005 resource page

CHAPTER 1

PROJECT #1

DATE DUE: 2/2/2000

Spring Semester 2000

 

Instructions:

Please complete the following project and deliver to me (via e-mail, fax, mail or by hand by Wed. Feb. 2.

As you recall chapter 2 is about describing, exploring and comparing data. In this project I will ask you to look at some specific data sites and apply some of techniques you have used in Chapter 2 to the data. This project will also show you how to read and interpret data sets

In each part below you will look at a data set, do some type of graph and/or descriptive statistics and be asked to do some interpretation. Please write up your answers to include all the stated information. HAVE FUN!!

Part I: In the Microprocessor Industry, computer microchips are placed on extremely small boards called wafers. These wafers are coated with a layer of silicon. Businesses that manufacturer microprocessors would like to insure that wafers are coated as evenly as possible with silicon. This is not an easy process! To check on their manufacturing process, a company will take a random sample of the wafers, and select a random number of sites on the wafer to test for thickness. The following Internet site contains data for one such test (you may want to print this data)

http://lib.stat.cmu.edu/jasadata/hughes-l-d-g

The data is organized as follows: each row contains the data for one wafer. The wafer label is in column 1 (for example A16), you are to ignore columns 2 and 3, columns 4 through 16 consist of data measuring the thickness of the silicon on the wafer at 13 different spots. For the wafers labelled A16, A20, B5, and B21 do the following

1.     Compute the mean, variance, standard deviation, five-number summary, range and interquartile range for the 13 thickness measures (These are in columns 4 through 16).

2.     On a single graph, display boxplots for the four wafers appropriately labeled. Correctly indicate any outliers in the boxplots.

3.     Write a short report on the results of the quality test on silicon thickness done by this company. Include in your report the raw data, the boxplots, the descriptive statistics done in part 1 with an interpretation of what these values mean in terms of each wafer. You boss wishes to select one of these wafers for further development. Based on the criteria of even thickness of the wafers, which one would you select from these four and why?

NOTE: YOU ONLY HAVE TO TURN IN PART 3 - but part 3 should contain the information from part 1 and 2!

 Part II: Many of you probably watch baseball on T.V. Sports is a major part of American culture and professional athletes are rewarded with large salaries. The internet site:

http://www.amstat.org/publications/jse/archive.htm

Contains a number of data files. Look for the file marked baseball.dat and save it (just click on the link) . Now open it in your word processor. This is the data for baseball players during the 1992 season. The columns are organized as follows:

#1 the salary of the baseball player in thousands of dollars

#2 the players batting average

#3 the players base percentage,

#4 the number of runs scored by the player

#5 the number of hits by the player

We will also look at column #8 - which is number of home runs. Since there are 337 players, we will do a systematic sample of the data. Do the following

1.     Choose every 10th player in the dataset (this is every ten rows). Using this sample find the mean, variance, standard deviation, range, midrange, mode and five number summary for EACH of the following pieces of data in the sample: salary, batting average, and number of home runs.

2.     Create two scatter plots using your sample - in the first scatter plot compare salary and batting average. In the second scatter plot compare salary and home runs.

3.     Write up the results of your investigation as follows. Include the raw data in your sample, the scatter plots and a discussion of the following questions - does it seem that baseball players with better performance (higher batting average, more home runs) are paid higher salaries?

NOTE: YOU ONLY HAVE TO TURN IN PART 3 - but part 3 should contain the information from part 1 and 2!

Part III: Everyone has a car and I know many people who would like to have a better car, Here is a site with some data about cars from 1993

http://www.amstat.org/publications/jse/archive.htm

Look for the link to the file 93cars.dat. Click on this link to save this file. Now open it in a word processor. This time I would like you to determine how the data is organized - here is a file that explains the data organization

http://www.amstat.org/publications/jse/datasets/93cars.txt

For this data set do the following

1.     Create a frequency table, histogram and relative frequency histogram for cost of all cars classified compact. Create the same for all cars marked sporty. You should determine the appropriate number of classes for your table and histogram.

2.     Interpret the graphs in part 1 - what information can you glean from them - can you draw any preliminary conclusions about the relative costs of sporty cars vs. compact cars

That's all folks - please direct questions to me if any of this is unclear.

Especially if you have trouble reading the data after you save the file - I can send the data to you directly!