A bit of review
to motivate us!
Let's review with the following example:
Example: According to DuPont Automotive
15% of sport compact cars are dark green. Assume 10 sport compact cars are
randomly selected. Let X be the random variable that is the number of dark green
cars in the sample. Find a probability distribution for the random variable
Solution: Is X binomial?? Yes! There are
certainly two outcomes (dark green or not dark green) There are a fixed number
of trials (10), the trials are independent and the probability is constant
between trials. So using the binomial probability formula (or Statdisk), we get
the following distribution:

Now you know that
the probability that exactly 5 cars in the sample of 10 are dark green is
0.00849.
The Binomial distribution is great for DISCRETE random
variables. The only problem is that not all random variables are DISCRETE. Here
is another example
Example: Mr. Wildman wishes to purchase a
new sport or compact car. He can't seem to decide so being into statistics he decides
to select randomly from all the different models available. What is the
probability that the car he selects will have an average city miles per gallon
that exceeds 42 mpg?
Here the random variable X represents the average city mpg
of the car - this is not discrete (why??), but a continuous random variable. I
can't write down a probability distribution like we did in the binomial example
above. So we need some new tools!
The
Standard Normal Distribution
Most continuous random variables follow what is know as the
normal distribution:
|
A continuous random variable has a normal distribution if
that distribution is symmetric and bell-shaped and the distribution fits the
equation:
|
Yuck!
Fortunately for us we don't need to use this formula. We need to know
though that the distribution is bell-shaped. What we need to use and understand
is the idea that there is a correspondence between the area under the
distribution of a continuous random variable and probability
To see this connection consider this example
Example: Suppose in a certain
manufacturing process the temperature is controlled to range between 0 degrees
and 5 degrees with all values being equally likely. Let X be the temperature in
this manufacturing process. Since it is equally likely to be all values between
0 and 5 degrees we get a relative frequency histogram that looks as follows:

In other words we
get a relative frequency histogram that has five bars each of height 0.2 (or
20%). This is an example of what is known as a uniform probability distribution
function. A probability distribution where every value of the random variable
is equally likely. Here is another definition:
|
The
graph of a continuous probability distribution is called a density
curve, it has the following properties: 1) The
curve has a total area under the curve of 1 2) Every
point on the curve has a height between 0 and 1. |
Certainly the uniform distribution given above has these
properties
Example: Consider the uniform distribution given above. What
is:
a) ![]()
b) ![]()
If you think about what each of these problems are asking it
is easy to determine these probabilities. Since we have temperatures evenly distributed
between 0 and 5, the probability getting a temperature greater than or equal to
1 is 4/5. Similarly, the probability of getting a temperature greater than or
equal to 1 and less than or equal to 3 should be 2/5
Notice in the relative frequency graph given above that the
total area under the graph is 1 (0.2 * 5 = 1). In part a we want the
probability that we select a temperature it is greater than or equal to 1. The
area under the curve greater than or equal to one is 4/5 (0.2 * 4=0.8). Notice that
corresponds to the area under the curve from
1 to 5! In part b we want a temperature that is between 1 and 3. The area
greater than or equal to 1 and less than or equal to 3 under the curve is 2/5
(0.2 * 2 = 0.4). Again the area under the curve and the probability are the
same!
It is easy to find area under the curve with a uniform
distribution (it is always a rectangle), what about the normal distribution.
If you graph the normal distribution you get a graph like
the following:

To calculate a probability under this curve we need 2
things:
1) We need a standard to compare against - this is called
the standard normal distribution. It is the normal distribution with mean of 0
and standard deviation of 1
2) We have to be able to find the area under the standard
normal curve - there are 2 ways to do this - using table A2 in your book or
using a TI-83 calculator
Here is another example:
Suppose we manufacturer thermometers that are supposed to
give readings of 0 degrees at the freezing point. We test a large sample and
find that the mean of the sample is 0 degrees with a standard deviation of 1
degree and that the distribution is bell-shaped. If one thermometer is randomly
selected what is the probability that the reading at the freezing point is
between 0 and 1.32 degrees
Solution: Since the distribution is bell-shaped we will
assume that it is the normal distribution and since its mean is 0 and the
standard deviation 1, we will assume it is the standard normal distribution.
To find the probability that a thermometer has zero readings
between 0 and 1.32 degrees, we need to find the area under the standard normal
curve between 0 and 1.32. First we will do this using table A-2.
For an on-line version of Table A-2 take this link
Notice that Table A-2, gives the area under the curve
between 0 and any point z. Since we want the area under the curve between 0 and
1.32, look down the column labelled z till you get to 1.3, now go across the
row until you are under the label .02 (since 1.3 + .02 = 1.32), this is the
area under the curve between 0 and 1.32, and there fore this is the probability
the thermometer measures between 0 and 1.32 at the freezing point
To do this on e TI-83,
Hit 2nd VARS to get the distr menu, choose item 2 for
normalcdf
Type in the following
normalcdf(0,1.32) and hit enter to get the answer
Note the calculator answer is far superior in terms of
accuracy
To test your skills, find the probability that a thermometer
is selected at random and measures between 0 and 1.47
Take this link to see where to find this in the table
On the calculator use: normalcdf(0, 1.47) to get the answer
More problems:
1) What is the probability that one selected measures
between -2.57 and 0 degrees
Solution: To determine this you would need the area between
-2.57 and 0 on the normal curve which at first glance does not seem to be in
the table. But since the normal curve is symmetric about zero, this is
identical to the area between 0 and 2.57. So looking up 2.57 on the table we
get ,4949. Identically using the calculator we use normalcdf(0,2,57)
2) What is the probability that one selected measures
greater than 1.53 degrees
Solution: Table A-2 give the area between 0 and 1.53, BUT
the area under the whole curve is 1 (since it is a density function) and so the
area under the positive half of the curve is 0.5, so the table gives use .4370
as the area under the curve from 0 to 1.53, there fore the area greater than
1.53 is 0.5 - .4370 or 0.063, this is the probability that one selected
measures above 1.53.
On the calculator use 2nd vars to get he distr menu and then
select 2 for normalcdf( now enter 1.53 and a comma and hit 2nd comma for the E
notation and enter 99and a parenthesis. When you are finished it should look
like this:
normalcdf(1.53,E99)
The E99 means infinity, so this give the area under the
normal curve from 1.53 to infinity
3) What is the probability that one is selected and it
measures between -1.31 and 1.46 at the freezing point
Solution: Via Table A-2: Look up 1.31, this gives the area
between 0 and 1.31, by the symmetry of the normal curve this is the same as the
area between -1.31 and 0. The answer you get is 0.4049. Now look up 1.46, this
gives the area between 0 and 1.46, which is .4279. Since
|
Area between -1.31 and 1.46 |
= |
Area between 0 and 1.46 |
+ |
Area between -1.31 and 0 |
The final answer is 0.4049 + 0.4279 = 0.8328
Via the calculator you just need to use normalcdf(-1.31,
1.46)
Finding
Z-scores when given Probabilities
Sometimes we wish to find a decile, percentile or quartile
for a standard normal distribution. Consider these problems
Example 1: Find the temperature of the 80th
percentile
Solution: Since the 80th percentile means 80% of the scores
are below this value and since the area under the curve is 1, this means that
the area below the score representing the 80th percentile is 0.8. To use the
table, consider that it gives only the positive half of the curve and that the
area under the negative half is 0.5. This means you are looking for a score
corresponding to 0.3. Looking at the BODY of the table, find the value closest
to 0.3, you will see this is the column and row that corresponds to 0.84, So a
z-score of 0.84 is the 80th percentile.
It is easier on a calculator: hit 2nd VARS for the distr
menu, but select 3 for invNorm, now enter .8 and a parenthesis. You should get
this
InvNorm(.8) = .84162
Notice once again the calculator is a bit more accurate
Example 2: Find the temperature
corresponding to ![]()
Solution: This is the 10th percentile, so 10% of the scores
are below this value. The area under the curve below this value is 0.1 (again
the total area under the curve is 1 - so 0.10 * 1 = 0.1). By symmetry this is
the same as looking for the point whose area ABOVE is 0.1. Using the table we
need to find the value corresponding to 0.4 in the body of the table. The
closest value is 1.28. This is the point with 10% of the scores above it, so
-1.28 is the point with 10% of the scores below it.
On the calculator just use invNorm(0.1) to get then answer