Забыли?

# EXAM 3 has been moved to November 28 !

код для вставкиСкачать
```NOTE : For the purpose of review, I have added some additional parts not
found on the original exam. These parts are indicated with a ** beside them
Statistics 224 Solution key to EXAM 2
Friday 11/2/07
Professor Michael Iltis (Lecture 2)
FALL 2007
(Do 5 out of 6 problems at 20 points each)
EXAM 3 has been moved to November 28 !
NOTE: We have decided to move the date of EXAM 3 to Friday November
28 after Thanksgiving rather than Wednesday Nov 26th since that works out
better for your TA. It will cover roughly the same amount of material as the
exam planned for the earlier date Nov 19th but gives you a little more chance
to study and gives me more time to write it.
1. The fracture strengths (Mpa) for a random sample of n = 100 ceramic bars fired in a
particular kiln resulted in a sample mean x =91.20 and sample standard deviation
s = 3.81.
a) (10 pts) Calculate a 95% confidence interval for the true average fracture strength 
What assumptions are you making about the distribution of fracture strengths ?
By the central limit theorem for large sample size ( n = 100 here ) the random variable
Z below will be approximately normal and will (by definition of z-critical value z / 2 )
with probability 1− satisfy the inequality
−z / 2≤Z =
 −
X
≤z  /2 .
s/  n
This statement (which in essence says that the sample mean is approximately normal so
the standardized sample mean is approximately standard normal), is true without making
any assumptions on the distribution of the original population other than the population
variance exists and is finite :
 2 ∞ .
If we re-write this by multiplying all sides of this inequality by the denominator s /  n
and then adding the population mean  to all sides, we find that the above is
equivalent to the statement
 − z / 2 s/  n≤≤ X
  z / 2 s /  n .
X
Once we plug in a particular observed sample mean we don't know if the statement
holds or not but we know if we repeat this procedure with a lot of random samples that it
will be true approximately 100 ( 1− )% of the time. For the sample given above we
get
x − z  /2 s/  n=91.20−1.963.81/10≤≤91.201.963.81/10= x z  / 2 s /  n
or
90.4532≤≤91.9468
is our 95% confidence interval for  .
b) (10 points) Suppose investigators believe a priori that the population standard
deviation for fracture strength is =4 Mpa . How large a sample would then be
required to estimate  to within 0.5 Mpa (one half of an Mpa) with 95% confidence ?
The above confidence interval statement about  can be re-written as
 −|≤
|X
z / 2 s
≤E=.5= maximum error
n
Saying that the maximum error E = .5 (in units of Mpa) when estimating  by the
sample mean can be re-written as
z  /2 s
≤n
E
or

2

1.96 3.81
n≥
=223.06 so n≥224
.5
For sample size n = 224 the maximum error is bounded by .5 with probability 95%.
2. Customers at an optometry center select either contact lenses (A), regular eyeglasses
(B), or bifocals (C) . Assume that successive customers make independent selections
with probabilities P  A=.5, P B=.4 , and P C =.1 .
a) (10 points) Among the next 15 customers what is the probability P  X =3 that
exactly 3 will purchase regular eyeglasses ? What kind of random variable is X ?
(Hint: the probability that a customer will not purchase regular glasses is 1-P(B) = .6 )
X = the number of customers among the next 15 who purchase regular glasses
is a binomial random variable with parameters n = 15 and p = .4
Either a customer chooses regular glasses with probability p = .4 or he doesn't with
probability 1 – p = .6 and by independence these probabilities multiply to give
(since there are 15 choose 3 ways to choose exactly 3 customers out of 15 who get
regular glasses leaving the other 15-3 = 12 customers who don't) :
 
P  X =3= 15 .43 .612 =455.064.00217678=.0633879
3
b) (10 points) Among the next 20 customers, what are the mean and variance of the
number who select bifocals ?
This number is a binomial random variable with n = 20 customers and p = .1 is the
probability a customer will select bifocals. But a binomial random variable has
mean : n p = 20(.1) = 2
and the variance : n p (1-p) = 20(.1)(.9) = 1.8
**c) In a (large) sample of size n=100 what are the mean and variance of the number Y
who select bifocals and how would you use the continuity correction to approximate
the (discrete binomial) probability P13≤Y ≤16 by a continuous standard normal ?
E[Y ]=np=100 .1=10,
V [Y ]=np1− p=9 so
 Y =3 . For large n, the
binomial r.v. Y being a sum of a large number of Bernoulli 0 or 1 valued random
variables is approximately normal by the central limit theorem (special case gives
normal approximation to binomial) so standardizing we have
12.5−10
Y −10 16.5−10
Z=


3
3
3
=P5/6Z13/6=F 13/6−F 5/6=.98487−.79743=.18744
P13≤Y ≤16=P12.5Y 16.5=P
**2 d) What is the probability PY =12 that the first customer encountered who
wants bifocals is the 12th customer to arrive? What kind of random variable is Y?
Y , the number of independent trials until the first "success" , is a geometric random
variable with parameter p =.1 ("success" probability that a customer wears bifocals)
For this to happen first for the 12th customer means that the previous 11 customers
failed each with probability 1− p=.9 while the 12 th succeeded with probability p = .1
so by independence
PY =12=.911.1=.0313811
gives this geometric probability to seven places.
**2 e) Find the multinomial probability P X=5 , Y =8, Z=2 that in a random sample
of n=15 customers exactly 5 select contacts, 8 regular glasses, and 2 bifocals :
The multinomial coefficient counts the number of ways in which this can happen and is
15 !
= 15 10
5 ! 8 ! 2!
5 8
  
(first choose 5 with contacts from 15 and then choose 8 regular
glasses from the remaining 10)
The equal probability of each way is found by independence multiplying the three
probabilities .5, .4 , .1 of each selection the appropriate number of times. This gives the
probability of one of the ways as
.55 .48 .12
so the total probability is
P X=5 , Y =8, Z=2=
15 !
.55 .48 .12
5! 8 ! 2 !
I am happy if you leave your answer in the above form but if you want to give the
=135135×.000000204=.027675648
3. If the optometry center sees n = 1000 customers a day and on average one customer
every two days must be referred to a cataract specialist so that p=.0005 is the small
probability that a customer has cataracts, the average number of customers per day who
have cataracts is =n p=1000.0005=.5 .
a) (5 points ) What kind of random variable (and with what parameter =  ) would you
use to approximate the binomial random variable (with n = 3000 , p = .0005 )
X(3) = the number of customers with cataracts who arrive in a 3 day period ?
In a three day period 3000 customers arrive. n is large and p is small so we use the
Poisson random variable approximation to the binomial with parameter  = the mean
=n p=1.5 = E[ X(3)]= variance of the Poisson
b) (5 points ) Using this approximation, what are the mean and the standard deviation of
the number X(3) of customers with cataracts who arrive in a 3 day period ?
The mean equals the variance  2 equals the parameter =n p=1.5 . So the S.D. is
=  =  1.5=1.2247448
Note this approximation holds exactly only in the limit when n goes to infinity and p
goes to 0 with =n p=1.5 fixed.
The actual standard deviation of the binomial random variable is a bit different being
 np1− p= 1.51−.0005=1.2244386
which differs in the fourth decimal place.
c ) (5 points) Using this approximation estimate the probability P( X(3) = 2 ) that
exactly 2 customers with cataracts arrive in a 3 day period.
P  X 3=2=
The actual probability is
2 − 1.52 −1.5
e =
e =.2510214
2!
2!
.0005 1−.0005
3000
2 
2
2998
=.25109467 which differs in the 5th
decimal place.
d) (5 points) What kind of random variable is T (what parameter) and what is the
P T t = P  X t =0=1− PT t 
probability
that the time T until the first customer with cataracts arrives exceeds t days or
equivalently that zero customers with cataracts arrive in a t day period ? Note :
E [ X t ]= t=.5 t The latter is the parameter (mean) of a Poisson process so
P T t = P  X t =0=.5 t0 / 0 ! e−.5t =e−.5 t =1−P T t 
gives the tail probability ( 1 minus the cumulative distribution) of an exponential
random variable T with parameter ==.5 (The waiting time T to the first Poisson
event and also the inter-arrival time between Poisson events is an exponential r.v. )
4. Participants from a day long conference choose from 2 lunch specials : Sirloin tips
\$10 or Spinach lasagna \$5 and from 3 dinner specials : Lobster \$25, Crab legs \$20 , or
Thai Stir fried vegetables \$10. For a randomly selected participant the joint probability
distribution p(x,y) for
X = cost of lunch
and Y = cost of dinner
is given by
p(x,y) :
Y
\$10
\$20
\$25
-----------------------------\$5 | .30
.20
.10 | .6= p X 5=.3.2.1
X
|
|
\$10 | .20
.10
.10 | .4= P X 10=.2.1.1
----------------------------.5
.3
.2
a) (5 points) Find the marginal probability mass functions p X  x and p Y  y for X
and for Y
The row sums give the marginal p.m.f. for X : p X 5=.6 and p X 10=.4 .
The column sums give the marginal p.m.f. for Y p Y 10=.5 , pY  20=.3 and p Y 25=.2
b) (5 points) Find the expected cost of meals for a randomly selected participant
E[ X+Y ] .
We could directly calculate this from the joint p.m.f. via
E [ X Y ]=∑  x y p  x , y
x,y
= (5+10)(.3) + (5+20)(.2) + (5+25) (.1) + (10+10)(.2)+ (10+20)(.1) + (10+25)(.1) = 23
but it is easier to use the marginal probabilities via the property of expectation :
E [ X Y ]=E [ X ] E[ Y ]=∑ x p X  x ∑ y pY  y
x
y
= [ 5(.6) + 10 (.4) ] + [ 10(.5) + 20 (.3) + 25 (.2)]
= 7 + 16
= 23
** b') For purposes of the covariance calculation below I could have asked instead using
the marginals above, for the means and variances E[ X ] , E[Y ] , V [ X ] , V [Y ]
We already found the means. The variances are calculated below in the covariance
calculation that follows.
4. c) (6 points) Compute the correlation
Cov [ X , Y ]
where Cov[ X , Y ]=E [ X − X Y −Y ]
 X Y
is the covariance of X and Y and  X , Y are the standard deviations of X and Y .
 X , Y =
Using the marginal probabilities
V [ X ]=E [ X 2 ]−2X =52 .610 2 .4 −72=55−49=6
V [Y ]=E [Y 2 ]−2Y =102 .520 2 .3252 .2−16 2=50120125−256=295−256=39
so
 X  Y = 6⋅39=3  26=15.29706 .
Cov [ X , Y ]=∑  x−7 y−16 p  x , y 
x,y
= 5−710−16.35−720−16.25−725−16.1
10−710−16.210−720−16.110−7 25−16.1
= 12.3−8.2−18.1−18.212.127 .1
= .5
so the correlation is .5 / 15.29706 = .03269
d) (4 points ) Are X and Y independent ? Explain
No, if X and Y were independent the covariance would be 0 but it is not.
Alternately, then every single entry in the joint probability table would factor as the
product of the corresponding marginal probabilities but this clearly does not hold.
5. The bolt diameter X is normally distributed with mean 1.200 cm and standard
deviation .003 cm. The washer diameter Y is normally distributed with mean 1.204 cm
and standard deviation .004 cm. Assume X and Y are independent.
a) (7 points ) What is the probability that a randomly selected bolt will have diameter X
exceeding 1.204 cm ?
We standardize X to convert it to a standard normal by subtracting its mean 1.2 and
dividing by its standard deviation .003.
Whatever we do to one side of the inequality we must do to all sides so
(using the symmetry of the normal in the last step) :
P  X 1.204=P  Z =
X −1.2 1.204−1.2 .004 4

=
= =1.333= P Z −1.333=.0912 .
.003
.003
.003 3
b) (7 points ) What is the probability P  X Y  that a randomly selected bolt will have
a diameter X that exceeds the diameter Y of a randomly selected washer ? [ Hint :
P  X Y  = P  X −Y 0 . Find the mean and variance of X – Y . What kind of
random variable is X – Y ? ]
X-Y is a linear combination of normals so is normally distributed with mean
E [ X −Y ]=E [ X ]−E [Y ]=1.2−1.204=−.004
and variance  2X −Y =V [ X −Y ]=V [ X ]−12 V [Y ]=.0032.0042=.0052
so the standard deviation is
 X − Y =.005 .
Then standardizing the normal random variable X-Y gives
P  X Y =P  X −Y 0= P Z =
X −Y −−.004 0−−.004

=.8
.005
.005
which by symmetry equals
= P Z −.8=.2119
5. c) (6 points) If all we know is that the bolt diameter X is normally distributed and that
a sample of size n = 9 bolts has (sample) mean diameter 1.204 and (sample) standard
deviation .003 cm , should we believe that the population mean is 1.200 ? It may be
useful to know the t-critical values (8 degrees of freedom) t .005=3.355 t .001=4.501
 −/ S /  n ? ] Explain.
[ Hint: what kind of random variable is  X
 −
X
S / n
is a random variable having a t-distribution with parameter =n−1=8 degrees of
freedom. For the given sample this gives a value of
T=
1.204−1.200 .004
=
=4
.001
.003/  9
which lies between the two t-critical values given.
t=
This says that at significance level =.01=2.005 we would reject the null
hypothesis
H 0 : =1.200
that the population mean is 1.200 cm since the observed t value exceeds the t-critical
value t / 2=t .005=3.355
but for a more stringent test with a higher standard having significance level
=.002 and t-critical value t / 2=t .001=4.501
we would accept the null hypothesis that the mean is 1.2. It is not really right to mix up
the language of confidence intervals with that of rejection regions since confidence
 (the sample mean) whereas the rejection
intervals involve the random quantity X
region is fixed once we fix the type I error probability (significance level)  .
Note that the p-value which is the probability of seeing data as (or more) extreme as the
observed sample given the null hypothesis is true would lie somewhere between .002
and .01 (the p-value would be twice the probability /2 represented by the t-critical
value t / 2=4 which we don't know exactly here but only that it lies between twice the
probabilities /2 for the two t-critical values t / 2 with / 2 = .005 and /2 = .
001) since for a two sided test like this one we would also have seen equally strong
evidence in favor of rejecting the null hypothesis had we seen a negative value of t less
than or equal to
t=-4
instead of a t value greater than or equal to
t = 4.
**5 c') Alternately I could have phrased the previous hypothesis test instead using the
language of confidence intervals :
If all we know is that the bolt diameter X is normally distributed and that a
sample of size n = 9 bolts has (sample) mean diameter 1.204 and (sample) standard
deviation .003 cm , give a 99% confidence interval for the true population mean
diameter. At level =.01 should we then believe that the population mean is 1.200 ?
With probability 1−=.99 (i.e. 99% confidence) we can write
 −
X
−t  /2=−3.355≤T =
≤3.355=t  /2 which can be rearranged to give the
S/  n
confidence interval
 ±t  /2 S /  n=1.204±3.355 .003/3 = [1.200645, 1.207355] for  .
X
Since the null hypothesis value of the mean 1.200 just misses this confidence interval,
at level =.01 we would reject the null hypothesis.
Note that we would reach the sameconclusion for any other null hypothesis value of the
mean that did not lie in this confidence interval.
Note that the above confidence interval statement can be recast as saying that under the
x −0
=4 falls outside the acceptance region
S /n

null hypothesis H 0 :=0=1.200 , t=
−t  /2T =
 −0
X
t =3.355
S/  n  /2
**5 d) If for the above sample of size n=9 the sample standard deviation had instead
been .004 cm would we reject the null hypothesis H 0 :=.003 in favor of the
alternative H a : .003 at significance level =.01 ? Note the chi-squared critical
value (for n-1 = 8 degrees of freedom) is 2.01=20.09
Since
2 =
n−1 s 2 8×16
=
=14.2222.01=20.09
2
9

here we would not reject the null hypothesis at level .01 when s = .004 but had we seen a
sample standard deviation of s = .005 instead , we would then have
2
n−1 s 8×25
=
=
=22.22222.01=20.09
2
9

so we would reject H 0 :=.003 at level .01 (for data with s = .005 instead).
2
6. Let X 1 = number of radios, X 2 = number of pocket calculators and
X 3 = number of headphones sold in a one hour period in the electronics section of a
discount store. Assuming these purchases are made independently of one another, and
that the mean and variance of these numbers of items and the costs are given by the table
2
 i =V [ X i ]
i i=E [ X i ]
Cost
1
16
10
2
12
8
\$ 4 per calculator
3
8
5
\$ 6 per pair of headphones
a) (10 points ) Find the expected sales revenue of these items in a single hour.
[ Hint : the revenue per hour in dollars is 3 X 14 X 26 X 3 ]
E [3 X 14 X 26 X 3 ]=3 E [ X 1 ]4 E [ X 2 ]6 E [ X 3]
= 3⋅164⋅126⋅8=3⋅48 = \$144
This property of expectation of a linear combination of the X i ' s does not require
independence of the random variables.
The corresponding property of the variance does use independence however :
b) (10 points ) Find the standard deviation of the revenue in a single hour.
Using the independence of X 1 , X 2 , and X 3 we have the formula
for the variance of the hourly revenue :
V [3 X 14 X 26 X 3 ]=3 2 V [ X 1 ]4 2 V [ X 2 ]6 2 V [ X 3 ]
= 9⋅1016⋅836⋅5=90128180=398 .
Thus the standard deviation of the hourly revenue is
=  398 ≈ \$19.95 .
Note : Although the laws of expectation and variance used in this problem may seem
very easy to use once properly understood, that does not mean these are not important.
In many ways these are the most important techniques we have used since many of the
formulas we have encountered during this course are a direct application of these simple
laws, and knowledge of these properties often saves us the burden of having to
memorize formulas since they help us derive many formulas.
**6 c) Find E[3 X 21 ]
By the linearity of expectation E[3 X 21 ]=3 E[ X 21 ]
but now from the formula for variance V [ X ]=E [ X 2]−E [ X ]2
(or equivalently E[ X 2 ]=V [X ]E [ X ]2 ) using the given table the above equals
2
2
2
=3V [ X 1 ] E[ X 1 ] =3 1016 =798
**6 d) If we are also told that in addition to being independent, the random variables
X 1, X 2 , X 3 are approximately normal, what can we then say about the
distribution of the revenue ? Since the revenue 3 X 14 X 26 X 3 is a linear
combination of (approximately) normally distributed random variables, we
know it must also be approximately normal with parameters the mean 144 and
variance 398 already determined above. Thus if we wanted to find the
probability of the (normally distributed) revenue exceeding \$200 say we would
standardize the inequality R > 200 to get the equivalent statement
Z=
R−144 200−144

=2.807 and use symmetry to write (interpolating values)
398
 398
PZ2.807=PZ−2.807 =.00253
from the standard Z table
```
1/--страниц
Пожаловаться на содержимое документа