NOTE : For the purpose of review, I have added some additional parts not found on the original exam. These parts are indicated with a ** beside them Statistics 224 Solution key to EXAM 2 Friday 11/2/07 Professor Michael Iltis (Lecture 2) FALL 2007 (Do 5 out of 6 problems at 20 points each) EXAM 3 has been moved to November 28 ! NOTE: We have decided to move the date of EXAM 3 to Friday November 28 after Thanksgiving rather than Wednesday Nov 26th since that works out better for your TA. It will cover roughly the same amount of material as the exam planned for the earlier date Nov 19th but gives you a little more chance to study and gives me more time to write it. 1. The fracture strengths (Mpa) for a random sample of n = 100 ceramic bars fired in a particular kiln resulted in a sample mean x =91.20 and sample standard deviation s = 3.81. a) (10 pts) Calculate a 95% confidence interval for the true average fracture strength What assumptions are you making about the distribution of fracture strengths ? By the central limit theorem for large sample size ( n = 100 here ) the random variable Z below will be approximately normal and will (by definition of z-critical value z / 2 ) with probability 1− satisfy the inequality −z / 2≤Z = − X ≤z /2 . s/ n This statement (which in essence says that the sample mean is approximately normal so the standardized sample mean is approximately standard normal), is true without making any assumptions on the distribution of the original population other than the population variance exists and is finite : 2 ∞ . If we re-write this by multiplying all sides of this inequality by the denominator s / n and then adding the population mean to all sides, we find that the above is equivalent to the statement − z / 2 s/ n≤≤ X z / 2 s / n . X Once we plug in a particular observed sample mean we don't know if the statement holds or not but we know if we repeat this procedure with a lot of random samples that it will be true approximately 100 ( 1− )% of the time. For the sample given above we get x − z /2 s/ n=91.20−1.963.81/10≤≤91.201.963.81/10= x z / 2 s / n or 90.4532≤≤91.9468 is our 95% confidence interval for . b) (10 points) Suppose investigators believe a priori that the population standard deviation for fracture strength is =4 Mpa . How large a sample would then be required to estimate to within 0.5 Mpa (one half of an Mpa) with 95% confidence ? The above confidence interval statement about can be re-written as −|≤ |X z / 2 s ≤E=.5= maximum error n Saying that the maximum error E = .5 (in units of Mpa) when estimating by the sample mean can be re-written as z /2 s ≤n E or 2 1.96 3.81 n≥ =223.06 so n≥224 .5 For sample size n = 224 the maximum error is bounded by .5 with probability 95%. 2. Customers at an optometry center select either contact lenses (A), regular eyeglasses (B), or bifocals (C) . Assume that successive customers make independent selections with probabilities P A=.5, P B=.4 , and P C =.1 . a) (10 points) Among the next 15 customers what is the probability P X =3 that exactly 3 will purchase regular eyeglasses ? What kind of random variable is X ? (Hint: the probability that a customer will not purchase regular glasses is 1-P(B) = .6 ) X = the number of customers among the next 15 who purchase regular glasses is a binomial random variable with parameters n = 15 and p = .4 Either a customer chooses regular glasses with probability p = .4 or he doesn't with probability 1 – p = .6 and by independence these probabilities multiply to give (since there are 15 choose 3 ways to choose exactly 3 customers out of 15 who get regular glasses leaving the other 15-3 = 12 customers who don't) : P X =3= 15 .43 .612 =455.064.00217678=.0633879 3 b) (10 points) Among the next 20 customers, what are the mean and variance of the number who select bifocals ? This number is a binomial random variable with n = 20 customers and p = .1 is the probability a customer will select bifocals. But a binomial random variable has mean : n p = 20(.1) = 2 and the variance : n p (1-p) = 20(.1)(.9) = 1.8 **c) In a (large) sample of size n=100 what are the mean and variance of the number Y who select bifocals and how would you use the continuity correction to approximate the (discrete binomial) probability P13≤Y ≤16 by a continuous standard normal ? E[Y ]=np=100 .1=10, V [Y ]=np1− p=9 so Y =3 . For large n, the binomial r.v. Y being a sum of a large number of Bernoulli 0 or 1 valued random variables is approximately normal by the central limit theorem (special case gives normal approximation to binomial) so standardizing we have 12.5−10 Y −10 16.5−10 Z= 3 3 3 =P5/6Z13/6=F 13/6−F 5/6=.98487−.79743=.18744 P13≤Y ≤16=P12.5Y 16.5=P **2 d) What is the probability PY =12 that the first customer encountered who wants bifocals is the 12th customer to arrive? What kind of random variable is Y? Y , the number of independent trials until the first "success" , is a geometric random variable with parameter p =.1 ("success" probability that a customer wears bifocals) For this to happen first for the 12th customer means that the previous 11 customers failed each with probability 1− p=.9 while the 12 th succeeded with probability p = .1 so by independence PY =12=.911.1=.0313811 gives this geometric probability to seven places. **2 e) Find the multinomial probability P X=5 , Y =8, Z=2 that in a random sample of n=15 customers exactly 5 select contacts, 8 regular glasses, and 2 bifocals : The multinomial coefficient counts the number of ways in which this can happen and is 15 ! = 15 10 5 ! 8 ! 2! 5 8 (first choose 5 with contacts from 15 and then choose 8 regular glasses from the remaining 10) The equal probability of each way is found by independence multiplying the three probabilities .5, .4 , .1 of each selection the appropriate number of times. This gives the probability of one of the ways as .55 .48 .12 so the total probability is P X=5 , Y =8, Z=2= 15 ! .55 .48 .12 5! 8 ! 2 ! I am happy if you leave your answer in the above form but if you want to give the calculator answer this equals =135135×.000000204=.027675648 3. If the optometry center sees n = 1000 customers a day and on average one customer every two days must be referred to a cataract specialist so that p=.0005 is the small probability that a customer has cataracts, the average number of customers per day who have cataracts is =n p=1000.0005=.5 . a) (5 points ) What kind of random variable (and with what parameter = ) would you use to approximate the binomial random variable (with n = 3000 , p = .0005 ) X(3) = the number of customers with cataracts who arrive in a 3 day period ? In a three day period 3000 customers arrive. n is large and p is small so we use the Poisson random variable approximation to the binomial with parameter = the mean =n p=1.5 = E[ X(3)]= variance of the Poisson b) (5 points ) Using this approximation, what are the mean and the standard deviation of the number X(3) of customers with cataracts who arrive in a 3 day period ? The mean equals the variance 2 equals the parameter =n p=1.5 . So the S.D. is = = 1.5=1.2247448 Note this approximation holds exactly only in the limit when n goes to infinity and p goes to 0 with =n p=1.5 fixed. The actual standard deviation of the binomial random variable is a bit different being np1− p= 1.51−.0005=1.2244386 which differs in the fourth decimal place. c ) (5 points) Using this approximation estimate the probability P( X(3) = 2 ) that exactly 2 customers with cataracts arrive in a 3 day period. P X 3=2= The actual probability is 2 − 1.52 −1.5 e = e =.2510214 2! 2! .0005 1−.0005 3000 2 2 2998 =.25109467 which differs in the 5th decimal place. d) (5 points) What kind of random variable is T (what parameter) and what is the P T t = P X t =0=1− PT t probability that the time T until the first customer with cataracts arrives exceeds t days or equivalently that zero customers with cataracts arrive in a t day period ? Note : E [ X t ]= t=.5 t The latter is the parameter (mean) of a Poisson process so P T t = P X t =0=.5 t0 / 0 ! e−.5t =e−.5 t =1−P T t gives the tail probability ( 1 minus the cumulative distribution) of an exponential random variable T with parameter ==.5 (The waiting time T to the first Poisson event and also the inter-arrival time between Poisson events is an exponential r.v. ) 4. Participants from a day long conference choose from 2 lunch specials : Sirloin tips $10 or Spinach lasagna $5 and from 3 dinner specials : Lobster $25, Crab legs $20 , or Thai Stir fried vegetables $10. For a randomly selected participant the joint probability distribution p(x,y) for X = cost of lunch and Y = cost of dinner is given by p(x,y) : Y $10 $20 $25 -----------------------------$5 | .30 .20 .10 | .6= p X 5=.3.2.1 X | | $10 | .20 .10 .10 | .4= P X 10=.2.1.1 ----------------------------.5 .3 .2 a) (5 points) Find the marginal probability mass functions p X x and p Y y for X and for Y The row sums give the marginal p.m.f. for X : p X 5=.6 and p X 10=.4 . The column sums give the marginal p.m.f. for Y p Y 10=.5 , pY 20=.3 and p Y 25=.2 b) (5 points) Find the expected cost of meals for a randomly selected participant E[ X+Y ] . We could directly calculate this from the joint p.m.f. via E [ X Y ]=∑ x y p x , y x,y = (5+10)(.3) + (5+20)(.2) + (5+25) (.1) + (10+10)(.2)+ (10+20)(.1) + (10+25)(.1) = 23 but it is easier to use the marginal probabilities via the property of expectation : E [ X Y ]=E [ X ] E[ Y ]=∑ x p X x ∑ y pY y x y = [ 5(.6) + 10 (.4) ] + [ 10(.5) + 20 (.3) + 25 (.2)] = 7 + 16 = 23 ** b') For purposes of the covariance calculation below I could have asked instead using the marginals above, for the means and variances E[ X ] , E[Y ] , V [ X ] , V [Y ] We already found the means. The variances are calculated below in the covariance calculation that follows. 4. c) (6 points) Compute the correlation Cov [ X , Y ] where Cov[ X , Y ]=E [ X − X Y −Y ] X Y is the covariance of X and Y and X , Y are the standard deviations of X and Y . X , Y = Using the marginal probabilities V [ X ]=E [ X 2 ]−2X =52 .610 2 .4 −72=55−49=6 V [Y ]=E [Y 2 ]−2Y =102 .520 2 .3252 .2−16 2=50120125−256=295−256=39 so X Y = 6⋅39=3 26=15.29706 . Cov [ X , Y ]=∑ x−7 y−16 p x , y x,y = 5−710−16.35−720−16.25−725−16.1 10−710−16.210−720−16.110−7 25−16.1 = 12.3−8.2−18.1−18.212.127 .1 = .5 so the correlation is .5 / 15.29706 = .03269 d) (4 points ) Are X and Y independent ? Explain No, if X and Y were independent the covariance would be 0 but it is not. Alternately, then every single entry in the joint probability table would factor as the product of the corresponding marginal probabilities but this clearly does not hold. 5. The bolt diameter X is normally distributed with mean 1.200 cm and standard deviation .003 cm. The washer diameter Y is normally distributed with mean 1.204 cm and standard deviation .004 cm. Assume X and Y are independent. a) (7 points ) What is the probability that a randomly selected bolt will have diameter X exceeding 1.204 cm ? We standardize X to convert it to a standard normal by subtracting its mean 1.2 and dividing by its standard deviation .003. Whatever we do to one side of the inequality we must do to all sides so (using the symmetry of the normal in the last step) : P X 1.204=P Z = X −1.2 1.204−1.2 .004 4 = = =1.333= P Z −1.333=.0912 . .003 .003 .003 3 b) (7 points ) What is the probability P X Y that a randomly selected bolt will have a diameter X that exceeds the diameter Y of a randomly selected washer ? [ Hint : P X Y = P X −Y 0 . Find the mean and variance of X – Y . What kind of random variable is X – Y ? ] X-Y is a linear combination of normals so is normally distributed with mean E [ X −Y ]=E [ X ]−E [Y ]=1.2−1.204=−.004 and variance 2X −Y =V [ X −Y ]=V [ X ]−12 V [Y ]=.0032.0042=.0052 so the standard deviation is X − Y =.005 . Then standardizing the normal random variable X-Y gives P X Y =P X −Y 0= P Z = X −Y −−.004 0−−.004 =.8 .005 .005 which by symmetry equals = P Z −.8=.2119 5. c) (6 points) If all we know is that the bolt diameter X is normally distributed and that a sample of size n = 9 bolts has (sample) mean diameter 1.204 and (sample) standard deviation .003 cm , should we believe that the population mean is 1.200 ? It may be useful to know the t-critical values (8 degrees of freedom) t .005=3.355 t .001=4.501 −/ S / n ? ] Explain. [ Hint: what kind of random variable is X − X S / n is a random variable having a t-distribution with parameter =n−1=8 degrees of freedom. For the given sample this gives a value of T= 1.204−1.200 .004 = =4 .001 .003/ 9 which lies between the two t-critical values given. t= This says that at significance level =.01=2.005 we would reject the null hypothesis H 0 : =1.200 that the population mean is 1.200 cm since the observed t value exceeds the t-critical value t / 2=t .005=3.355 but for a more stringent test with a higher standard having significance level =.002 and t-critical value t / 2=t .001=4.501 we would accept the null hypothesis that the mean is 1.2. It is not really right to mix up the language of confidence intervals with that of rejection regions since confidence (the sample mean) whereas the rejection intervals involve the random quantity X region is fixed once we fix the type I error probability (significance level) . Note that the p-value which is the probability of seeing data as (or more) extreme as the observed sample given the null hypothesis is true would lie somewhere between .002 and .01 (the p-value would be twice the probability /2 represented by the t-critical value t / 2=4 which we don't know exactly here but only that it lies between twice the probabilities /2 for the two t-critical values t / 2 with / 2 = .005 and /2 = . 001) since for a two sided test like this one we would also have seen equally strong evidence in favor of rejecting the null hypothesis had we seen a negative value of t less than or equal to t=-4 instead of a t value greater than or equal to t = 4. **5 c') Alternately I could have phrased the previous hypothesis test instead using the language of confidence intervals : If all we know is that the bolt diameter X is normally distributed and that a sample of size n = 9 bolts has (sample) mean diameter 1.204 and (sample) standard deviation .003 cm , give a 99% confidence interval for the true population mean diameter. At level =.01 should we then believe that the population mean is 1.200 ? With probability 1−=.99 (i.e. 99% confidence) we can write − X −t /2=−3.355≤T = ≤3.355=t /2 which can be rearranged to give the S/ n confidence interval ±t /2 S / n=1.204±3.355 .003/3 = [1.200645, 1.207355] for . X Since the null hypothesis value of the mean 1.200 just misses this confidence interval, at level =.01 we would reject the null hypothesis. Note that we would reach the sameconclusion for any other null hypothesis value of the mean that did not lie in this confidence interval. Note that the above confidence interval statement can be recast as saying that under the x −0 =4 falls outside the acceptance region S /n null hypothesis H 0 :=0=1.200 , t= −t /2T = −0 X t =3.355 S/ n /2 **5 d) If for the above sample of size n=9 the sample standard deviation had instead been .004 cm would we reject the null hypothesis H 0 :=.003 in favor of the alternative H a : .003 at significance level =.01 ? Note the chi-squared critical value (for n-1 = 8 degrees of freedom) is 2.01=20.09 Since 2 = n−1 s 2 8×16 = =14.2222.01=20.09 2 9 here we would not reject the null hypothesis at level .01 when s = .004 but had we seen a sample standard deviation of s = .005 instead , we would then have 2 n−1 s 8×25 = = =22.22222.01=20.09 2 9 so we would reject H 0 :=.003 at level .01 (for data with s = .005 instead). 2 6. Let X 1 = number of radios, X 2 = number of pocket calculators and X 3 = number of headphones sold in a one hour period in the electronics section of a discount store. Assuming these purchases are made independently of one another, and that the mean and variance of these numbers of items and the costs are given by the table 2 i =V [ X i ] i i=E [ X i ] Cost 1 16 10 $ 3 per radio 2 12 8 $ 4 per calculator 3 8 5 $ 6 per pair of headphones a) (10 points ) Find the expected sales revenue of these items in a single hour. [ Hint : the revenue per hour in dollars is 3 X 14 X 26 X 3 ] E [3 X 14 X 26 X 3 ]=3 E [ X 1 ]4 E [ X 2 ]6 E [ X 3] = 3⋅164⋅126⋅8=3⋅48 = $144 This property of expectation of a linear combination of the X i ' s does not require independence of the random variables. The corresponding property of the variance does use independence however : b) (10 points ) Find the standard deviation of the revenue in a single hour. Using the independence of X 1 , X 2 , and X 3 we have the formula for the variance of the hourly revenue : V [3 X 14 X 26 X 3 ]=3 2 V [ X 1 ]4 2 V [ X 2 ]6 2 V [ X 3 ] = 9⋅1016⋅836⋅5=90128180=398 . Thus the standard deviation of the hourly revenue is = 398 ≈ $19.95 . Note : Although the laws of expectation and variance used in this problem may seem very easy to use once properly understood, that does not mean these are not important. In many ways these are the most important techniques we have used since many of the formulas we have encountered during this course are a direct application of these simple laws, and knowledge of these properties often saves us the burden of having to memorize formulas since they help us derive many formulas. **6 c) Find E[3 X 21 ] By the linearity of expectation E[3 X 21 ]=3 E[ X 21 ] but now from the formula for variance V [ X ]=E [ X 2]−E [ X ]2 (or equivalently E[ X 2 ]=V [X ]E [ X ]2 ) using the given table the above equals 2 2 2 =3V [ X 1 ] E[ X 1 ] =3 1016 =798 **6 d) If we are also told that in addition to being independent, the random variables X 1, X 2 , X 3 are approximately normal, what can we then say about the distribution of the revenue ? Since the revenue 3 X 14 X 26 X 3 is a linear combination of (approximately) normally distributed random variables, we know it must also be approximately normal with parameters the mean 144 and variance 398 already determined above. Thus if we wanted to find the probability of the (normally distributed) revenue exceeding $200 say we would standardize the inequality R > 200 to get the equivalent statement Z= R−144 200−144 =2.807 and use symmetry to write (interpolating values) 398 398 PZ2.807=PZ−2.807 =.00253 from the standard Z table

1/--страниц