solutions to HW, STAT 535

Solutions for some HW problems

I will post answers and/or solutions for most of the problems not to be turned in for credit here shortly after they are assigned, and I will post answer and/or solutions for most of the problems turned in for credit here after the grace period expires for their submission. Below are solutions for

Problem 1,
Problem 3,
Problem 4,
Problem 5,
Problem 6,
Problem 7,
Problem 8,
Problem 9,
Problem 10,
Problem 11,
Problem 12,
Problem 13,
Problem 14,
Problem 15,
Problem 16,
Problem 17,
Problem 19,
Problem 20,
Problem 21,
Problem 22,
Problem 24,
Problem 25,
Problem 26,
Problem 27,
Problem 28,
Problem 29,
Problem 30,
Problem 31,
Problem 32,
Problem 33,
Problem 34,
Problem 35,
Problem 36,
Problem 37,
Problem 38,
Problem 39,
Problem 40,
Problem 41,
Problem 42,
Problem 43,
Problem 45,
Problem 46,
Problem 47,
Problem 48,
Problem 49,
Problem 50,
Problem 51,
Problem 52,
Problem 53,
Problem 54,
Problem 55,
Problem 57,
Problem 58,
Problem 59,
Problem 60,
Problem 61,
Problem 62,
Problem 66,
Problem 67,
Problem 68,
Problem 69,
Problem 70,
Problem 71,
Problem 72,
Problem 73,
Problem 74,
Problem 75,
Problem 76,
Problem 77,
Problem 78,
Problem 79,
Problem 80,
Problem 81,
Problem 82,
Problem 83,
Problem 84,
Problem 85,
Problem 86,
Problem 87,
Problem 88,
Problem 89,
Problem 90,
Problem 91,
Problem 92,
Problem 93,
Problem 94,
Problem 95,
Problem 96,
Problem 97,
Problem 98,
Problem 99,
Problem 100,
Problem 101,
Problem 102,
Problem 104,
Problem 110,
Problem 112.

Problem 1

composition of sample	number of samples
0 mutants, 5 nonmutants	3
1 mutant, 4 nonmutants	3
2 mutants, 3 nonmutants	2
3 mutants, 2 nonmutants	2
4 mutants, 1 nonmutant	0
5 mutants, 0 nonmutants	0

Problem 3

(a) 1016/6549 = 0.155. (Note: Normally, I'd put a dot over the = to indicate approximately equal (since it's not an exact equality --- I rounded to the nearest thousandth), but I don't know how to do that using HTML. Since this will be a problem in expressing many answers, I won't bother to state each time that an indicated equality may only be an approximate equality.)

(b) 2480/6549 = 0.379.

(c) 1016/6549 + 2480/6549 - 526/6549 = 0.454.

(d) 526/6549 = 0.080.

Problem 4

One can use a tree diagram to identify two ways of getting a positive test. One can either pick a pregnant woman and have her test positive, or one can pick a woman who isn't pregnant and get a false positive result. We have

P(picking a pregnant woman) = 0.1 (from 100/1000),
P(picking a woman who isn't pregnant) = 0.9 (from 900/1000),
P(pregnant woman produces positive test result) = 0.98 (given),
P(woman who isn't pregnant produces positive test result) = 0.01 (given).

Summing the probabilities for the two pertinent branches of the tree (out of four possible paths altogether), we get

P(picking a pregnant woman) * P(pregnant woman produces positive test result) + P(picking a woman who isn't pregnant) * P(woman who isn't pregnant produces positive test result) = (0.1)(0.98) + (0.9)(0.01) = 0.098 + 0.009 = 0.107.

(Defining the event A to be that the chosen woman tests positive, and the event B to be that the chosen woman is pregnant, we can justify the tree approach used above by making use of Rule 4 (p. 89) and Rule 7 (p. 91 --- Rule 7 is used twice below, by the way) as follows:

P(A) = P( (A ∩ B) ∪ (A ∩ B^C) ) = P(A ∩ B) + P(A ∩ B^C) = P(A|B)*P(B) + P(A|B^C)*P(B^C).

This is just a way to use symbols to express more formally what is expressed using more words above --- but the numbers are plugged in the same way as shown above to produce the answer of 0.107.)

Problem 5

One can use a tree diagram to identify two ways of getting a positive test. One can either pick a person who has the disease and have him/her test positive, or one can pick a person who doesn't have the disease and get a false positive result. We have

P(picking a person who has the disease) = 0.1,
P(picking a person who doesn't have the disease) = 0.9,
P(person who has the disease produces positive test result) = 0.92,
P(person who doesn't have the disease produces positive test result) = 0.06.

Summing the probabilities for the two pertinent branches of the tree (out of four possible paths altogether), we get

P(picking a person who has the disease) * P(person who has the disease produces positive test result) + P(picking a person who doesn't have the disease) * P(person who doesn't have the disease produces positive test result) = (0.1)(0.92) + (0.9)(0.06) = 0.092 + 0.054 = 0.146.

Problem 6

(a) 1213/6549 = 0.185. (Note: Normally, I'd put a dot over the = to indicate approximately equal (since it's not an exact equality --- I rounded to the nearest thousandth), but I don't know how to do that using HTML. Since this will be a problem in expressing many answers, I won't bother to state each time that an indicated equality may only be an approximate equality.)

(b) 247/2115 = 0.117.

(c) If the event of selecting a smoker and the event of selecting a person with high income were independent, the probabilities requested in parts (a) and (b) would be equal. Since they aren't equal, the desired answer is no.

Problem 7

If the event that the husband smokes and the event that the wife smokes are independent events, then the percentage of couples for which both the husband and wife smoke would be 6% (since 30% of the husbands smoke and 20% of the wives smoke). Since it's 8%, it can be concluded that the event that the husband smokes and the event that the wife smokes are not independent events, and so the desired answer is no. (One can also note that the percentage of husbands who smoke given that their wife smokes is 40%, which does not equal the overall percentage of husbands who smoke, and the percentage of wives who smoke given that their husband smokes is about 26.67%, which does not equal the overall percentage of wives who smoke.)

Problem 8

There are 130 + 26 + 3 + 1 = 160 broods of size 7 or larger, and 5000 broods in all. If we select a brood randomly (with all 5000 broods equally likely to be selected), the chance the brood is of size 7 or larger (which is the desired probability) is simply 160/5000 = 0.032.

Problem 9

There are 3*610 = 1830 young birds in broods of size 3, and 22435 birds in all. If we select a young bird randomly (with all 22435 birds equally likely to be selected), the chance the bird is from a brood of size 3 (which is the desired probability) is simply 1830/22435, or about (upon rounding) 0.0816.

Problem 10

The expected value of a random variable is the weighted average of its possible outcomes, with the weights being the probabilities with which the outcomes occur. So we have E(Y) = 1*P(Y = 1) + 2*P(Y = 2) + ... + 10*P(Y = 10) = 1*(90/5000) + 2*(230/5000) + ... + 10*(1/5000) = 4.487.

Problem 11

The expected value of a random variable is the weighted average of its possible outcomes, with the weights being the probabilities with which the outcomes occur. So we have E(Y) = (0.343)*0 + (0.441)*1 + (0.189)*2 + (0.027)*3 = 0.441 + 0.378 + 0.081 = 0.9 (which is also equal to n*p = 3*(0.3), the mean of a binomial (3, 0.3) random variable).

Problem 12

The desired percentage is 25% + 12% + 7% = 44%.

Problem 13

(a) The desired probability is 0.41 + 0.21 = 0.62.

(b) The desired probability is 0.41 + 0.21 + 0.03 = 0.65.

(c) The desired probability is 0.01 + 0.34 = 0.35.

Problem 14

The desired probability is P(Y=5), where Y is a binomial (10, 0.6) random variable. This probability equals ₁₀C₅0.6⁵0.4⁵ = 0.201. (Note: One can also use SPSS (as described here) to obtain this probability.)

Problem 15

The desired probability is P(Y=2), where Y is a binomial (4, p) random variable, with p = 105/(105 + 100). This probability equals ₄C₂p²(1-p)² = 6(105/205)²(100/205)² = 0.375. (Note: One can also use SPSS (as described here) to obtain this probability.)

Problem 16

(a) The probability that all six are not albino is 0.75⁶, which is about 0.178. (One could also use the binomial distribution. Letting Y be the number albinos out of 6, Y has a binomial (6, 0.25) distribution. The desired probability is P(Y = 0). This leads to the same value given above.)

(b) The event that one or more are albino is the compliment of the event that none are albino --- so the desired probability is just 1 - 0.75⁶, which is about 0.822. (Using the the binomial distribution as indicated above, the desired probability is P(Y >= 1) = 1 - P(Y = 0). This leads to the same value given above.)

Problem 17

(a) The 1% indicated in the problem corresponds to a probability of 0.01 for a randomly selected patient experiencing damage, which gives us a probability of 0.99 for no damage. Making an assumption of independence, the probability of all fifty having no damage is 0.99⁵⁰, which is about 0.605. (One could also use the binomial distribution. Letting Y be the number out of 50 that experience damage, Y has a binomial (50, 0.01) distribution. The desired probability is P(Y = 0). This leads to the same value given above. (Note: One can use SPSS (as described here) to obtain this probability associated with a binomial distribution.))

(b) The event that one or more experience damage is the compliment of the event that none experience damage --- so the desired probability is just 1 - 0.99⁵⁰, which is about 0.395. (Using the the binomial distribution as indicated above, the desired probability is P(Y >= 1) = 1 - P(Y = 0). This leads to the same value given above. (Note: One can use SPSS (as described here) to obtain this probability associated with a binomial distribution.))

Problem 19

(a) The desired probability is approximately equal (noting that the distribution of brain weights is only approximately normal) to the probability that a standard normal random variable is less than or equal to (1325 - E(Y))/std.dev.(Y) = (1325 - 1400)/100 = -0.75. (Note that 1325 is 0.75 standard deviations below the mean of Y's distribution, and -0.75 is 0.75 standard deviations below the mean of a standard normal distribution.) So we have that P(Y <= 1325) is approximately equal to Φ( -0.75 ), or about 0.23. (Tables (e.g., see pp. 675-676 of S&W) or SPSS (as described here) can be used to find that Φ(-0.75) = 0.2266. (Alternatively, if one uses SPSS, one can obtain P(Y <= 1325) directly --- one does not have to first put the probability in terms of the cdf of the standard normal distribution.)) (Note: I intentionally rounded to only give two significant digits, since stating the probability with more than two digits seems to express a degree of accuracy that isn't warranted, given that the brain weight distribution is only approximately normal.)

(b) The desired probability is equal to P(Y <= 1600) - P(Y < 1475). Similar to above, we have that P(Y <= 1600) = Φ(2.00) and P(Y < 1475) = Φ(0.75). Tables (e.g., see pp. 675-676 of S&W) or SPSS (as described here) can be used to find that Φ(2.00) = 0.9772 and Φ(0.75) = 0.7734, and so altogether the desired probability is about 0.20. (Alternatively, if one uses SPSS, one can obtain P(Y <= 1600) and P(Y < 1475) directly --- one does not have to first put the probabilities in terms of the cdf of the standard normal distribution.)

Problem 20

The easy way to get the desired value is to find it on the infinity row of Table 4 on p. 677, in the 0.10 column. (That row corresponds to the standard normal distribution.) The desired standard normal critical value is about 1.282.

Problem 21

The easy way to get the desired value is to find it on the infinity row of Table 4 on p. 677, in the 0.01 column. (That row corresponds to the standard normal distribution.) The desired standard normal critical value is about 2.326. (Note: One can use SPSS (as described here) to obtain that the needed z critical value is 2.326.)

Problem 22

The desired probability is equal to the probability that a standard normal random variable is greater than or equal to (180 - E(Y))/std.dev.(Y) = (180 - 176)/30 = 2/15 = 0.1333. So we have that P(Y >= 180) is approximately equal to Phi( -0.1333 ), recalling, that to find an upper-tail probability for a standard normal random variable, we can look up the additive inverse (i.e., we change the sign) of the value in the table of the cdf of the standard normal distribution. In this case, we should interpolate 1/3 the way between the -0.13 value toward the -0.14 value, which gives us about 0.447.

Problem 24

(a) 1.44

(b) 1.41

(c) 1.58

(d) 0.18

(e) 0.7

(f) -0.1

(All of the above values, as well as the requested graphics, can be obtained by SPSS, using Analyze > Descriptive Statistics > Explore. (Select the gain variable by highlighting it and clicking the arrow to put it into the Dependent List box. The defaults of Explore don't produce percentile estimates. To obtain them, click Statistics near the bottom of the window that opened with Explore. Click the box in front of Percentiles to put a check in the box. Then click Continue to close the window. Also, the defaults of Explore don't produce the requested graphics (although some other plots are preduced). To obtain them, click Plots near the bottom of the window that opened with Explore. Click the boxes in front of Histogram and Normality plots with tests to put checks in the boxes. Then click Continue to close the window. Finally, click OK to close the window that opened with Explore, and the desired output should be produced.) Some claim that the sample mean should be rounded to the place indicated by the second significant digit of the estimated standard error. For the data at hand, the estimated standard error associated with the sample mean is given to be 0.02932, which would mean rounding the outputted sample mean value to the nearest thousandth. But since the data values have been reported to the nearest hundreth, I don't think it is proper to express so much accuracy in the estimate of the mean. So in this case I chose to use one less decimal place than was indicated by the "rule." I also rounded to the nearest hundreth for parts (c) and (d). (The answer to part (b) did not have to be rounded since the middle order statistic is already a value rounded to the nearest hundreth.) For the estimate of the 75th percentile, I used the one SPSS produced labeled Weighted Average (Definition 1), as opposed to the one labeled Tukey's Hinges. (From what I can gather, the weighted average estimate is from one of the standard estimators that people use for estimating percentiles. But it's not the best choice for the data under consideration. For this data, I would use another estimator to produce a value of 1.57 --- a value that I would regard as being slightly better. But you can see that it makes little difference, especially when it needs to be kept in mind that both of these estimates are subject to error (due to the fact that all we have to work with is a smallish sample of observations).) For parts (e) and (f) I rounded to the nearest tenth. The estimates of the skewness and kurtosis have so much uncertainty associated with them, giving more precise values seems silly --- it would be indicating way more accuracy than is warranted. (Plus, even if the estimates were more reliable, for most purposes, it would make no difference whether a skewness or kurtosis value was say 0.47 or 0.53.) (Note: For homework problems, round values as indicated in the assignment. If I do not indicate how to round, then make use of comments like the ones above in order to determine how much, if any, you should round.) (Although Explore can be used to obtain the desired graphics, I'll also indicate that the histogram can be obtained using Graphs > Histogram. (One just has to click the gain variable into the Variable box, and then click OK.) Also, the normality plot can be obtained using Graphs > Q-Q. (One just has to click the gain variable into the Variables box, and then click OK.))

Problem 25

(a) 98.3

(b) 94.5

(c) 67.3

(d) 40.38

(e) 0.82

Problem 26

Upon looking at the 5 probit plots (obtained using the default settings of SPSS's Q-Q plot routine), the two that are most strikingly nonlinear in appearance are the ones from the moisture and glucose data sets. For these data sets, a determinination of skewness is called for, with the moisture data set appearing to have come from a negatively skewed distribution, and the glucose data set appearing to have come from a positively skewed distribution. Upon looking at the other three plots, one should seek consistent patterns of curvature in order to identify the heavy-tailed and light-tailed distributions. If one looks at the lamb-wt plot, one should be able to see a gentle S-like curvature. (It helps if you can completely ignore the straight line that has been annoyingly placed on the plot with the plotted points. If you can do that, and can imagine the plotted points as being a country road, you should see that the road first gently curves in one direction, and then gently curves in the other direction.) Similarly, the radish plot shows a consistent curvature ... like a road that gently curves in one direction, and then curves in the other direction. (Continuing with the road analogy, the plots obtained from skewed distributions often show the road just curving one way. However, some plots from skewed distributions could indicate a bit of opposite curvature at the other end of the road (the end that isn't the more clearly curved), but the degree of curvature will generally be appreciably different at the two ends --- the plots will show a lack of "balance.") The lamb-wt and radish plots both have consistent patterns of curvature, and show a good degree of "balance" (thus giving no indication of appreciable skewness). The S-like curvature of the lamb-wt plot is indicative of heavy-tails, while the curvature of the radish plot (where the road appears to continue more in a north and south direction, as opposed to the more east and west appearance from the lamb-wt plot --- although each road still has a southwest and northest slant to it) is indicative of light tails. For the peppers data, there is not a consistent pattern of deviation from a straight line pattern that is indicative of clear positive or negative skewness, or of heavy or light tails. (If viewed as a road, it changes direction in it's slight curvature several times.) So, it's consistent with an approximately normal distribution.) It should be noted, that with these plots, the skewness patterns were more pronounced than the heavy-tailed and light-tailed patterns. But for other data sets, the skewness patterns can be milder, and the heavy-tailed pattern can be much more pronounced. (The radish plot is gives a fairly strong indication of light tails, but the lamb-wt plot is more subtle --- it could pass for approximate normality if one doesn't look carefully (but then when matching the 5 descriptions to the 5 data sets, the lamb-wt data should be the one selected as being from a heavy-tailed distribution ... none of the others show a heavy-tailed (approximately) symmetric pattern).)

(a) [E]

(b) [B]

(c) [D]

(d) [C]

(e) [A]

Problem 27

One can obtain the desired probabilities using a binomial random variable. Let Y be a binomial (3, 0.39) random variable, representing the number of mutants in a sample of size 3.

(a) The desired probability is P(Y = 0) = 0.61³, which is about 0.227.

(b) The desired probability is P(Y = 1) = ₃C₁(0.39)¹(0.61)² = 3(0.39)(0.61)², which is about 0.435.

Problem 28

The sample proportion will equal 0.4 if the sample of size 5 contains 2 mutants and 3 nonmutants. Using the table indicated in the book, the desired probability can be seen to be 0.35. (A direct calculation yields a value of about 0.345.)

Problem 29

Here we can use Theorem 5.1 on p. 159 of S&W. Using part (a) of item 3 of the theorem, the sample mean has a normal distribution as its sampling distribution. Also, we have that the mean of the sampling distribution is 176. The standard deviation associated with the cholesterol level of a randomly chosen member of the population is 30. So the standard error of a sample mean of nine is 30 divided by the square root of 9, or 30/3 = 10. (This is the standard deviation of the sampling distribution of the sample mean.) The desired probability is the probability that a normal random variable assumes a value within 1 standard deviation of its mean, which is about 0.683.

Problem 30

This is similar to the immediately preceding problem. Here the sample mean is normally distributed with a standard error of 400 divided by the square root of 15, or about 103.28. The desired probability is the probability that the sample mean assumes a value within 100 of its mean, which is the probability that a normal random variable assumes a value within 100/103.28 = 0.96825 standard deviations of its mean, which is about 0.667. (This value can be obtained by looking up the probability corresponding to a z value of 0.96825 (which, interpolating between the values given for 0.96 and 0.97, is about 0.83356), and subtracting from it the probability corresponding to a z value of -0.96825 (which, interpolating between the values given for -0.96 and -0.97, is about 0.16644). It can be noted that the answer given in the back of the book is a bit off (due to too much rounding error in steps prior to reporting the final answer). If you consider part (c) of the problem in the book, the answer is increase --- if the sample size is larger, the probability that the sample mean is within a specified amount of the distribution mean is larger ... if it wasn't, then we'd have a worse estimator even though the amount of information is greater, which isn't sensible.)

Problem 31

This is similar to the immediately preceding problem. Here the sample mean is normally distributed with a standard error of 400 divided by the square root of 60, which gives us a value of 400/sqrt(60) = 51.6398. The desired probability is the probability that the sample mean assumes a value within 100 of its mean, which is the probability that a normal random variable assumes a value within 100/51.6398 = 1.9365 standard deviations of its mean, which is about 0.947.

Problem 32

The answers can be obtained by dividing the standard deviation, 145, by the square root of the sample size. (Note: The book should have requested the standard error of the sample mean --- the mean isn't a random variable, and so it has no standard error ... the sample mean is an estimator which can be used to estimate the distribution/population mean, and since it's a statistic (which is a random variable), we can refer to its standard error. If 145 is the sample standard deviation instead of the true standard deviation, then the request should have been for the estimated standard error of the sample mean.)

(a) 51.3

(b) 26.5

Problem 33

The answer can be obtained by dividing the sample standard deviation, 15, by the square root of the sample size. (Note: The book should have requested the estimated standard error of the sample mean --- the mean isn't a random variable, and so it has no standard error ... the sample mean is an estimator which can be used to estimate the distribution/population mean, and since it's a statistic (which is a random variable), we can refer to its standard error, but since it involves the population standard deviation, which is unknown, we must be content with an estimate of the standard error.) The desired value is 3.

Problem 34

Of the two choices given, the answer is the SD. The standard deviation is a measure of how much variation exists in a population, whereas the standard error pertains to the variability of a statistic, which is typically a function of the sample size. But if I saw the statement "Rats weighing 150 +/- 10g were injected" I would take it to mean that all of the rats injected weighed between 140 g and 160 g. So, in my opinion, the exercise in the book isn't a good one, but I hope that you learned something from it and my comments anyway.

Problem 35

Here we should take SD and SE to refer to the sample standard deviation and the estimated standard error of the sample mean. (Note: We can refer to the standard error of any statistic --- it doesn't just pertain to the sample mean --- and so one should always make sure that the relevant statistic is clearly indicated.)

(a) SE (The standard error of an estimator is a measure of how spread out the probability mass of its sampling distribution is about the mean of its sampling distribution. Since the sample mean is unbiased, the standard error of the sample mean is a measure of how spread out the probability mass of the sampling distribution of the sample mean is about the mean of the parent distribution (the estimand), which is of course related to the probability that the estimator assumes a value close to the estimand --- that is, it's related to the accuracy of the estimator.)

(b) SD (The sample standard deviation is a consistant estimator of the population/distribution standard deviation --- so it converges to some constant value as the sample size increases. For large sample sizes, it's close to the true value with high probability --- it doesn't tend to get larger or smaller as the sample size increases, but rather tends to fluctuate about the true value as observations are added to the sample.)

(c) SE (Since the sample standard deviation converges to some constant value as the sample size increases, the sample standard deviation divided by the square root of the sample size (which is the estimated standard error of the sample mean) tends to decrease as the sample size increases.)

Problem 36

(a) 3.9 (This can be obtained from SPSS's Explore procedure or One-Sample T Test procedure. (To use Explore, type in the data and then use Analyze > Descriptive Statistics > Explore. The value of 3.9 can be obtained from the Std. Error column, beside of the value of the sample mean. To use One-Sample T Test, follow the instructions given for part (b) below. The desired standard error is given in the One-Sample Statistics part of the output. Alternatively, in this case, one could take the given standard deviation and divide by the square root of 5, but in general if you want two accurate significant digits, you should worry that too much rounding before the final answer (i.e., the sample standard deviation is not exactly 8.7) could lead to the 2nd digit being wrong.)

(b) (23.4, 40.0) (This can be obtained from SPSS's One-Sample T Test procedure. With the data entered, use Analyze > Compare Means > One-Sample T Test. Select the variable (the name of the column containing the data), and then click Options. Change the value in the Confidence Interval box from 95 to 90. Click Continue and then OK. The desired confidence interval is given as part of the One-Sample Test output. Alternatively, in this case, one could use the given sample mean and standard deviation, along with the t critical value from Table 4, and and arrive at the correct answer, but in general you should worry that too much rounding before the final answer (e.g., the sample mean is not exactly 31.7 and the sample standard deviation is not exactly 8.7), which could lead to inaccuracy in the confidence bounds.)

Problem 37

(20.9, 42.6) (This can be obtained from SPSS's One-Sample T Test procedure. If one used the given sample mean and standard deviation, along with the t critical value from Table 4, the upper confidence bound rounds to 42.5. This answer suffers from too much rounding error. If you want two accurate significant digits, you should worry that too much rounding before the final answer (e.g., the sample standard deviation is not exactly 8.7) could lead to the last reported digit being wrong.)

Problem 38

(a) False (We are 100% confident that the sample mean is in the stated interval.)

(b) True (The method has a success rate of (about) 95%. (I put about because we can be certain that the sample did not come from exactly a normal distribution, and so the nominal confidence level may not correspond exactly to the coverage probability --- but assuming the distribution is not too nonnormal, since the sample size is 86, the coverage probability ought to be close to 0.95 for this situation.))

Problem 39

False (A 95% confidence interval is supposed to trap the distribution mean with probability 0.95, and not necessarily trap 95% of the data. If the population is approximately normal, an interval centered on the sample mean and including values two sample standard deviations above and below it might contain about 95% of the data values in the sample --- but this interval will be much wider than the stated confidence interval since the confidence interval includes points about two estimated standard errors on either side of the sample mean, and the estimated standard error is much smaller than the sample standard deviation (since it is the sample standard deviation divided by the square root of the sample size).)

Problem 40

The widest interval (the last one) is the 90% confidence interval, and the narrowest interval (the first one) is the 80% confidence interval (and the other one is the 85% confidence interval) --- the greater the confidence level, the wider the confidence interval.

Problem 41

Yes^* (Really, the answer is no, if exact normality is meant --- I don't think that the parent distribution for the data is exactly normal. But a probit plot suggests that the parent distribution may be nearly normal, and the confidence interval procedure should work decently. The parent distribution doesn't have to be exactly normal for the method to be okay to use --- if exact normality were to be required, the method would be seldom, if ever, used!)

Problem 42

Table 4 doesn't have the desired t critical value based on 35 df, but one might guess that it is close to 2.03, doing a crude slightly nonlinear interpolation procedure (2.03 is nearly halfway between the critical values based on 30 df and 40 df). So the desired confidence interval is (6.21 +/- 2.03*1.84/6), noting that 6 is the square root of 36 (the sample size). Rounding to give only two significant digits for the confidence bounds, due to the interpolation, the fact that the summary statistics and data values themselves may have been rounded, and because the parent distribution isn't (we can assume) exactly normal, the desired interval is (5.6, 6.8). (Note: One can use SPSS (as described here) to obtain that the needed t critical value is 2.030.)

Problem 43

Table 4 gives 1.984 as the desired t critical value based on 100 df. So the desired confidence interval is (10.3 +/- 1.984*0.9/(101)^1/2). If we take 0.9 as the exact sample standard deviation, then upon dividing by the square root of 101 to obtain the estimated standard error, it can be seen that the 2nd significant digit is in the thousandths location. But since the summary statistics and data values themselves may have been rounded, and because the parent distribution isn't (we can assume) exactly normal, the confidence bounds for the desired interval shouldn't be reported indicating too much precision, and so rounding each bound to the nearest hundreth or nearest tenth seems like the sensible thing to do. So, the desired interval is (10.12, 10.48) (or (10.1, 10.5)).

Problem 45

The coverage probability would be about 0.68.

Problem 46

(a) (5.0, 5.4)

(b) (4.9, 5.4)

(c) (4.8, 5.5)

Problem 47

For the center of the interval,

the adjusted numerator is 69 + 1.96*1.96/2 = 70.921;
the adjusted denominator is 339 + 1.96*1.96 = 342.84;

which gives us 0.20686. For the estimated standard error we have

( 0.20686*(1 - 0.20686)/342.84 )**0.5 = 0.021876,

which suggests that the confidence bounds should be rounded to the nearest thousandth (since the 2nd significant digit is in the thousandth location). Upon multiplying the estimated standard error by the appropriate standard normal critical value, 1.96, we obtain

( 0.20686 +/- 1.96*0.021876 ) = ( 0.20686 +/- 0.04288 ) = (0.1640, 0.2497),

which, upon further rounding, gives us (0.164, 0.250).

Problem 48

The number of "successes" is 151 (because 151 is the only integer to put in the numerator to combine with 959 in the denominator to yield a sample proportion which rounds to 0.157). For the center of the interval,

the adjusted numerator is 151 + 1.645*1.645/2 = 152.35;
the adjusted denominator is 959 + 1.645*1.645 = 961.71;

which gives us 0.15842. For the estimated standard error we have

( 0.15842*(1 - 0.15842)/961.71)**0.5 = 0.011774,

( 0.15842 +/- 1.645*0.011774 ) = ( 0.15842 +/- 0.01937 ) = (0.13905, 0.17779),

which, upon further rounding, gives us (0.139, 0.178).

Be sure to express any interval estimate as an interval. (Why is this so hard for students to grasp?) That is, the answer is (0.139, 0.178), and not 0.158 +/- 0.019, and certainly not 0.139 < p < 0.178 (since we don't know that the estimand is actually between the lower and upper confidence bounds). Also, while there is seldom a good reason to express a p-value using more than 2 significant digits, we don't necessarily want to round point estimates and confidence bounds to two significant digits. (In some cases, when rounded to 2 significant digits, the upper bound will be the same as the lower bound, and that doesn't make a good interval. Also, if we round so much, sometimes there is little point in using a superior method instead of an inferior one.) S&W's suggestion of using the place of the 2nd significant digit of the (estimated) standard error to determine to what place point estimates and confidence bounds should be rounded is generally a good method. I sometimes deviate from this method if I believe that the raw data has been rounded so much that fewer significant digits should be used to express the estimate obtained from the data.

Problem 49

We have

( 4.3*4.3/6 + 5.7*5.7/12 )**0.5 = ( 3.0817 + 2.7075 )**0.5 = ( 5.7892 )**0.5 = 2.406,

which, upon further rounding, gives us 2.4.

Problem 50

We have

( 44.2*44.2/10 + 28.7*28.7/10 )**0.5 = ( 195.36 + 82.37 )**0.5 = ( 277.73 )**0.5 = 16.665

which, upon further rounding, gives us 17.

Problem 51

I used SPSS to obtain the confidence interval and p-value, and also used it to produce normality plots and distribution skewness estimates in order to address part (c).

(a) (-7, 10).

(b) 0.78

(c) The results are reasonably trustworthy, since the normality plots indicate that both distributions may be positively skewed, and the estimated skewness of 0.8 (0.7) for the caffeine distribution and the estimated skewness of 0.4 (0.7) for the decaf distribution aren't too different, due to a central limit theorem effect for the sample means (i.e., their sampling distributions are quite a bit less skewed than are the parent distributions of the individual data values) and also a cancellation of skewness effect (due to using the difference in the sample means), the robustness properties of Welch's test ought to make it perform decently in the given situation. The biggest worry is that with such small sample sizes, we don't have a good idea about what the underlying distributions are like.

Problem 52

I used SPSS to obtain the confidence interval and p-value, and also used it to produce normality plots and distribution skewness estimates in order to address part (c). (See my comments here about Example 7.7 to see how to use SPSS to obtain the desired confidence interval. It can be noted that the output produced when obtaining the confidence interval also contains the desired p-value for part (b).)

(a) The 95% confidence interval for the mean of the red light distribution minus the mean of the green light distribution is (-1.6, 0.4). So while the point estimate suggests that the mean of the green light distribution may be the larger one, the fact that the interval includes 0 and also some positive values means that we can't be highly confident that the mean of the green light distribution is actually larger than the mean of the red light distribution. (It can be noted that the SPSS output gives intervals obtained from both Student's t procedure and Welch's method. It also gives the result of a test for nonequality of variances, but I don't think that the test result needs to be considered here --- I don't think there is any reason to give the benefit of the doubt to the null hypothesis of homoscedasticity (equal variances), and that's what the test does ... rather, the safe thing to do is to allow for heteroscedasticity (unequal variances), knowing that if the variances are really equal, incorrectly using Welch's method should have little adverse effect. (Similarly, in part (b), Welch's test is used, instead of Student's two-sample t test.))

(b) 0.26

(c) The results are reasonably trustworthy, since the normality plots indicate that both distributions are negatively skewed, and the estimated skewness of -1.4 (0.6) for the red light distribution and the estimated skewness of -1.1 (0.5) for the green light distribution aren't too different, due to a central limit theorem effect for the sample means (i.e., their sampling distributions are quite a bit less skewed than are the parent distributions of the individual data values) and also a cancellation of skewness effect (due to using the difference in the sample means), the robustness properties of Welch's test ought to make it perform decently in the given situation.

Problem 53

Using the given information, the value of Welch's statistic is

(3840 - 5310)/1064 = -1470/1064 = -1.3816.

Since this value is not less than -2.145 or greater than 2.145 (where 2.145 is the 0.025 critical value for the T distribution with 14 df), one cannot reject the null hypothesis of equal means with a size 0.05 test. (Note: Sometimes you may only be given the summary statistics, and not the complete data. So it's important to know how to compute the value of the test statistic from the summary information, and determine whether or not you can reject at a given level, since one might not be able to load the data into a statistical software package and compute a p-value. However, using the table on p. 677 of S&W one can further determine that the p-value satisfies

0.1 < p-value < 0.2,

and using software it can be determined that the p-value is about 0.19. (Note: One can use SPSS (as described here, only using CDF.T instead of CDFNORM) to obtain the probability that T₁₄ random variable will assume a value less than or equal to -1.3816. This probability needs to be doubled to obtain the p-value for a two-tailed test.) Making a statement about the p-value can provide more information than merely stating whether or not one can reject at a particular level. It should be kept in mind that without having the raw data to work with, but instead only having the summary measures, we cannot assess the approximate normality of the distributions, or determine if they are nonnormal in such a way as to make Welch's test reliable. This is particularly bothersome when the sample sizes are small, and we can't rely on robustness due to large samples.)

Problem 54

Using the given information, the value of the estimated standard error is

( 2.87*2.87/60 + 3.52*3.52/50 )**0.5 = ( 0.38509 )**0.5 = 0.62056,

and the value of Welch's statistic is

(78.42 - 80.44)/0.62056 = -2.02/0.62056 = -3.26.

To do a size 0.05 test, we need to compare the magnitude of the test statistic to t_{94, 0.025}. This value is in between t_{80, 0.025} = 1.990 and t_{100, 0.025} = 1.984, and so we should reject the null hypothesis of equal means in favor of the alternative hypothesis of unequal means (since the absolute value of the test statistic exceeds the 0.025 critical value).

Problem 55

Noting that the sample mean of the males is less than the sample mean of the females, it can be concluded that we have

p-value > 0.5.

We can actually be more specific about the p-value by noting that we're doing an upper-tailed test with a test statistic value that is -3.26. Since the absolute value of the test statistic, 3.26, is in between the 0.005 and 0.0005 critical values for both 80 df and 100 df, it is also in between those critical values for 94 df. So the lower-tail probability is between 0.0005 and 0.005, which gives us

0.995 < p-value < 0.9995

(since the p-value for an upper-tail test is the probability mass associated with values greater than the observed value of the test statistic).

Problem 57

No, since the distributions appear to overlap quite a bit, a given tibia length could be consistent with both the values of male tibia lengths and female tibia lengths. (That the distributions overlap quite a bit can be guessed from an examination of the sample means and sample standard deviations. If one assumes approximate normality for the length distributions, then it can be guessed the bulk of the male lengths range from about 73 to about 84, and that the bulk of the female lengths range from about 73 to about 87. Note that a test about the means doesn't do much to help us answer the question posed for this part --- whether or not there is statistically significant evidence that the distribution means differ, there can be appreciable overlap of the ranges of values commonly observed from each distribution, which makes it hard to confidently guess which distribution an individual observation is associated with.)

Problem 58

(a) The statement is true. Recalling that the p-value is the smallest level for which one can reject based on the observed data, since the p-value is 0.03, we can surely reject at level 0.03. Since a level 0.05 test has a larger rejection region than a level 0.03 test, one can also reject at level 0.05 since one can reject at level 0.03. (That is, if the observed value of the test statistic is in the rejection region for a size 0.03 test, it must be in the rejection region for a size 0.05 test.) In general, it can be noted that one can reject at level alpha whenever

p-value <= alpha.

(b) The statement is true, for the reason highlighted above in part (a).

(c) The statement is true, since the p-value for the test equals the null probability of obtaining a test statistic value at least as extreme as the observed value of the test statistic.

Problem 59

(a) The statement is true. Recalling that the p-value is the smallest level for which one can reject based on the observed data, since the p-value is 0.07, we can surely reject at level 0.07. Since a level 0.1 test has a larger rejection region than a level 0.07 test, one can also reject at level 0.1 since one can reject at level 0.07. (That is, if the observed value of the test statistic is in the rejection region for a size 0.07 test, it must be in the rejection region for a size 0.1 test.) In general, it can be noted that one can reject at level alpha if

p-value <= alpha,

and one cannot reject at level alpha if

p-value > alpha.

(b) The statement is false, for the reason highlighted above in part (a).

(c) The statement is false, since the observed value of the sample mean for the first sample is either greater than the observed value of the sample mean for the second sample or it's not --- and so the probability that it's greter is either 1 or 0. (If we take the sample means to be random variables, then there isn't enough information given to determine the probability that the one sample mean is greater than the other one. But there is no reason why this probability should equal the p-value.)

Problem 60

SPSS yields 0.049 as the p-value for a two-tailed Welch's test. Since the sample means are in the order consistent with the alternative hypothesis, the appropriate p-value for the indicated one-tailed test is half of the two-tailed test p-value. So the p-value is about 0.025. (When we divide 0.049 in half we get 0.0245, but whether we should report 0.024 or 0.025 as the one-tailed test p-value depends on whether the two-tailed test p-value is actually greater than or less than 0.049. It's impossible to easily determine this from the SPSS output. But with another software package, I got 0.025 as the one-tailed test p-value.)

It is appropriate to put something about the validity of Welch's test, since an examination of the data indicates that both distributions are skewed. Because the degree of skewness appears to be similar for the two distributions, and the sample sizes are equal and not really small, Welch's test ought be behave decently (due to a cancellation of a lot of the effect due to skewness on the sampling distributions of the sample means). One could also consider using the Mann-Whitney test. It produces a p-value of about 0.026. If we are willing to believe that if the two distributions differ, then one is stochastically larger then the other, we can use the M-W test to do a test about the distribution means. The data supports that it's reasonable to make the sufficient assumption (but unfortunately, with SPSS, there doesn't seem to be a good way to do the graphical check that I like to do (and so I used other software to help me reach this conclusion)), and so we could consider reporting the p-value from the M-W test, since it's nearly as small as the one obtained from Welch's test, and we don't have the nonnormality to worry about. (It's nice when the p-values from two reasonable tests agree. This need not be the case, since in some settings one test may yield a smaller p-value due to higher power, but sometimes you do get a nice confirmation about the strength of the evidence.)

Problem 61

(a) 0.13 (Since the sample sizes are larger than 10, the tables of the exact null distribution of U cannot be used. However, we can make use of SPSS to obtain an approximate p-value, using Analyze > Nonparametric Tests > 2 Independent Samples. One should click height into the Test Variable List box. Unfortunately, color cannot be used as the Grouping Variable since red and green aren't accepted when defining the groups --- unlike what worked for Welch's test, this part of SPSS requires that the grouping variable be coded with numerical values. So, to make it work, I created a new column, entering 17 values of 1 followed by 25 values of 2. If I name this new variable gr, I click gr into the Grouping Variable box, and then click Define Groups and enter 1 and 2 as the group labels. Finally, I can click Continue, followed by OK, to obtain the approximate p-value of 0.127 (which I round to 0.13 since it results from using an approximation). (Note: On the outputted display, the p-value is labeled with Asymp. Sig. (2-tailed).))

(b) 0.28 (This can be obtained from SPSS using Analyze > Compare Means > Independent-Samples T Test. One should click height into the Test Variable(s) box. Then click color into the Grouping Variable box, and next click Define Groups and enter red and green as the group labels. Finally, I can click Continue, followed by OK, to obtain the approximate p-value of 0.275 (which I round to 0.28 since it results from using a test designed for normal distributions on nonnormal distributions (and so we must rely on the robustness of the procedure to give us an approximate p-value)). (Note: On the outputted display, the p-value is found in the Equal variances assumed row of the Sig. (2-tailed) column. (The assumption of equal variances is proper, since if the null hypothesis of no effect is true, the two distributions underlying the data are identical (and thus have equal variances).))) (The nonnormality and unequal sample sizes hurts the robustness of Student's t test, creating some concern about it's (approximate) validity. But since the clearly valid Mann-Whitney test produces a smaller p-value, and it's not even highly significant, there isn't a lot of reason or wonder about the validity of the t test. (Often skewness hurts the power of the t test to detect differences, and it is often the case that with skewed distributions the M-W test is more powerfeul.))

Problem 62

(a) 0.69 (For this data, we can make use of SPSS to obtain an exact p-value, using Analyze > Nonparametric Tests > 2 Independent Samples. One should click weight into the Test Variable List box, and then click group into the Grouping Variable box. Next, click Define Groups and enter 14 and 15 as the group labels. Finally, I can click Continue, followed by OK, to obtain the exact p-value of 0.690 (which I round to 0.69 since a high degree of stated accuracy isn't important when the p-value is such a large, statistically nonsignificant, value).) (Since SPSS uses the exact distribution (in this case (as can be seen from the output), but not always) to obtain the p-value, there is no need to use the table. But to use the table, one could first compare each of the 5 14 days observations to each of the 5 15 days observations, and note that the 14 days observation is larger in 15 of the 25 comparisons. This gives us that the observed value of the M-W test statistic, u, is equal to 15. Since u is larger than the mean of the test statistic if the null hypothesis is true, the p-value for a two-tailed test is equal to 2P₀(U≥u) = 2P₀(U≥15) = 2P₀(U≤25-15) = 2P₀(U≤10) = 2(0.3452) = 0.6904. (For some further explanation, the mean of the null hypothesis sampling distribution of U is just the product of the two sample sizes divided by 2, which is equal to (5)(5)/2 = 12.5. Since the tabulated values are lower-tail probabilities, to determine an upper-tail probability we can convert it to an equal lower-tail probability using P₀(U ≥ u) = P₀(U ≤ n_Xn_Y - u). In using the table, one sets k₁ equal to the minimum of the sample sizes and sets k₂ equal to the maximum of the sample sizes. In this case, both of these values are equal to 5, and so we go to the k₁ = 5 section of the table, and use the k₂ = 5 column. Then going down the column one should see that the value corresponding to a = 10 is 0.3452. (The tabulated values are the null probabilities that the test statistic takes a value less than or equal to a. In our case, we want the probability of a value less than or equal to 10, and so we need to use 10 as the value of a.)) One could reverse the roles of the two groups and let u be the number of times, out of 25 comparisons, a 15 days value is larger. This would yield u = 10, and since 10 ls less than the null hypothesis mean, the p-value is 2P₀(U≤10). Either way one does it, the same p-value results.)

(b) 0.63 (This can be obtained from SPSS using Analyze > Compare Means > Independent-Samples T Test.) (Note: One may wonder whether Student's two-sample t test is valid here. If we do a test of the general two-sample problem (of the null hypothesis of identical distributions against the general alternative that the distributions differ), then if the null hypothesis is true, the variances are equal (since the distributions are the same). This supports the choice of Student's t test over Welch's test. But what about possible nonnormality? Well, since the sample sizes are equal, if the null hypothesis is true the two sample means have the same sampling distribution, and so a complete cancellation of any skewness effect will occur (because one sample mean is subtracted from the other, which means that the sampling distribution of the difference in sample means will be symmetric (if the null hypothesis is true). Other types of nonnormality could cause problems with validity, but some types of nonnormality, like heavy tails, will lead to a conservative test instead of one that isn't valid. Still, having only 6 observations per sample makes is very hard to tell much about the nature of the nonnormality, and with such small sample sizes it's also the case that the "Central Limit Theorem effect" may not be too good (i.e., the sampling distributions of the sample means may not be as normal as is needed to make the T distribution a good approximation of the actual null sampling distribution of the test statistic). All in all, it is somewhat risky to rely on the robustness of Student's t test in this small sample setting, even though the facts that the general two-sample problem is being addressed and that there are equal sample sizes, leads to some robustness (like a cancellation effect for skewness). So it is nice that a nonparametric test which is clearly valid gives a p-value almost as small as the one which is suspect (although with such large p-values there is not much to get excited about anyway).)

Problem 66

(a) 0.026. (Since SPSS uses the exact distribution (in this case (as can be seen from the output), but not always) to obtain the p-value, there is no need to use the table. But to use the table, one could first compare each of the 6 Toluene observations to each of the 6 Control observations, and note that the Toluene observation is larger in 32 of the 36 comparisons. This gives us that the observed value of the M-W test statistic, u, is equal to 32. Since u is larger than the mean of the test statistic if the null hypothesis is true, the p-value for a two-tailed test is equal to 2P₀(U≥u) = 2P₀(U≥32) = 2P₀(U≤36-32) = 2P₀(U≤4) = 2(0.0130) = 0.0260. (For some further explanation, the mean of the null hypothesis sampling distribution of U is just the product of the two sample sizes divided by 2, which is equal to (6)(6)/2 = 18. Since the tabulated values are lower-tail probabilities, to determine an upper-tail probability we can convert it to an equal lower-tail probability using P₀(U ≥ u) = P₀(U ≤ n_Xn_Y - u). In using the table, one sets k₁ equal to the minimum of the sample sizes and sets k₂ equal to the maximum of the sample sizes. In this case, both of these values are equal to 6, and so we go to the k₁ = 6 section of the table, and use the k₂ = 6 column. Then going down the column one should see that the value corresponding to a = 4 is 0.0130. (The tabulated values are the null probabilities that the test statistic takes a value less than or equal to a. In our case, we want the probability of a value less than or equal to 4, and so we need to use 4 as the value of a.)) One could reverse the roles of the two groups and let u be the number of times, out of 36 comparisons, a Control value is larger. This would yield u = 4, and since 4 ls less than the null hypothesis mean, the p-value is 2P₀(U≤4). Either way one does it, the same p-value results.)

(b) 0.020 (This can be obtained from SPSS using Analyze > Compare Means > Independent-Samples T Test.) (Note: One may wonder whether Student's two-sample t test is valid here. If we do a test of the general two-sample problem (of the null hypothesis of identical distributions against the general alternative that the distributions differ), then if the null hypothesis is true, the variances are equal (since the distributions are the same). This supports the choice of Student's t test over Welch's test. But what about possible nonnormality? Well, since the sample sizes are equal, if the null hypothesis is true the two sample means have the same sampling distribution, and so a complete cancellation of any skewness effect will occur (because one sample mean is subtracted from the other, which means that the sampling distribution of the difference in sample means will be symmetric (if the null hypothesis is true). Other types of nonnormality could cause problems with validity, but some types of nonnormality, like heavy tails, will lead to a conservative test instead of one that isn't valid. Still, having only 6 observations per sample makes is very hard to tell much about the nature of the nonnormality, and with such small sample sizes it's also the case that the "Central Limit Theorem effect" may not be too good (i.e., the sampling distributions of the sample means may not be as normal as is needed to make the T distribution a good approximation of the actual null sampling distribution of the test statistic). All in all, it is somewhat risky to rely on the robustness of Student's t test in this small sample setting, even though the facts that the general two-sample problem is being addressed and that there are equal sample sizes, leads to some robustness (like a cancellation effect for skewness). So it is nice that a nonparametric test which is clearly valid gives a p-value almost as small as the one which is suspect.)

Problem 67

One should not conclude that living in Arizona exacerbates respiratory problem, since the information provided does not give us strong evidence that people in Arizona would have less respiratory trouble if they lived elsewhere. It could be that a greater than average proportion of people who have respiratory problems just happen to live in Arizona (and it may be that some such people moved to Arizona because they believe that the climate there will be better for them and their respiratory problems --- that is, Arizona may attract people with existing respiratory problems as opposed to causing respiratory problems).

Problem 68

(a) daily coffee intake

(b) heart disease indicator (a binary variable indicating whether or not a subject has coronary heart disease)

(c) the 1040 subjects

Problem 69

Because of what they may have heard or read, some people who think they are consuming olestra, might expect to have a problem, giving us the possibility of a noceblo effect. To "balance out" a possible noceblo effect, subjects should not be told what type of potato chips they are given. If subjects self-report whether or not they experienced problems, then a double-blind experiment is not necessary, but if they are interviewed by people helping to conduct the experiment, and these helpers make a judgement or help the subjects to reach a judgement about whether or not problems occurred, then double-blinding is desirable.

Problem 70

The alternative design is not good, because there is no good way to determine if there are differences between the 5 treatments (one actually a control), since any observed differences could be totally due to differences between the litters (and variation among the piglets in the litters). That is, due to confounding, if two groups were appeciably different, one would not know if this was due to a difference between treatments or a difference between litters. (With an observational study, one cannot avoid confounding, and this makes the interpretation of the results less crisp than what one might get from a well-designed experiment But when designing an experiment one needs to avoid problems with confounding --- in order to attribute observed differences to, at least in part, treatment differences, one needs to have large enough differences between observations for which all of the factors are held constant except for the treatments (One needs to the observed differences to be beyond what can be anticipated due to experimental noise (due to variation among experimental units), which can be assessed using statistical methods). A common way of doing this is with blocks, since within each block key factors are constant, or nearly constant, and any small differences can be absorbed into the experimental noise by the use of proper randomization (and may in fact be the major source of noise).)

Problem 71

Plan III is the best plan, Plan I is the second best plan, and Plan II is the worst plan. II is worst because of problem with confounding --- there is no way to determine if observed differences are due to treatment effects, or due to differences in the lighting conditions. I allows for a fair assessment of treatment differences, since randomization can account for (but not control/reduce) observed differences due to different lighting conditions. But III will provide increased power to detect treatment effects, because some of the experimental noise due to the random assignment of animals to the different lighting conditions will be cancelled out by having equal numbers of each treatment group at each level. (Another way to think of it is that by considering differences between treatments each each level, the lighting is held constant (or nearly so), and observed differences between groups at a given level can be attributed to treatment differences (and differences between animals that must be accounted for --- i.e., the observed differences between treatments will have to detected above the magnitude of differences that are easily explained by uncontrolled experimental noise).)

Problem 72

Plan II is the better plan. Confounding is a problem with Plan I --- if rain allowed the last variety to remain in the field growing for an appreciably longer period of time than the times for the other varieties, and the last variety produced the greatest yield, one wouldn't know if the large yield was due to the rain or due to that variety of corn being superior. However, if rain interupted the harvesting when using II, the different growing times might lead to some appreciable differences between the total yields for the blocks, but when comparing the varieties of corn, such differences in yield due to growing time differences would cancel out, and one could still fairly assess, and make specific statements about, differences between varieties of corn.

Problem 73

False --- The primary reason for using a randomized block design is to reduce the variability due to extraneous factors. By reducing the experimental noise, observed differences can be taken more seriously as evidence of differences between treatments. If blocking is not done, differences in experimental units due to extraneous factors can cloud the results of the experiment, since more allowance has to be made for the observed differences being due to factors other than the differences between treatments. A completely randomized design, without blocking, can reduce bias, but by combining blocking with randomization, one can reduce variability in addition to bias.

Problem 74

Because 3 measurements were made using each of 2 different flasks, it's not good to assume that they are 6 iid observations from the same distribution --- the nested design leads to a lack of independence, and the estimated standard error should be obtained by first computing 2 sample means (1 from each set of 3 observations), and then computing the sample standard deviation using those two values. (See p. 336 for a similar situation.)

Problem 75

To obtain the best set of blocks, one just needs to go down the columns, letting the first two subjects be one block, the next two subjects be another block, and so on. The assignment of the subjects in the blocks to the treatments can be done randomly, using one coin flip for each block. (It can be noted that this creation of blocks put the two 56 year old males in different blocks, but if they are put in the same block, then another block will have two males rather different in age.)

Problem 76

(a) 0.001 (With SPSS, one can use Analyze > Compare Means > One-Sample T Test. One should click protprod into the Test Variable box, change the Test Value to 450, and click OK. SPSS does a two-tailed test, but since the test statistic value is 3.525, which is positive, it can be concluded that the two-tailed test p-value of 0.002 corresponds to doubling an upper-tail area of 0.001. This upper-tail area is the p-value for an upper-tailed test. (It's too bad that SPSS supplies so few digits for the reported p-values, since in some cases we may want more. In this case one can conclude that the two-tailed test p-value, if reported with more digits, is in the interval (0.0015, 0.0025), since it rounds to 0.002. It follows that the one-tail probability belongs to (0.00075, 0.00125). So if we wanted to express a bit more precision, we cannot easily tell from the SPSS output whether we should put 0.0008, 0.0009, 0.0010, 0.0011, or 0.0012. In this case, such extra indicated accuracy is perhaps not warrented, and so there is nothing wrong with stating that the p-value is about/approximately 0.001. If one did want to indicate more precision, CDF.T could be used to determine that the upper-tail area is about 0.00077, and so one could state that the p-value is about 0.0008.))

(b) 0.98 (With SPSS, one can use Analyze > Compare Means > One-Sample T Test. One should click protprod into the Test Variable box, change the Test Value to 500, and click OK. SPSS does a two-tailed test, but since the test statistic value is -2.213, which is negative, it can be concluded that the two-tailed test p-value of 0.036 corresponds to doubling a lower-tail area of 0.018. The corresponding upper-tail area, which is the p-value for an upper-tailed test, is 1 - 0.018 = 0.982, which can be rounded to 0.98 (since with a p-value so large more precision isn't really needed, and since nonnormaility makes the test only approximate anyway).)

(c) The p-values should be regarded as being fairly trustworthy. With SPSS, using Analyze > Descriptive Statsitics > Explore, one can see that the sample skewness is about -0.1, that the sample kurtosis is about -0.9, and that the probit plot suggests an approximately normal distribution. It seems that any skewness is of a negligible nature, and if the tails are indeed light, it should be remembered that light tails don't cause a problem unless the sample size is about 10 or less.

Problem 77

(a) 0.091 (With SPSS, one can use Analyze > Compare Means > One-Sample T Test, changing the Test Value to 90. SPSS does a two-tailed test, but since the test statistic value is -1.367, which is negative, it can be concluded that the two-tailed test p-value of 0.182 corresponds to doubling a lower-tail area of 0.091. This lower-tail area is the p-value for a lower-tailed test.)

(b) 0.001 (With SPSS, one can use Analyze > Compare Means > One-Sample T Test, changing the Test Value to 95. SPSS does a two-tailed test, but since the test statistic value is -3.487, which is negative, it can be concluded that the two-tailed test p-value of 0.002 corresponds to doubling a lower-tail area of 0.001. This lower-tail area is the p-value for a lower-tailed test. (It's too bad that SPSS supplies so few digits for the reported p-values, since in some cases we may want more. In this case one can conclude that the two-tailed test p-value, if reported with more digits, is in the interval (0.0015, 0.0025), since it rounds to 0.002. It follows that the one-tailed probability belongs to (0.00075, 0.00125). So if we wanted to express a bit more precision, we cannot easily tell from the SPSS output whether we should put 0.0008, 0.0009, 0.0010, 0.0011, or 0.0012. In this case, such extra indicated accuracy is perhaps not warrented, and so there is nothing wrong with stating that the p-value is about/approximately 0.001. If one did want to indicate more precision, CDF.T could be used to determine that the lower-tail area is about 0.00076, and so one could state that the p-value is about 0.0008.))

(c) The p-values should not be regarded as being trustworthy. With SPSS, using Analyze > Descriptive Statsitics > Explore, one can see that the sample skewness is about 1.6, that the sample kurtosis is about 3.4, and that the probit plot suggests an appreciably distribution. With such a small sample size, this degree of skewness creates a problem. (Doing a lower-tailed test with data from a positively skewed distribution results in there being a danger that type I errors can occur with a fairly large probability if the null hypothesis is true --- p-values can be misleadingly small.)

Problem 78

(a) 0.093 (With SPSS, one can use Analyze > Compare Means > One-Sample T Test, changing the Test Value to 5. SPSS does a two-tailed test, but since the test statistic value is positive, it can be concluded that the two-tailed test p-value of 0.186 corresponds to doubling an upper-tail area of 0.093. This upper-tail area is the p-value for a upper-tailed test.)

(b) 0.91 (In part (a), it was determined that the null probability that the test statistic, based on a μ₀ value of 5, exceeds the observed value is about 0.093. For the lower-tailed test of this part, the p-value is the null probability that the test statistic, based on a μ₀ value of 5, is less than its observed value. Since the value of the test statistic is the same for parts (a) and (b) (since the data is the same, and μ₀ is the same), the desired lower-tail probability (which is the p-value) is about 1 - 0.093, or about 0.91.)

(c) Given that it appears that we are dealing with a slightly heavy-tailed distribution, but one that is not appreciably skewed, the t test performs conservatively, and the p-values should not be regarded as being misleadingly small.

Problem 79

(a) 0.28 (With SPSS, one can use Analyze > Compare Means > One-Sample T Test, putting a variable corresponding to the matched-pairs differences into the Test Variable box. Alternatively, one can use Analyze > Compare Means > Paired-Samples T Test, clicking two columns of observations into the Paired Variables box. (Either way, the resulting p-value for a two-tailed test is about 0.28.))

(b) Since we may be dealing with a light-tailed distribution (with a sample size of only 9 there is considerable uncertainty, but indications are that we have data from a light-tailed distribution), and the sample size is rather small, the p-value should not be regarded as being trustworthy, because in such situations the t test performs anticonservatively, which compromises the validity of the test. (While this means that the p-value may be misleadingly small, in this particular situation there really isn't a problem since the p-value isn't small, and so one should not claim that there is statistically significant evidence that the mean differs from 0, and so a type I error will not be made.)

Problem 80

(a) 0.044 (With SPSS, one can use Analyze > Compare Means > One-Sample T Test, putting a variable corresponding to the matched-pairs differences into the Test Variable box. Alternatively, one can use Analyze > Compare Means > Paired-Samples T Test, clicking two columns of observations into the Paired Variables box. (Either way, the resulting p-value for a two-tailed test is 0.044.))

(b) The p-value should not be regarded as being trustworthy, mainly because with a sample size of 4 we have no good way to learn much about the nature of the underlying distribution, and if there is appreciable nonnormality, with a sample of size 4 we don't have any of the large sample robustness effects to help counter the nonnormality.

Problem 81

There are infinite possibilities. Basically, you need to have all of the d_i be the same value in order to have the estimated standard error of the mean difference equal to 0 (because the only way that a sample standard deviation can equal 0 is to have all of the values in the sample be the same value). For the two sample means (of the x_i and the y_i) to be different, the common value for the d_i cannot be 0. In order to have the estimated standard errors for the sample means of the X_i and the Y_i to be nonzero, one cannot have all of the x_i be the same value, nor all of the y_i be the same value. Below is one set of values that works.

i	*x_i*	*y_i*	*d_i*
1	3	5	2
2	6	8	2
3	5	7	2
4	7	9	2
5	4	6	2

Problem 82

(a) 0.34 (I obtained this value from the SPSS output created to do part (b). One could take the sample standard deviation value of 0.68 from p. 355 and divide by the square root of 4 to obtain the same answer.)

(b) 0.016

(c) One should worry that the p-value may be smaller than it should be, due to nonnormality, and the very small sample size. When using the t test to test for a treatment effect, the only worry is that light tails can lead to a conservative test which produces p-values that are misleadingly small. Typically, if there are about a dozen or more cases, it can be assumed that the anticonservativeness is negligible. But with only 4 cases, unless the differences were close to being normally distributed, one might worry that light tails (which cannot be detected well with only 4 observations --- there isn't a good way to check for light tails with so few observations) can easily lead to a small p-value even though the null hypothesis is true.

(d) 0.23

(e) 0.125 (Note: The p-value of 0.068 which results from SPSS (using Analyze > Nonparametric Tests > 2 Related Samples) should not be trusted, since the normal approximation (particularly the version SPSS uses) is not guaranteed to work decently when the sample size is so small. The correct value of 0.125 can be obtained from the table that I supplied in class. It should be noted that 0.125 is the smallest p-value possible for a two-tailed signed-rank test when there are only 4 observations. So it should be clear that one needs more than 4 cases in such a setting where a test for a treatment effect is desired, since nonnormality can throw a t test off when the sample size is so small, and nonparametric tests (0.125 is also the smallest possible p-value for a two-tailed sign test) cannot possibly produce a small p-value.)

Problem 83

I used SPSS to obtain most of the information needed to supply the answers below.

(a) 0.25

(b) 0.12

(c) 0.29

(d) (-0.74, 0.98)

(e) The probit plot suggests approximate normality, and in particular, there are no signs of appreciable skewness. Plus, we have that a confidence interval has some robustness against slight skewness (the main worry concerning the validity of the procedure) due to a cancellation effect (not because of using a difference in sample means, but because the skewness leads to an anticonservative phenomenon that is partially cancelled out by a conservative phenomenon). So, the interval produced should be reasonably accurate.

(f) 0.69

(g) When using the t test for a test for a treatment effect, there is little to worry about with regard to validity. If the null hypothesis is true, the distribution of the differences will be symmetric (about 0). So if distribution skewness contributes to a rejection of the null hypothesis, this is good, since skewness by itself is evidence of a treatment effect. Heavy tails make the test conservative, and light tails shouldn't cause a problem as long as there are about a dozen or more observations. So, the results are reasonably trustworthy.

(h) 0.80

(i) Here one has a choice: one can either use the conservative approach to assign integer ranks so that the table of the exact distribution can be used, or one can use mid-ranks (which results in a noninteger value for the test statistic), and rely on the normal approximation to produce an approximate p-value. The conservative approach results in a p-value of 1, while the normal approximation based on midranks results in an approximate p-value of 0.80.

(j) The signed-rank test is always valid as a test for a treatment effect, as long as one has independent units producing the matched pairs data. (If an approximation is used to obtain the p-value, one may be a bit concerned about the quality of the approximation, but one doesn't have to worry about nonnormality.)

Problem 84

No, it does not. The range of the confidence interval for the mean does not necessarily correspond to the range of the typical values in the data set. However, from the confidence interval, one can extract the value of the sample standard deviation, and that provides information pertaining to the scatter of the values about the sample mean. But without knowing something more about the values in the sample, one cannot extract enough information about the sample of magnitudes of the differences to be able to make a confident statement about the "typical" magnitude. (Note: I don't like to make vague references about a typical member of the population, since it's unclear as to what is specifically meant.)

In order to address the magnitude of the difference for the "typical" member of the population, one needs to consider the sample of magnitudes of differences. In this case we have that 16 of the 29 dogs have a difference that is less, in magnitude, than 1.1. So perhaps we would say that the typical dog has a difference of less than 1.1. However, if a sign test is done to determine if there is statistically significant evidence that the median absolute difference is less than 1.1, the resulting p-value is about 0.36, and so there is a lack of strong evidence that the median absolute difference is less than 1.1

Bottom line: In general, we cannot extract precise information from the confidence interval for the mean difference with which to make a strong statement about a typical value. One can do more than is indicated in S&W, but it generally isn't possible to extract enough information to make a confident statement about a typical value.

Problem 85

I used SPSS to obtain answers for parts (a) and (c), and used a table of the exact null distribution of the signed-rank test to obtain the answer for part (b).

(a) 0.004

(b) 0.004

(c) 0.004

Problem 86

For the signed-rank test, there are 2ⁿ = 2⁹ = 512 sets of ranks which are equally-likely to be the set of ranks corresponding to the positive differences if the null hypothesis is true. Two of these sets, one corresponding to a test statistic value of 45, and the other corresponding to a test statistic value of 0, give a value of the test statistic as extreme or more extreme as the observed value of the test statistic. So the p-value is 2/512 = 1/256, which is about 0.0039

For the sign test, one can use the binomial (9, 0.5) distribution to determine that the exact p-value is 2/512 = 1/256, which is about 0.0039

Problem 87

I used SPSS to obtain answers for parts (a) and (c), and used a table of the exact null distribution of the signed-rank test to obtain the answer for part (b).

(a) 0.031

(b) 0.039

(c) 0.070

Problem 88

0.12. (Since SPSS uses the exact binomial distribution (in this case (as can be seen from the output), but not always) to obtain the p-value, one could report 0.118, using three significant digits, but there is typically little point in being so precise (since a p-value of 0.118, 0.120, or even 0.122, provide just about the same strength of evidence against the null hypothesis). Using the table I supplied in class, one could double 0.0592, to obtain 0.1184, which should definitely be rounded to at least 3 significant digits unless one confirms that the 4 is correct (because it could be off due to rounding error (and in fact it is, since the exact value, rounded to 4 decimal places, is 0.1185)).) (Note: To obtain the p-value with SPSS, use Analyze > Nonparametric Tests > 2 Related Samples, and click the two columns of measurements into the Test Pair(s) List. Then check the Sign box under Test Type, and click OK. From the output it can be noted that there are 2 differences of 0. These are ignored, and so the effective sample size is 15. To use the tables, we use the column for a sample size of 15, and go down to a test statistic value of 4, since there are 4 positive differences. The tabulated value of 0.0592 is the null probability that the sign test statistic assumes a value of 4 or less. Since we are doing a two-tailed test, and since the observed value of the test statistic is less than the null mean of 15/2 = 7.5, this lower-tail probability is doubled to obtain the desired p-value.)

Problem 89

0.50. (Since 4 of the observations are greater than 3.50, the value of the test statistic is 4. Since we are to determine if there is significant evidence that the distribution median is less than 3.50, we need to do a lower-tailed test and obtain the null probability of getting a test statistic value of 4 or less. Using the column for a sample of size 9 on the sign test tables I supplied, one sees that the desired probability is listed as 0.5000.) (Note: To obtain the p-value with SPSS, put 9 values of 3.5 into the first column and then enter the 9 sample values into the second column. Next use Analyze > Nonparametric Tests > 2 Related Samples, and click the two columns into the Test Pair(s) List. (The makes the differences of the form observation - 3.5, so that the number of positive differences is the number of sample values greater than 3.5.) Then check the Sign box under Test Type, and click OK. An exact p-value of 1.000 is given for a two-tailed test. Since the value of the test statistic, 4, is less than the null mean of 9/2 = 4.5, the two-tailed test p-value results from doubling the lower-tail null probability of the test statistic assuming a value less than or equal to 4. This lower-tail probability, which must be 0.500, is the desired p-value for the lower-tailed test.)

Problem 90

28 of the 39 observations exceed 1.30, and so the observed value of the test statistic is 28. So the desired p-value is P₀(S >= 28). This probability is the same as P₀(S <= 11), which is a value that can be obtained from the table I distributed in class. We have that the desired p-value is about 0.0047.

Problem 91

SPSS can be used to obtain that the p-value is about 0.19. (Note: SPSS uses the standard chi-square approximation, with an adjustment for ties. There is no way to get an exact p-value, and even if there were a way, I'm confident that to two significant digits, the exact p-value would match the approximate p-value.)

Problem 92

(a) SS(between) will equal 0 only if all of the sample means are equal. SS(within) will exceed 0 as long as not all of the sample values are equal to the sample means.

(b) SS(between) will exceed 0 as long as not all of the sample means are equal. SS(within) will equal 0 only if each sample value is equal to its sample mean.

Problem 93

(a) 0.006

(b) 0.005

(c) μ₃ > μ₂

(d) There is evidence of slight positive skewness. (The skewness definitely isn't strong, but there is enough of a consistent pattern to indicate mild skewness. (Note: In this case, the sample sizes are large enough to make it so that it is better to look at individual probit plots for each of the samples. If this is done, one might guess that the distributions have different shapes.))

(e) (Note: The plot does not provide strong evidence of heteroscedasticity.)

(f) 0.009

(g) 0.006

(h) μ₃ > μ₂

(i) 0.011

Problem 94

(a) 0.006

(b) 0.006

(c)

mu₂ < mu₇
mu₄ < mu₇

(d) There is clear evidence of positive skewness. (Note: If there is heteroscedasticity, then it isn't good to put much emphasis on the probit plot of the pooled residuals. In this case, due to the small sample sizes, it isn't clear that there is appreciable heteroscedasticity, and so I think looking at the probit plot of the pooled residuals is a very good idea. If we go with the assumption of a common error term distribution, then the plot suggests that this distribution is appreciably skewed, and in that case, with samples of size 10, it's possible that an appearance of possible heteroscedasticity is due to some of the small samples containing an outlier due to the stretched-out upper tail of the error term distribution, and some of the samples, just by chance, not containing an outlier.)

(f) 0.016

(g) 0.036

(h) mu₂ < mu₅

(i) 0.006

Problem 95

All of the results presented below were obtained using Analyze > General Linear Model > Univariate in SPSS.

(a) 0.79

(b) 0.79

(c) 0.41

(There isn't statistically significant evidence that either sex or dose affect the respose variable, level.)

Problem 96

(a) p-value < 0.0005

(b) p-value < 0.0005

(c) 0.14

Problem 97

(a) 0.003

(b) 0.009

(c) 0.19

(e) p-value < 0.0005 (based on negative inverse square root transformation)

(f) p-value < 0.0005 (based on negative inverse square root transformation)

(g) p-value < 0.0005 (based on negative inverse square root transformation)

Problem 98

The best transformation to have selected is the inverse.

(e) p-value < 0.0005

(f) p-value < 0.0005

(g) 0.39

(h) -0.35 (skewness), -0.23 (kurtosis)

Problem 99

SPSS produces a test statistic value of 9.0, and gives 0.029 as an approximate p-value. Since the case of 4 treatments and 3 blocks is covered by the tables I distributed in class, they should be used to obtain the exact (rounded) p-value of 0.002. (Using the table for k = 4 and n = 3, you can go down the column labeled x to 9.0 (the observed value of the test statistic), and read off the p-value of 0.002 right next to the 9.0, looking in the P₀(S ≥ x) column.) Note that in this case the chi-square approximation performed horribly. This demonstrates the importance of using the tables of the exact distributions when they are available.

Problem 100

SPSS produces a test statistic value of 6.5, and gives 0.039 as an approximate p-value. (See the instructions given with the statement of the problem on the homework web page for how to produce the SPSS output for Friedman's test.) Since the case of 3 treatments (in this case, 3 different models of washing machines) and 4 blocks (in this case, 4 different detergents) is covered by the tables I distributed in class, they should be used to obtain the exact (rounded) p-value of 0.042. (Using the table for k = 3 and n = 4, you can go down the column labeled x to 6.5 (the observed value of the test statistic), and read off the p-value of 0.042 right next to the 6.5, looking in the P₀(S ≥ x) column.)

Problem 101

0.002

Problem 102

To get all of the values needed for parts (a), (b), and (c), I used Analyze > Correlate > Bivariate. I clicked fatfree and energy into the Variables box, and I clicked to check the Spearman box so that I would get the Spearman coefficient in addition to the Pearson coefficient. Finally, I clicked OK.

(a) 0.981

(b) p-value < 0.0005 (SPSS reports the p-value as .000, which means that when rounded to the nearest thousandth, it's 0.000 --- but I prefer to give the upper bound on the p-value.)

(c) 0.964

Problem 104

To get all of the values needed for parts (a), (b), and (c), I used Analyze > Regression > Linear. I clicked leucine into the Dependent box, and I clicked time into the Independent box. Next I clicked Statistics and clicked to check the Confidence intervals box so that I would get interval estimates in addition to the point estimates. Finally, I clicked Continue, and then OK. To get the result needed for part (d), do the regression again, keeping everything the same except before clicking OK, click Options and then click to uncheck the Include constant in equation box. (You can see that the estimated slope changes only slightly.)

(a) 0.986

(b) -0.047

(c) (-0.195, 0.100)

(d) 0.028 x (where x is time)

Problem 110

(a) -0.210 (Since I didn't specify how the difference was to be done, 0.210 is also okay.)

(b) 9.73

(c) 0.066 (Since I didn't specify how the difference was to be done, -0.066 is also okay.)

Problem 112

(a) Time

(b)

0.786
p-value < 0.0005
0.038

(c)

0.983
The product of Time and Temp has a p-value of 0.027.
The value of the F statistic is (90.488 - 7.265)/(3*7.265/18) = 68.73, and we have that the p-value is less than 0.0005. (The p-value prints out as 0.0000000.)