Solutions for some HW problems
I will post answers and/or solutions for most of the problems not to be turned in for credit here shortly after they are
assigned, and I will post answer and/or solutions for most of the problems turned in for credit here after the grace
period expires for their submission.
Below are solutions for
- Problem 1,
- Problem 3,
- Problem 4,
- Problem 5,
- Problem 6,
- Problem 7,
- Problem 8,
- Problem 9,
- Problem 10,
- Problem 11,
- Problem 12,
- Problem 13,
- Problem 14,
- Problem 15,
- Problem 16,
- Problem 17,
- Problem 19,
- Problem 20,
- Problem 21,
- Problem 22,
- Problem 24,
- Problem 25,
- Problem 26,
- Problem 27,
- Problem 28,
- Problem 29,
- Problem 30,
- Problem 31,
- Problem 32,
- Problem 33,
- Problem 34,
- Problem 35,
- Problem 36,
- Problem 37,
- Problem 38,
- Problem 39,
- Problem 40,
- Problem 41,
- Problem 42,
- Problem 43,
- Problem 45,
- Problem 46,
- Problem 47,
- Problem 48,
- Problem 49,
- Problem 50,
- Problem 51,
- Problem 52,
- Problem 53,
- Problem 54,
- Problem 55,
- Problem 57,
- Problem 58,
- Problem 59,
- Problem 60,
- Problem 61,
- Problem 62,
- Problem 66,
- Problem 67,
- Problem 68,
- Problem 69,
- Problem 70,
- Problem 71,
- Problem 72,
- Problem 73,
- Problem 74,
- Problem 75,
- Problem 76,
- Problem 77,
- Problem 78,
- Problem 79,
- Problem 80,
- Problem 81,
- Problem 82,
- Problem 83,
- Problem 84,
- Problem 85,
- Problem 86,
- Problem 87,
- Problem 88,
- Problem 89,
- Problem 90,
- Problem 91,
- Problem 92,
- Problem 93,
- Problem 94,
- Problem 95,
- Problem 96,
- Problem 97,
- Problem 98,
- Problem 99,
- Problem 100,
- Problem 101,
- Problem 102,
- Problem 104,
- Problem 110,
- Problem 112.
Problem 1
composition of sample |
number of samples |
0 mutants, 5 nonmutants |
3 |
1 mutant, 4 nonmutants |
3 |
2 mutants, 3 nonmutants |
2 |
3 mutants, 2 nonmutants |
2 |
4 mutants, 1 nonmutant |
0 |
5 mutants, 0 nonmutants |
0 |
Problem 3
(a)
1016/6549 =
0.155.
(Note: Normally, I'd put a dot over the = to indicate
approximately equal (since it's not an exact equality --- I rounded to
the nearest thousandth), but I don't know how to do that using HTML.
Since this will be a problem in expressing many answers, I won't bother
to state each time that an indicated equality may only be an approximate equality.)
(b)
2480/6549 =
0.379.
(c)
1016/6549 + 2480/6549 - 526/6549 =
0.454.
(d)
526/6549 =
0.080.
Problem 4
One can use a tree diagram to identify two ways of getting a positive
test. One can either pick a pregnant woman and have her test positive,
or one can pick a woman who isn't pregnant and get a false positive
result. We have
- P(picking a pregnant woman) = 0.1 (from 100/1000),
- P(picking a woman who isn't pregnant) = 0.9 (from 900/1000),
- P(pregnant woman produces positive test result) = 0.98 (given),
- P(woman who isn't pregnant produces positive test result) = 0.01 (given).
Summing the probabilities for the two pertinent branches of the tree
(out of four possible paths altogether), we get
P(picking a pregnant woman) *
P(pregnant woman produces positive test result) +
P(picking a woman who isn't pregnant) *
P(woman who isn't pregnant produces positive test result) =
(0.1)(0.98) + (0.9)(0.01) = 0.098 + 0.009 =
0.107.
(Defining the event A to be that the chosen woman tests positive,
and the event B to be that the chosen woman is pregnant, we can
justify the tree approach used above by making use of Rule 4 (p. 89) and
Rule 7 (p. 91 --- Rule 7 is used twice below, by the way) as follows:
P(A) =
P( (A ∩ B)
∪ (A ∩ BC) ) =
P(A ∩ B) +
P(A ∩ BC) =
P(A|B)*P(B) +
P(A|BC)*P(BC).
This is just a way to use symbols to express more formally
what is expressed using
more words above --- but the numbers are plugged in the same way as
shown above to produce the answer of 0.107.)
Problem 5
One can use a tree diagram to identify two ways of getting a positive
test. One can either pick a person who has the disease and have him/her
test positive,
or one can pick a person who doesn't have the disease and get a false positive
result. We have
- P(picking a person who has the disease) = 0.1,
- P(picking a person who doesn't have the disease) = 0.9,
- P(person who has the disease produces positive test result) = 0.92,
- P(person who doesn't have the disease produces positive test result) = 0.06.
Summing the probabilities for the two pertinent branches of the tree
(out of four possible paths altogether), we get
P(picking a person who has the disease) *
P(person who has the disease produces positive test result) +
P(picking a person who doesn't have the disease) *
P(person who doesn't have the disease produces positive test result) =
(0.1)(0.92) + (0.9)(0.06) = 0.092 + 0.054 =
0.146.
Problem 6
(a)
1213/6549 =
0.185.
(Note: Normally, I'd put a dot over the = to indicate
approximately equal (since it's not an exact equality --- I rounded to
the nearest thousandth), but I don't know how to do that using HTML.
Since this will be a problem in expressing many answers, I won't bother
to state each time that an indicated equality may only be an approximate equality.)
(b)
247/2115 =
0.117.
(c)
If the
event of selecting a smoker
and the event of
selecting a person with high income were independent,
the probabilities requested in
parts (a) and (b) would be equal. Since they aren't equal, the desired
answer is
no.
Problem 7
If the event that the husband smokes and the event that the wife smokes
are independent events, then the percentage of couples for which both
the husband and wife smoke would be 6% (since 30% of the husbands smoke
and 20% of the wives smoke). Since it's 8%, it can be concluded that
the event that the husband smokes and the event that the wife smokes
are not independent events, and so the desired answer is
no.
(One can also note that the percentage of husbands who smoke given that
their wife smokes is 40%, which does not equal the overall percentage
of husbands who smoke, and
the percentage of wives who smoke given that
their husband smokes is about 26.67%, which does not equal the overall percentage
of wives who smoke.)
Problem 8
There are 130 + 26 + 3 + 1 = 160 broods of size 7 or larger, and 5000
broods in all.
If we select a brood randomly (with all 5000 broods equally likely to be
selected), the chance the brood is of size 7 or larger
(which is the
desired probability) is simply 160/5000 =
0.032.
Problem 9
There are 3*610 = 1830 young birds in broods of size 3, and 22435 birds in all.
If we select a young bird randomly (with all 22435 birds equally likely to be
selected), the chance the bird is from a brood of size 3 (which is the
desired probability) is simply 1830/22435, or about (upon rounding)
0.0816.
Problem 10
The expected value of a random variable is the weighted average of its
possible outcomes, with the weights being the probabilities with which
the outcomes occur. So we have E(Y) =
1*P(Y = 1) +
2*P(Y = 2) + ... +
10*P(Y = 10) =
1*(90/5000) +
2*(230/5000) + ... +
10*(1/5000) =
4.487.
Problem 11
The expected value of a random variable is the weighted average of its
possible outcomes, with the weights being the probabilities with which
the outcomes occur. So we have E(Y) =
(0.343)*0 + (0.441)*1 + (0.189)*2 + (0.027)*3
= 0.441 + 0.378 + 0.081 =
0.9
(which is also equal to n*p = 3*(0.3), the mean of a
binomial (3, 0.3) random variable).
Problem 12
The desired percentage is 25% + 12% + 7% =
44%.
Problem 13
(a)
The desired probability is 0.41 + 0.21 =
0.62.
(b)
The desired probability is 0.41 + 0.21 + 0.03 =
0.65.
(c)
The desired probability is 0.01 + 0.34 =
0.35.
Problem 14
The desired probability is P(Y=5), where Y is a
binomial (10, 0.6) random variable.
This probability equals
10C50.650.45 =
0.201.
(Note: One can also use SPSS (as described
here) to obtain this probability.)
Problem 15
The desired probability is P(Y=2), where Y is a
binomial (4, p) random variable, with p = 105/(105 + 100).
This probability equals
4C2p2(1-p)2 =
6(105/205)2(100/205)2 =
0.375.
(Note: One can also use SPSS (as described
here) to obtain this probability.)
Problem 16
(a)
The probability that all six are not albino is
0.756, which is about
0.178.
(One could also use the binomial distribution. Letting Y be the
number albinos out of 6, Y has a binomial (6,
0.25) distribution. The desired probability is P(Y = 0). This
leads to the same value given above.)
(b)
The event that one or more are albino is the compliment of the
event that none are albino --- so the desired probability is just
1 - 0.756, which is about
0.822.
(Using the the binomial distribution as indicated above,
the desired probability is
P(Y >= 1) = 1 -
P(Y = 0). This
leads to the same value given above.)
Problem 17
(a)
The 1% indicated in the problem corresponds to a probability of 0.01 for
a randomly selected patient experiencing damage, which gives us a
probability of 0.99 for no damage. Making an assumption of
independence, the probability of all fifty having no damage is
0.9950, which is about
0.605.
(One could also use the binomial distribution. Letting Y be the
number out of 50 that experience damage, Y has a binomial (50,
0.01) distribution. The desired probability is P(Y = 0). This
leads to the same value given above.
(Note: One can use SPSS (as described
here) to obtain this probability
associated with a binomial distribution.))
(b)
The event that one or more experience damage is the compliment of the
event that none experience damage --- so the desired probability is just
1 - 0.9950, which is about
0.395.
(Using the the binomial distribution as indicated above,
the desired probability is
P(Y >= 1) = 1 -
P(Y = 0). This
leads to the same value given above.
(Note: One can use SPSS (as described
here) to obtain this probability
associated with a binomial distribution.))
Problem 19
(a)
The desired probability is approximately equal (noting that the
distribution of brain weights is only approximately normal)
to the probability that a standard
normal random variable is less than or equal to
(1325 - E(Y))/std.dev.(Y) =
(1325 - 1400)/100 = -0.75. (Note that 1325 is 0.75 standard deviations
below the mean of Y's distribution, and -0.75 is 0.75 standard
deviations below the mean of a standard normal distribution.) So we
have that
P(Y <= 1325) is approximately equal to Φ( -0.75 ), or about
0.23.
(Tables (e.g., see pp. 675-676 of S&W) or SPSS
(as described
here) can be used to find that
Φ(-0.75) = 0.2266.
(Alternatively, if one uses SPSS, one can obtain
P(Y <= 1325)
directly
--- one does not have to first put the probability in terms of the cdf
of the standard normal distribution.))
(Note: I intentionally rounded to only give two significant
digits, since stating the probability with more than two digits seems to
express a degree of accuracy that isn't warranted, given that the brain
weight distribution is only approximately normal.)
(b)
The desired probability is equal to
P(Y <= 1600) -
P(Y < 1475). Similar to above, we have that
P(Y <= 1600) = Φ(2.00) and
P(Y < 1475) = Φ(0.75).
Tables (e.g., see pp. 675-676 of S&W) or SPSS
(as described
here) can be used to find that
Φ(2.00) = 0.9772 and
Φ(0.75) = 0.7734, and so altogether the desired probability is about
0.20. (Alternatively, if one uses SPSS, one can obtain
P(Y <= 1600) and
P(Y < 1475) directly
--- one does not have to first put the probabilities in terms of the cdf
of the standard normal distribution.)
Problem 20
The easy way to get the desired value is to find it on the
infinity row of Table 4 on p. 677, in the 0.10 column. (That row
corresponds to the standard normal distribution.) The desired
standard normal critical value is about
1.282.
Problem 21
The easy way to get the desired value is to find it on the
infinity row of Table 4 on p. 677, in the 0.01 column. (That row
corresponds to the standard normal distribution.) The desired
standard normal critical value is about
2.326.
(Note:
One can use SPSS (as described
here) to obtain that the needed z critical value is 2.326.)
Problem 22
The desired probability is equal
to the probability that a standard
normal random variable is greater than or equal to
(180 - E(Y))/std.dev.(Y) =
(180 - 176)/30 = 2/15 = 0.1333.
So we have that
P(Y >= 180) is approximately equal to Phi( -0.1333 ),
recalling, that to find an upper-tail probability for a standard normal
random variable, we can look up the additive inverse (i.e., we change
the sign) of the value in the table of the cdf of the standard normal
distribution. In this case, we should interpolate 1/3 the way between
the -0.13 value toward the -0.14 value, which gives us about
0.447.
Problem 24
(a)
1.44
(b)
1.41
(c)
1.58
(d)
0.18
(e)
0.7
(f)
-0.1
(All of the above values, as well as the requested graphics, can be obtained
by SPSS, using Analyze > Descriptive Statistics > Explore. (Select
the gain variable by highlighting it and clicking the arrow to put it
into the Dependent List box. The defaults of Explore don't
produce percentile estimates. To obtain them, click Statistics near
the bottom of the window that opened with Explore. Click the box
in front of Percentiles to put a check in the box. Then click Continue
to close the window.
Also, the defaults of Explore don't
produce the requested graphics (although some other plots are preduced).
To obtain them, click Plots near
the bottom of the window that opened with Explore. Click the boxes
in front of Histogram and Normality plots with tests to put
checks in the boxes. Then click Continue
to close the window. Finally, click OK to close the window that opened with
Explore, and the desired output should be produced.) Some claim that the
sample mean should be rounded to the place indicated by the second significant digit
of the estimated standard error. For the data at hand, the estimated standard error
associated with the sample mean is given to be 0.02932, which would mean rounding the
outputted sample mean value to the nearest thousandth. But since the data values
have been reported to the nearest hundreth, I don't think it is proper to express so
much accuracy in the estimate of the mean. So in this case I chose to use one less
decimal place than was indicated by the "rule." I also rounded to the nearest hundreth
for parts (c) and (d). (The answer to part (b) did not have to be rounded since the
middle order statistic is already a value rounded to the nearest hundreth.) For the
estimate of the 75th percentile, I used the one SPSS produced labeled Weighted Average
(Definition 1), as opposed to the one labeled Tukey's Hinges. (From what
I can gather, the weighted average estimate is from one of the standard estimators that
people use for estimating percentiles. But it's not the best choice for the data under
consideration. For this data, I would use another estimator to produce a value of
1.57 --- a value that I would regard as being slightly better. But you can see that
it makes little difference, especially when it needs to be kept in mind that both of
these estimates are subject to error (due to the fact that all we have to work with is a
smallish sample of observations).)
For parts
(e) and (f) I rounded to the nearest tenth. The estimates of the skewness and kurtosis
have so much uncertainty associated with them, giving more precise values seems silly
--- it would be indicating way more accuracy than is warranted. (Plus, even if the
estimates were more reliable, for most purposes, it would make no difference whether
a skewness or kurtosis value was say 0.47 or 0.53.)
(Note: For homework problems, round values as indicated in the assignment.
If I do not indicate how to round, then make use of comments like the ones above
in order to determine how much, if any, you should round.)
(Although Explore can be used to obtain the desired graphics, I'll also
indicate that the histogram can be obtained using Graphs > Histogram.
(One just has to click the gain variable into the Variable box, and then
click OK.)
Also, the normality plot can be obtained using Graphs > Q-Q.
(One just has to click the gain variable into the Variables box, and then
click OK.))
Problem 25
(a)
98.3
(b)
94.5
(c)
67.3
(d)
40.38
(e)
0.82
Problem 26
Upon looking at the 5 probit plots (obtained using the default settings of SPSS's Q-Q plot routine),
the two that are most strikingly nonlinear in appearance are the ones from the moisture and glucose data sets.
For these data sets,
a determinination of skewness is called for, with the moisture data set appearing to have come from a negatively skewed
distribution, and the glucose data set appearing to have come from a positively skewed distribution. Upon looking at the other
three plots, one should seek consistent patterns of curvature in order to identify the heavy-tailed and light-tailed distributions.
If one looks at the lamb-wt plot, one should be able to see a gentle S-like curvature. (It helps if you can completely
ignore the straight line that has been annoyingly placed on the plot with the plotted points. If you can do that, and can imagine the
plotted points as being a country road, you should see that the road first gently curves in one direction, and then gently curves in
the other direction.) Similarly, the radish plot shows a consistent curvature ... like a road that gently curves in one
direction, and then curves in the other direction. (Continuing with the road analogy, the plots obtained from skewed distributions
often show the road just curving one way. However, some plots from skewed distributions could indicate a bit of opposite curvature at
the other end of the road (the end that isn't the more clearly curved), but the degree of curvature will generally be appreciably
different at the two ends --- the plots will show a lack of "balance.") The lamb-wt and radish plots both have consistent
patterns of curvature, and show a good degree of "balance" (thus giving no indication of appreciable skewness). The S-like
curvature of the lamb-wt plot is indicative of heavy-tails, while the curvature of the radish plot (where the road
appears to continue more in a north and south direction, as opposed to the more east and west appearance from the lamb-wt plot
--- although each road still has a southwest and northest slant to it) is indicative of light tails. For the peppers data,
there is not a consistent pattern of deviation from a straight line pattern that is indicative of clear positive or negative skewness,
or of heavy or light tails. (If viewed as a road, it changes direction in it's slight curvature several times.) So, it's consistent
with an approximately normal distribution.) It should be noted, that with these plots, the skewness patterns were more pronounced
than the heavy-tailed and light-tailed patterns. But for other data sets, the skewness patterns can be milder, and the heavy-tailed
pattern can be much more pronounced. (The radish plot is gives a fairly strong indication of light tails, but the lamb-wt
plot is more subtle --- it could pass for approximate normality if one doesn't look carefully (but then when matching the 5
descriptions to the 5 data sets, the lamb-wt data should be the one selected as being from a heavy-tailed distribution ... none
of the others show a heavy-tailed (approximately) symmetric pattern).)
(a)
[E]
(b)
[B]
(c)
[D]
(d)
[C]
(e)
[A]
Problem 27
One can obtain the desired probabilities using a binomial random
variable. Let Y be a binomial (3, 0.39) random variable,
representing the number of mutants in a sample of size 3.
(a)
The desired probability is P(Y = 0) =
0.613, which is about
0.227.
(b)
The desired probability is P(Y = 1) =
3C1(0.39)1(0.61)2 =
3(0.39)(0.61)2,
which is about
0.435.
Problem 28
The sample proportion will equal 0.4 if the sample of size 5 contains
2 mutants and 3 nonmutants. Using the table indicated in the book, the
desired probability can be seen to be
0.35.
(A direct calculation yields a value of about 0.345.)
Problem 29
Here we can use Theorem 5.1 on p. 159 of S&W. Using part (a) of item 3
of the theorem, the sample mean has a normal distribution as its
sampling distribution. Also, we
have that the mean of the sampling distribution is 176.
The standard deviation associated with the
cholesterol level of a randomly chosen member of the population is
30. So the standard error of a sample mean of nine is 30 divided by
the square root of 9, or 30/3 = 10. (This is the standard deviation of
the sampling distribution of the sample mean.)
The desired probability is the probability that a normal random variable
assumes a value within 1 standard deviation of its mean, which is about
0.683.
Problem 30
This is similar to the immediately preceding problem. Here the sample
mean is normally distributed with a standard error of 400 divided by the
square root of 15, or about 103.28.
The desired probability is the probability that the sample mean
assumes a value within 100 of its mean, which is the probability that a
normal random variable assumes a value within 100/103.28 = 0.96825
standard deviations
of its mean, which is about
0.667.
(This value can be obtained by looking up the probability corresponding
to a z value of 0.96825 (which, interpolating between the values
given for 0.96 and 0.97, is about 0.83356), and subtracting from it the
probability corresponding
to a z value of -0.96825 (which, interpolating between the values
given for -0.96 and -0.97, is about 0.16644). It can be noted that the
answer given in the back of the book is a bit off (due to too much
rounding error in steps prior to reporting the final answer). If you
consider part (c) of the problem in the book, the answer is
increase --- if the sample size is larger, the probability that
the sample mean is within a specified amount of the distribution mean is
larger ... if it wasn't, then we'd have a worse estimator even though
the amount of information is greater, which isn't sensible.)
Problem 31
This is similar to the immediately preceding problem. Here the sample
mean is normally distributed with a standard error of 400 divided by the
square root of 60, which gives us a value of 400/sqrt(60) = 51.6398.
The desired probability is the probability that the sample mean
assumes a value within 100 of its mean, which is the probability that a
normal random variable assumes a value within 100/51.6398 = 1.9365
standard deviations
of its mean, which is about
0.947.
Problem 32
The answers can be obtained by dividing the standard deviation, 145, by
the square root of the sample size. (Note: The book should have
requested the standard error of the sample mean --- the mean
isn't a random variable, and so it has no standard error ... the sample
mean is an estimator
which can be used to estimate the
distribution/population mean, and since it's a statistic (which is a random
variable), we can
refer to its standard error. If 145 is the sample standard deviation instead of the true standard deviation, then the request should
have been for the estimated standard error of the sample mean.)
(a)
51.3
(b)
26.5
Problem 33
The answer can be obtained by dividing the sample standard deviation, 15, by
the square root of the sample size. (Note: The book should have
requested the estimated standard error of the sample mean --- the mean
isn't a random variable, and so it has no standard error ... the sample
mean is an estimator
which can be used to estimate the
distribution/population mean, and since it's a statistic (which is a random
variable), we can
refer to its standard error, but since it involves the population
standard deviation, which is unknown, we must be content with an estimate of the standard
error.) The desired value is
3.
Problem 34
Of the two choices given, the answer is the
SD.
The standard deviation is a measure of how much variation
exists in a population, whereas the standard error pertains to the
variability of a statistic, which is typically a function of the sample
size. But if I saw the statement "Rats weighing 150 +/- 10g were
injected" I would
take it to mean that all of the rats injected weighed between 140 g and
160 g. So, in my opinion, the exercise in the book isn't a good one,
but I hope that you learned something from it and my comments anyway.
Problem 35
Here we should take SD and SE to refer to the sample standard deviation
and the estimated standard error of the sample mean.
(Note: We can refer to the standard error of any statistic --- it
doesn't just pertain to the sample mean --- and so one should always
make sure that the relevant statistic is clearly indicated.)
(a)
SE
(The standard error of an estimator is a measure of how
spread out the probability mass of its sampling distribution is about
the mean of its sampling distribution. Since the sample mean is unbiased, the
standard error of the
sample mean is a measure of how spread out the probability mass of the
sampling distribution of the sample mean is about the mean of the parent
distribution (the estimand), which is of course related to the probability that the
estimator assumes a value close to the estimand --- that is, it's
related to the accuracy of the estimator.)
(b)
SD
(The sample standard deviation is a consistant estimator of
the population/distribution standard deviation --- so it converges to
some constant value as the sample size increases. For large sample
sizes, it's close to the true value with high probability --- it doesn't
tend to get larger or smaller as the sample size increases, but rather
tends to fluctuate about the true value as observations are added to the
sample.)
(c)
SE
(Since the sample standard deviation
converges to
some constant value as the sample size increases, the sample standard
deviation divided by the square root of the sample size (which is the
estimated standard error of the sample mean) tends to decrease as the
sample size increases.)
Problem 36
(a)
3.9
(This can be obtained from SPSS's Explore procedure or
One-Sample T Test procedure.
(To use Explore, type in the data and then use
Analyze > Descriptive Statistics > Explore. The value of 3.9 can be obtained from
the Std. Error column, beside of the value of the sample mean.
To use
One-Sample T Test, follow the instructions given for part (b) below.
The desired standard error is given in the One-Sample Statistics part of the output.
Alternatively, in this case, one could take the given
standard deviation and divide by the square root of 5, but in general if
you want two accurate significant digits, you should worry that too much
rounding before the final answer (i.e., the sample standard deviation is
not exactly 8.7) could lead to the 2nd digit being wrong.)
(b)
(23.4, 40.0)
(This can be obtained from SPSS's
One-Sample T Test procedure.
With the data entered, use Analyze > Compare Means > One-Sample T Test.
Select the variable (the name of the column containing the data), and then click
Options. Change the value in the Confidence Interval box from 95
to 90. Click Continue and then OK. The desired confidence interval is given as part of the
One-Sample Test output.
Alternatively, in this case, one could use the given
sample mean and standard deviation, along with the t critical
value from Table 4, and and arrive at the correct answer,
but in general
you should worry that too much
rounding before the final answer (e.g., the sample mean is not exactly 31.7 and
the sample standard deviation is
not exactly 8.7), which could lead to inaccuracy in the confidence bounds.)
Problem 37
(20.9, 42.6)
(This can be obtained from SPSS's
One-Sample T Test procedure. If one used the given
sample mean and standard deviation, along with the t critical
value from Table 4, the upper confidence bound rounds to 42.5.
This answer suffers from too much rounding error. If
you want two accurate significant digits, you should worry that too much
rounding before the final answer (e.g., the sample standard deviation is
not exactly 8.7) could lead to the last reported digit being wrong.)
Problem 38
(a)
False
(We are 100% confident that the sample mean is in the stated
interval.)
(b)
True
(The method has a success rate of (about) 95%. (I put about
because we can be certain that the sample did not come from exactly a
normal distribution, and so the nominal confidence level may not
correspond exactly to the coverage probability --- but assuming the
distribution is not too nonnormal, since the sample size is 86, the
coverage probability ought to be close to 0.95 for this situation.))
Problem 39
False
(A 95% confidence interval is supposed to trap the distribution mean
with probability 0.95, and not necessarily trap 95% of the data. If the
population is approximately normal, an interval centered on the sample
mean and including values two sample standard deviations above and below it
might contain about 95% of the data values in the sample --- but this
interval will be much wider than the stated confidence interval since
the confidence interval includes points about two estimated standard
errors on either side of the sample mean, and the estimated standard
error is much smaller than the sample standard deviation (since it is
the sample standard deviation divided by the square root of the sample
size).)
Problem 40
The widest interval (the last one) is the 90% confidence interval,
and the narrowest interval (the first one) is the 80% confidence
interval (and the other one is the 85% confidence interval)
--- the
greater the confidence level, the wider the confidence interval.
Problem 41
Yes*
(Really, the answer is no, if exact normality is meant ---
I don't think that the parent distribution for the data is exactly
normal. But a probit plot suggests that the parent distribution may be
nearly normal, and the confidence interval procedure should work
decently. The parent distribution doesn't have to be exactly normal for
the method to be okay to use --- if exact normality were to be required,
the method would be seldom, if ever, used!)
Problem 42
Table 4 doesn't have the desired t critical value based on
35 df, but one might guess that it is close to 2.03, doing a crude
slightly nonlinear interpolation procedure (2.03 is nearly halfway
between the critical values based on 30 df and 40 df).
So the desired confidence interval is
(6.21 +/- 2.03*1.84/6), noting that 6 is the square root of 36 (the
sample size). Rounding to give only two significant digits for the
confidence bounds, due to the interpolation, the fact that the summary
statistics and data values themselves may have been rounded, and because
the parent distribution isn't (we can assume) exactly normal, the
desired interval is
(5.6, 6.8).
(Note:
One can use SPSS (as described
here) to obtain that the needed t critical value is 2.030.)
Problem 43
Table 4 gives 1.984 as the desired t critical value based on
100 df.
So the desired confidence interval is
(10.3 +/- 1.984*0.9/(101)1/2).
If we take 0.9 as the exact sample standard deviation, then upon
dividing by the square root of 101 to obtain the estimated standard
error, it can be seen that the 2nd significant digit is in the
thousandths location. But since
the summary
statistics and data values themselves may have been rounded, and because
the parent distribution isn't (we can assume) exactly normal, the
confidence bounds for the desired interval shouldn't be reported
indicating too much precision, and so rounding each bound to the
nearest hundreth or nearest tenth seems like the sensible thing to do.
So, the desired interval is
(10.12, 10.48) (or (10.1, 10.5)).
Problem 45
The coverage probability would be about
0.68.
Problem 46
(a)
(5.0, 5.4)
(b)
(4.9, 5.4)
(c)
(4.8, 5.5)
Problem 47
For the center of the interval,
- the adjusted numerator is 69 + 1.96*1.96/2 = 70.921;
- the adjusted denominator is 339 + 1.96*1.96 = 342.84;
which gives us 0.20686. For the estimated standard error we have
( 0.20686*(1 - 0.20686)/342.84 )**0.5 = 0.021876,
which suggests that the confidence bounds should be rounded to the
nearest thousandth (since the 2nd significant digit is in the thousandth
location). Upon multiplying the estimated standard error by the
appropriate standard normal critical value, 1.96, we obtain
( 0.20686 +/- 1.96*0.021876 ) =
( 0.20686 +/- 0.04288 ) = (0.1640, 0.2497),
which, upon further rounding, gives us
(0.164, 0.250).
Problem 48
The number of "successes" is 151 (because 151 is the only integer to put
in the numerator to combine with 959 in the denominator to yield a
sample proportion which rounds to 0.157).
For the center of the interval,
- the adjusted numerator is 151 + 1.645*1.645/2 = 152.35;
- the adjusted denominator is 959 + 1.645*1.645 = 961.71;
which gives us 0.15842. For the estimated standard error we have
( 0.15842*(1 - 0.15842)/961.71)**0.5 = 0.011774,
which suggests that the confidence bounds should be rounded to the
nearest thousandth (since the 2nd significant digit is in the thousandth
location). Upon multiplying the estimated standard error by the
appropriate standard normal critical value, 1.645, we obtain
( 0.15842 +/- 1.645*0.011774 ) =
( 0.15842 +/- 0.01937 ) = (0.13905, 0.17779),
which, upon further rounding, gives us
(0.139, 0.178).
Be sure to express any interval estimate as an
interval. (Why is this so hard for students to grasp?) That is,
the answer is (0.139, 0.178), and not 0.158 +/- 0.019, and certainly not
0.139 < p < 0.178 (since we don't know that the estimand is
actually between the lower and upper confidence bounds). Also, while
there is seldom a good reason to express a p-value using more than 2
significant digits, we don't necessarily want to round point estimates
and confidence bounds to two significant digits. (In some cases, when
rounded to 2 significant digits, the upper bound will be the same as the
lower bound, and that doesn't make a good interval. Also, if we round
so much, sometimes there is little point in using a superior method
instead of an inferior one.) S&W's suggestion of using the place of the
2nd significant digit of the (estimated) standard error to determine to
what place point estimates and confidence bounds should be rounded is
generally a good method. I sometimes deviate from this method if I
believe that the raw data has been rounded so much that fewer
significant digits should be used to express the estimate obtained from
the data.
Problem 49
We have
( 4.3*4.3/6 + 5.7*5.7/12 )**0.5 =
( 3.0817 + 2.7075 )**0.5 =
( 5.7892 )**0.5 = 2.406,
which, upon further rounding, gives us
2.4.
Problem 50
We have
( 44.2*44.2/10 + 28.7*28.7/10 )**0.5 =
( 195.36 + 82.37 )**0.5 =
( 277.73 )**0.5 = 16.665
which, upon further rounding, gives us
17.
Problem 51
I used SPSS to obtain the confidence interval and p-value, and also used
it to produce normality plots and distribution skewness estimates in
order to address part (c).
(a)
(-7, 10).
(b)
0.78
(c)
The results are reasonably trustworthy,
since the normality plots
indicate that both distributions may be positively skewed, and the
estimated skewness of 0.8 (0.7) for the caffeine distribution and the
estimated skewness of 0.4 (0.7) for the decaf distribution aren't
too different, due to a central limit theorem effect
for the sample means (i.e., their sampling distributions are quite a bit
less skewed than are the parent distributions of the individual data
values) and also a cancellation of skewness
effect (due to using the
difference in the sample means), the robustness properties of Welch's test ought to
make it perform decently in the given situation.
The biggest worry is
that with such small sample sizes, we don't have a good idea about what
the underlying distributions are like.
Problem 52
I used SPSS to obtain the confidence interval and p-value, and also used
it to produce normality plots and distribution skewness estimates in
order to address part (c).
(See my comments
here about Example 7.7 to see how to use SPSS to obtain the desired confidence
interval. It can be noted that the output produced when obtaining the confidence interval also contains the desired p-value for part
(b).)
(a)
The 95% confidence interval for the mean of the red light
distribution minus the mean of the green light distribution is
(-1.6, 0.4).
So while the point estimate suggests that the mean of the green light
distribution may be the larger one, the fact that the interval includes
0 and also some positive values means that we can't be highly confident
that the mean of the green light distribution is actually larger than
the mean of the red light distribution.
(It can be noted that the SPSS output gives intervals obtained from both
Student's t
procedure and Welch's method. It also gives the result of a test for
nonequality of variances, but I don't think that the test result needs
to be considered here --- I don't think there is any reason to give the
benefit of the doubt to the null hypothesis of homoscedasticity (equal
variances), and that's what the test does ... rather, the safe thing
to do is to allow for heteroscedasticity (unequal variances), knowing
that if the variances are really equal, incorrectly using Welch's method
should have little adverse effect. (Similarly, in part (b), Welch's
test is used, instead of Student's two-sample t test.))
(b)
0.26
(c)
The results are reasonably trustworthy,
since the normality plots
indicate that both distributions are negatively skewed, and the
estimated skewness of -1.4 (0.6) for the red light distribution and the
estimated skewness of -1.1 (0.5) for the green light distribution aren't
too different, due to a central limit theorem effect
for the sample means (i.e., their sampling distributions are quite a bit
less skewed than are the parent distributions of the individual data
values) and also a cancellation of skewness
effect (due to using the
difference in the sample means), the robustness properties of Welch's test ought to
make it perform decently in the given situation.
Problem 53
Using the given information, the value of Welch's statistic is
(3840 - 5310)/1064 = -1470/1064 = -1.3816.
Since this value is not less than -2.145 or greater than 2.145 (where
2.145 is the 0.025 critical value for the T distribution with 14
df), one
cannot reject the null hypothesis of equal means with a size
0.05 test.
(Note: Sometimes you may only be given the summary
statistics, and not the complete data. So it's important to know how to
compute the value of the test statistic from the summary information,
and determine whether or not you can reject at a given level, since one
might not be able to load the data into a statistical software package
and compute a p-value. However, using the table on p. 677 of S&W one
can further determine that the p-value satisfies
0.1 < p-value < 0.2,
and using software it can be determined that the p-value is about 0.19.
(Note: One can use SPSS (as described
here, only using CDF.T instead of CDFNORM) to obtain the probability that
T14 random variable will assume a value less than or equal to -1.3816. This probability needs to be doubled to
obtain the p-value for a two-tailed test.)
Making a statement about the p-value can provide more information than
merely stating whether or not one can reject at a particular level. It
should be kept in mind that without having the raw data to work with,
but instead only having the summary measures, we cannot assess the
approximate normality of the distributions, or determine if they are
nonnormal in such a way as to make Welch's test reliable. This is
particularly bothersome when the sample sizes are small, and we can't
rely on robustness due to large samples.)
Problem 54
Using the given information,
the value of the estimated standard error is
( 2.87*2.87/60 + 3.52*3.52/50 )**0.5 = ( 0.38509 )**0.5 = 0.62056,
and the value of Welch's statistic is
(78.42 - 80.44)/0.62056 = -2.02/0.62056 =
-3.26.
To do a size 0.05 test, we need to compare the magnitude of the test
statistic to
t94, 0.025.
This value is in between
t80, 0.025 = 1.990
and t100, 0.025 = 1.984, and so we should
reject the null hypothesis of equal means in favor of the alternative
hypothesis of unequal means (since the absolute value of the test
statistic exceeds the 0.025 critical value).
Problem 55
Noting that the sample mean of the males is less than the sample
mean of the females, it can be concluded that we have
p-value > 0.5.
We can actually be more specific about the p-value by noting that we're
doing an upper-tailed test with a test statistic value that is -3.26.
Since the absolute value of the test statistic, 3.26, is in between the 0.005
and 0.0005 critical values for both 80 df and 100 df, it is also in
between those critical values for 94 df.
So the lower-tail probability is between 0.0005 and 0.005, which gives us
0.995 < p-value < 0.9995
(since the p-value for an upper-tail test is the probability mass
associated with values greater than the observed value of the test
statistic).
Problem 57
No, since the distributions appear to overlap quite a bit, a given
tibia length could be consistent with both the values of male tibia
lengths and female tibia lengths.
(That the distributions overlap quite a bit can be guessed from an
examination of the sample means and sample standard deviations. If one
assumes approximate normality for the length distributions, then it can
be guessed the bulk of the male lengths range from about 73 to about 84,
and that the bulk of the female lengths range from about 73 to about 87.
Note that a test about the means doesn't do much to help us answer the
question posed for this part --- whether or not there is statistically
significant evidence that the distribution means differ, there can be
appreciable overlap of the ranges of values commonly observed from each
distribution, which makes it hard to confidently guess which
distribution an individual observation is associated with.)
Problem 58
(a)
The statement is
true.
Recalling that the p-value is the smallest level for which one can
reject based on the observed data, since the p-value is 0.03, we can
surely reject at level 0.03. Since a level 0.05 test has a larger
rejection region than a level 0.03 test, one can also reject at level
0.05 since one can reject at level 0.03. (That is, if the observed
value of the test
statistic is in the rejection region for a size 0.03 test, it must
be in the rejection region for a size 0.05 test.)
In general, it can be noted
that
one can reject at level alpha whenever
p-value <= alpha.
(b)
The statement is
true, for the reason highlighted above in part (a).
(c)
The statement is
true, since the p-value for the test equals the null probability
of obtaining a test statistic value at least as extreme as the observed
value of the test statistic.
Problem 59
(a)
The statement is
true.
Recalling that the p-value is the smallest level for which one can
reject based on the observed data, since the p-value is 0.07, we can
surely reject at level 0.07. Since a level 0.1 test has a larger
rejection region than a level 0.07 test, one can also reject at level
0.1 since one can reject at level 0.07. (That is, if the observed
value of the test
statistic is in the rejection region for a size 0.07 test, it must
be in the rejection region for a size 0.1 test.)
In general, it can be noted
that
one can reject at level alpha if
p-value <= alpha,
and one cannot reject at level alpha if
p-value > alpha.
(b)
The statement is
false, for the reason highlighted above in part (a).
(c)
The statement is
false, since the observed value of the sample mean for the first sample
is either greater than the observed value of the sample mean for the
second sample or it's not --- and so the probability that it's greter is
either 1 or 0. (If we take the sample means to be random
variables, then there isn't enough information given to determine the
probability that the one sample mean is greater than the other one. But
there is no reason why this probability should equal the p-value.)
Problem 60
SPSS yields 0.049 as the p-value for a two-tailed Welch's test. Since
the sample means are in the order consistent with the alternative
hypothesis, the appropriate p-value for the indicated one-tailed test is
half of the two-tailed test p-value. So the
p-value is about
0.025.
(When we divide 0.049 in half we get 0.0245, but whether we should
report 0.024 or 0.025 as the one-tailed test p-value depends on whether
the two-tailed test p-value is actually greater than or less than 0.049.
It's impossible to easily determine this from the SPSS output. But with
another software package, I got 0.025 as the one-tailed test p-value.)
It is appropriate to put something about the validity of Welch's test,
since an examination of the data indicates that both distributions are
skewed. Because the degree of skewness appears to be similar for the
two distributions, and the sample sizes are equal and not really small,
Welch's test ought be behave decently (due to a cancellation of a lot of
the effect due to skewness on the sampling distributions of the sample means).
One could also consider using the Mann-Whitney test. It produces a
p-value of about 0.026. If we are willing to believe that if the two
distributions differ, then one is stochastically larger then the other,
we can use the M-W test to do a test about the distribution means. The
data supports that it's reasonable to make the sufficient assumption
(but unfortunately, with SPSS, there doesn't seem to be a good way to do
the graphical check that I like to do (and so I used other software to
help me reach this conclusion)),
and so we could consider reporting the p-value from the M-W test, since
it's nearly as small as the one obtained from Welch's test, and we don't
have the nonnormality to worry about. (It's nice when the p-values from
two reasonable tests agree. This need not be the case, since in some
settings one test may yield a smaller p-value due to higher power, but
sometimes you do get a nice confirmation about the strength of the
evidence.)
Problem 61
(a)
0.13
(Since the sample sizes are larger than 10, the tables of the exact null distribution of U cannot be used.
However, we can make use of SPSS to obtain an approximate p-value, using
Analyze > Nonparametric Tests > 2 Independent Samples. One should click height into the Test Variable List box.
Unfortunately, color cannot be used as the Grouping Variable since red and green aren't accepted when defining the
groups --- unlike what worked for Welch's test, this part of SPSS requires that the grouping variable be coded with numerical values.
So, to make it work, I created a new column, entering 17 values of 1 followed by 25 values of 2. If I name this new
variable gr, I click gr into the
Grouping Variable box, and then click Define Groups and enter 1 and 2 as the group labels.
Finally, I can click Continue, followed by OK, to obtain the approximate p-value of 0.127 (which I round to 0.13 since
it results from using an approximation).
(Note: On the outputted display, the p-value is labeled with
Asymp. Sig. (2-tailed).))
(b)
0.28
(This can be obtained from SPSS using
Analyze > Compare Means > Independent-Samples T Test.
One should click height into the Test Variable(s) box.
Then click color into the
Grouping Variable box, and next click Define Groups and enter red and green as the group labels.
Finally, I can click Continue, followed by OK, to obtain the approximate p-value of 0.275 (which I round to 0.28 since
it results from using a test designed for normal distributions on nonnormal distributions (and so we must rely on the robustness of
the procedure to give us an approximate p-value)).
(Note: On the outputted display, the p-value is found in the Equal variances assumed row of the
Sig. (2-tailed) column. (The assumption of equal variances is proper, since if the null hypothesis of no effect is true, the
two distributions underlying the data are identical (and thus have equal variances).)))
(The nonnormality and unequal sample sizes hurts the robustness of Student's t test, creating some concern about it's
(approximate) validity. But since the clearly valid Mann-Whitney test produces a smaller p-value, and it's not even highly
significant, there isn't a lot of reason or wonder about the validity of the t test. (Often skewness hurts the power of the
t test to detect differences, and it is often the case that with skewed distributions the M-W test is more powerfeul.))
Problem 62
(a)
0.69
(For this data, we can make use of SPSS to obtain an exact p-value, using
Analyze > Nonparametric Tests > 2 Independent Samples. One should click weight into the Test Variable List box,
and then click group into the
Grouping Variable box. Next, click Define Groups and enter 14 and 15 as the group labels.
Finally, I can click Continue, followed by OK, to obtain the exact p-value of 0.690 (which I round to 0.69 since
a high degree of stated accuracy isn't important when the p-value is such a large, statistically nonsignificant, value).)
(Since SPSS uses the exact distribution (in this case (as can
be seen from the output), but not always) to obtain the p-value,
there is no need to use the table.
But to use the table, one could first compare each of the 5 14 days
observations to each of the 5 15 days observations, and note that
the 14 days observation is larger in 15 of the 25 comparisons.
This gives us that the observed value of the M-W test statistic, u,
is equal to 15. Since u is larger than the mean of the test statistic if the null hypothesis is true,
the p-value for a two-tailed test is equal to
2P0(U≥u) =
2P0(U≥15) =
2P0(U≤25-15) =
2P0(U≤10) =
2(0.3452) = 0.6904.
(For some further explanation, the mean of the null hypothesis sampling distribution of U is just the product of the two sample
sizes divided by 2, which is equal to (5)(5)/2 = 12.5. Since the tabulated values are lower-tail probabilities, to determine an
upper-tail probability we can convert it to an equal lower-tail probability using
P0(U ≥ u) =
P0(U ≤ nXnY - u). In using the table, one sets
k1 equal to the minimum of the sample sizes and sets
k2 equal to the maximum of the sample sizes. In this case, both of these values are equal to 5, and so we go to the
k1 = 5 section of the table, and use the
k2 = 5 column. Then going down the column one should see that the value corresponding to a = 10 is 0.3452.
(The tabulated values are the null probabilities that the test statistic takes a value less than or equal to a. In our case,
we want the probability of a value less than or equal to 10, and so we need to use 10 as the value of a.)) One could reverse the
roles of the two groups and let u be the number of times, out of 25 comparisons, a 15 days value is larger. This would
yield u = 10, and since 10 ls less than the null hypothesis mean, the p-value is
2P0(U≤10). Either way one does it, the same p-value results.)
(b)
0.63
(This can be obtained from SPSS using
Analyze > Compare Means > Independent-Samples T Test.)
(Note: One may wonder whether Student's two-sample t test is valid here.
If we do a test of the general two-sample problem (of the null hypothesis of identical distributions against the general alternative
that the distributions differ), then if the null hypothesis is true, the variances are equal (since the distributions are the same).
This supports the choice of Student's t test over Welch's test. But what about possible nonnormality?
Well, since the sample sizes are equal, if the null hypothesis is true the two sample means have the same sampling
distribution, and so a complete cancellation of any skewness effect will occur (because one sample mean is subtracted from the other,
which means that the sampling distribution of the difference in sample means will be symmetric (if the null hypothesis is true).
Other types of nonnormality could cause problems with validity, but some types of nonnormality, like heavy tails, will lead to a
conservative test instead of one that isn't valid. Still, having only 6 observations per sample makes is very hard to tell much about
the nature of the nonnormality, and with such small sample sizes it's also the case that the "Central Limit Theorem effect" may not be
too good (i.e., the sampling distributions of the sample means may not be as normal as is needed to make the T distribution
a good approximation of the actual null sampling distribution of the test statistic). All in all, it is somewhat risky to rely on the
robustness of Student's t test in this small sample setting, even though the facts that the general two-sample problem is being
addressed and that there are equal sample sizes, leads to some robustness (like a cancellation effect for skewness). So it is nice
that a nonparametric test which is clearly valid gives a p-value almost as small as the one which is suspect (although with such large
p-values there is not much to get excited about anyway).)
Problem 66
(a)
0.026.
(Since SPSS uses the exact distribution (in this case (as can
be seen from the output), but not always) to obtain the p-value,
there is no need to use the table.
But to use the table, one could first compare each of the 6 Toluene
observations to each of the 6 Control observations, and note that
the Toluene observation is larger in 32 of the 36 comparisons.
This gives us that the observed value of the M-W test statistic, u,
is equal to 32. Since u is larger than the mean of the test statistic if the null hypothesis is true,
the p-value for a two-tailed test is equal to
2P0(U≥u) =
2P0(U≥32) =
2P0(U≤36-32) =
2P0(U≤4) =
2(0.0130) = 0.0260.
(For some further explanation, the mean of the null hypothesis sampling distribution of U is just the product of the two sample
sizes divided by 2, which is equal to (6)(6)/2 = 18. Since the tabulated values are lower-tail probabilities, to determine an
upper-tail probability we can convert it to an equal lower-tail probability using
P0(U ≥ u) =
P0(U ≤ nXnY - u). In using the table, one sets
k1 equal to the minimum of the sample sizes and sets
k2 equal to the maximum of the sample sizes. In this case, both of these values are equal to 6, and so we go to the
k1 = 6 section of the table, and use the
k2 = 6 column. Then going down the column one should see that the value corresponding to a = 4 is 0.0130.
(The tabulated values are the null probabilities that the test statistic takes a value less than or equal to a. In our case,
we want the probability of a value less than or equal to 4, and so we need to use 4 as the value of a.)) One could reverse the
roles of the two groups and let u be the number of times, out of 36 comparisons, a Control value is larger. This would
yield u = 4, and since 4 ls less than the null hypothesis mean, the p-value is
2P0(U≤4). Either way one does it, the same p-value results.)
(b)
0.020
(This can be obtained from SPSS using
Analyze > Compare Means > Independent-Samples T Test.)
(Note: One may wonder whether Student's two-sample t test is valid here.
If we do a test of the general two-sample problem (of the null hypothesis of identical distributions against the general alternative
that the distributions differ), then if the null hypothesis is true, the variances are equal (since the distributions are the same).
This supports the choice of Student's t test over Welch's test. But what about possible nonnormality?
Well, since the sample sizes are equal, if the null hypothesis is true the two sample means have the same sampling
distribution, and so a complete cancellation of any skewness effect will occur (because one sample mean is subtracted from the other,
which means that the sampling distribution of the difference in sample means will be symmetric (if the null hypothesis is true).
Other types of nonnormality could cause problems with validity, but some types of nonnormality, like heavy tails, will lead to a
conservative test instead of one that isn't valid. Still, having only 6 observations per sample makes is very hard to tell much about
the nature of the nonnormality, and with such small sample sizes it's also the case that the "Central Limit Theorem effect" may not be
too good (i.e., the sampling distributions of the sample means may not be as normal as is needed to make the T distribution
a good approximation of the actual null sampling distribution of the test statistic). All in all, it is somewhat risky to rely on the
robustness of Student's t test in this small sample setting, even though the facts that the general two-sample problem is being
addressed and that there are equal sample sizes, leads to some robustness (like a cancellation effect for skewness). So it is nice
that a nonparametric test which is clearly valid gives a p-value almost as small as the one which is suspect.)
Problem 67
One should
not conclude that living in Arizona exacerbates respiratory
problem, since the information provided does not give us strong evidence
that people in Arizona would have less respiratory trouble if they lived
elsewhere. It could be that a greater than average proportion of
people who have respiratory problems just happen to live in Arizona (and
it may be that some such people moved to Arizona because they believe
that
the climate there will be better for them and their respiratory problems
--- that is, Arizona may attract people with existing respiratory
problems as opposed to causing respiratory problems).
Problem 68
(a)
daily coffee intake
(b)
heart disease indicator
(a binary variable indicating whether or not a subject has coronary
heart disease)
(c)
the 1040 subjects
Problem 69
Because of what they may have heard or read, some people who think they are
consuming olestra, might expect to have a problem, giving us the
possibility of a noceblo effect. To "balance out" a possible
noceblo effect, subjects should not be told what type of potato chips
they are given. If subjects self-report whether or not they experienced
problems, then a double-blind experiment is not necessary, but if they
are interviewed by people helping to conduct the experiment, and these
helpers make a judgement or help the subjects to reach a judgement about
whether or not problems occurred, then double-blinding is desirable.
Problem 70
The alternative design is not good, because there is no good way
to determine if there are differences between the 5 treatments (one
actually a control), since any observed differences could be totally due
to differences between the litters (and variation among the piglets in
the litters). That is, due to confounding, if two groups were
appeciably different, one would not know if this was due to a difference
between treatments or a difference between litters.
(With an observational study, one cannot avoid confounding, and this
makes the interpretation of the results less crisp than what one
might get from a well-designed experiment But when designing an
experiment one needs to avoid problems with confounding --- in order to
attribute observed differences to, at least in part, treatment differences,
one needs to have large enough differences between observations for which
all of the factors are held constant except for the treatments (One
needs to the observed differences to be beyond what can be anticipated
due to experimental noise (due to variation among experimental units),
which can be assessed using statistical methods). A common way of
doing this is with blocks, since within each block key factors are
constant, or nearly constant, and any small differences can be absorbed
into the experimental noise by the use of proper randomization
(and may in fact be the major source of
noise).)
Problem 71
Plan III is the best plan,
Plan I is the second best plan,
and Plan II is the worst plan. II is worst because of
problem with confounding --- there is no way to determine if observed
differences
are due to treatment effects, or due to differences in the lighting
conditions. I allows for a fair assessment of treatment differences,
since randomization can account for (but not control/reduce)
observed differences due to
different lighting conditions. But III will provide increased power to
detect treatment effects, because some of the experimental noise due to
the random assignment of animals to the different lighting conditions
will be cancelled out by having equal numbers of each treatment group at
each level. (Another way to think of it is that by considering
differences between treatments each each level, the lighting is
held constant (or nearly so), and observed differences between groups at
a given level can be attributed to treatment differences (and
differences between animals that must be accounted for --- i.e., the
observed differences between treatments will have to detected above the
magnitude of differences that are easily explained by uncontrolled
experimental noise).)
Problem 72
Plan II is the better plan.
Confounding is a problem with Plan I ---
if rain allowed the last variety to remain in the field growing for an
appreciably longer period of time than the times for the other
varieties, and the last variety produced the greatest yield, one
wouldn't know if the large yield was due to the rain or due to that
variety of corn being superior. However, if rain interupted the
harvesting when using II, the different growing times might lead
to some appreciable differences between the total yields for the blocks,
but when comparing the varieties of corn, such differences in yield due to
growing time differences would cancel out, and one could still fairly
assess, and make specific statements about, differences between
varieties of corn.
Problem 73
False --- The primary reason for using a randomized block design
is to reduce the variability due to extraneous factors.
By reducing the experimental noise, observed differences can be
taken more seriously as evidence of differences between treatments.
If blocking is not done, differences in experimental units due to
extraneous factors can cloud the results of the experiment, since more
allowance has to be made for the observed differences being due to
factors other than the differences between treatments. A completely
randomized design, without blocking, can reduce bias, but by
combining
blocking with randomization, one can reduce variability in addition to
bias.
Problem 74
Because 3 measurements were made using each of 2 different flasks,
it's not good to assume that they are 6 iid observations from the same
distribution
--- the nested design leads to a lack of independence,
and the estimated standard error should be obtained by first computing 2
sample means (1 from each set of 3 observations), and then computing the
sample standard deviation using those two values. (See p. 336 for a
similar situation.)
Problem 75
To obtain the best set of blocks, one just needs to go down the
columns, letting the first two subjects be one block, the next two
subjects be another block, and so on.
The assignment of the subjects in the blocks to the treatments can be
done randomly, using one coin flip for each block.
(It can be noted that this creation of blocks put the two 56 year old
males in different blocks, but if they are put in the same block, then
another block will have two males rather different in age.)
Problem 76
(a)
0.001
(With SPSS, one can use
Analyze > Compare Means > One-Sample T Test. One should click protprod into the Test Variable box,
change the Test Value to 450, and click OK. SPSS does a two-tailed test, but since the test statistic
value is 3.525, which is positive, it can be concluded that the two-tailed test p-value of 0.002 corresponds to doubling
an upper-tail area of 0.001. This upper-tail area is the p-value for an upper-tailed test. (It's too bad that SPSS
supplies so few digits for the reported p-values, since in some cases we may want more. In this case one can conclude
that the two-tailed test p-value, if reported with more digits, is in the interval (0.0015, 0.0025), since it rounds to
0.002. It follows that the one-tail probability belongs to (0.00075, 0.00125). So if we wanted to express a bit more
precision, we cannot easily tell from the SPSS output whether we should put 0.0008, 0.0009, 0.0010, 0.0011, or 0.0012.
In this case, such extra indicated accuracy is perhaps not warrented, and so there is nothing wrong with stating that
the p-value is about/approximately 0.001. If one did want to indicate more precision, CDF.T could be used to
determine that the upper-tail area is about 0.00077, and so one could state that the p-value is about 0.0008.))
(b)
0.98
(With SPSS, one can use
Analyze > Compare Means > One-Sample T Test. One should click protprod into the Test Variable box,
change the Test Value to 500, and click OK. SPSS does a two-tailed test, but since the test statistic
value is -2.213, which is negative, it can be concluded that the two-tailed test p-value of 0.036 corresponds to doubling
a lower-tail area of 0.018. The corresponding upper-tail area, which is the p-value for an upper-tailed test,
is 1 - 0.018 = 0.982, which can be rounded to 0.98 (since with a p-value so large more precision isn't really needed,
and since nonnormaility makes the test only approximate anyway).)
(c)
The p-values should be regarded as being fairly trustworthy.
With SPSS, using Analyze > Descriptive Statsitics > Explore, one can see that the sample skewness is about -0.1, that the
sample kurtosis is about -0.9, and that the probit plot suggests an approximately normal distribution.
It seems that any skewness is of a negligible nature, and if the tails are indeed light, it should be remembered that
light tails don't cause a problem unless the sample size is about 10 or less.
Problem 77
(a)
0.091
(With SPSS, one can use
Analyze > Compare Means > One-Sample T Test,
changing the Test Value to 90. SPSS does a two-tailed test, but since the test statistic
value is -1.367, which is negative, it can be concluded that the two-tailed test p-value of 0.182 corresponds to doubling
a lower-tail area of 0.091. This lower-tail area is the p-value for a lower-tailed test.)
(b)
0.001
(With SPSS, one can use
Analyze > Compare Means > One-Sample T Test,
changing the Test Value to 95. SPSS does a two-tailed test, but since the test statistic
value is -3.487, which is negative, it can be concluded that the two-tailed test p-value of 0.002 corresponds to doubling
a lower-tail area of 0.001. This lower-tail area is the p-value for a lower-tailed test.
(It's too bad that SPSS
supplies so few digits for the reported p-values, since in some cases we may want more. In this case one can conclude
that the two-tailed test p-value, if reported with more digits, is in the interval (0.0015, 0.0025), since it rounds to
0.002. It follows that the one-tailed probability belongs to (0.00075, 0.00125). So if we wanted to express a bit more
precision, we cannot easily tell from the SPSS output whether we should put 0.0008, 0.0009, 0.0010, 0.0011, or 0.0012.
In this case, such extra indicated accuracy is perhaps not warrented, and so there is nothing wrong with stating that
the p-value is about/approximately 0.001. If one did want to indicate more precision, CDF.T could be used to
determine that the lower-tail area is about 0.00076, and so one could state that the p-value is about 0.0008.))
(c)
The p-values should not be regarded as being trustworthy.
With SPSS, using Analyze > Descriptive Statsitics > Explore, one can see that the sample skewness is about 1.6, that the
sample kurtosis is about 3.4, and that the probit plot suggests an appreciably distribution.
With such a small sample size, this degree of skewness creates a problem. (Doing a lower-tailed test with data from
a positively skewed distribution results in there being a danger that type I errors can occur with a fairly large
probability if the null hypothesis is true --- p-values can be misleadingly small.)
Problem 78
(a)
0.093
(With SPSS, one can use
Analyze > Compare Means > One-Sample T Test,
changing the Test Value to 5. SPSS does a two-tailed test, but since the test statistic
value is positive, it can be concluded that the two-tailed test p-value of 0.186 corresponds to doubling
an upper-tail area of 0.093. This upper-tail area is the p-value for a upper-tailed test.)
(b)
0.91
(In part (a), it was determined that the null probability that the test statistic, based on a μ0 value
of 5, exceeds the observed value is about 0.093. For the lower-tailed test of this part, the p-value is the
null probability that the test statistic, based on a μ0 value of 5, is less than its observed value.
Since the value of the test statistic is the same for parts (a) and (b) (since the data is the same, and
μ0 is the same), the desired lower-tail probability (which is the p-value) is about 1 - 0.093, or about
0.91.)
(c)
Given that it appears that we are dealing with a slightly heavy-tailed distribution, but one that is not appreciably
skewed,
the t test performs conservatively, and the
p-values should not be regarded as being misleadingly small.
Problem 79
(a)
0.28
(With SPSS, one can use
Analyze > Compare Means > One-Sample T Test, putting a variable corresponding to the matched-pairs differences
into the
Test Variable box. Alternatively, one can use
Analyze > Compare Means > Paired-Samples T Test, clicking two columns of observations into the
Paired Variables box. (Either way, the resulting p-value for a two-tailed test is about 0.28.))
(b)
Since we may be dealing with a light-tailed distribution (with a sample size of only 9 there is considerable
uncertainty, but indications are that we have data from a light-tailed distribution), and the sample size is rather
small,
the p-value should not be regarded as being trustworthy, because
in such situations the t test performs anticonservatively, which compromises the validity of the test.
(While this means that the p-value may be misleadingly small, in this particular situation there really isn't a
problem since the p-value isn't small, and so one should not claim that there is statistically significant evidence
that the mean differs from 0, and so a type I error will not be made.)
Problem 80
(a)
0.044
(With SPSS, one can use
Analyze > Compare Means > One-Sample T Test, putting a variable corresponding to the matched-pairs differences
into the
Test Variable box. Alternatively, one can use
Analyze > Compare Means > Paired-Samples T Test, clicking two columns of observations into the
Paired Variables box. (Either way, the resulting p-value for a two-tailed test is 0.044.))
(b)
The p-value should not be regarded as being trustworthy, mainly because with a sample size of 4 we
have no good way to learn much about the nature of the underlying distribution, and if there is appreciable
nonnormality, with a sample of size 4 we don't have any of the large sample robustness effects to help counter the
nonnormality.
Problem 81
There are infinite possibilities. Basically, you need to have all of
the di be the same value
in order to have the estimated standard
error of the mean difference equal to 0 (because the only way that a
sample standard deviation can equal 0 is to have all of the values in
the sample be the same value). For the two sample means (of the
xi and the
yi)
to be different, the common value for the
di cannot be 0. In order to have the estimated
standard errors for the sample means of the
Xi
and the Yi to be nonzero, one cannot have all of the
xi be the same value, nor all of the
yi be the same value. Below is one set of values
that works.
i |
xi |
yi |
di |
1 |
3 |
5 |
2 |
2 |
6 |
8 |
2 |
3 |
5 |
7 |
2 |
4 |
7 |
9 |
2 |
5 |
4 |
6 |
2 |
Problem 82
(a)
0.34
(I obtained this value from the SPSS output created to do part (b). One
could take the sample standard deviation value of 0.68 from p. 355 and
divide by the square root of 4 to obtain the same answer.)
(b)
0.016
(c)
One should worry that the p-value may be smaller than it should be,
due to nonnormality, and the very small sample size. When using the
t test to test for a treatment effect, the only worry is that
light tails can lead to a conservative test which produces p-values that
are misleadingly small. Typically, if there are about a dozen or more cases,
it can be assumed that the anticonservativeness is negligible. But with
only 4 cases, unless the differences were close to being normally
distributed, one might worry that light tails (which cannot be detected
well with only 4 observations --- there isn't a good way to check for
light tails with so few observations) can easily lead to a small p-value
even though the null hypothesis is true.
(d)
0.23
(e)
0.125
(Note: The p-value of 0.068 which results from SPSS (using Analyze > Nonparametric Tests > 2 Related Samples)
should not be
trusted, since the normal approximation (particularly the version SPSS
uses) is not guaranteed to work decently when the sample size is so
small. The correct value of 0.125 can be obtained from the table that I
supplied in class. It should be noted that 0.125 is the smallest
p-value possible for a two-tailed signed-rank test when there are only 4
observations. So it should be clear that one needs more than 4 cases in
such a setting where a test for a treatment effect is desired, since
nonnormality can throw a t test off when the sample size is so
small, and nonparametric tests (0.125 is also the smallest possible
p-value for a two-tailed sign test) cannot possibly produce a small
p-value.)
Problem 83
I used SPSS to obtain most of the information needed to supply the
answers below.
(a)
0.25
(b)
0.12
(c)
0.29
(d)
(-0.74, 0.98)
(e)
The probit plot suggests approximate normality, and in
particular,
there are no signs of appreciable skewness. Plus, we
have that a confidence interval has some robustness against slight
skewness (the main worry concerning the validity of the procedure)
due to a cancellation effect (not because of using a difference in
sample means, but because the skewness leads to an anticonservative
phenomenon that is partially cancelled out by a conservative phenomenon).
So, the interval produced should be
reasonably accurate.
(f)
0.69
(g)
When using the t test for a test for a treatment effect, there is
little to worry about with regard to validity. If the null hypothesis
is true, the distribution of the differences will be symmetric (about
0). So if distribution skewness contributes to a rejection of the null
hypothesis, this is good, since skewness by itself is evidence of a
treatment effect. Heavy tails make the test conservative, and light
tails shouldn't cause a problem as long as there are about a dozen or
more observations. So,
the results are reasonably trustworthy.
(h)
0.80
(i)
Here one has a choice: one can either use the conservative approach to
assign integer ranks so that the table of the exact distribution can be
used, or one can use mid-ranks (which results in a noninteger value for
the test statistic), and rely on the normal approximation to produce an
approximate p-value. The conservative approach results in a p-value of
1,
while the normal approximation based on midranks results in an
approximate p-value of
0.80.
(j)
The signed-rank test is always valid as a test for a treatment
effect, as long as one has independent units producing the matched
pairs data.
(If an approximation is used to obtain the p-value, one may be a bit
concerned about the
quality of the approximation, but one doesn't have to worry about
nonnormality.)
Problem 84
No, it does not. The range of the confidence interval for the
mean does not necessarily correspond to the range of the typical values
in the data set. However, from the confidence interval, one can extract the
value of the sample standard deviation, and that provides information
pertaining to the scatter of the values about the sample mean.
But without knowing something more about the values in the sample, one
cannot extract enough information about the sample of magnitudes of the
differences to be able to make a confident statement about the "typical"
magnitude. (Note: I don't like to make vague references about a
typical member of the population, since it's unclear as to what is
specifically meant.)
In order to address the magnitude of the difference for the "typical"
member of the population, one needs to consider the sample of magnitudes
of differences. In this case we have that 16 of the 29 dogs have a
difference that is less, in magnitude, than 1.1. So perhaps we would
say that the typical dog has a difference of less than 1.1.
However, if a sign test is done to determine if there is statistically
significant evidence that the median absolute difference is less than
1.1, the resulting p-value is about 0.36, and so there is a lack of
strong evidence that the median absolute difference is less than 1.1
Bottom line: In
general, we cannot extract precise information from the confidence interval
for the mean difference with which to make a strong statement about a typical
value. One can do more than is indicated in S&W, but it generally isn't
possible to extract enough information to make a confident statement
about a typical value.
Problem 85
I used SPSS to obtain
answers for parts (a) and (c), and used a table of the exact null
distribution of the signed-rank test to obtain the answer for part (b).
(a)
0.004
(b)
0.004
(c)
0.004
Problem 86
For the signed-rank test, there are 2n =
29 = 512 sets of ranks which are equally-likely to be the set
of ranks corresponding to the positive differences if the null
hypothesis is true. Two of these sets, one corresponding to a test
statistic value of 45, and the other corresponding to a test statistic
value of 0, give a value of the test statistic as extreme or more
extreme as the observed value of the test statistic. So the p-value is
2/512 = 1/256, which is about
0.0039
For the sign test, one can use the binomial (9, 0.5)
distribution
to determine that the exact p-value is
2/512 = 1/256, which is about
0.0039
Problem 87
I used SPSS to obtain
answers for parts (a) and (c), and used a table of the exact null
distribution of the signed-rank test to obtain the answer for part (b).
(a)
0.031
(b)
0.039
(c)
0.070
Problem 88
0.12.
(Since SPSS uses the exact binomial distribution (in this case (as can
be seen from the output), but not always) to obtain the p-value,
one could report 0.118, using three significant digits, but there is
typically little point in being so precise (since a p-value of 0.118,
0.120, or even 0.122, provide just about the same strength of evidence
against the null hypothesis). Using the table I supplied in class, one
could double 0.0592, to obtain 0.1184, which should definitely be
rounded to at least 3 significant digits unless one confirms that the 4
is correct (because it could be off due to rounding error (and in fact
it is, since the exact value, rounded to 4 decimal places, is 0.1185)).)
(Note: To obtain the p-value with SPSS, use Analyze > Nonparametric Tests > 2 Related Samples, and click the two columns
of measurements into the Test Pair(s) List. Then check the Sign box under Test Type, and click OK.
From the output it can be noted that there are 2 differences of 0. These are ignored, and so the effective sample size is 15.
To use the tables, we use the column for a sample size of 15, and go down to a test statistic value of 4, since there are 4 positive
differences. The tabulated value of 0.0592 is the null probability that the sign test statistic assumes a value of 4 or less.
Since we are doing a two-tailed test, and since the observed value of the test statistic is less than the null mean of 15/2 = 7.5,
this lower-tail probability is doubled to obtain the desired p-value.)
Problem 89
0.50.
(Since 4 of the observations are greater than 3.50, the value of the test statistic is 4. Since we are to determine if there is
significant evidence that the distribution median is less than 3.50, we need to do a lower-tailed test and obtain the null probability
of getting a test statistic value of 4 or less. Using the column for a sample of size 9 on the sign test tables I supplied, one sees
that the desired probability is listed as 0.5000.)
(Note: To obtain the p-value with SPSS, put
9 values of 3.5 into the first column and then enter the 9 sample values into the second column. Next use Analyze > Nonparametric Tests > 2 Related Samples, and click the two columns
into the Test Pair(s) List. (The makes the differences of the form observation - 3.5, so that the number of positive
differences is the number of sample values greater than 3.5.) Then check the Sign box under Test Type, and click OK.
An exact p-value of 1.000 is given for a two-tailed test. Since the value of the test statistic, 4, is less than the null mean of 9/2
= 4.5, the two-tailed test p-value results from doubling the lower-tail null probability of the test statistic assuming a value less
than or equal to 4. This lower-tail probability, which must be 0.500, is the desired p-value for the lower-tailed test.)
Problem 90
28 of the 39 observations exceed 1.30, and so the observed value of the
test statistic is 28. So the desired p-value is
P0(S >= 28). This probability is the same as
P0(S <= 11), which is a value that can be obtained
from the table I distributed in class. We have that the desired
p-value is about
0.0047.
Problem 91
SPSS can be used to obtain that the p-value is about
0.19.
(Note: SPSS uses the standard chi-square approximation, with an
adjustment for ties. There is no way to get an exact p-value, and even
if there were a way, I'm confident that to two significant digits, the
exact p-value would match the approximate p-value.)
Problem 92
(a)
SS(between) will equal 0 only if all of the sample means are equal.
SS(within) will exceed 0 as long as not all of the sample values are
equal to the sample means.
(b)
SS(between) will exceed 0 as long as not all of the sample means are equal.
SS(within) will equal 0 only if each sample value is equal to its
sample mean.
Problem 93
(a)
0.006
(b)
0.005
(c)
μ3 > μ2
(d)
There is evidence of slight positive skewness.
(The skewness definitely isn't strong, but there is enough of a consistent pattern
to indicate mild skewness. (Note: In this case, the sample sizes are large enough to make it so that it is
better to look at individual probit plots for each of the samples. If this is done, one might guess that the
distributions have different shapes.))
(e)
(Note: The plot does not provide strong evidence of heteroscedasticity.)
(f)
0.009
(g)
0.006
(h)
μ3 > μ2
(i)
0.011
Problem 94
(a)
0.006
(b)
0.006
(c)
(d)
There is clear evidence of positive skewness.
(Note: If there is heteroscedasticity, then it isn't good to put much emphasis on the probit plot of the
pooled residuals. In this case, due to the small sample sizes, it isn't clear that there is appreciable
heteroscedasticity, and so I think looking at the probit plot of the pooled residuals is a very good idea.
If we go with the assumption of a common error term distribution, then the plot suggests that this distribution is
appreciably skewed, and in that case, with samples of size 10, it's possible that an appearance of possible
heteroscedasticity is due to some of the small samples containing an outlier due to the stretched-out upper tail of
the error term distribution, and some of the samples, just by chance, not containing an outlier.)
(f)
0.016
(g)
0.036
(h)
mu2 < mu5
(i)
0.006
Problem 95
All of the results presented below were obtained using
Analyze > General Linear Model > Univariate in SPSS.
(a)
0.79
(b)
0.79
(c)
0.41
(There isn't statistically significant evidence that either sex
or dose affect the respose variable, level.)
Problem 96
(a)
p-value < 0.0005
(b)
p-value < 0.0005
(c)
0.14
Problem 97
(a)
0.003
(b)
0.009
(c)
0.19
(e)
p-value < 0.0005
(based on negative inverse square root transformation)
(f)
p-value < 0.0005
(based on negative inverse square root transformation)
(g)
p-value < 0.0005
(based on negative inverse square root transformation)
Problem 98
The best transformation to have selected is the inverse.
(e)
p-value < 0.0005
(f)
p-value < 0.0005
(g)
0.39
(h)
-0.35
(skewness),
-0.23
(kurtosis)
Problem 99
SPSS produces a test statistic value of 9.0, and gives 0.029 as an
approximate p-value.
Since the case of 4 treatments
and 3 blocks
is covered by the tables I
distributed in class, they should be used to obtain the exact (rounded)
p-value of
0.002.
(Using the table for k = 4 and n = 3, you can go down the
column labeled x to 9.0 (the observed value of the test
statistic), and read off the p-value of 0.002 right next to the 9.0,
looking in the P0(S ≥ x) column.) Note that in this case the chi-square approximation
performed horribly. This demonstrates the importance of using the tables of the exact distributions when they are
available.
Problem 100
SPSS produces a test statistic value of 6.5, and gives 0.039 as an
approximate p-value. (See the instructions given with the statement of
the problem on the
homework web page for how to produce
the SPSS output for Friedman's test.) Since the case of 3 treatments
(in this case, 3 different models of washing machines) and 4 blocks (in
this case, 4 different detergents) is covered by the tables I
distributed in class, they should be used to obtain the exact (rounded)
p-value of
0.042.
(Using the table for k = 3 and n = 4, you can go down the
column labeled x to 6.5 (the observed value of the test
statistic), and read off the p-value of 0.042 right next to the 6.5,
looking in the P0(S ≥ x) column.)
Problem 101
0.002
Problem 102
To get all of the values needed for parts (a), (b), and (c), I used
Analyze > Correlate > Bivariate.
I clicked fatfree and energy into the Variables box, and I clicked to
check the Spearman box so that I would get the Spearman coefficient in addition to the Pearson coefficient.
Finally, I clicked OK.
(a)
0.981
(b)
p-value < 0.0005
(SPSS reports the p-value as .000, which means that when rounded to the nearest thousandth,
it's 0.000 --- but I prefer to give the upper bound on the p-value.)
(c)
0.964
Problem 104
To get all of the values needed for parts (a), (b), and (c), I used
Analyze > Regression > Linear.
I clicked leucine into the Dependent box, and I clicked
time into the Independent box. Next I clicked Statistics and clicked to
check the Confidence intervals box so that I would get interval estimates in addition to the point estimates.
Finally, I clicked Continue, and then OK.
To get the result needed for part (d), do the regression again, keeping everything the same except before clicking
OK, click Options and then click to uncheck the Include constant in equation box.
(You can see that the estimated slope changes only slightly.)
(a)
0.986
(b)
-0.047
(c)
(-0.195, 0.100)
(d)
0.028 x
(where x is time)
Problem 110
(a)
-0.210
(Since I didn't specify how the difference was to be done, 0.210 is also okay.)
(b)
9.73
(c)
0.066
(Since I didn't specify how the difference was to be done, -0.066 is also okay.)
Problem 112
(a)
Time
(b)
- 0.786
- p-value
< 0.0005
- 0.038
(c)
- 0.983
- The
product of Time and Temp
has a p-value of
0.027.
- The value of the F statistic is
(90.488 - 7.265)/(3*7.265/18) =
68.73,
and we have that the p-value is
less than 0.0005.
(The p-value prints out as 0.0000000.)