Answers for HW #5
Fall 2004
Problem 1
part (a)
There is no good reason to rely on the
robustness of Student's t test,
since all of the two-sample nonparametric procedures
are valid,
and one of them produces a smaller p-value.
0.000076
is the smallest of the valid p-values, and it is produced by
Fisher's exact test (exact version).
Other p-values are given below. (Note: Welch's test should not be used
since under the null hypothesis the variances are equal.)
- 0.00011 (Fisher's exact test (chi-square approx., w/ c.c.)),
- 0.000036 (Fisher's exact test (chi-square approx., w/o c.c.)),
- 0.0012 (W-M-W test (normal approx., w/ c.c., using midranks, adj. for
ties)),
- 0.075 (W-W runs test (exact)),
- 0.076 (W-W runs test (normal approx., w/ c.c.)),
- 0.059 (W-W runs test (normal approx., w/o c.c.)),
- 0.0054 (Student's t test),
- 0.0054 (Welch's test).
part (b)
The value of the test statistic is about -4.082, and the resulting
p-value is about
0.0002. (The p-value rounds to 0.00020, but since it's approximate, and the approximation may not be
real good for such a small p-value, it makes sense to just report it as being about 0.0002.)
part (c)
Due to the lack of a shift model, the Hodges-Lehmann estimate shouldn't
be used. (The empirical Q-Q plot isn't close to being linear with a slope of 1,
and the sample skewness and kurtosis values differ by quite a bit.) Due to the strong skewness of one distribution,
and the mild skewness and light tails of the other distribution, the
difference in the sample means,
which is
0.38, should be preferred over the difference in
alternative estimates of the distribution means. (For each distribution, the sample mean is the best estimator of
the distribution mean.)
(Note: Since the difference in sample means has an estimated
standard error of about 0.132, the point estimate should be rounded to the nearest hundreth (using the second
significant digit guideline).)
part (d)
The Old distribution appears to be strongly negatively skewed, and so the results of my study indicate that
due to the small sample size a lower-tailed trimmed mean may be a good choice. Since the magnitude of the skewness
is greater than what was considered in my study, it could be that trimming more than 5% --- maybe trimming 2 or 3
observations --- will be good. A variety of estimates are givien below.
- 9.864 (lower-tailed trimmed mean, g = 1)
- 9.905 (lower-tailed trimmed mean, g = 2)
- 9.949 (lower-tailed trimmed mean, g = 3)
- 9.975 (sample median)
- 9.954 (H-D estimate)
- 9.897 (two-tailed 10% trimmed mean)
- 9.939 (two-tailed 20% trimmed mean)
- 9.935 (Huber M-estimate)
Of the bottom four estimates given above, perhaps the last two should be considered to be the best
(recall the 20% trimmed mean outperformed the H-D estimator in my study), and they are very similar
(and not too different from the H-D estimate). But my study indicates that when the skewness is strong, some
lower-tailed trimmed mean may be a better choice. The problem is it isn't clear how much to trim.
The Huber M-estimate resulted from L and H values of 5 and 1, and so it could be that trimming more
than 1 or 2 values will be good. In the end, there is a lot of uncertainty with this situation, but I'll guess that
an estimate of 9.93 or 9.94 may not be too bad.
The New distribution appears to be slightly negatively skewed and not at all heavy-tailed, and so the results of
my study indicate that
due to the small sample size a lower-tailed trimmed mean, with just a mild amount of trimming, may be a good choice
--- perhaps trimming just 1
observation will be good, which gives an estimate not too different from the H-D estimate.
- 9.454 (lower-tailed trimmed mean, g = 1)
- 9.483 (lower-tailed trimmed mean, g = 2)
- 9.457 (H-D estimate)
- 9.455 (sample median)
I'll go with the first estimate given above (but one can note that three of the estimates are nearly equal to one
another), and subtract it from the average of 9.93 and 9.94 to arrive at an
estimated difference of
0.48. This estimate should be considered to be superior to the ones given below.
(Recall that the H-L estimate only makes sense if a shift model can be assumed, and the sample median has been shown
to be inferior to other estimators of the median in small sample size situations.)
- 0.44 (H-L estimate)
- 0.52 (difference in sample medians)
- 0.50 (difference in H-D estimates)
part (e)
The Old distribution appears to be strongly negatively skewed, and so the results of my study indicate that
due to the small size, the modified E1 estimator is the best choice (clearly better than the H-D estimator).
A variety of estimates are givien below.
- 8.80 (modified E1)
- 8.91 (Harrell-Davis estimate)
- 8.76 (E8)
- 8.75 (estimate given on p. 115 of class notes)
- 8.72 (E6)
- 8.72 (E2)
- 8.73 (E4)
- 8.72 (E1)
The New distribution appears to be slightly negatively skewed and not at all heavy-tailed. A probit plot
suggests that the left tail may be approximately normal, or perhaps a bit light-tailed (like a uniform distribution).
The plot indicates that the negative skewness, which is rather mild anyway, may be primarily due to a stubby right
tail as opposed to an elongated left tail. All of this suggests that the H-D estimate should be considered to be
the best choice.
- 8.78 (modified E1)
- 8.72 (Harrell-Davis estimate)
- 8.73 (E8)
- 8.72 (estimate given on p. 115 of class notes)
- 8.68 (E6)
- 8.68 (E2)
- 8.69 (E4)
- 8.68 (E1)
I'll use the
difference in
modified E1 estimate (for the Old) and the
E9 estimate (for the New distribution),
and report
0.08 as my estimate.
- 0.18 (difference in H-D estimates)
- 0.02 (difference in modified E1 estimates)
- 0.44 (H-L estimate)
Problem 2
It should be noted that we have matched pairs instead of two independent samples.
One can begin by examining some possibilities. Here are p-values
from a variety of tests:
- 0.021 (sign test),
- 0.011 (signed-rank test, approximate),
- 0.0059 (signed-rank test, exact),
- 0.012 (normal scores test, approximate (I didn't expect you to try this)),
- 0.010 (trimmed mean t test, g = 1),
- 0.020 (trimmed mean t test, g = 2),
- 0.0059 (Student's t test),
- 0.0037 (Johnson's modified t test).
The
Wilcoxon signed-rank test should be chosen, since it
produces the smallest p-value of all of the reasonable candidates, and it's a
perfectly valid test when doing a test for a treatment effect. The
desired p-value is about
0.0059.
(Note: If one doubles the one-tailed probability from the table, the
result is 0.0058. However, that method is subject to a rounding error
effect, and a careful computation of the exact p-value results in a
value which rounds to 0.0059.)
Johnson's test should not be used, since it is for tests about the mean of
a skewed distribution. The accuracy of the t test and the tests
based on the trimmed means should be decent, though perhaps not great. But since these
tests are not exact
tests, and they don't produce the smallest p-value, there is no need to
rely on them. The sign test is perfectly valid, but it doesn't produce
a p-value which is smaller than the perfectly valid one which results
from the signed-rank test.
Problem 3
part (a)
There is no good reason to rely on the
robustness of Student's t test,
since all of the two-sample nonparametric procedures
are valid,
and one of them produces a smaller p-value.
0.24
is the smallest of the valid p-values, and it is produced by
Wald-Wolfowitz runs test (exact version).
Other p-values are given below. (Note: Welch's test should not be used
since under the null hypothesis the variances are equal.)
- 1.00 (Fisher's exact test (exact)),
- 0.68 (Fisher's exact test (chi-square approx., w/ c.c.)),
- 1.00 (Fisher's exact test (chi-square approx., w/o c.c.)),
- 0.93 (W-M-W test (normal approx., w/ c.c., using midranks, adj. for
ties)),
- 0.23 (W-W runs test (normal approx., w/ c.c.)),
- 0.17 (W-W runs test (normal approx., w/o c.c.)),
- 0.58 (Student's t test),
- 0.69 (Welch's test).
part (b)
Because there is evidence that neither
distribution is stochastically larger than the
other one, we should not use nonparametric tests to do a
test about the means. (Whether one uses E8 or E9 to estimate the percentiles,
the distribution for the smokers has a smaller estimated 25th percentile and a larger estimated 75th percentile
than the distribution for the nonsmokers.)
So we should rely on the robustness of a normal theory procedure.
Because there isn't a strong reason to assume that the variances are
equal,
Welch's test should be chosen over Student's t
test. The resulting p-value is about
0.69. (Some other p-values are given above.)
Problem 4
Both distributions appear to be positively skewed, and it appears
(based on an empirical Q-Q plot) that
one distribution is stochastically larger than the other. But it
doesn't seem appropriate to assume a shift model. (The sample standard
deviations differ by more than a factor of 2 and the sample skewness values are quite different.)
part (a)
Since it
seems safe to assume that one distribution is stochastically larger than
the other if they aren't identical, the two-sample nonparametric tests
can be used for a test about the means, and so one doesn't have to
strongly consider whether or not Welch's test is an appropriate choice,
since it doesn't produce the smallest p-value.
(If one was to use Welch's test, its robustness would have to be relied
upon, which may be a problem due to the small sample sizes and the fact that one distribution appears to be much more
strongly skewed than the other one is.)
We should remove Student's t test from consideration because the
variances may be rather unequal.
0.0011
is the smallest of the valid p-values, and it is produced by the
W-M-W test (normal approx., w/ c.c., and using midranks).
Other p-values are given below.
- 0.42 (W-W runs test (exact)),
- 0.42 (W-W runs test (normal approx., w/ c.c.)),
- 0.020 (Fisher's exact test (exact)),
- 0.021 (Fisher's exact test (chi-square approx., w/ c.c.)),
- 0.0072 (Fisher's exact test (chi-square approx., w/o c.c.)),
- 0.0020 (Welch's test),
- 0.0013 (Student's t test).
(Note: One cannot directly address the one-sided altrernative with the W-W runs test, but
the p-value from the W-W runs test can be applied to the two-sided alternative, and then given the
indication from the data of which mean is greater than which other mean if they differ, that two-sided alternative p-value
can be used with the one-sided alternative. This inability to directly address a one-sided alternative with a W-W
runs test generally results in low power when the test result is applied to a one-sided alternative.)
part (b)
Because it doesn't seem appropriate to assume a shift model, the
interval associated with the W-M-W procedure, which is (3.0, 45.0),
should not be used. Furthermore, since there is no good reason to
assume equal variances, the
Welch interval,
which is
(2.8, 40.7),
should be preferred to Student's t interval, which is (3.6,
39.9). It should be noted that while it's not reasonable to believe
that the assumption of normality underlying Welch's interval is met,
relying on it's robustness is the best that we can do given the procedures covered in STAT 554, since the other
procedures can be rather inaccurate if the variances differ greatly.)
Problem 5
One needs to work with two independent samples of matched pairs differences.
Although Student's t test is rather robust for the general
two-sample problem when the sample sizes are
equal and not too small, since it doesn't
produce the smallest p-value, there is no good reason to rely on
it's robustness, since all of the two-sample nonparametric procedures
are valid,
and one of them produces a smaller p-value.
0.00031
is the smallest of the valid p-values, and it is produced by
W-M-W test (exact version).
(Note: Using the tables, one gets 0.0004 by doubling the indicated one-tail probability of 0.0002.
But the exact one-tail probability is a bit less than 0.0002, even though it rounds to 0.0002. The exact one-tail
probability can be obtained by noting that only one of the equally-likely patterns yields a M-W test statistic value
of 0, and only one of them yields a test statistic value of 1. So the exact lower-tail null probability corresponding to
u = 1 is just 2 divided by the total number of equally-likely patterns.)
Other p-values are given below. (Note: Welch's test should not be used
since under the null hypothesis the variances are equal. Also, approximate
versions of tests that can be done exactly should not be used due to the
small sample sizes.)
- 0.0089 (W-W runs test (exact)),
- 0.0099 (W-W runs test (normal approx., w/ c.c.)),
- 0.0048 (W-W runs test (normal approx., w/o c.c.)),
- 0.010 (Fisher's exact test (exact)),
- 0.012 (Fisher's exact test (chi-square approx., w/ c.c.)),
- 0.0027 (Fisher's exact test (chi-square approx., w/o c.c.)),
- 0.0014 (W-M-W test (normal approx., w/ c.c.)),
- 0.0011 (W-M-W test (normal approx., w/o c.c.)),
- 0.0048 (Student's t test),
- 0.012 (Welch's test).