Answers for HW #5
Spring 2008
Problem 1
part (a)
There is no good reason to rely on the
robustness of Student's t test,
since all of the two-sample nonparametric procedures
are valid,
and one of them produces a smaller p-value.
0.070
is the smallest of the valid p-values, and it is produced by the
Fisher's exact test.
Other p-values are given below. (Note: Welch's test should not be used
since under the null hypothesis the variances are equal.)
- 0.10 (W-M-W test (approximate version, using midranks and adjusting for ties)),
- 0.14 (W-M-W test (exact, breaking ties in a conservative manner)),
- 0.072 (Fisher's exact test (chi-square approx., w/ c.c.)),
- 0.025 (Fisher's exact test (chi-square approx., w/o c.c.)),
- 0.59 (W-W runs test (exact)),
- 0.59 (W-W runs test (normal approx., w/ c.c.)),
- 0.50 (W-W runs test (normal approx., w/o c.c.)),
- 0.18 (Student's t test),
- 0.19 (Welch's test).
part (b)
The value of the test statistic is about 1.6845, and the resulting
p-value is about
0.11.
Problem 2
part (a)
Since it
seems safe to assume that one distribution is stochastically larger than
the other if they aren't identical, the two-sample nonparametric tests
can be used for a test about the means, and so one doesn't have to
strongly consider whether or not Welch's test is an appropriate choice,
since it doesn't produce the smallest p-value.
(If one was to use Welch's test, its robustness would have to be relied
upon, which may be a problem due to the small sample sizes and the fact that one distribution may be more
strongly skewed than the other one is.)
We should remove Student's t test from consideration because the
variances may be rather unequal.
(Note:
One always has to make some assumptions, and in this case the data
provides no evidence against the assumption that one distribution is
stochastically larger than the other if they differ, but the data does
provide some evidence of nonnormality and unequal variances.
The fact that heteroscedasticity makes Welch's test
the choice over Student's t test causes a problem since with such
small sample sizes, the estimated df to use may be inaccurate.
So, all in all, I worry less about making the assumption of one
distribution being stochastically larger than the other if they differ,
than I do about the robustness of Welch's test.)
0.021
is the smallest of the valid p-values, and it is produced by the
W-M-W test (using the exact version, breaking ties conservatively).
(Note: The only tied values occur in the same sample, and so the same sum of ranks is obtained whether one uses
midranks or only integer ranks.)
Other p-values are given below.
- 0.023 (W-M-W test (normal approx., w/ c.c., using midranks, adj. for
ties)),
- 0.051 (Fisher's exact test (exact)),
- 0.053 (Fisher's exact test (chi-square approx., w/ c.c.)),
- 0.015 (Fisher's exact test (chi-square approx., w/o c.c.)),
- 0.034 (Welch's test),
- 0.020 (Student's t test),
- 0.23 (W-W runs test (exact)),
- 0.22 (W-W runs test (normal approx., w/ c.c.)),
- 0.15 (W-W runs test (normal approx., w/o c.c.)).
(Note: One cannot directly address the one-sided altrernative with the W-W runs test, but
the p-value from the W-W runs test can be applied to the two-sided alternative, and then given the
indication from the data of which mean is greater than which other mean if they differ, that two-sided alternative p-value
can be used with the one-sided alternative. This inability to directly address a one-sided alternative with a W-W
runs test generally results in low power when the test result is applied to a one-sided alternative.)
part (b)
Because it doesn't seem appropriate to assume a shift model, the
interval associated with the W-M-W procedure, which is (4, 38),
should not be used. Furthermore, since there is no good reason to
assume equal variances, the
Welch interval,
which is
(-2, 39),
should be preferred to Student's t interval, which is (1,
36). (It should be noted that while it's not reasonable to believe
that the assumption of normality underlying Welch's interval is met,
relying on it's robustness is the best that we can do given the procedures covered in STAT 554, since the other
procedures can be more inaccurate if the variances differ.)
Although the usual rule of thumb applied to the estimated standard error of the difference in sample means indicates rounding to the
nearest tenth is appropriate, since the data values are only recorded to the nearest integer, and violations of the assumptions makes the
procedure subject to some inaccuracy, I will report the interval estimate as
(-2, 39).
Problem 3
There is no good reason to rely on the
robustness of Student's t test,
since all of the two-sample nonparametric procedures
are valid,
and one of them produces a smaller p-value.
0.60
is the smallest of the valid p-values, and it is produced by
Wald-Wolfowitz runs test (exact version).
Other p-values are given below. (Note: Welch's test should not be used
since under the null hypothesis the variances are equal.)
- 1.00 (Fisher's exact test (exact)),
- 0.83 (Fisher's exact test (chi-square approx., w/ c.c.)),
- 0.95 (Fisher's exact test (chi-square approx., w/o c.c.)),
- 0.89 (W-M-W test (normal approx., w/ c.c., using midranks, adj. for
ties)),
- 0.60 (W-W runs test (normal approx., w/ c.c.)),
- 0.54 (W-W runs test (normal approx., w/o c.c.)),
- 0.75 (Student's t test),
- 0.74 (Welch's test).
Problem 4
It should be noted that we have matched pairs instead of two independent samples.
One can begin by examining some possibilities. Here are p-values
from a variety of tests:
- 0.0074 (sign test),
- 0.0026 (signed-rank test, exact, breaking ties conservatively),
- 0.004 (signed-rank test, approximate),
- 0.010 (trimmed mean t test, g = 2),
- 0.0045 (trimmed mean t test, g = 1),
- 0.0033 (Student's t test),
- 0.0012 (Johnson's modified t test).
The
Wilcoxon signed-rank test should be chosen, since it
produces the smallest p-value of all of the reasonable candidates, and it's a
perfectly valid test when doing a test for a treatment effect. The
desired p-value is about
0.0026.
Johnson's test should not be used, since it is for tests about the mean of
a skewed distribution. The accuracy of the t test and the tests
based on the trimmed means should be decent, though perhaps not great. But since these
tests are not exact
tests, and they don't produce the smallest p-value, there is no need to
rely on them. The sign test is perfectly valid, but it doesn't produce
a p-value which is smaller than the perfectly valid one which results
from the signed-rank test.