HW #5 solutions, Fall '04, STAT 554

Answers for HW #5

Fall 2004

Problem 1

part (a)

There is no good reason to rely on the robustness of Student's t test, since all of the two-sample nonparametric procedures are valid, and one of them produces a smaller p-value.

0.000076 is the smallest of the valid p-values, and it is produced by Fisher's exact test (exact version). Other p-values are given below. (Note: Welch's test should not be used since under the null hypothesis the variances are equal.)

0.00011 (Fisher's exact test (chi-square approx., w/ c.c.)),
0.000036 (Fisher's exact test (chi-square approx., w/o c.c.)),
0.0012 (W-M-W test (normal approx., w/ c.c., using midranks, adj. for ties)),
0.075 (W-W runs test (exact)),
0.076 (W-W runs test (normal approx., w/ c.c.)),
0.059 (W-W runs test (normal approx., w/o c.c.)),
0.0054 (Student's t test),
0.0054 (Welch's test).

part (b)

The value of the test statistic is about -4.082, and the resulting p-value is about 0.0002. (The p-value rounds to 0.00020, but since it's approximate, and the approximation may not be real good for such a small p-value, it makes sense to just report it as being about 0.0002.)

part (c)

Due to the lack of a shift model, the Hodges-Lehmann estimate shouldn't be used. (The empirical Q-Q plot isn't close to being linear with a slope of 1, and the sample skewness and kurtosis values differ by quite a bit.) Due to the strong skewness of one distribution, and the mild skewness and light tails of the other distribution, the difference in the sample means, which is 0.38, should be preferred over the difference in alternative estimates of the distribution means. (For each distribution, the sample mean is the best estimator of the distribution mean.) (Note: Since the difference in sample means has an estimated standard error of about 0.132, the point estimate should be rounded to the nearest hundreth (using the second significant digit guideline).)

part (d)

The Old distribution appears to be strongly negatively skewed, and so the results of my study indicate that due to the small sample size a lower-tailed trimmed mean may be a good choice. Since the magnitude of the skewness is greater than what was considered in my study, it could be that trimming more than 5% --- maybe trimming 2 or 3 observations --- will be good. A variety of estimates are givien below.

9.864 (lower-tailed trimmed mean, g = 1)
9.905 (lower-tailed trimmed mean, g = 2)
9.949 (lower-tailed trimmed mean, g = 3)
9.975 (sample median)
9.954 (H-D estimate)
9.897 (two-tailed 10% trimmed mean)
9.939 (two-tailed 20% trimmed mean)
9.935 (Huber M-estimate)

Of the bottom four estimates given above, perhaps the last two should be considered to be the best (recall the 20% trimmed mean outperformed the H-D estimator in my study), and they are very similar (and not too different from the H-D estimate). But my study indicates that when the skewness is strong, some lower-tailed trimmed mean may be a better choice. The problem is it isn't clear how much to trim. The Huber M-estimate resulted from L and H values of 5 and 1, and so it could be that trimming more than 1 or 2 values will be good. In the end, there is a lot of uncertainty with this situation, but I'll guess that an estimate of 9.93 or 9.94 may not be too bad.

The New distribution appears to be slightly negatively skewed and not at all heavy-tailed, and so the results of my study indicate that due to the small sample size a lower-tailed trimmed mean, with just a mild amount of trimming, may be a good choice --- perhaps trimming just 1 observation will be good, which gives an estimate not too different from the H-D estimate.

9.454 (lower-tailed trimmed mean, g = 1)
9.483 (lower-tailed trimmed mean, g = 2)
9.457 (H-D estimate)
9.455 (sample median)

I'll go with the first estimate given above (but one can note that three of the estimates are nearly equal to one another), and subtract it from the average of 9.93 and 9.94 to arrive at an estimated difference of 0.48. This estimate should be considered to be superior to the ones given below. (Recall that the H-L estimate only makes sense if a shift model can be assumed, and the sample median has been shown to be inferior to other estimators of the median in small sample size situations.)

0.44 (H-L estimate)
0.52 (difference in sample medians)
0.50 (difference in H-D estimates)

part (e)

The Old distribution appears to be strongly negatively skewed, and so the results of my study indicate that due to the small size, the modified E1 estimator is the best choice (clearly better than the H-D estimator). A variety of estimates are givien below.

8.80 (modified E1)
8.91 (Harrell-Davis estimate)
8.76 (E8)
8.75 (estimate given on p. 115 of class notes)
8.72 (E6)
8.72 (E2)
8.73 (E4)
8.72 (E1)

The New distribution appears to be slightly negatively skewed and not at all heavy-tailed. A probit plot suggests that the left tail may be approximately normal, or perhaps a bit light-tailed (like a uniform distribution). The plot indicates that the negative skewness, which is rather mild anyway, may be primarily due to a stubby right tail as opposed to an elongated left tail. All of this suggests that the H-D estimate should be considered to be the best choice.

8.78 (modified E1)
8.72 (Harrell-Davis estimate)
8.73 (E8)
8.72 (estimate given on p. 115 of class notes)
8.68 (E6)
8.68 (E2)
8.69 (E4)
8.68 (E1)

I'll use the difference in modified E1 estimate (for the Old) and the E9 estimate (for the New distribution), and report 0.08 as my estimate.

0.18 (difference in H-D estimates)
0.02 (difference in modified E1 estimates)
0.44 (H-L estimate)

Problem 2

It should be noted that we have matched pairs instead of two independent samples.

One can begin by examining some possibilities. Here are p-values from a variety of tests:

0.021 (sign test),
0.011 (signed-rank test, approximate),
0.0059 (signed-rank test, exact),
0.012 (normal scores test, approximate (I didn't expect you to try this)),
0.010 (trimmed mean t test, g = 1),
0.020 (trimmed mean t test, g = 2),
0.0059 (Student's t test),
0.0037 (Johnson's modified t test).

The Wilcoxon signed-rank test should be chosen, since it produces the smallest p-value of all of the reasonable candidates, and it's a perfectly valid test when doing a test for a treatment effect. The desired p-value is about 0.0059. (Note: If one doubles the one-tailed probability from the table, the result is 0.0058. However, that method is subject to a rounding error effect, and a careful computation of the exact p-value results in a value which rounds to 0.0059.)

Johnson's test should not be used, since it is for tests about the mean of a skewed distribution. The accuracy of the t test and the tests based on the trimmed means should be decent, though perhaps not great. But since these tests are not exact tests, and they don't produce the smallest p-value, there is no need to rely on them. The sign test is perfectly valid, but it doesn't produce a p-value which is smaller than the perfectly valid one which results from the signed-rank test.

Problem 3

part (a)

There is no good reason to rely on the robustness of Student's t test, since all of the two-sample nonparametric procedures are valid, and one of them produces a smaller p-value.

0.24 is the smallest of the valid p-values, and it is produced by Wald-Wolfowitz runs test (exact version). Other p-values are given below. (Note: Welch's test should not be used since under the null hypothesis the variances are equal.)

1.00 (Fisher's exact test (exact)),
0.68 (Fisher's exact test (chi-square approx., w/ c.c.)),
1.00 (Fisher's exact test (chi-square approx., w/o c.c.)),
0.93 (W-M-W test (normal approx., w/ c.c., using midranks, adj. for ties)),
0.23 (W-W runs test (normal approx., w/ c.c.)),
0.17 (W-W runs test (normal approx., w/o c.c.)),
0.58 (Student's t test),
0.69 (Welch's test).

part (b)

Because there is evidence that neither distribution is stochastically larger than the other one, we should not use nonparametric tests to do a test about the means. (Whether one uses E8 or E9 to estimate the percentiles, the distribution for the smokers has a smaller estimated 25th percentile and a larger estimated 75th percentile than the distribution for the nonsmokers.) So we should rely on the robustness of a normal theory procedure. Because there isn't a strong reason to assume that the variances are equal, Welch's test should be chosen over Student's t test. The resulting p-value is about 0.69. (Some other p-values are given above.)

Problem 4

Both distributions appear to be positively skewed, and it appears (based on an empirical Q-Q plot) that one distribution is stochastically larger than the other. But it doesn't seem appropriate to assume a shift model. (The sample standard deviations differ by more than a factor of 2 and the sample skewness values are quite different.)

part (a)

Since it seems safe to assume that one distribution is stochastically larger than the other if they aren't identical, the two-sample nonparametric tests can be used for a test about the means, and so one doesn't have to strongly consider whether or not Welch's test is an appropriate choice, since it doesn't produce the smallest p-value. (If one was to use Welch's test, its robustness would have to be relied upon, which may be a problem due to the small sample sizes and the fact that one distribution appears to be much more strongly skewed than the other one is.) We should remove Student's t test from consideration because the variances may be rather unequal.

0.0011 is the smallest of the valid p-values, and it is produced by the W-M-W test (normal approx., w/ c.c., and using midranks). Other p-values are given below.

0.42 (W-W runs test (exact)),
0.42 (W-W runs test (normal approx., w/ c.c.)),
0.020 (Fisher's exact test (exact)),
0.021 (Fisher's exact test (chi-square approx., w/ c.c.)),
0.0072 (Fisher's exact test (chi-square approx., w/o c.c.)),
0.0020 (Welch's test),
0.0013 (Student's t test).

(Note: One cannot directly address the one-sided altrernative with the W-W runs test, but the p-value from the W-W runs test can be applied to the two-sided alternative, and then given the indication from the data of which mean is greater than which other mean if they differ, that two-sided alternative p-value can be used with the one-sided alternative. This inability to directly address a one-sided alternative with a W-W runs test generally results in low power when the test result is applied to a one-sided alternative.)

part (b)

Because it doesn't seem appropriate to assume a shift model, the interval associated with the W-M-W procedure, which is (3.0, 45.0), should not be used. Furthermore, since there is no good reason to assume equal variances, the Welch interval, which is (2.8, 40.7), should be preferred to Student's t interval, which is (3.6, 39.9). It should be noted that while it's not reasonable to believe that the assumption of normality underlying Welch's interval is met, relying on it's robustness is the best that we can do given the procedures covered in STAT 554, since the other procedures can be rather inaccurate if the variances differ greatly.)

Problem 5

One needs to work with two independent samples of matched pairs differences.

Although Student's t test is rather robust for the general two-sample problem when the sample sizes are equal and not too small, since it doesn't produce the smallest p-value, there is no good reason to rely on it's robustness, since all of the two-sample nonparametric procedures are valid, and one of them produces a smaller p-value.

0.00031 is the smallest of the valid p-values, and it is produced by W-M-W test (exact version). (Note: Using the tables, one gets 0.0004 by doubling the indicated one-tail probability of 0.0002. But the exact one-tail probability is a bit less than 0.0002, even though it rounds to 0.0002. The exact one-tail probability can be obtained by noting that only one of the equally-likely patterns yields a M-W test statistic value of 0, and only one of them yields a test statistic value of 1. So the exact lower-tail null probability corresponding to u = 1 is just 2 divided by the total number of equally-likely patterns.) Other p-values are given below. (Note: Welch's test should not be used since under the null hypothesis the variances are equal. Also, approximate versions of tests that can be done exactly should not be used due to the small sample sizes.)

0.0089 (W-W runs test (exact)),
0.0099 (W-W runs test (normal approx., w/ c.c.)),
0.0048 (W-W runs test (normal approx., w/o c.c.)),
0.010 (Fisher's exact test (exact)),
0.012 (Fisher's exact test (chi-square approx., w/ c.c.)),
0.0027 (Fisher's exact test (chi-square approx., w/o c.c.)),
0.0014 (W-M-W test (normal approx., w/ c.c.)),
0.0011 (W-M-W test (normal approx., w/o c.c.)),
0.0048 (Student's t test),
0.012 (Welch's test).