Answers for HW #4

Fall 2003


Note: The format used below is not what I expected you to use --- you should have given some plots, and need not have given the results for procedures which shouldn't be considered. I'm giving results obtained from many different methods in order to make it easier for me to grade the papers (since I suspect that not everyone will have consistently used the best methods).

Problem 1

A symmetry plot shows a clear pattern indicating negative skewness. The skewness can also be seen from a probit plot, although heavy tails create something of a S shape, and this perhaps makes the skewness a bit harder for some to see. A sample skewness of about -0.92 also provides evidence of negative skewness.

part (a)

Based on the results of my studies, E9 and a modified version of E1 are the best of the possibilities. (Note: The guidelines I supplied with HW #4 were not intended to supply you will all of the information that you need. I gave some specific recommendations concerning quantile estimation, but did not give a recommendation to cover every possible situation. I was rather surprised that no one took me up on my offer to show you my graphs pertaining to quantile estimation again, since an examination of those graphs would have taken some of the guesswork out of the process, and would have allowed you to back up your choice of estimation procedure (if you picked the correct estimator) or avoid making a poor choice.) For normal and logistic distributions, E9 is clearly the better choice for the 10th percentile when the sample size is at least 50. For the exponential distribution, which has a skewness of 2, and so is more skewed than the distribution underlying the data, it's less clear --- for a sample size of 50, E1 is a bit better (for estimating the 90th percentile), and for a sample size of 100, E9 seems appreciably better. So for a sample size of 63, there is perhaps little difference in the ability of E1 and E9 to estimate the 90th percentile, and this would suggest that for a negatively skewed distribution shaped somewhat like an exponential distribution (only negatively skewed), E9 and a modified E1 would perform similarly, on the average. But since the distribution that we want to make an inference about isn't as skewed as an exponential distribution, I think it's best to favor E9 over a modified E1. So, the estimate is 0.99, obtained using the Harrell-Davis estimator (E9).

For purposes of comparison (and for grading purposes), some other estimates are:

part (b)

The possibilities for a valid test are the sign test, and employing the transformation ploy and doing a t test, although the latter requires that a transformation to symmetry be found (so that a test about the mean of the transformed distribution can be taken to be a test about the median of the transformed distribution).

Although a power transformation with a power of 2.25 results in a sample skewness near zero, symmetry and probit plots suggest that this transformation doesn't provide a symmetric distribution. (Note: Although symmetry implies a skewness of 0, a skewness of 0 doesn't imply symmetry.) It can also be noted that the transformation results in a sample mean that is somewhat less than estimates of the distribution median. So even though the transformation method yielded a smaller p-value (0.0022), I don't think that it should be considered to be trustworthy. So the p-value of 0.012, resulting from the sign test, should be preferred.

For purposes of comparison (and for grading purposes), some other p-values are:

part (c)

Since the desired test is about the mean of an apparently skewed distribution, Johnson's modifed t test is the clear choice, and so the desired p-value is 0.00009 (or 0.0001). The other tests are either not valid or have poor power (due to conservativeness in the situation under consideration).


Problem 2

A symmetry plot suggests that the distribution is nearly symmetric, and a sample skewness of about -0.08 lends support. Although the sample kurtosis is only about 1.1, various Q-Q plots indicate that the distribution may be appreciably heavy-tailed (with perhaps a kurtosis of about 3).

part (a)

Since the sample size isn't real small, the distribution appears to have little skewness, and the quantile is not in a far tail of the distribution, the Harrell-Davis estimator (E9) should be preferred, and the resulting estimate of 8.47 reported.

For purpose of comparison (and for grading purposes), some other estimates are:

part (b)

Based upon indications that the kurtosis may be as large as 3, one might think that trimming about 20% or 15% (as opposed to about 10%) may be in order. However, upon looking at the sample, it appears that only 3 observations from each end of the ordered sample are contributing significantly to the appearance of a heavy-tailed distribution. Because of all of this, it may be prudent to use a M-estimator, which can trim and adjust adaptively (using the data to determine how much trimming and adjusting is done). Using a "middle of the road" sized bend seems reasonable, given the mixed signals, and so my estimate is 9.27, obtained from a M-estimator with a bend of 1.345.

Below are some other estimates: It's impossible to know if 9.27 is a better choice than 9.26 or 9.25, but the results of the studies that I've done (and presented to you) make it pretty clear that when the sample size is small, the sample median is not a good choice, and the Harrell-Davis estimator is also inferior to many other choices.


Problem 3

part (a)

One can begin by examining some possibilities. Here are p-values from a variety of tests: The Wilcoxon signed-rank test should be chosen, since it produces the smallest p-value of all of the reasonable candidates, and it's a perfectly valid test when doing a test for a treatment effect. The desired p-value is about 0.012.

part (b)

For this part, we need to take into account the strong skewness which is apparent in the difference distribution. (If the null hypothesis of no treatment effect is true, which doesn't seem reasonable, given the result of part (a), then we would have symmetry about 0 --- and no need to estimate the mean. So we allow for the possibility that there is a treatment effect.)

Due to the skewness, which can be easily detected using a symmetry plot and/or a probit plot, with the sample skewness in agreement, Johnson's modified t procedure should be chosen, and the resulting confidence interval of (2.0, 18.8) reported. (Note: Johnson's method results in an interval of exactly the same width as the standard t interval. So it isn't preferred because it is shorter --- rather it is preferred because an adjustment for distribution skewness centers it differently and by doing so greater accuracy is achieved.)

Some other intervals (which are inappropriate) are:

Problem 4

A symmetry plot shows a clear pattern indicating positive skewness. The skewness can also be clearly seen from a probit plot. A sample skewness of about 1.6 also provides evidence of positive skewness.

part (a)

Based on the results of my studies, E1 is the best choice for estimating the 95th percentile, given the skewness and sample size --- since the 95th percentile is in the far upper tail, the skewness is appreciable, and the sample size is smallish. (My study indicates that for estimating the 90th percentile of an exponential distribution from a sample of size 50, E1 does better than E9. Estimating a more extreme quantile from a smaller sample favors the choice of E1 even more.) The resulting estimate is 1.27.

For purpose of comparison (and for grading purposes), some other estimates are:

part (b)

The possibilities for a valid interval is the one based on the sign procedure, and employing the transformation ploy and doing an interval based on Student's t and then applying the inverse transformation, although the latter requires that a transformation to symmetry be found (so that an interval for the mean of the transformed distribution can be taken to be an interval for the median of the transformed distribution).

A power transformation with a power of 2/17 results in a sample skewness near zero, and symmetry and probit plots suggest that this transformation may result in something very close to symmetry. However, the resulting confidence interval isn't any shorter than the one obtained using the sign procedure, and so one might as well choose the standard sign procedure interval, which is (0.32, 0.48).

For purpose of comparison (and for grading purposes), some other intervals are:

part (c)

Due to the apparent skewness, the sample mean is the only thing that makes sense. The bias of any of the alternative estimators is too great to overcome when the skewness is appreciable. So, the estimate is 0.50, from the sample mean.

part (d)

Johnson's modifed t test is the clear choice, because of the apparent skewness, and so the desired p-value is 0.013. The other tests are not valid for doing lower-tailed tests about the means of positively skewed distributions, and so should not be used even though they yield smaller p-values.

For purpose of comparison (and for grading purposes), some other p-values are: