Answers for HW #4
Spring 2008
Note: The format used below is not what I expected you to
use --- you should have given some plots, and need not have given the
results for procedures which shouldn't be considered. I'm giving
results obtained from many different methods in order to make it easier
for me to grade the papers (since I suspect that not everyone will have
consistently used the best methods).
Problem 1
A probit plot shows a clear pattern indicating fairly strong positive skewness.
The skewness can also be seen from a symmetry plot, and
a sample skewness of about 1.8 also provides
evidence of positive skewness.
part (a)
Due to the skewness, which is large in magnitude relative to the kurtosis, the
sample mean,
60,
should be chosen as the estimate of the distribution mean. (Recall, the sample kurtosis is not
as easy to interpret when there is appreciable skewness. The skewness being large can cause the
kurtosis to be large.)
Some other (bad) estimates are given below
(for grading purposes --- they should not be considered to be competitive here):
- 22 (sample median),
- 49.1 (trimmed mean (g = 2)),
- 40.4 (M-estimate (with bend parameter 1.345)).
part (b)
Since the desired test is about the mean of an apparently skewed
distribution,
Johnson's modifed t test is the clear choice, and the
desired p-value is about
0.028.
The other tests are not valid.
For grading purposes,
some other p-values are:
- 0.0023 (Student's t test),
- 0.003 (signed-rank test (Minitab's normal approx. using midranks)),
- 0.00016 (sign test).
part (c)
Due to the skewness, the possibilities are the sign test,
and the transformation ploy, for which a transformation to symmetry (or very near symmetry) is done, and the
t test (or signed-rank test) is applied to the transformed data.
With the positive skewness and only positive data values, power transformations using powers less than 1
can be investigated. While a power of 0.1 results in a sample skewness near zero for the
transformed data, probit and symmetry plots suggest that the transformed distribution is not symmetric.
(It's possible for a distribution to have a skewness of zero and at the same time not be symmetric.)
Therefore the transformation ploy should not be used, even though the resulting p-value is smaller than
the one obtained with the sign test.
It seems best to just rely on the exact result from the
sign test and report a
p-value of
0.00016.
part (d)
Due to the skewness, the interval associated with the sign test can be considered, as can an interval resulting from
the transformation to symmetry ploy.
With the positive skewness and only positive data values, power transformations using powers less than 1
can be investigated. While a power of 0.1 results in a sample skewness near zero for the
transformed data, probit and symmetry plots suggest that the transformed distribution is not symmetric.
(It's possible for a distribution to have a skewness of zero and at the same time not be symmetric.)
Therefore the transformation ploy should not be used.
I'll go with the
interval based on the sign test.
The resulting confidence interval is
(14, 62).
Some other intervals are:
- (37, 82) (Student's t procedure),
- (28, 66) (signed-rank procedure).
part (e)
Due to the strong positive skewness, the smallish sample size, and the
fact that the estimand is a rather extreme quantile in the
stretched-out right tail of the distribution,
E1
is the best choice, and the estimate is
120.
(Note: The skewness is almost as strong as that of an exponential
distribution, and my studies indicate that for a sample of size 25,
E1 is appreciably better than E9 for estimating the 90th percentile
of an exponential distribution (and E1 is also a bit better for a sample size
as large as 50). So E1 seems to be the clear choice here.)
For purpose of comparison (and for grading purposes),
some other estimates are:
- 120 (E6),
- 120 (E2),
- 183 (E9),
- 173 (E8),
- 187 (estimator on p. 115 of class notes),
- 183 (estimator on pp. 3-4 of handout from 7th lecture),
- 215 (E4).
Problem 2
A probit plot and a symmetry plot, along with the sample skewness, suggest the possibility of somewhat mild negative skewness,
but the evidence for skewness is not conclusive. Still, given the possibility of skewness, I think it may be best to choose
Johnson's modified t procedure, although in the end we can see that the same interval results from using
the ordinary t interval.
The resulting confidence interval is
(-2.3×10³, 3.2×10³). (Since the estimated standard error of the sample mean
is 925, a rule of thumb suggests that the confidence bounds should be expressed using one additional significant digit. However,
one can note that the data values seem to have been rounded to the nearest hundred, and so in this case it doesn't seem
appropriate to express additional accuracy in the interval estimate.)
Some other intervals are:
-
(-2.3×10³, 3.2×10³)
(Student's t procedure),
-
(-2.5×10³, 3.2×10³)
(signed-rank procedure),
-
(-1.6×10³, 3.9×10³)
(sign procedure (approximate --- based on nonlinear interpolation)).
Problem 3
A symmetry plot and probit plot suggests that the distribution is symmetric, or nearly symmetric, and
a sample skewness of about 0.06 supports this conclusion.
The probit plot also suggests that the underlying distribution is slightly light-tailed.
part (a)
Given the appearance of a slightly light-tailed symmetric distribution,
Student's t interval should work fine.
The resulting confidence interval is
(3.82, 4.30). (Since the estimated standard error of the sample mean
is 0.0889, a rule of thumb suggests that the confidence bounds should be expressed using one additional significant digit. However,
one can note that the data values seem to have been rounded to the nearest hundredth, and so in this case it doesn't seem
appropriate to express additional accuracy in the interval estimate.)
Some other intervals are:
-
(3.82, 4.30)
(Johnson's modified t procedure),
-
(3.80, 4.32)
(signed-rank procedure),
-
(3.69, 4.46)
(sign procedure (approximate --- based on nonlinear interpolation)).
part (b)
Since the sample size is not too small, and there is very little or no
skewness,
E9
is the best choice, and the estimate is
4.98.
(Note:
My studies indicate that even for a sample of size as small as 25,
E9 is appreciably better than E1 for estimating the 90th percentile
of a normal distribution.)
For purpose of comparison (and for grading purposes),
some other estimates are:
- 4.98 (E8),
- 4.99 (estimator on pp. 3-4 of handout from 7th lecture),
- 4.99 (estimator on p. 115 of class notes),
- 4.93 (E1),
- 4.90 (E2),
- 4.90 (E6),
- 5.02 (E4).
Problem 4
There are signs of heavy tails, and possibly skewness.
(Recall, it's harder to judge the skewness/symmetry of a heavy-tailed
distribution --- the wildness due to the heavy tails of a symmetric
distribution can create an appearance of mild skewness.)
Since we can't be sure about the symmetry/skewness issue, the safe thing
to do is to use a Huber M-estimate since it should be decent whether the
underlying distribution is mildly skewed or symmetric.
(When heavy tails are the dominant feature, it really doesn't matter much if the
distribution is perfectly symmetric or a little bit skewed.)
Since the tail weight doesn't seem to be only slightly greater than that
of a normal distribution (based on an inspection of several Q-Q plots, as well
as the sample kurtosis), we should avoid using 1.5 as a bend (since
that choice would be appropriate for something only slightly
heavy-tailed, like a logistic distribution), and instead use a bend of
1.345, or even perhaps 1.2. A 20% trimmed mean could also be a decent
choice.
Among the trimmed means, the ones having 15%, 20%, and 25% trimming
have the lowest estimated standard errors (although with such a small
sample size we shouldn't take the estimated standard errors too
seriously). Also, these give the same value, 0.43, as the two M-estimates when
rounded to two significant digits.
(It's nice when several of the top candidates all give the same estimate.)
It should be noted that trimming just 10% is too little since various Q-Q plots
suggest more than just a slightly heavy-tailed distribution (and it should be
recalled that the sample kurtosis based on only 20 observations may not be
too accurate, and in this case seems a bit low (based on the tail wieght indicated by the
Q-Q plots)).
Various estimates are:
- 0.41 (10% upper trimmed mean (trimming 2 from upper end)),
- 0.43 (5% upper trimmed mean (trimming 1 from upper end)),
- 0.45 (sample mean),
-
0.45 (5% trimmed mean (trimming 1 from each end)),
-
0.44 (10% trimmed mean (trimming 2 from each end)),
-
0.43 (15% trimmed mean (trimming 3 from each end)),
-
0.43 (20% trimmed mean (trimming 4 from each end)),
-
0.43 (one-step Huber M-estimate using bend of 1.345),
-
0.43 (one-step Huber M-estimate using bend of 1.2),
-
0.44 (one-step Huber M-estimate using bend of 1.5),
-
0.43 (25% trimmed mean (trimming 5 from each end)),
-
0.44 (30% trimmed mean (trimming 6 from each end)),
-
0.44 (35% trimmed mean (trimming 7 from each end)),
-
0.44 (sample median),
-
0.44 (Harrell-Davis estimate),
-
0.44 (Hodges-Lehmann estimate).
(It should be noted that the results of the studies that I've done (and presented to you)
make it pretty clear that when the sample size is smallish, the sample
median is typically not a good choice, and the Harrell-Davis estimator is also
generally inferior to several other choices.)
(FYI, the
value of MADN is about 0.1038.)
Problem 5
A probit plot suggests negative skewness.
The skewness is also suggested by a symmetry plot, and
a sample skewness of about -0.6 also provides
evidence of mild negative skewness.
part (a)
Since the desired test is about the mean of an apparently skewed
distribution,
Johnson's modifed t test is the clear choice, and the
desired p-value is about
0.01 (rounded from 0.012).
The other tests are not valid.
For grading purposes,
some other p-values are:
- 0.020 (Student's t test),
- 0.052 (signed-rank test (Minitab's normal approx. using midranks)),
- 0.25 (sign test).
part (b)
Due to the skewness, the possibilities are the sign test,
and the transformation ploy, for which a transformation to symmetry (or very near symmetry) is done, and the
t test (or signed-rank test) is applied to the transformed data.
With the negative skewness and only positive data values, power transformations using powers greater than 1
can be investigated, but no simple power transformation will do much to correct the skewness. However, the
transformation y = (x - 9.55)**2.45 does pretty good,
as do the transformations
y = (x - 9.5)**2.7 and
y = (x - 9.4)**3.25.
All of these transformations result in a p-value of about 0.06 when a t test is done.
Even though none of these transformations may be perfect, the fact that they are all pretty good and
result in the same p-value (when rounded appropriately), suggests that some similar transformation
which does achieve perfect symmetry will result in about the same p-value. (That is, the tansformation
technique appears to be pretty robust in this setting.) So it seems best to use a transformation followed by a t
test. (Note: Since applying the transformation method is pretty tricky is this case, I didn't really expect anyone
to go this route and feel confident about doing so. Therefore, if you simply reported the p-value of 0.25 from the sign test
on the original data, and at least gave the transformation ploy
adequate consideration, I'll give you almost full credit (even though the p-value is 4 times larger than the one resulting
from the transformation ploy).