Some Notes Pertaining to Ch. 10 of E&T
This chapter focuses on bias. While typically not as important as standard error, there are times when one
may want to estimate it.
Two methods of using bootstrapping to estimate bias will be covered, with the simple straightforward method being
inferior to the other method (which is covered in more detail in Ch. 23). Also the jackknife estimate of bias
will be covered.
The bias of an estimator is given by (10.1) on p. 124, and the (ideal) bootstrap estimate of bias is given by
(10.2) on p. 125 --- it's just the plug-in estimate of bias. (i.e., the empirical distribution takes the place of the
unknown underlying distribution in (10.1)). The plug-in estimate of θ takes the place of θ (which is
unknown) in 10.1, and the expectation is with respect to the known empirical distribution instead of the unknown
(real world) distribution. Except for a few easy cases, the expected value in (10.2) has to be estimated using the
sample mean of the bootstrap replicates of the estimate of θ.
Before continuing in E&T, I encourage you to read over this
description of the bootstrap estimate of bias that I prepared for a special
topics class in the summer of 2002. Even though it is short and simple, I think it may give some of you some
additional insight about the procedure of using bootstrapping to estimate bias.
Here is some of the data in Table 10.1 on p. 127 of E&T.
This R code can be used to obtain the estimate given by (10.11) on p. 128 of E&T, and produce results
similar to those given by (10.13) and Fig. 10.1.
From the result given in (10.14) it follows that if the magnitude of the bias is no more than 1/4 the value of the
standard error, then the root mean square error (RMSE) is no more than about 3.1% greater than the standard error,
and so if the bias is rather small compared to the standard error, it's contribution to the RMSE is perhaps of small
importance. (We also have that
if the magnitude of the bias is no more than 1/4 the value of the
standard error, then the mean square error (MSE) is no more than 6.25% greater than the standard error.)
The approximate equality in (10.14) is due to a Maclaurin series approximation. (I'll try to remember to go over it
in class.)
The results given on the top portion of p. 130 suggest that unless the estimated standard error divided by the square
root of B is small relative to the estimate of the bias, then B may be too small to allow us to treat
the estimate of the bias as being reasonably accurate. E&T point out that while 400 is typically more than enough
bootstrap replicates if an estimate of the standard error is desired, it may be that many more replicates are needed
to estimate the bias accurately. (Note: If the improved bootstrap estimate of bias described in Sec. 10.4 is used
instead of the simple bootstrap estimate of bias described in Sec. 10.2, then 400 (or even fewer)
replicates may be adequate to
estimate the bias.)
Sec. 10.4 describes a better bootstrap estimate of bias. (An explanation of why it works better is delayed until
Ch. 23.) The improved bootstrap estimate of bias can only be used when the estimator is the plug-in estimator of the
estimand.
The improved estimate of bias makes use of the resampling vectors. A
resampling vector gives the proportions of the various members of the original sample in a bootstrap sample. (See
(10.17) and (10.18) on the bottom portion of p. 130.)
In comparing the simple estimate given by (10.24) to the improved estimate given by (10.25), it can be seen
that the plug-in estimate of θ that is in the simple estimate is replaced by something else in the improved
estimate. While the plug-in estimate is exactly what is prescribed by the ideal estimate (given by (10.2)), the
substitution used in the better estimate can be thought of as compensating for the fact that the Monte Carlo estimate
of the expected value in the ideal estimate may not exactly equal the desired expected value. In the better bootstrap
estimate, the substitution for the plug-in estimate is in a way more compatible with the estimated expected value
than the plug-in estimate is, and this results in a better estimate of bias. (In the simple estimate, if
the average of the observed resampling vectors is not equal to the vector given in (10.21), which corresponds to the
empirical distribution, then the estimated expected value corresponds more closely to a distribution not equal to the empirical
distribution. So we don't have a perfect match with regard to the two terms of the simple estimate. In the improved
estimate, we change the second term of the simple estimate to make it more compatible with the first term of the simple
estimate. (Note: The first term is the same for both bootstrap estimates of bias.))
For an example of the improved estimator performing better, we can consider the sample mean as an estimator for the
distribution mean. The sample mean is unbiased, but the simple bootstrap estimator of bias will typically not yield
an estimate of 0. However, the improved bootstrap estimator will yield 0 as the estimated bias every time.
This R code also gives results similar to those given on p. 132, and given in Figure 10.2 on p. 133.
Sec. 10.5 introduces the jackknife estimates of bias and standard error. Basically just the formulas for the
estimates are given in Ch. 10, and more details are supplied in Ch. 11.
The ith jackknife sample is given by (10.28) on p. 133. (Note: Although the notation is like order
statistics notation, since order statistics don't make sense for a vector, there should be no confusion --- you just
have to get used to the notation.)
The ith jackknife replicate is given by (10.29) on p. 134. It's just the statistic of interest
evaluated with the sample being
the ith jackknife sample.
The jackknife estimate of bias is given by (10.30) and (10.31) on p. 134. It only applies to estimates of
bias for plug-in estimators, and only for smooth statistics. (Note: The concept of smoothness is
addressed below.) The computation of the jackknife estimate of bias is faster than the computation of a bootstrap
estimate of bias, since for the jackknife estimate only n additional evaluations of the statistic of interest
are needed, whereas for the improved bootstrap estimate, at least 200 are needed, and for the simple bootstrap
estimate many times that number are needed. (Note: The simple bootstrap estimate of bias is the only one which does
not require that the estimator be a plug-in estimator.) Another nice feature of the jackknife estimate is that no
randomness is involved.
An example of a statistic which is not smooth
is the sample median, and an example of a smooth statistic is the sample mean.
In Ch. 11, E&T address smoothness by considering what happens to the value of a statistic as one data value (or the
sample in general) is changed gradually. (I don't like part of their description of smoothness in Ch. 11 --- see
my Ch. 11 web page for what I don't like.)
In Ch. 10 they address smoothness by considering the statistic t(P*). (Note: It doesn't
make sense that they use a lower-case t everywhere except for the one place where they have T.)
We can express the
sample mean as
P1* x1 +
P2* x2 +
... + Pn* xn.
If we change the
Pi*
gradually, the sample mean changes gradually. But for the sample median this is not the case --- gradual changes in
the
Pi*
can result in abrupt changes in the value of the sample median. For example, suppose that we have that the sum of
the
Pi*
values corresponding to the first 4 order statistics of the original sample is slightly less than 0.5, and the
sum of the
Pi*
values corresponding to the first 5 order statistics of the original sample is slightly greater than 0.5, making the
5th order statistic of the original sample the sample median of a bootstrap sample corresponding to
P*. But if we change two of the
Pi*
values slightly so that the sum of the
Pi*
values corresponding to the first 4 order statistics of the original sample becomes slightly greater than 0.5,
the sample median of the bootstrap sample corresponding to
P* changes abruptly from the value of the 5th order statistic of the original sample to the value
of the 4th order statistic of the original sample.
E&T state that (10.20) must be "twice differentiable" --- this means that we
can differentiate (10.20) with respect to any of the Pi* twice.
The sample mean of a bootstrap sample can be expressed as
P1* x1 +
P2* x2 +
... + Pn* xn.
We can differentiate this with respect to say
P2*
and obtain
x2.
Differentiating again with respect to
P2*
gives us 0. The sample mean, as expressed above, is twice differentiable with respect to any of the
Pi*.
The sample median, expressed in terms of the
Pi*,
is more difficult to investigate. But it can be shown that the sample median is not a continuous function of the
Pi* (there are jump discontinuities),
and is therefore not everywhere differentiable (and so it's certainly not twice differentiable).
This R code can be used to obtain the jackknife estimate of bias given by (10.32) on p. 134.
E&T point out that it is no surprise that it is rather close to the ideal bootstrap estimate of bias (which
was estimated using B = 100,000). This is due to the fact (shown in Ch. 20) that the jackknife estimate is a
quadratic Taylor series approximation of the ideal bootstrap estimate (aka the plug-in estimate).
All three of the estimates of bias presented in Ch. 10 (the two bootstrap estimates, and the jackknife estimate) are
attempts to approximate the ideal estimate. It should be kept in mind that the ideal estimate isn't perfect. By
letting B tend to infinity, the variability due to Monte Carlo sampling is eliminated, but the inaccuracy due
to the fact that the empirical distribution is just an estimate of the true underlying distribution (used to
determine the true bias) doesn't go away as B increases. That is, as in most every case dealing with
estimation, the sample size limits our ability to always produce an estimate as close as we want to the estimand.
The bottom 60% of p. 135 and pp. 136-137 of E&T may be a bit hard to follow. The second new paragraph on p. 136
suggests that one could investigate the variability of the bootstrap estimate of bias by using bootstrapping to
estimate the standard error of the bootstrap estimate. But this would involve double bootstrapping (aka
nested bootstrapping) --- for each
bootstrap replicate of the bias estimate needed for the standard error estimate, bootstrapping would have to be done.
Thus we'd have a bootstrapping routine nested within a bootstrapping routine, and that requires a lot of
computation.
(To perhaps make this clearer, I'll expand on the explanation. To obtain a bootstrap estimate of the standard
error of the bootstrap estimator of bias, we need B bootstrap replicates of the estimate of bias. So we need
B bootstrap samples drawn from the original data. From each of these B bootstrap samples, we need a
bootstrap estimate of bias. But this involves bootstrapping. So C bootstrap samples are drawn from each of
the B bootstrap samples. So altogether, B + BC = B(1 + C) bootstrap
samples are drawn, and computations have to be done using each one of this large number of bootstrap samples.)
So E&T decided that they'd investigate the variability of the jackknife estimator of bias since doing so
would not involve double bootstrapping, but one can still get an idea about the amount of variability in bias
estimators due to the fact that we have to use a limited amount of data to estimate aspects of an unknown
distribution.
Even though the jackknife estimate of bias is more complicated than many of the estimates considered thus far in E&T,
upon creating B = 50 bootstrap samples from the original data, 50 replicates of the jackknife estimate of bias
(of the ratio estimator corresponding to (10.10) on p. 128) can be computed, and a bootstrap estimate of the standard
error of the jackknife estimator of bias can be obtained. E&T got the estimated standard error to be about 0.0081,
which is larger than their original jackknife estimate of 0.0080. Thus it may be that the standard deviation of the
estimator is larger than the value being estimated, which doesn't give us much confidence about the accuracy of the
original jackknife point estimate of bias. (Note: This shouldn't be taken as a suggestion that the jackknife
method for estimating bias is defective --- rather, one should conclude that it just doesn't work so well in this
very small sample size (n = 8) situation.)
Despite the variablity of the jackknife estimator of bias, E&T point out on p. 136 that the exercise to estimate the
variability of the jackknife estimator of bias still provides some useful information. Even though we shouldn't be highly
confident that the bias of the ratio estimator is close to 0.0080, the bootstrap replicates of the bias estimate do
suggest that it may be reasonably safe to assume that the bias of the ratio estimator is no more than 0.025. Even
though 0.025 is more than three times the original estimate of bias, it is still considerably less than the estimated
standard error of the ratio estimator, which is 0.105 (a value obtained by bootstrapping and given right below
(10.12) on p. 128.) So, using (10.14) on p. 128, it can be concluded that the bias of the ratio estimator is
somewhat negligible compared to the standard error of the estimator. This means that it isn't so important that we
cannot estimate the bias with high accuracy.
The conclusion above is predicated on the assumption that the bootstrap estimate of the standard error of the ratio
estimator is reasonably accurate. To check on the accuracy of the bootstrap estimator of standard error which
supplied the estimate of 0.105 used above, one could obtain a bootstrap estimate of the standard error of the
bootstrap estimator of the standard error of the ratio estimator. But this would involve a double bootstrapping
procedure. So E&T opted to estimate the standard error of the jackknife estimator of the standard error of the ratio
estimator instead (since one could then use the jackknife estimator of standard error to learn something about the
value of the standard error of interest and the uncertainty associated with an estimate of it).
The jackknife estimate of standard error is given by (10.34) on p. 136. (An explanation of the jackknife
estimate of standard error is delayed until Ch. 11, and some additional information will be given in later chapters.)
The jackknife estimate of standard error should be used only with smooth estimators (as is the case with the
jackknife estimate of bias). In some situations (more on this later), the jackknife estimator of standard error will
be clearly inferior to the bootstrap estimators of standard error.
This R code can also be used to obtain the jackknife estimate of standard error given by (10.35) on p. 137.
E&T used 200 bootstrap replicates of the jackknife estimate of the standard error of the ratio estimator to estimate
the standard error of the jackknife estimator of the standard error of the ratio estimator. They found that in a
relative sense, there is considerably less uncertainty associated with jackknife standard error estimator in this
case, and this is generally true for estimating standard error and bias in other settings.
Near the end of Sec. 10.5, on the bottom part of p. 137, E&T consider an alternative type of bias, median
bias, given by (10.37). Basically, a median just replaces an expected value in the usual definition of bias.
The median bias of an estimator can be estimated using bootstrapping in a very obvious way. However, median bias is
not used a lot. Nevertheless, since the estimate of median bias can be computed with little extra work once the
replicates are obtained to produce a bootstrap estimate of (ordinary) bias,
this R code
can be used to estimate the median bias of the ratio estimator considered in Sec. 10.5.
Sec. 10.6 considers the dangerous practice of using an estimator of bias to develop a bias-corrected estimator.
The obvious form of a bias-corrected estimator is given by (10.40) on p. 138. If one uses the simple bootstrap
estimator of bias, (10.40) is equal to (10.41). Although bias-correction may seem like a good idea, the addition of
the bias correction term to an estimator can result in an increase in variance that is not offset by the reduction
in bias. (One way to check on this would be to use bootstrapping to obtain estimates of the standard errors of the
bias-corrected estimator, and the uncorrected estimator.)
I've never had much success with bias-corrected estimators, but there are some situations in which they perform well.
On p. 138, E&T suggest that bias correction may work okay with the ratio estimator from Sec. 10.5. Also, for
prediction error estimation, bias correction can be useful, since some estimators of prediction error have a bias
which is large relative to the standard error. (This last topic is considered in Ch. 17 of E&T.)