Some Notes Pertaining to Ch. 10 of E&T

This chapter focuses on bias. While typically not as important as standard error, there are times when one may want to estimate it.

Two methods of using bootstrapping to estimate bias will be covered, with the simple straightforward method being inferior to the other method (which is covered in more detail in Ch. 23). Also the jackknife estimate of bias will be covered.

The bias of an estimator is given by (10.1) on p. 124, and the (ideal) bootstrap estimate of bias is given by (10.2) on p. 125 --- it's just the plug-in estimate of bias. (i.e., the empirical distribution takes the place of the unknown underlying distribution in (10.1)). The plug-in estimate of θ takes the place of θ (which is unknown) in 10.1, and the expectation is with respect to the known empirical distribution instead of the unknown (real world) distribution. Except for a few easy cases, the expected value in (10.2) has to be estimated using the sample mean of the bootstrap replicates of the estimate of θ.

Before continuing in E&T, I encourage you to read over this description of the bootstrap estimate of bias that I prepared for a special topics class in the summer of 2002. Even though it is short and simple, I think it may give some of you some additional insight about the procedure of using bootstrapping to estimate bias.

Here is some of the data in Table 10.1 on p. 127 of E&T. This R code can be used to obtain the estimate given by (10.11) on p. 128 of E&T, and produce results similar to those given by (10.13) and Fig. 10.1.

From the result given in (10.14) it follows that if the magnitude of the bias is no more than 1/4 the value of the standard error, then the root mean square error (RMSE) is no more than about 3.1% greater than the standard error, and so if the bias is rather small compared to the standard error, it's contribution to the RMSE is perhaps of small importance. (We also have that if the magnitude of the bias is no more than 1/4 the value of the standard error, then the mean square error (MSE) is no more than 6.25% greater than the standard error.) The approximate equality in (10.14) is due to a Maclaurin series approximation. (I'll try to remember to go over it in class.)

The results given on the top portion of p. 130 suggest that unless the estimated standard error divided by the square root of B is small relative to the estimate of the bias, then B may be too small to allow us to treat the estimate of the bias as being reasonably accurate. E&T point out that while 400 is typically more than enough bootstrap replicates if an estimate of the standard error is desired, it may be that many more replicates are needed to estimate the bias accurately. (Note: If the improved bootstrap estimate of bias described in Sec. 10.4 is used instead of the simple bootstrap estimate of bias described in Sec. 10.2, then 400 (or even fewer) replicates may be adequate to estimate the bias.)

Sec. 10.4 describes a better bootstrap estimate of bias. (An explanation of why it works better is delayed until Ch. 23.) The improved bootstrap estimate of bias can only be used when the estimator is the plug-in estimator of the estimand.

The improved estimate of bias makes use of the resampling vectors. A resampling vector gives the proportions of the various members of the original sample in a bootstrap sample. (See (10.17) and (10.18) on the bottom portion of p. 130.)

In comparing the simple estimate given by (10.24) to the improved estimate given by (10.25), it can be seen that the plug-in estimate of θ that is in the simple estimate is replaced by something else in the improved estimate. While the plug-in estimate is exactly what is prescribed by the ideal estimate (given by (10.2)), the substitution used in the better estimate can be thought of as compensating for the fact that the Monte Carlo estimate of the expected value in the ideal estimate may not exactly equal the desired expected value. In the better bootstrap estimate, the substitution for the plug-in estimate is in a way more compatible with the estimated expected value than the plug-in estimate is, and this results in a better estimate of bias. (In the simple estimate, if the average of the observed resampling vectors is not equal to the vector given in (10.21), which corresponds to the empirical distribution, then the estimated expected value corresponds more closely to a distribution not equal to the empirical distribution. So we don't have a perfect match with regard to the two terms of the simple estimate. In the improved estimate, we change the second term of the simple estimate to make it more compatible with the first term of the simple estimate. (Note: The first term is the same for both bootstrap estimates of bias.))

For an example of the improved estimator performing better, we can consider the sample mean as an estimator for the distribution mean. The sample mean is unbiased, but the simple bootstrap estimator of bias will typically not yield an estimate of 0. However, the improved bootstrap estimator will yield 0 as the estimated bias every time.

This R code also gives results similar to those given on p. 132, and given in Figure 10.2 on p. 133.

Sec. 10.5 introduces the jackknife estimates of bias and standard error. Basically just the formulas for the estimates are given in Ch. 10, and more details are supplied in Ch. 11.

The ith jackknife sample is given by (10.28) on p. 133. (Note: Although the notation is like order statistics notation, since order statistics don't make sense for a vector, there should be no confusion --- you just have to get used to the notation.) The ith jackknife replicate is given by (10.29) on p. 134. It's just the statistic of interest evaluated with the sample being the ith jackknife sample.

The jackknife estimate of bias is given by (10.30) and (10.31) on p. 134. It only applies to estimates of bias for plug-in estimators, and only for smooth statistics. (Note: The concept of smoothness is addressed below.) The computation of the jackknife estimate of bias is faster than the computation of a bootstrap estimate of bias, since for the jackknife estimate only n additional evaluations of the statistic of interest are needed, whereas for the improved bootstrap estimate, at least 200 are needed, and for the simple bootstrap estimate many times that number are needed. (Note: The simple bootstrap estimate of bias is the only one which does not require that the estimator be a plug-in estimator.) Another nice feature of the jackknife estimate is that no randomness is involved.

An example of a statistic which is not smooth is the sample median, and an example of a smooth statistic is the sample mean. In Ch. 11, E&T address smoothness by considering what happens to the value of a statistic as one data value (or the sample in general) is changed gradually. (I don't like part of their description of smoothness in Ch. 11 --- see my Ch. 11 web page for what I don't like.) In Ch. 10 they address smoothness by considering the statistic t(P^*). (Note: It doesn't make sense that they use a lower-case t everywhere except for the one place where they have T.) We can express the sample mean as

P₁^* x₁ + P₂^* x₂ + ... + P_n^* x_n.

If we change the P_i^* gradually, the sample mean changes gradually. But for the sample median this is not the case --- gradual changes in the P_i^* can result in abrupt changes in the value of the sample median. For example, suppose that we have that the sum of the P_i^* values corresponding to the first 4 order statistics of the original sample is slightly less than 0.5, and the sum of the P_i^* values corresponding to the first 5 order statistics of the original sample is slightly greater than 0.5, making the 5th order statistic of the original sample the sample median of a bootstrap sample corresponding to P^*. But if we change two of the P_i^* values slightly so that the sum of the P_i^* values corresponding to the first 4 order statistics of the original sample becomes slightly greater than 0.5, the sample median of the bootstrap sample corresponding to P^* changes abruptly from the value of the 5th order statistic of the original sample to the value of the 4th order statistic of the original sample.

E&T state that (10.20) must be "twice differentiable" --- this means that we can differentiate (10.20) with respect to any of the P_i^* twice. The sample mean of a bootstrap sample can be expressed as

P₁^* x₁ + P₂^* x₂ + ... + P_n^* x_n.

We can differentiate this with respect to say P₂^* and obtain x₂. Differentiating again with respect to P₂^* gives us 0. The sample mean, as expressed above, is twice differentiable with respect to any of the P_i^*. The sample median, expressed in terms of the P_i^*, is more difficult to investigate. But it can be shown that the sample median is not a continuous function of the P_i^* (there are jump discontinuities), and is therefore not everywhere differentiable (and so it's certainly not twice differentiable).

This R code can be used to obtain the jackknife estimate of bias given by (10.32) on p. 134. E&T point out that it is no surprise that it is rather close to the ideal bootstrap estimate of bias (which was estimated using B = 100,000). This is due to the fact (shown in Ch. 20) that the jackknife estimate is a quadratic Taylor series approximation of the ideal bootstrap estimate (aka the plug-in estimate).

All three of the estimates of bias presented in Ch. 10 (the two bootstrap estimates, and the jackknife estimate) are attempts to approximate the ideal estimate. It should be kept in mind that the ideal estimate isn't perfect. By letting B tend to infinity, the variability due to Monte Carlo sampling is eliminated, but the inaccuracy due to the fact that the empirical distribution is just an estimate of the true underlying distribution (used to determine the true bias) doesn't go away as B increases. That is, as in most every case dealing with estimation, the sample size limits our ability to always produce an estimate as close as we want to the estimand.

The bottom 60% of p. 135 and pp. 136-137 of E&T may be a bit hard to follow. The second new paragraph on p. 136 suggests that one could investigate the variability of the bootstrap estimate of bias by using bootstrapping to estimate the standard error of the bootstrap estimate. But this would involve double bootstrapping (aka nested bootstrapping) --- for each bootstrap replicate of the bias estimate needed for the standard error estimate, bootstrapping would have to be done. Thus we'd have a bootstrapping routine nested within a bootstrapping routine, and that requires a lot of computation. (To perhaps make this clearer, I'll expand on the explanation. To obtain a bootstrap estimate of the standard error of the bootstrap estimator of bias, we need B bootstrap replicates of the estimate of bias. So we need B bootstrap samples drawn from the original data. From each of these B bootstrap samples, we need a bootstrap estimate of bias. But this involves bootstrapping. So C bootstrap samples are drawn from each of the B bootstrap samples. So altogether, B + BC = B(1 + C) bootstrap samples are drawn, and computations have to be done using each one of this large number of bootstrap samples.) So E&T decided that they'd investigate the variability of the jackknife estimator of bias since doing so would not involve double bootstrapping, but one can still get an idea about the amount of variability in bias estimators due to the fact that we have to use a limited amount of data to estimate aspects of an unknown distribution.

Even though the jackknife estimate of bias is more complicated than many of the estimates considered thus far in E&T, upon creating B = 50 bootstrap samples from the original data, 50 replicates of the jackknife estimate of bias (of the ratio estimator corresponding to (10.10) on p. 128) can be computed, and a bootstrap estimate of the standard error of the jackknife estimator of bias can be obtained. E&T got the estimated standard error to be about 0.0081, which is larger than their original jackknife estimate of 0.0080. Thus it may be that the standard deviation of the estimator is larger than the value being estimated, which doesn't give us much confidence about the accuracy of the original jackknife point estimate of bias. (Note: This shouldn't be taken as a suggestion that the jackknife method for estimating bias is defective --- rather, one should conclude that it just doesn't work so well in this very small sample size (n = 8) situation.)

Despite the variablity of the jackknife estimator of bias, E&T point out on p. 136 that the exercise to estimate the variability of the jackknife estimator of bias still provides some useful information. Even though we shouldn't be highly confident that the bias of the ratio estimator is close to 0.0080, the bootstrap replicates of the bias estimate do suggest that it may be reasonably safe to assume that the bias of the ratio estimator is no more than 0.025. Even though 0.025 is more than three times the original estimate of bias, it is still considerably less than the estimated standard error of the ratio estimator, which is 0.105 (a value obtained by bootstrapping and given right below (10.12) on p. 128.) So, using (10.14) on p. 128, it can be concluded that the bias of the ratio estimator is somewhat negligible compared to the standard error of the estimator. This means that it isn't so important that we cannot estimate the bias with high accuracy.

The conclusion above is predicated on the assumption that the bootstrap estimate of the standard error of the ratio estimator is reasonably accurate. To check on the accuracy of the bootstrap estimator of standard error which supplied the estimate of 0.105 used above, one could obtain a bootstrap estimate of the standard error of the bootstrap estimator of the standard error of the ratio estimator. But this would involve a double bootstrapping procedure. So E&T opted to estimate the standard error of the jackknife estimator of the standard error of the ratio estimator instead (since one could then use the jackknife estimator of standard error to learn something about the value of the standard error of interest and the uncertainty associated with an estimate of it).

The jackknife estimate of standard error is given by (10.34) on p. 136. (An explanation of the jackknife estimate of standard error is delayed until Ch. 11, and some additional information will be given in later chapters.) The jackknife estimate of standard error should be used only with smooth estimators (as is the case with the jackknife estimate of bias). In some situations (more on this later), the jackknife estimator of standard error will be clearly inferior to the bootstrap estimators of standard error. This R code can also be used to obtain the jackknife estimate of standard error given by (10.35) on p. 137.

E&T used 200 bootstrap replicates of the jackknife estimate of the standard error of the ratio estimator to estimate the standard error of the jackknife estimator of the standard error of the ratio estimator. They found that in a relative sense, there is considerably less uncertainty associated with jackknife standard error estimator in this case, and this is generally true for estimating standard error and bias in other settings.

Near the end of Sec. 10.5, on the bottom part of p. 137, E&T consider an alternative type of bias, median bias, given by (10.37). Basically, a median just replaces an expected value in the usual definition of bias. The median bias of an estimator can be estimated using bootstrapping in a very obvious way. However, median bias is not used a lot. Nevertheless, since the estimate of median bias can be computed with little extra work once the replicates are obtained to produce a bootstrap estimate of (ordinary) bias, this R code can be used to estimate the median bias of the ratio estimator considered in Sec. 10.5.

Sec. 10.6 considers the dangerous practice of using an estimator of bias to develop a bias-corrected estimator. The obvious form of a bias-corrected estimator is given by (10.40) on p. 138. If one uses the simple bootstrap estimator of bias, (10.40) is equal to (10.41). Although bias-correction may seem like a good idea, the addition of the bias correction term to an estimator can result in an increase in variance that is not offset by the reduction in bias. (One way to check on this would be to use bootstrapping to obtain estimates of the standard errors of the bias-corrected estimator, and the uncorrected estimator.)

I've never had much success with bias-corrected estimators, but there are some situations in which they perform well. On p. 138, E&T suggest that bias correction may work okay with the ratio estimator from Sec. 10.5. Also, for prediction error estimation, bias correction can be useful, since some estimators of prediction error have a bias which is large relative to the standard error. (This last topic is considered in Ch. 17 of E&T.)