Some Notes Pertaining to Ch. 11 of E&T

Some of the material on pp. 141-142 repeats material presented in Ch. 10. (Note: I'll point this out one last time and then quit harping on it, but I think it's bad that in the first sentence of the chapter E&T have "estimating the bias and standard error of an estimate" (italics are mine). I think it's proper to refer to the bias and standard error of an estimator, since the definitions involve moments of random variables (and estimators are random variables). But estimates are numerical values (constants), and I like to refer to the bias and standard error associated with (as opposed to of) an estimate.)

The formulas for the jackknife bias and standard error estimates are similar to those for the bootstrap estimates. But upon an initital look at the jackknife formulas, it may seem as though the constants, n - 1 for the bias, and (n - 1)/n for the standard error, are wrong. The first paragraph on p. 142 gives some sort of intuitive explanation of why one might expect to need some sort of "inflation factor" when working with the jackknife deviations to arrive at an estimate of standard error. Then in the middle portion of p. 142 it is explained that the constant used in the jackknife estimate of standard error is exactly what is needed in order to make the jackknife estimate match the usual estimate in the special case of the sample mean. Problem 11.1 of E&T confirms this --- here is my solution for Problem 11.1. (Note: Near the middle of p. 142, E&T indicate that the usual estimator of the standard error of the sample mean, S/n^1/2, is unbiased, but this is incorrect. S²/n is unbiased for the variance of the sample mean, but S/n^1/2 is not unbiased for the standard error.)

The constant in the jackknife estimate of bias is what is needed in order to obtain the desired result for the biased plug-in estimator of variance. The jackknife bias formula also gives the right result for the sample mean. Problem 11.8 and Problem 11.7 confirm these facts --- here are my solutions for Problem 11.8 and Problem 11.7.

Note: In the interest of saving time for other material, I'll skip discussing Sec. 11.3 in class. The jackknife example from Ch. 10 that I covered and presented R code for should be adequate for now as an application of jackknifing.

Similarly, I'm not going to say much about Sec. 11.4. E&T don't emphasize pseudo-values, and they state that "it is not clear whether they are a useful way of thinking about the jackknife." I suppose that if one was trying to develop a formula for a standard error estimator (as opposed to being content to just get a numerical value for the estimate using software), then using (11.14) and (11.15) would give you a way to check you work if you were going to use (11.5) to obtain the desired estimator.

Sec. 11.5 deals with performance comparisons of the jackknife and bootstrap estimators of bias and standard error. In summary, it can be said that there are some situations where it makes little difference which is used (provided that B is chosen to be large enough for the bootstrap estimators), and there are other situations in which the bootstrap estimators are clearly superior. There are no situations in which the jackknife estimators are clearly superior, although in situations where it makes little difference whether one does jackknifing or bootstrapping, some might prefer jackknifing due to the fact that it doesn't make use of a random number generator. (Note: To dislike bootstrapping because of it's random element is perhaps a bit silly. We're dealing with a random sample of observations to start with --- there is typically nothing "golden" about the particular data values at hand --- and if B is chosen to be large enough the added noise factor due to the use of random numbers in bootstrapping is dwarfed by the noise that we face when we deal with a random sample of observations.)

To give more details, one can view the jackknife estimate of standard error as an approximation to the bootstrap estimate of standard error. (Some of the problems at the end of Ch. 11 address this fact, but I won't take time to present any information about those problems.) The approximation is real good when the statistic is a linear statistic of the form given in (11.17) on p. 146. (The sample mean and the sample second moment are examples of linear statistics.) When used with linear statistics, jackknifing and bootstrapping perform similarly. But with nonlinear statistics the jackknife estimator does not perform as well as the bootstrap estimator, and with highly nonlinear statistics (statistics for which a 1st-order Taylor series approximation of the statistic is poor) the difference in performance quality can be rather large. (Examples of nonlinear statistics are the sample variance, the sample skewness, the sample correlation coefficient, and the inverse of the sample mean (i.e., 1 over the sample mean).) Fig. 11.3 on p. 147 of E&T shows how the jackknife and bootstrap estimators of standard error perform similarly for a linear statistic, but the bootstrap estimator performs better (it has a smaller standard error) for the nonlinear sample correlation statistic.

Similarly, one can view the jackknife estimate of bias as an approximation to the bootstrap estimate of bias. (Some of the problems at the end of Ch. 11 address this fact, but I won't take time to present any information about those problems.) The approximation is real good when the statistic is a quadratic statistic of the form given in (11.18) on p. 147. (The sample variance is an example of a quadratic statistic.) When used with quadratic statistics, jackknifing and bootstrapping perform similarly. But with many other statistics the jackknife estimator does not perform as well as the bootstrap estimator. (Note: Many linear statistics are clearly unbiased (e.g., the sample mean and sample second moment) and therefore estimators of bias may not even be of interest for many linear statistics.)

Sec. 11.6 deals with an example of jackknife failure: For the sample median, the jacknife estimator of standard error does not perform well, even for large values of n.

The sample median isn't a smooth function of the observations, and the jackknife should not be used with unsmooth estimators. On p. 148 E&T have that "the idea of smoothness is that small changes in the data set cause only small changes in the statistic." They go on to describe how the sample median changes as one of the sample observation changes. Since the description corresponds to a continous function, which means that there are no jump discontinuities, one may wonder why this doesn't satisfy the idea of smoothness as previously given by E&T. To address this, I would say that E&T's "idea of smoothness" isn't good, since it does seem like any continuous function would be smooth. Although not everyone seems to agree on a precise definition of smoothness (what a shock), it does seem to be agreed upon that continuity isn't enough to make a function a smooth function. A smooth function doesn't have sharp bends in it (like what is described for the median on p. 148). At the very least we need a continuous first derivative for a function to be considered to be a smooth function, so that changes in the function in a sense occur gradually. So instead of a sharp change in slope there must be a gradual ramping up or ramping down. (Some books make state that a smooth function must be twice differentiable, which is a stronger requirement than that it have a continuous first derivative.) To sum it up another way, even though the function described for the median is continuous, the rate of change of the function (it's first derivative) is not continuous --- the rate of change changes abruptly. (The first derivative is 0, then it is a positive constant over an interval, and then it is 0 again. (Note: The derivative doesn't exist at the points corresponding to the two bends.)) Ch. 10 addressed smoothness in terms of the resampling vector --- one needs the statistic, expressed in terms of the resampling vector, to be twice differentiable with respect to each component of the resampling vector. With the sample median, we don't even have continuity (there are jump discontinuities), and thus we certainly don't have the function being twice differentiable.

The last two sentences of Sec. 11.6 are a bit screwy, since E&T claim that the jackknife estimator of the standard error of the sample median is not consistent, which would mean that it fails to converge to the true standard error as n increases to infinity. But since the true standard error tends to 0 as n tends to infinity, and so does the jackknife estimate of standard error for the sample median, something is clearly wrong --- for one thing, since the true standard error is a function of n, you cannot let n go to infinity for the jackknife estimate and possibly obtain the true standard error. It would be okay if they had put that the jackknife estimator of standard error multiplied by the square root of n does not converge to the correct value.

Sec. 11.7 deals with a generalization of the common form of jackknifing. Although the alternative version of jackknifing can work appreciably better than the common form of jackknifing in situations where the common form does not work well, the alternative form of jackknifing is more computationally intensive, and it doesn't necessarily work better than bootstrapping. Since it's not clearly better than bootstrapping and it's not commonly used, I won't emphasize this alternative form of jackknifing.