Some Notes Pertaining to Ch. 11 of E&T
Some of the material on pp. 141-142 repeats material presented in Ch. 10. (Note: I'll point this out one last time
and then quit harping on it, but I think it's bad that in the first sentence of the chapter E&T have "estimating the
bias and standard error of an estimate" (italics are mine). I think it's proper to refer to the bias and
standard error of an estimator, since the definitions involve moments of random variables (and estimators are random
variables). But estimates are numerical values (constants), and I like to refer to the bias and standard error
associated with (as opposed to of) an estimate.)
The formulas for the jackknife bias and standard error estimates are similar to those for the bootstrap estimates.
But upon an initital look at the jackknife formulas, it may seem as though the constants, n - 1 for the bias,
and (n - 1)/n for the standard error, are wrong. The first paragraph on p. 142 gives some sort of
intuitive explanation of why one might expect to need some sort of "inflation factor" when working with the jackknife
deviations to arrive at an estimate of standard error. Then in the middle portion of p. 142 it is explained that the
constant used in the jackknife estimate of standard error is exactly what is needed in order to make the jackknife
estimate match the usual estimate in the special case of the sample mean. Problem 11.1 of E&T confirms this
--- here is
my solution for Problem 11.1. (Note: Near the middle of p. 142, E&T indicate
that the usual estimator of the standard error of the sample mean, S/n1/2, is unbiased, but this is
incorrect.
S2/n is unbiased for the variance of the sample mean, but
S/n1/2 is not unbiased for the standard error.)
The constant in the jackknife estimate of bias is what is needed in order to obtain the desired result
for the biased plug-in estimator of variance. The jackknife bias formula also gives the right result for the sample
mean.
Problem 11.8 and Problem 11.7 confirm these facts
--- here are
my solutions for Problem 11.8 and Problem 11.7.
Note: In the interest of saving time for other material, I'll skip discussing Sec. 11.3 in class. The jackknife
example from Ch. 10 that I covered and presented R code for should be adequate for now as an application of jackknifing.
Similarly, I'm not going to say much about Sec. 11.4. E&T don't emphasize pseudo-values, and they state that
"it is not clear whether they are a useful way of thinking about the jackknife." I suppose that if one was trying to
develop a formula for a standard error estimator (as opposed to being content to just get a numerical value for the
estimate using software), then using (11.14) and (11.15) would give you a way to check you work if you were going to
use (11.5) to obtain the desired estimator.
Sec. 11.5 deals with performance comparisons of the jackknife and bootstrap estimators of bias and standard error.
In summary, it can be said that there are some situations where it makes little difference which is used (provided
that B is chosen to be large enough for the bootstrap estimators), and there are other situations in which the
bootstrap estimators are clearly superior. There are no situations in which the jackknife estimators are clearly
superior, although in situations where it makes little difference whether one does jackknifing or bootstrapping,
some might prefer jackknifing due to the fact that it doesn't make use of a random number generator.
(Note: To dislike bootstrapping because of it's random element is perhaps a bit silly. We're dealing with a random
sample of observations to start with --- there is typically nothing "golden" about the particular data values at hand
--- and if B is chosen to be large enough the added noise factor due to the use of random numbers in
bootstrapping is dwarfed by the noise that we face when we deal with a random sample of observations.)
To give more details, one can view the jackknife estimate of standard error as an approximation to the bootstrap
estimate of standard error. (Some of the problems at the end of Ch. 11 address this fact, but I won't take time to
present any information about those problems.) The approximation is real good when the statistic is a linear
statistic of the form given in (11.17) on p. 146. (The sample mean and the sample second moment are examples of
linear statistics.) When used with linear statistics, jackknifing and bootstrapping perform similarly. But with
nonlinear statistics the jackknife estimator does not perform as well as the bootstrap estimator, and with highly
nonlinear statistics (statistics for which a 1st-order Taylor series approximation of the statistic is poor) the
difference in performance quality can be rather large. (Examples of nonlinear statistics are the sample variance,
the sample skewness, the sample correlation coefficient, and the inverse of the sample mean (i.e., 1 over the sample
mean).)
Fig. 11.3 on p. 147 of E&T shows how the jackknife and bootstrap estimators of standard error perform similarly for
a linear statistic, but the bootstrap estimator performs better (it has a smaller standard error) for the nonlinear
sample correlation statistic.
Similarly, one can view the jackknife estimate of bias as an approximation to the bootstrap
estimate of bias. (Some of the problems at the end of Ch. 11 address this fact, but I won't take time to
present any information about those problems.) The approximation is real good when the statistic is a quadratic
statistic of the form given in (11.18) on p. 147. (The sample variance is an example of a quadratic
statistic.) When used with quadratic statistics, jackknifing and bootstrapping perform similarly. But with
many other statistics the jackknife estimator does not perform as well as the bootstrap estimator.
(Note: Many linear statistics are clearly unbiased (e.g., the sample mean and sample second moment) and therefore estimators
of bias may not even be of interest for many linear statistics.)
Sec. 11.6 deals with an example of jackknife failure: For the sample median, the jacknife estimator of standard error
does not perform well, even for large values of n.
The sample median isn't a smooth function of the observations, and the jackknife should not be used with unsmooth
estimators. On p. 148 E&T have that "the idea of smoothness is that small changes in the data set cause only
small changes in the statistic." They go on to describe how the sample median changes as one of the sample
observation changes. Since the description corresponds to a continous function, which means that there are no jump
discontinuities, one may wonder why this doesn't satisfy the idea of smoothness as previously given by E&T. To
address this, I would say that E&T's "idea of smoothness" isn't good, since it does seem like any continuous function
would be smooth. Although not everyone seems to agree on a precise definition of smoothness (what a shock),
it does seem to be agreed upon that continuity isn't enough to make a function a smooth function. A smooth function
doesn't have sharp bends in it (like what is described for the median on p. 148). At the very least we need a
continuous first derivative for a function to be considered to be a smooth function, so that changes in the function
in a sense occur gradually. So instead of a sharp change in slope there must be a gradual ramping up or ramping
down. (Some books make state that a smooth function must be twice differentiable, which is a stronger requirement
than that it have a continuous first derivative.) To sum it up another way, even though the function described for
the median is continuous, the rate of change of the function (it's first derivative) is not continuous --- the rate
of change changes abruptly. (The first derivative is 0, then it is a positive constant over an interval, and then it
is 0 again. (Note: The derivative doesn't exist at the points corresponding to the two bends.)) Ch. 10 addressed
smoothness in terms of the resampling vector --- one needs the statistic, expressed in terms of the resampling vector,
to be twice differentiable with respect to each component of the resampling vector. With the sample median, we don't
even have continuity (there are jump discontinuities), and thus we certainly don't have the function being twice
differentiable.
The last two sentences of Sec. 11.6 are a bit screwy, since E&T claim that the jackknife estimator of the standard error
of the sample median
is not consistent, which would mean that it fails to converge to the true standard error as n increases to infinity.
But since the true standard error tends to 0 as n tends to
infinity, and so does the jackknife estimate of standard error for the sample median, something is clearly wrong
--- for one thing, since the true standard error is a function of n, you cannot let n go to infinity
for the jackknife estimate and possibly obtain the true standard error.
It would be okay if they had put that the jackknife estimator of standard error multiplied by the square root of
n does not converge to the correct value.
Sec. 11.7 deals with a generalization of the common form of jackknifing. Although the alternative version of
jackknifing can work appreciably better than the common form of jackknifing in situations where the common form
does not work well, the alternative form of jackknifing is more computationally intensive, and it doesn't
necessarily work better than bootstrapping. Since it's not clearly better than bootstrapping and it's not commonly
used, I won't emphasize this alternative form of jackknifing.