Some Notes Pertaining to Ch. 6 of E&T
Sec. 6.1 indicates that bootstrapping may be used to estimate the standard error of an estimator no matter how
complicated the estimator is. (Note: It is assumed that we're dealing with a random sample, and it should be noted
that by random sample E&T mean that we have independent observations from the sample distribution. If the sampling
is done without replacement from a finite population and we have what is known as a simple random sample, or perhaps
something more complicated if say stratified sampling is used, then modifications may have to be made in order to use
bootstrapping effectively.)
s(x) is used to denote the specific estimator of theta being considered --- it's the estimator for
which an estimated standard error is desired. (Don't assume that s
is the sample standard deviation.) It may or may not be a plug-in estimate.
Some of the material in Sec. 6.2 repeats what has been given previously in E&T. For example, Sec. 6.2 describes
bootstrap samples, bootstrap replicates, and the bootstrap estimate of an estimator's standard error; all of which
were introduced in Ch. 2. Something new (but hinted at in Ch. 2 with the limiting values of the bootstrap estimates)
is the ideal bootstrap estimate of standard error given by (6.3). Note that it doesn't require that
bootstrap samples actually be drawn! The last paragraph of Sec. 6.2 gives a more usable expression for the ideal
estimate, and it also indicates that obtaining the ideal estimate may be impractical unless the sample size is really
small. It can be noted that the ideal estimate is the plug-in estimate. (Keep in mind that although we may not have
a formula like (5.4) on p. 40 to apply the plug-in principle to (and arrive at a tidy expression like (5.12) on p.
43), it's the case that the standard error is always the
square root of the variance of the estimator, and so we can just obtain the plug-in estimate of that variance in a
very straightforward, though somewhat clunky, manner.)
(6.7) on p. 47 gives us that the limit (as B goes to infinity) of the usual boostrap estimate of standard
error, which uses bootstrap samples that are actually drawn, is equal to the ideal estimate. (For large values of
B, the proportion of times that each possible bootstrap sample appears in the collection of the B
bootstrap samples should be close to the probabilities (the weights, the wj) given in (6.8) on p.
49.) This suggests that for B sufficiently large, the typical bootstrap estimate of standard error, based on
actual bootstrap samples, should (with high probability) be close to the ideal bootstrap estimate (which is good
since it may be impractical to obtain the ideal estimate).
This R code can be used to obtain some results similar to those
presented in Table 6.1 and on the left in Figure 6.2.
The last paragraph of Sec. 6.3 points out that since we actually have complete knowlege of the population we can
in obtain a very good Monte Carlo estimate of the true standard error of the estimator being considered. (In
principle we could obtain the exact value of the estimator's standard error.) It can be noted that the bootstrap
estimate obtained from just the sample of size 15 and using the largest value tried for B is better than the
bootstrap estimates based on smaller values of B. However, it perhaps should also have been pointed out that
this is somewhat of a coincidence --- if another sample of size 15 had been drawn and used to obtain the estimated
standard error, it may be that for even larger values of B the bootstrap estimate won't be nearly so close to
the true standard error (or the very good Monte Carlo estimate of it). With a small random sample, unless the sample
has similar characteristics to the population, the bootstrap estimate may not have great accuracy no matter how large
B is made. It may be that the bootstrap estimate is about as good as we can do, but its accuracy is usually
much more dependent on the sample size, n, than it is on the number of bootstrap samples uses, B.
Bootstrap estimators of standard error tend to have relatively little bias, and their variances decrease as B
increases. (Note: On p. 51 E&T refer to the bias and standard deviation of a standard error estimate. I
think that to be proper, we should say that an estimate, which is not a random variable, doesn't have a mean or
standard deviation. But the estimator is a random variable and we can refer to its bias and standard deviation.)
Formula (6.9) on p. 52 of E&T shows that after a certain point, increasing B won't help much, because for
B sufficiently large the contribution to the variability of the bootstrap estimator due to the part of (6.9) that
depends on B is somewhat negligible compared to the part that doesn't depend on B. Inaccuracy due to
having a small sample size, n, cannot be overcome by using a really large value for B. For a lot of
situations encountered in practice, making B larger than 200 may do little to improve accuracy. But making
B larger won't hurt, and so if speed is not really an important factor, one might routinely use a value of at
least 400 for B --- this will protect against the somewhat rare cases for which appreciable improvement can be
obtained by making B larger than 200.
For parametric bootstrapping one obtains the bootstrap samples differently. Instead of resampling from the
original data, the data is used to fit a parametric model and then the bootstrap samples are generated using the
fitted model. (The values in the bootstrap samples can be, and will typically be, different from the values in the
original data set.)
Since we can do bootstrapping without having to assume a parametric model which may be incorrect, typically
nonparametric bootstrapping is used.