Some Notes Pertaining to Ch. 6 of E&T

Sec. 6.1 indicates that bootstrapping may be used to estimate the standard error of an estimator no matter how complicated the estimator is. (Note: It is assumed that we're dealing with a random sample, and it should be noted that by random sample E&T mean that we have independent observations from the sample distribution. If the sampling is done without replacement from a finite population and we have what is known as a simple random sample, or perhaps something more complicated if say stratified sampling is used, then modifications may have to be made in order to use bootstrapping effectively.)

s(x) is used to denote the specific estimator of theta being considered --- it's the estimator for which an estimated standard error is desired. (Don't assume that s is the sample standard deviation.) It may or may not be a plug-in estimate.

Some of the material in Sec. 6.2 repeats what has been given previously in E&T. For example, Sec. 6.2 describes bootstrap samples, bootstrap replicates, and the bootstrap estimate of an estimator's standard error; all of which were introduced in Ch. 2. Something new (but hinted at in Ch. 2 with the limiting values of the bootstrap estimates) is the ideal bootstrap estimate of standard error given by (6.3). Note that it doesn't require that bootstrap samples actually be drawn! The last paragraph of Sec. 6.2 gives a more usable expression for the ideal estimate, and it also indicates that obtaining the ideal estimate may be impractical unless the sample size is really small. It can be noted that the ideal estimate is the plug-in estimate. (Keep in mind that although we may not have a formula like (5.4) on p. 40 to apply the plug-in principle to (and arrive at a tidy expression like (5.12) on p. 43), it's the case that the standard error is always the square root of the variance of the estimator, and so we can just obtain the plug-in estimate of that variance in a very straightforward, though somewhat clunky, manner.)

(6.7) on p. 47 gives us that the limit (as B goes to infinity) of the usual boostrap estimate of standard error, which uses bootstrap samples that are actually drawn, is equal to the ideal estimate. (For large values of B, the proportion of times that each possible bootstrap sample appears in the collection of the B bootstrap samples should be close to the probabilities (the weights, the w_j) given in (6.8) on p. 49.) This suggests that for B sufficiently large, the typical bootstrap estimate of standard error, based on actual bootstrap samples, should (with high probability) be close to the ideal bootstrap estimate (which is good since it may be impractical to obtain the ideal estimate).

This R code can be used to obtain some results similar to those presented in Table 6.1 and on the left in Figure 6.2.

The last paragraph of Sec. 6.3 points out that since we actually have complete knowlege of the population we can in obtain a very good Monte Carlo estimate of the true standard error of the estimator being considered. (In principle we could obtain the exact value of the estimator's standard error.) It can be noted that the bootstrap estimate obtained from just the sample of size 15 and using the largest value tried for B is better than the bootstrap estimates based on smaller values of B. However, it perhaps should also have been pointed out that this is somewhat of a coincidence --- if another sample of size 15 had been drawn and used to obtain the estimated standard error, it may be that for even larger values of B the bootstrap estimate won't be nearly so close to the true standard error (or the very good Monte Carlo estimate of it). With a small random sample, unless the sample has similar characteristics to the population, the bootstrap estimate may not have great accuracy no matter how large B is made. It may be that the bootstrap estimate is about as good as we can do, but its accuracy is usually much more dependent on the sample size, n, than it is on the number of bootstrap samples uses, B.

Bootstrap estimators of standard error tend to have relatively little bias, and their variances decrease as B increases. (Note: On p. 51 E&T refer to the bias and standard deviation of a standard error estimate. I think that to be proper, we should say that an estimate, which is not a random variable, doesn't have a mean or standard deviation. But the estimator is a random variable and we can refer to its bias and standard deviation.)

Formula (6.9) on p. 52 of E&T shows that after a certain point, increasing B won't help much, because for B sufficiently large the contribution to the variability of the bootstrap estimator due to the part of (6.9) that depends on B is somewhat negligible compared to the part that doesn't depend on B. Inaccuracy due to having a small sample size, n, cannot be overcome by using a really large value for B. For a lot of situations encountered in practice, making B larger than 200 may do little to improve accuracy. But making B larger won't hurt, and so if speed is not really an important factor, one might routinely use a value of at least 400 for B --- this will protect against the somewhat rare cases for which appreciable improvement can be obtained by making B larger than 200.

For parametric bootstrapping one obtains the bootstrap samples differently. Instead of resampling from the original data, the data is used to fit a parametric model and then the bootstrap samples are generated using the fitted model. (The values in the bootstrap samples can be, and will typically be, different from the values in the original data set.)

Since we can do bootstrapping without having to assume a parametric model which may be incorrect, typically nonparametric bootstrapping is used.