Some Notes Pertaining to Ch. 12 of E&T

Ch. 12 is the first of three consecutive chapters dealing with confidence intervals. (Ch. 22 also deals with confidence intervals.)

In Sec. 12.1, we can see another use for estimated standard errors. (Previously in E&T they have been used primarily as a measure of accuracy for point estimators.) Estimated standard errors can be used in approximate confidence intervals for approximately normal estimators.

If an estimator is normally distributed and unbiased, then the probability that it takes a value within 1.6449 standard errors of the estimand is 0.9 (and the probability that it takes a value within 1.9600 standard errors of the estimand is 0.95).

If an estimator is normally distributed and unbiased, and the sample size is large enough so that the estimator's standard error can be estimated very well, then the probability that it takes a value within 1.6449 estimated standard errors of the estimand is approximately 0.9 (and the probability that it takes a value within 1.9600 estimated standard errors of the estimand is approximately 0.95). This can be shown using Slutzky's theorem.

Also, if an estimator is approximately normally distributed and is unbiased or has a negligible bias (relative to it's standard error), then the probability that it takes a value within 1.6449 standard errors of the estimand is approximately 0.9 (and the probability that it takes a value within 1.9600 standard errors of the estimand is approximately 0.95).

Finally, if an estimator is approximately normally distributed and is unbiased or has a negligible bias (relative to it's standard error), and the sample size is large enough so that the estimator's standard error can be estimated very well, then the probability that it takes a value within 1.6449 estimated standard errors of the estimand is approximately 0.9 (and the probability that it takes a value within 1.9600 estimated standard errors of the estimand is approximately 0.95).

While it may be rare to work with a normally distributed estimator in practice, many estimators are approximately normal, and so the last two results are quite useful --- the method of (approximate) pivots leads to approximate confidence intervals of the form given by (12.6). E&T refer to such intervals as standard confidence intervals. (Notes: (1) I will write z_α instead of z^(1-α). (2) I typically use α/2 instead of α to get a confidence interval having approximate coverage probability 1 - α instead of 1 - 2α.)

Estimators which are asymptotically normal are only approximately normal with a finite sample size, and for sample sizes which aren't large enough, the approximation suffers. In such cases it is often possible to use bootstrapping (in a variety of ways) to obtain better approximate confidence intervals. Bootstrapping can also be used effectively in some cases in which the estimator is not approximately normal. (Some estimators aren't even asymptotically normal.)

There's not a lot of importance in Sec. 12.2. The main focus is on what E&T refer to as accurate confidence intervals. They designate a confidence interval as accurate if the probability that it misses covering the estimand by having the upper confidence bound being below the estimate is the same as the probability that it misses by having the lower confidence bound being above the estimate, with both of these probabilities being (about) α for a (approximate) confidence interval having a nominal coverage probability of 1 - 2α.

E&T seem to put a lot of emphasis on having what they refer to as an accurate confidence interval, but a lot of others do not do this. Some strive to have a confidence interval procedure which is valid (meaning that the actual coverage probability is not less than the nominal coverage probability), and which produces intervals which are on the average as short as possible.

Sec. 12.3 isn't too important as far as learning about bootstrapping is concerned. It establishes the viewpoint that a confidence interval represents the plausible values for an unknown estimand by using aspects of hypothesis testing. It may be a bit hard to follow upon an initial reading, but in class I can easily explain what they are getting at by claiming that values above the interval and values below the interval are implausible. (Basically, values below the lower confidence bound are implausible because if such a value was the true value of θ, then it would be unlikely to obtain an estimate as large as the one observed, and values above the upper confidence bound are implausible because if such a value was the true value of θ, then it would be unlikely to obtain an estimate as small as the one observed.)

The second to the last sentence in the section indicates that a test of hypotheses can be carried out using a confidence interval. I'll briefly discuss this in class.

(12.19) on p. 159 gives the formula for Student's one-sample t confidence interval. (It should be noted that on p. 158 the estimator was specified to be the sample mean.) Especially for the case of normal random variables, the t interval represents an improvement over (12.16) for the case of the estimator being the sample mean (sometimes referred to as the z interval) because the t interval takes into account that an estimated standard error is being used rather than the actual standard error.

While the t interval adjusted for the fact that the standard error is unknown, it doesn't adjust for nonnormality, and thus is only an approximate interval when the underlying distribution of the observations is nonnormal. Also, the t interval given in this section is for estimating a distribution mean, and we don't have a lot of similar intervals for estimating other distribution measures except in a relatively small number of special cases. Using bootstrapping we can adjust for nonnormaility and improve upon the t interval for estimating a distribution mean, and we can also develop confidence intervals for other distribution measures without assuming parametric models which may not accurately model the phenomenon of interest.

Sec. 12.5 is a relatively short, relatively simple, and somewhat important section. It describes the first of the bootstrap confidence interval methods covered by E&T.

The bootstrap t method is based on the same pivot used for standard intervals (the pivot of (12.17) and (12.18)), but instead of assuming a standard normal or T distribution, the sampling distribution of the pivot --- or rather just the two quantiles of it that are needed to obtain a confidence interval by inverting the pivot --- is approximated using bootstrapping. Bootstrap replicates of the pivot (see (12.20)) are created, and they are used to obtain the estimates of the sampling distribution's quantiles that are needed for the confidence interval given by (12.22). (In class I'll show how (12.22) can be obtained.)

I don't like to use (12.21) to determine the estimated percentiles that are needed --- doing so is to use a method which lacks symmetry. E.g., if B = 1000, there are 49 replicates below the estimated 5th percentile and 50 replicates above the estimated 95th percentile. I like to use (B + 1)α with B = 999 instead of Bα with B = 1000. Using (B + 1)α with B = 999 puts 49 replicates below the estimated 5th percentile and 49 replicates above the estimated 95th percentile. (Note: 100 or 200 (or 99 or 199) is not nearly large enough for B --- one needs for B to be at least 10 times larger in order to be able to estimate the extreme percentiles accurately enough.)

(12.20) is for the case of the estimator the pivot is based on being the plug-in estimator. However, one could use some other estimate on the left in the numerator. But in that case the plug-in estimate still needs to be used on the right in the numerator (because the plug-in estimate is the true value of the population measure of interest in the bootstrap world). It should also be noted that unless the estimator the pivot is based on is the sample mean, so that s^*/sqrt(n) can be used as the estimated standard error, or a linear estimator, so that the jackknife estimate of standard error obtained from the bootstrap sample will be good to use, nested bootstrapping may be needed in order to get the estimated standard error to use in (12.20). (We cannot use the bootstrap estimate of standard error based on the original data in the replicates and still maintain the proper correspondence between things in the bootstrap world and things in the real world. Estimating the standard error from each bootstrap sample allows us to properly account for the variability in the estimated standard error --- to make the pivot similar to a t statistic instead of a z statistic.)

The bootstrap t method is particularly good to use with location statistics, and in general, for large sample sizes, it's coverage probability tends to be closer to the nominal level than is the case for standard confidence intervals. A bootstrap t confidence interval can have confidence bounds which are not the same distance below and above the point estimate. This is a big factor in why they can work appreciably better than standard intervals when the sampling distribution of the estimator is not symmetric.

Here is R code to obtain a bootstrap-t confidence interval for a situation addressed in Sec. 12.5. The first part of this R code can be used to obtain a bootstrap-t confidence interval for a situation addressed in Sec. 12.6 but using the general method described in Sec. 12.5.

The first page of Sec. 12.6 indicates that there are two problems associated with the use of the bootstrap-t confidence procedure: (1) typically there is no simple way to estimate the standard error of an estimator, and so nested bootstrapping (which adds greatly to the run time) may be needed; and (2) sometimes the bootstrap-t procedure behaves poorly (especially with small sample sizes). It turns out that using a variance stabilizing transformation along with the bootstrap-t confidence interval procedure can simultaneously address both of these concerns. However, since the proper transformation is seldom easy to identify, one may have to rely on a data-driven computer-intensive numerical method in order to carry out the transformation ploy. (Note: Typically trying to use a small amount of data to fit a highly flexible method is somewhat dangerous. So my guess is that if the sample size is rather small this complex alternative to a straightforward application of the bootstrap-t may not be trustworthy.)

To make a case for the use of variance stabilizing transformations, E&T consider the previously used law school data for which the estimand of interest is the correlation coefficient. If the underlying distribution is a bivariate normal distribution, then one can get decent performance using (12.24) and (12.25). (I'll work through in class how (12.24) and (12.25) may be used to obtain a confidence interval for the correlation without using bootstrapping. Here is a .pdf file that shows the main steps of employing the Fisher transformaion to obtain a confidence interval for ρ.) But if the underlying distribution is not a bivariate normal distribution, then (12.25) may not hold, and the method of using (12.24) and (12.25) may not perform well. It turns out that a good solution is to use the bootstrap-t procedure with (12.24), which allows us to not have to assume that (12.25) holds (since the bootstrap-t procedure estimates the needed aspects of the sampling distribution of the "studentized" statistic based on the transformation (12.24)). The bootstrap-t intervals based on the transformation are shorter than the bootstrap-t intervals which resulted from not using the transformation, and they didn't contain impossible values for the estimand, unlike the intervals resulting from not using the transformation. (Note: While short confidence intervals are preferred, no results are presented to indicate that the intervals based on the transformation are not too short --- we don't know that the overall confidence interval procedure produces intervals having the correct coverage probability. (Later, we will cover results in E&T that show that in some cases bootstrap confidence intervals really do perform appreciably better than other confidence intervals.))

In most situations, unlike the law school correlation example, one wouldn't know what transformation would be good to use. Ideally one might seek a transformation which results in a pivot that is approximately normal and doesn't involve an estimated standard error, since that may necessitate nested bootstrapping. But in most cases that is too much to ask for, and it turns out to be better to focus on finding a transformation for which the transformed estimator's variance does not depend on the value of the estimand, since the bootstrap-t procedure can deal with the nonnormality but suffers if the variance of the transformed estimator depends on the estimand.

Even with the relaxed goal of using a variance-stabilizing transformation, in most cases it still won't be clear what transformation to use. In class I'll show you how a first-order Taylor series approximation of g(x) leads to (12.29) on p. 167, and how if one uses a g which satisfies (12.26) on p. 164, the result would be that if X is a random variable with mean θ and standard deviation s(θ), then g(X) has a variance which is close to 1 no matter what value θ is. (Note: These answers for Problem 12.4 on p. 167 of E&T show part of my classroom presentation.) So while X's standard deviation depends on θ, g(X)'s standard deviation does not. Letting the estimator of the estimand of interest assume the role of X, and assuming that the estimator has negligible bias, if we knew how the standard deviation of the estimator depended on the value of the estimand, we could apply (12.26) (or equivalently (12.27)) to determine a transformation to use that would approximately stabilize the variance of the transformed estimator, which allows us to use a pivot which does not contain an estimated standard error, which eliminates the need for nested bootstrapping!

Since we seldom will know how the standard error of an estimator varies with the value of the estimand, we don't have the function s in (12.26) and (12.27), and so we can't use s to obtain the desired transformation g. But we can use bootstrapping to approximate s, and ultimately obtain a confidence interval for θ, using the following steps.

B₁ bootstrap samples can be drawn from the orignal data. (Typically it's okay to use a number around 100 or 200 for B₁.)
From each of the B₁ bootstrap samples, B₂ (a smaller number, say 50 or 100) can be drawn in order to get an estimate of the standard error associated with the value of the bootstrap replicate of the estimate of θ.
The collection of the B₁ (replicate of the estimate of θ, estimated standard error) pairs can be used to estimate the function s, and from the estimated s numerical integration can be used to obtain and estimate of g, using (12.27).
Finally, B₃ (being at least 1000) bootstrap samples are drawn from the orginal data, and from these replicates of the g-transformed estimate of θ are obtained. These are used to estimate the needed percentiles of the simplified pivot with the standard error set equal to 1; the estimated quantiles are used to obtain a confidence interval for g(θ); and then the inverse transformation is applied to g(θ)'s confidence bounds to obtain confidence bounds for θ. (Note: Since s is a positive function, from (12.27) it is clear that g will be a continuous monotone increasing function, and so g^-1 will exist.)

Obviously, using transformations with the bootstrap-t procedure is a rather involved method for obtaining confidence intervals. However, not only can performance be improved over the basic bootstrap-t procedure, but the total number of bootstrap samples needed can be greatly reduced due to the fact that a variance-stabilizing transformation eliminates the necessity for large-scale nested bootstrapping. (The first two steps indicated above give us a small-scale nested bootstrapping. For a large-scale nested bootstrapping, to carry out the method of Sec. 12.5 when we don't have a formula for the estimated standard error, if we need 999 replicates of the pivot, and 100 replicates of the estimate from each of the 999 bootstrap samples in order to estimate the standard error for the denominators of the pivot replicates, a total of 999 + 999*100 = 100,899 bootstrap samples. But making use of the transformation scheme described above, one could use a total of only 200*100 + 999 = 20,999 bootstrap samples (so about one fifth as many as before), or perhaps as few as just 100*100 + 999 = 10,999 bootstrap samples, and possibly obtain a better confidence interval.)

The last portion of this R code can be used to obtain a variance-stabilized bootstrap-t confidence interval for a situation addressed in Sec. 12.6, working with the law school data. (It also includes code for obtaining confidence intervals based of the Fisher transformation of the correlation.)