Introduction to Bootstrapping (and the course): Some Notes Pertaining to Ch. 1 of E&T

In STAT 554, statistical inferences were made by

relying on the robustness of parametric procedures (e.g., the one-sample t test (if the underlying distribution appears slightly nonnormal ... perhaps a bit light-tailed), the one-way ANOVA F test (if the underlying distributions appear slightly nonnormal ... perhaps a bit light-tailed));
using nonparametric procedures (e.g., the sign test, the Wilcoxon rank sum test);
using robust procedures (e.g., Johnson's modified t test).

Although normal-theory methods can be rather inaccurate and/or inefficient (i.e., have low power), sometimes there is no nonparametric test that addresses the issue one is concerned about (e.g., if one wants to do a two-sample test about means but it's not safe to assume that one distribution is stochastically larger than the other if they differ), and it may also be questionable as to whether or not any robust procedures are reliable. So even though one may know how to perform many different statistical procedures, there may be times when one still feels too limited, and so it may be nice if one's arsenal of methods can be expanded to include bootstrap methods.

Boostrapping is a computer-intensive technique that can be used in many different ways to make statistical inferences. It can be used to estimate the standard error, bias, and mean squared error of various estimators, even if the distribution underlying the data is unknown. (It can be used to deal with an estimator of some complicated function of a parameter in a parametric model, or it can be used to deal with an estimator of some distribution measure (e.g., a correlation) when no parametric model is being assumed.) It can also be used to construct confidence intervals for parameters and distribution measures --- again, even if the underlying distribution is unknown. Other uses of bootstrapping include doing tests of hypotheses and estimating prediction errors. (Note: E&T refer to "the bootstrap" but I don't like to use bootstrap as a noun.)

The common element of bootstrap methods is that they all make use of bootstrap samples, created by resampling the original data. But that aspect of bootstrapping is the easy part --- the hard parts are knowing what to do with the bootstrap samples, and knowing in what situations various bootstrap methods can be considered to be trustworthy. (Note: While E&T indicate some situations when bootstrap methods may not work so well, they don't do a good job of indicating how large the sample sizes must be to get reliable results in a lot of cases. It may be that not a lot of guidelines for use are readily available, and so we will want to learn how we can gain some indication about whether or not a specific bootstrap method is reliable in a certain situation.)

Bootstrapping was developed by Brad Efron of Stanford University in the late 1970s. The term bootstrapping was inspired by the commonly used phrase to pull oneself up by one's bootstraps. (Notes: (1) Julian Simon introduced similar ideas earlier, but the development of sophisticated bootstrap methods began with Efron's work. (2) With Efron's work on bootstrap methods and Friedman's big role in the development of CART and other methods, the statistics department of Stanford University was a hotbed for computational statistics during the 1980s (and I consider myself very fortunate to have been a graduate student there at that time (even though I didn't have sense enough to jump onto the computational statistics bandwagon until a year or so after I left)).) Although it has proven to be effective in making accurate inferences in a wide variety of situations, it hasn't really caught on the way that its early advocates thought that it would, but I think that it is becoming more widely used, and I'm convinced that it's worthwhile for me to learn about bootstrap methods.

an example (for which we really don't need bootstrapping)

Ch. 1 of E&T covers an example of bootstrapping for a situation where bootstrapping really isn't needed. But a nice thing about the example is that we can see that bootstrapping produces results in line with those of more traditional methods.

The example has to do with an estimate of relative risk, which is denoted here (and in E&T's Ch. 1) by θ. Based on the data described on p. 2 of E&T, a point estimate of θ can easily be found to be 0.55. But typically we aren't happy with just having a point estimate --- we are concerned about its accuracy, and so often we want to produce an interval estimate as well, or at least come up with some measure of the point estimate's accuracy.

If one has access to the proper books and/or software, developing a confidence interval for θ is possible. An easy way to do it is to use StatXact software --- such a confidence interval can be obtained by selecting from the menus on the newer versions of StatXact. A harder way to obtain a confidence interval is to make use of material in the book Statistical Methods for Rates and Proportions, 2nd Ed., by Fleiss. (Pages 74-75 indicate that one just needs to determine the 2 by 2 tables corresponding to the endpoints of a confidence interval for the odds ratio and then use the relative risk values associated with these two tables. Sec. 5.6 of Fleiss describes how to obtain confidence intervals for odds ratios using an iterative procedure. (Note: Even though I think the relative risk is a more "natural" quantity than is the odds ratio, it seems to be a lot harder to find information about statistical procedures pertaining to relative risk.))

The 95% confidence interval for θ obtained from traditional methods is (0.43, 0.70). (Notes: (1) One way to obtain this interval is to use the default settings after selecting the asymptotic confidence interval for relative risk using StatXact. (2) I don't like the way E&T express the confidence interval result in (1.2) on p. 4.) This R code can be used to obtain almost the same result using bootstrapping instead of theory. (Notes: (1) Also included is the code to produce the second confidence interval given on p. 4, and the estimate of the point estimator's standard error given on p. 5. (2) I think a good way to learn R is by studying many examples --- you are expected to study all of the R examples I provide even though I don't go over all of them in class.)

Although the bootstrap procedures used to obtain the results in Ch. 1 may seem reasonable, it is natural to have some questions about them --- like Aren't we assuming the observed data accurately represents the unknown underlying population? (If so, why not take it one step farther and just assume the point estimate is 100% accurate?)

Students: Be sure to read Sec. 1.3 on p. 9 in order to get familiar with the notational conventions used in E&T.