Some Notes Pertaining to Ch. 7 of E&T

Sec. 7.1 suggests that there are two objections to relying on traditional ways of obtaining estimated standard errors:

many formulas for estimated standard errors rely on distributional assumptions which may not be true;
for many situations, no formula for the estimated standard error of the estimator exists.

The correlation setting from Sec. 6.5 is one for which (7.1) on p. 60 can be used for the estimated standard error if one is willing to assume bivariate normality, or something close to it. But if the underlying distribution differs too much from a bivariate normal distribution, the estimator indicated by (7.1) may perform poorly. The complicated estimation situation addressed in Sec. 7.2 is one for which traditional methods offer no easy way to obtain estimated standard errors --- but bootstrapping can be used to estimate the desired standard error even through the estimator under consideration is rather complicated.

I'm going to skip Sec. 7.2, except that I may just say a little bit about it in class. (I'm skipping it because it's about an example dealing with multivariate statistics, and many students in the class have had little or no exposure to multivariate statistics and all of the eigenvector and eigenvalue stuff.)

If we view theta-hat and se-hat in (7.13) on p. 66 as estimators (random variables) instead of estimates (observed values of estimators), then it's perfectly okay to indicate that the confidence interval covers the estimand (the true value of theta) with the stated coverage probability. But the last two lines on p. 66 are bad --- theta either belongs to an observed confidence interval or it doesn't. (E&T seem to be careless with this sort of thing.)

Sec. 7.3 deals with a regression example. There is a response variable, improvement, which is denoted by y, and one predictor variable, compliance, which is denoted by z in E&T, but which I'll denote by x. I have the data shown in Table 7.4 on p. 72 of E&T in a file which we can view as a web page. The first portion of this R code shows how we can read the data into R and view it in a scatter plot (like Figure 7.5 on p. 73 of E&T).

Since Sec. 7.3 deals with regression, I'll cover some basic material about ordinary least squares regression in class. (This web page covers regression basics, but it goes into greater detail than what I intend to present in class.) Returning to the R code, we can see how to use R to fit a simple regression model, as well as higher-order polynomial regression models. E&T choose a quadratic regression model as a candidate, due to the fact that the quadratic fit generally follows the trend indicated by the unbiased plug-in estimator (which results in an estimated response "curve" which is too rough to be believed). Higher-order polynomial fits can be considered, but if too many terms are used in the model the estimated response can be too wiggly and fear of overfitting needs to be given serious consideration. With 164 observations, it wouldn't be unreasonable to consider higher-order models, but it's not clear that a cubic fit is better.

E&T also consider a local regression fit. They use a specific type of local regression known as loess. (Note: Some local regression methods are also known as smoothers, or scatterplot smoothers.) Since I suspect that a large proportion of the class may not be very familiar with local regression, I'll go over the basics in class. (This web page pertains to kernel methods for regression, including loess, but it goes into greater detail than what I intend to present in class.) Returning to the R code, we can see how loess fits can be obtained with R.

E&T consider several estimators in this section. (There are three estimands.) We have least squares and loess estimators of E(Y|x = 60), least squares and loess estimators of E(Y|x = 100), and least squares and loess estimators of θ, which is defined by (7.29) on p. 80 of E&T. (It can be noted that three different values are given for the loess estimate of θ --- 1.84 appears in (7.28) on p. 79, 1.59 appears twice on p. 80, and 1.63 appears on p. 82. 1.84 is consistent with the numbers used in (7.28), but those numbers aren't all correct --- two different values are given for the loess estimate of E(Y|x = 80). Using R it can be determined that the loess estimate of E(Y|x = 80) is 37.50. If the other value, 32.50, is changed to 37.50, the estimate of θ is 1.59 (which is what I get using R).) Returning to the R code, we can see how R can be used to get the various estimates. We can also see how bootstrapping can be done to produce estimated standard errors for the various estimators. (Note: The last complete sentence on p. 79 of E&T is not consistent with Figure 7.10 on p. 82 of E&T.)

With the sentence that starts at the bottom of p. 78, E&T claim that Table 7.5 shows that the loess estimates are less accurate. While the loess estimators appear to have greater variability, I wouldn't conclude, just based on Table 7.5, that they are less accurate. In regression settings, some estimators can be highly biased. Due to the local nature of loess and the global nature of ordinary least squares fits, my strong guess is that the loess estimators have smaller bias compared to the least squares estimators, and so overall it's not clear which estimators are really more accurate, since bias and variance should both be taken into account when considering estimator accuracy. (E&T comment on the bias and variablity of loess estimators below (7.29) on p. 80.)

Sec. 7.4 of E&T deals with an example for which bootstrapping fails. To me, it's not clear that bootstrapping fails just because the distribution of bootstrap replicates shown on the left in Figure 7.11 (on p. 83 of E&T) appears somewhat odd compared to other bootstrap replicate distributions shown in the book. However, this R code produces results which indeed indicate bootstrap failure (because the bootstrap estimate of standard error is much much worse than a parametric estimate of standard error).