Some Notes Pertaining to Ch. 7 of E&T
Sec. 7.1 suggests that there are two objections to relying on traditional ways of obtaining estimated standard
errors:
- many formulas for estimated standard errors rely on distributional assumptions which may not be true;
- for many situations, no formula for the estimated standard error of the estimator exists.
The correlation setting from Sec. 6.5 is one for which (7.1) on p. 60 can be used for the estimated standard error if
one is willing to assume bivariate normality, or something close to it. But if the underlying distribution differs
too much from a bivariate normal distribution, the estimator indicated by (7.1) may perform poorly.
The complicated estimation situation addressed in Sec. 7.2 is
one for which traditional methods offer no easy way to obtain estimated standard errors --- but bootstrapping can be
used to estimate the desired standard error even through the estimator under consideration is rather complicated.
I'm going to skip Sec. 7.2, except that I may just say a little bit about it in class. (I'm skipping it because it's
about an example dealing with multivariate statistics, and many students in the class have had little or no exposure
to multivariate statistics and all of the eigenvector and eigenvalue stuff.)
If we view theta-hat and se-hat in (7.13) on p. 66 as estimators (random variables) instead of estimates (observed
values of estimators), then it's perfectly okay to indicate that the confidence interval covers the estimand (the
true value of theta) with the stated coverage probability. But the last two lines on p. 66 are bad --- theta either
belongs to an observed confidence interval or it doesn't. (E&T seem to be careless with this sort of thing.)
Sec. 7.3 deals with a regression example. There is a response variable, improvement, which is denoted by y,
and one predictor variable, compliance, which is denoted by z in E&T, but which I'll denote by x.
I have the data shown in Table 7.4 on p. 72 of E&T in a
file which we can view as a web page.
The first portion of
this R code shows how we can read the data into R and view it in a
scatter plot (like Figure 7.5 on p. 73 of E&T).
Since Sec. 7.3 deals with regression, I'll cover some basic material about ordinary least squares regression in
class.
(This web page covers regression basics, but it goes into greater detail than
what I intend to present in class.)
Returning to
the R code, we can see how to use R to fit a simple regression
model, as well as higher-order polynomial regression models. E&T choose a quadratic regression model as a candidate,
due to the fact that the quadratic fit generally follows the trend indicated by the unbiased plug-in estimator
(which results in an estimated response "curve" which is too rough to be believed).
Higher-order polynomial fits can be considered, but if too many terms are used in the model the estimated response
can be too wiggly and fear of overfitting needs to be given serious consideration. With 164 observations, it
wouldn't be unreasonable to consider higher-order models, but it's not clear that a cubic fit is better.
E&T also consider a local regression fit.
They use a specific type of local regression known as loess.
(Note: Some local regression methods are also known as smoothers, or scatterplot smoothers.)
Since I suspect that a large proportion of the class may not be very
familiar with local regression, I'll go over the basics in class.
(This web page pertains to kernel methods for regression, including loess,
but it goes into greater detail than
what I intend to present in class.)
Returning to
the R code, we can see how loess fits can be obtained with R.
E&T consider several estimators in this section. (There are three estimands.)
We have least squares and loess
estimators of E(Y|x = 60),
least squares and loess
estimators of E(Y|x = 100), and
least squares and loess
estimators of θ, which is defined by (7.29) on p. 80 of E&T.
(It can be noted that three different values are given for the loess estimate of θ --- 1.84 appears in (7.28)
on p. 79, 1.59 appears twice on p. 80, and 1.63 appears on p. 82. 1.84 is consistent with the numbers used in
(7.28), but those numbers aren't all correct --- two different values are given for
the loess estimate of E(Y|x = 80). Using R it can be determined that
the loess estimate of E(Y|x = 80) is 37.50. If the other value, 32.50, is changed to 37.50,
the estimate of θ is 1.59 (which is what I get using R).)
Returning to
the R code, we can see how R can be used to get the various
estimates. We can also see how bootstrapping can be done to produce estimated standard errors for the various
estimators. (Note: The last complete sentence on p. 79 of E&T is not consistent with Figure 7.10 on p. 82 of E&T.)
With the sentence that starts at the bottom of p. 78, E&T claim that Table 7.5 shows that the loess estimates
are less accurate. While the loess estimators appear to have greater variability, I wouldn't conclude, just based on
Table 7.5, that they are less accurate. In regression settings, some estimators can be highly biased. Due to the
local nature of loess and the global nature of ordinary least squares fits, my strong guess is that the loess
estimators have smaller bias compared to the least squares estimators, and so overall it's not clear which estimators
are really more accurate, since bias and variance should both be taken into account when considering estimator
accuracy. (E&T comment on the bias and variablity of loess estimators below (7.29) on p. 80.)
Sec. 7.4 of E&T deals with an example for which bootstrapping fails. To me, it's not clear that bootstrapping fails
just because the distribution of bootstrap replicates shown on the left in Figure 7.11 (on p. 83 of E&T) appears
somewhat odd compared to other bootstrap replicate distributions shown in the book. However,
this R code produces results which indeed indicate bootstrap failure
(because the bootstrap estimate of standard error is much much worse than a parametric estimate of standard error).