Class 4, STAT 672

Assorted Comments Related to Ch. 3 Material Covered Week 4

(and the first portion of Week 5)

I'll just point out one more thing regarding using an F test to assess the significance of a qualitative (categorical) predictor having more than two levels, instead of relying on two or more t tests about the individual adjustments for the levels. This relates to the statement in the 2nd to last paragraph of p. 86 of the text (pertaining to the coefficients and p-values for the individual levels): "the coefficients and their p-values do depend on the choice of dummy variable coding." Suppose there are 3 levels; A, B, and C. If A is the baseline group, and the adjustment for B is slightly negative, and the adjustment for C is slightly positive, then neither of the t tests for the B and C coefficients may be significant, because while the B-A, and C-A differences are nozero, neither is large enough in magnitude to produce a significant result. But if instead B served as the baseline group, the t for C could be significant, since the difference between C and B is the largest in magnitude of the 3 pairwise differences (A-B, C-B, and C-A).

In the last complete paragraph of p. 87 of the text, the possibility that "spending money on radio advertising actually increases the effectiveness of TV advertising" is considered. Another explanation for the interaction term could be that spending too much on one type of advertising results in diminishing returns. So it isn't so much that spending appreciably on two types of advertising causes one type of advertising to increase the effectiveness of the other than it is the case that when most of the spending is just for one of the two types of advertising, overspending past a certain "saturation threshold" doesn't produce the same "bang for the buck" that increasing spending for the other type of advertising would have brought.

In the example on the middle portion of p. 88, one can say that increasing the number of workers increases the effectiveness of increasing the number of lines (or increasing the number of lines increases the effectiveness of the number of workers). One could also argue that increasing one beyond a certain point without increasing the other appropriately would lead to diminishing returns, but for this example I think the notion that increasing one increases the effectiveness of the other is a good way to describe the situation.

Some statisticians shy away from using transformations of the independent variables (except for maybe replacing x by log x if they replace y by log y), because they think it doesn't properly "penalize" the adjustment for nonlinearity. (I.e., if instead of replacing x by 1/x, one approximated this nonlinear relationship by adding quadratic and cubic terms, you'd now have three coefficients to estimate instead of just one ... and some think it's somehow "cheating" to avoid incorporating the extra coefficients.) But I think this attitude is a bit silly. (I mean, who wants to be penalized, right?) In some cases there may be a very good reason why the mean response is related to the inverse of a predictor instead of being linearly related to the predictor (and so in some sense, the inverse of the predictor is the natural variable to use). E.g., it follows from Bolyle's law that for a fixed amount of gas at a constant temperature, the pressue is directly proportional to the inverse of the volume. This gives us that if v_i is the ith volume considered, and Y_i is the measured pressure for the ith volume, then the model

Y_i = β (v_i)^-1 + ε_i,

where ε_i is an error term due to measurement error, might be accurate.

In class I mentioned component plus residual plots (see p. 3-38 of the class notes (toward to bottom)). The bottom portion of this web page from my Summer 2005 class, under the heading variable transformation, contains a description of such plots (which are easy to construct). (One way to create such a plot for each of the predictors in a multiple regression model is to use the function crPlots from the car package. If you've created the object fit1 using the lm function, and have loaded the car library, then crPlots(fit1) will create a component plus residuals plot for each of the predictors.)

Even though I don't discuss Sec. 3.4 of the text in my lectures, I encourage you to read it, and look back at the earlier results in the chapter pertaining to this running example while you do so. This will be a good way to review some of the steps one takes when building a regression model from a data set.

I talk a little about the curse of dimensionality when I discuss nearest neighbors regression, but be sure to carefully go over the comments in the text pertaining to FIGURE 3.20, and notice how increasing the number of "noise predictors" can greatly hurt the performance of KNN regression, while this is not the case for OLS regression (because with OLS regression the noise variables will be ignored for the most part, but with KNN regression the method gives the same weight to the noise variables as it does the meaningful predictors). (The portion about FIGURE 3.20 starts on the bottom portion of p. 107 and continues to the end of Sec. 3.5.)

As will always be the case with the text's Lab sections, I encouage you to work through the R stuff in Sec. 3.6. Here, I'll add a few comments pertaining to using R for regression.

In the first line of the 2nd shaded box on p. 111, the name lm.fit is given to the object created using the lm command. But one could give it other names as well. For example, if one was going to create a sequence of different fitted regression models to compare, one could name the object for the first one fit1, the object for the second one fit2, etc.
Note that in part (a) of Exercise 11 on p. 125 of the text, it is explained how to perform a regression without using an intercept.