Week 8, STAT 472

Some Comments About Ch. 6 of Text

The part of the book about C_p, AIC, BIC, and Adjusted R² on pp. 233-234, and also the corresponding HTOC video (in which Daniela Witten misspoke more than a few times) is terribly confusing regarding the meaning of d. Slide 20 of the Ch. 6 slides of the HTOC correctly identifies d in the C_p formula as being the number of parameters, but in the book it's given to be the number of predictors (which only makes sense if a column of 1s in the design matrix, corresponding to the intercept term, is counted as being a predictor). Also, in the formulas for AIC and BIC, d is the number of parameters (so d = 2 for a simple regression model that has slope and intercept parameters). But in the formula for Adjusted R², d is the number of actual predictor variables (and so doesn't include the intercept). (E.g., if there is a single explanatory variable, x, for just a simple regression model, d = 1, but for a quadratic model which includes both x and x², then d = 2 (because x² is counted as a 2nd predictor variable).)

The version of C_p given in the footnote on p. 233 of the text (which is the version called Mallow's C_p) is what is created when the summary() function is applied to an object created by a regsubsets() fit. (I suspect that in most places where C_p appears in statistical software output, it'll be Mallow's C_p, and not the nonstandard version given in ISL and ESL.)

4 lines from the top of p. 242 of the text, the term sparse models is used, and it's indicated that a sparse model is one that involves only a subset of the variables. So, according to the text, if even a single predictor is removed, we have a sparse model. But I'm not sure how widely accepted this meaning of sparse is. I believe that some would take a sparse model to mean that most of the variables have been omitted (similar to saying that a sparse matrix is one having most of its elements being 0).