Some Comments About Ch. 6 of Text



The part of the book about Cp, AIC, BIC, and Adjusted R2 on pp. 233-234, and also the corresponding HTOC video (in which Daniela Witten misspoke more than a few times) is terribly confusing regarding the meaning of d. Slide 20 of the Ch. 6 slides of the HTOC correctly identifies d in the Cp formula as being the number of parameters, but in the book it's given to be the number of predictors (which only makes sense if a column of 1s in the design matrix, corresponding to the intercept term, is counted as being a predictor). Also, in the formulas for AIC and BIC, d is the number of parameters (so d = 2 for a simple regression model that has slope and intercept parameters). But in the formula for Adjusted R2, d is the number of actual predictor variables (and so doesn't include the intercept). (E.g., if there is a single explanatory variable, x, for just a simple regression model, d = 1, but for a quadratic model which includes both x and x2, then d = 2 (because x2 is counted as a 2nd predictor variable).)



The version of Cp given in the footnote on p. 233 of the text (which is the version called Mallow's Cp) is what is created when the summary() function is applied to an object created by a regsubsets() fit. (I suspect that in most places where Cp appears in statistical software output, it'll be Mallow's Cp, and not the nonstandard version given in ISL and ESL.)



4 lines from the top of p. 242 of the text, the term sparse models is used, and it's indicated that a sparse model is one that involves only a subset of the variables. So, according to the text, if even a single predictor is removed, we have a sparse model. But I'm not sure how widely accepted this meaning of sparse is. I believe that some would take a sparse model to mean that most of the variables have been omitted (similar to saying that a sparse matrix is one having most of its elements being 0).