Assorted Coments Related to Ch. 2 Material Covered Week 2

(and the first portion of Week 3)



Since my Ch. 2 lecture notes are fairly comprehensive, particularly with regard to expression (2.3) on p. 19 of the text, and expression (2.7) on p. 34 of the text, I'm not going to add a lot about those topics here. (My lecture notes also give some additional details pertaining to Bayes classifiers.)

During my Ch. 2 lectures, I didn't discuss Figure 2.7 on p. 25, so please look over it on your own. Basically be can see that methods that yield highly interpretable models tend to be those having low flexibility, while the highly flexible methods tend to produce models that are harder to interpret. Some methods, like GAMs and trees, are somewhat "middle-of-the-road" with respect to both flexibility and interpretability. It should be kept in mind that any of the methods can sometimes be the best one for particular applications. (Note: I believe that by Least Squares they mean fitting just a basic 1st-order model, using all available predictor variables (and so no "bells and whistles" such as variable transformations, higher-order terms, interaction terms, and variable selection techniques). Lasso is deemed to be more interpretable than Least Squares because it eliminates weak predictors and only uses the most important predictors in the fitted model (as opposed to Least Squares using all of the predictors). The Subset Selection in the upper left corner is Least Squares enhanced by a variable selection method (which makes it more similar to Lasso). If we augment Least Squares by using higher-order terms and interactions, it would increase in flexibility, but decrease in interpretability.)

On the bottom half of p. 30 of the text, it isn't completely clear whether the Ave in expression (2.6) is intended to be an expected value, which would make the book's "test MSE" equal to what I'm calling the MSPE, or whether a data-based expression like I have on the bottom portion of p. 2-11 of the class notes is intended. (As a data-based expression, it's an estimate of the actual MSPE. But if the number of test points used to provide the estimate is very large, the estimate should be close to the actual MSPE.) However, on subsequent pages, I believe that when the book uses "test MSE" it is intended to be the actual MSPE.

The last major Ch. 2 topic covered by my Ch. 2 videos is K-Nearest Neighbors (KNN) classification. (My video on this topic is the first video in the Week 3 folder of the Blackboard site.) Here is a link to a 2012 link to a journal article I wrote about nearest-neighbor methods. (It's not a typical journal article about new research, although the comments concerning classification using multiple queries known to be from the same class are original ideas that arose from my work on a research project dealing with writer identification based on scanned handwritten documents. Rather, the article is more of a summary article about the topic which I was requested to write by the editors of the journal.) I'll discuss KNN classifiers more in class when we cover Ch. 4. (You don't have to read my article now: it's just some extra information I'm supplying, and not crucial at this time.)

It should be noted that for a two-class problem, K is usually taken to be an odd number to eliminate the possibility of a tie in the voting to determine a predicted class. So I find it odd that the example on pp. 40-41 of the text uses K = 10 and K = 100 (in addition to K = 1).

Although I'll spend relatively little time in class lecturing about R, I expect you to carefully go through each of the text's Lab sections. (Note: You can easily copy the R commands used in each Lab from the text's web site and paste them into R.) Here are some comments about the text's Lab in Sec. 2.3.
Finally, I'll inform you that the Ch. 2 videos from the HTOC mention a few topics (e.g., nearest-neighbors regression (as opposed to nearest-neighbors classification) and the "curse of dimensionality") that we'll cover in later chapters.