Assorted Coments Related to Ch. 2 Material Covered Week 2
(and the first portion of Week 3)
Since my Ch. 2 lecture notes are fairly comprehensive, particularly with regard to expression (2.3) on p. 19 of the text, and expression (2.7) on p. 34 of the text,
I'm not going to add a lot about those topics here. (My lecture notes also give some additional details pertaining to Bayes classifiers.)
During my Ch. 2 lectures, I didn't discuss Figure 2.7 on p. 25, so please look over it on your own. Basically be can see that methods that yield highly
interpretable models tend to be those having low flexibility, while the highly flexible methods tend to produce models that are harder to interpret.
Some methods, like GAMs and trees, are somewhat "middle-of-the-road" with respect to both flexibility and interpretability. It should be kept in mind that any
of the methods can sometimes be the best one for particular applications. (Note: I believe that by Least Squares they mean fitting just a basic 1st-order
model, using all available predictor variables (and so no "bells and whistles" such as variable transformations, higher-order terms, interaction terms, and
variable selection techniques). Lasso is deemed to be more interpretable than Least Squares because it eliminates weak predictors and only uses the most
important predictors in the fitted model (as opposed to Least Squares using all of the predictors). The Subset Selection in the upper left corner is Least Squares
enhanced by a variable selection method (which makes it more similar to Lasso). If we augment Least Squares by using higher-order terms and interactions, it
would increase in flexibility, but decrease in interpretability.)
On the bottom half of p. 30 of the text, it isn't completely clear whether the Ave in expression (2.6) is intended to be an expected value, which would make
the book's "test MSE" equal to what I'm calling the MSPE, or whether a data-based expression like I have on the bottom portion of p. 2-11 of the class notes is
intended. (As a data-based expression, it's an estimate of the actual MSPE. But if the number of test points used to provide the estimate is very large,
the estimate should be close to the actual MSPE.) However, on subsequent pages, I believe that when the book uses "test MSE" it is intended to be the actual MSPE.
The last major Ch. 2 topic covered by my Ch. 2 videos is K-Nearest Neighbors (KNN) classification. (My video on this topic is the first video in the Week 3 folder of the Blackboard site.) Here is a link to a 2012
link to a
journal article I wrote about nearest-neighbor methods. (It's not a typical journal article about new research, although the comments concerning
classification using multiple queries known to be from the same class are original ideas that arose from my work on a research project dealing with writer identification
based on scanned handwritten documents. Rather, the article is more of a summary article about the topic which I was requested to write by the editors of
the journal.) I'll discuss KNN classifiers more in class when we cover Ch. 4. (You don't have to read my article now: it's just some extra information I'm
supplying, and not crucial at this time.)
It should be noted that for a two-class problem, K is usually taken to be an odd number to eliminate the possibility of a tie in the voting to determine a
predicted class. So I find it odd that the example on pp. 40-41 of the text uses K = 10 and K = 100 (in addition to K = 1).
Although I'll spend relatively little time in class lecturing about R, I expect you to carefully go through each of the text's Lab sections.
(Note: You can easily copy the R commands used in each Lab from the text's web site and paste them into R.) Here are some comments about
the text's Lab in Sec. 2.3.
- The set.seed() function introduced on p. 45 of the text will be very useful to us.
Any time you use random numbers in an R program, it's generally a good idea to use this
function. Setting the random number seed allows you to get exactly the same sequence of random numbers each time you run the program, which can be useful when trying to debug a program.
It should be kept in mind that sometimes random numbers are used, even though you don't explicitly make use of a function such as
runif()
or rnorm(). (E.g., When using R to classify using one of the available KNN functions (there are more than one), the KNN function may call
for a random number in order to randomly choose a neighbor when 2 or more points are tied for being the Kth nearest neighbor, or select a predicted
class when two or more classes are tied for having the largest number of votes.) When using R to do HW problems and there is a possibility that
random numbers will be used, I might ask you to set the seed to 123 (using
set.seed(123)), or some other specified integer (I don't always specify the same seed), so that all of us should get the same result.
- The function t used on the first line of p. 47 is the transpose function, which outputs the tranpose on an inputted matrix.
- In the 3rd shaded box on p. 48, to make the outputted 6 8 appear as a 2 by 1 object instead of a 1 by 2 object (i.e., to make it appear in column form,
since the idea is to extract part of the 2nd column of A), you can use A[-c(1,3),-c(1,3,4),drop=F].
- The read.table command given on the 1st line of the first shaded box on p. 49 won't work for you unless you already have the Auto.data file located in the right
place on your computer. My advice is to not worry about this part of the lab for now. As we do homework exercises throughout the course, I'll have you reading in
data in a variety of ways from a variety of locations.
- On the last line of the second shaded box on p. 51, the breaks argument specifies the number of histogram bins to be used, and not the width of the histogram bins used.
Finally, I'll inform you that the Ch. 2 videos from the HTOC mention a few topics (e.g., nearest-neighbors regression (as opposed to nearest-neighbors classification)
and the "curse of dimensionality") that we'll cover in later chapters.