Class 6, STAT 472

Some Comments About the Last Portion of Ch. 4 of Text

Be sure to carefully read the discussion of the six scenarios on pp. 161-163 (and FIGURE 4.11 and FIGURE 4.12 on pp. 162-163). For Scenario 5 it's stated that the data was generated from a normal distribution. But unlike the first 4 scenarios for which a different normal distribution was used to generate the data for each class, for the 5th scenario a single bivariate norm distribtion was used to generate all of the data (for the two classes), but then each data point was randomly assigned to be Class 1 or Class 2 according to probabilities that depend of the values of x₁ and x₂, and the way the random assignments were done (the function of x₁ and x₂ that gives the probability of a Class 1 assignment) produced a more highly nonlinear optimal decision boundary compared to the decision boundaries for the other senarios.

Subsection 4.6.7, which focuses on doing KNN classification using R's knn() function, isn't terribly important, because I think you'll prefer using the functions of R's kknn package for KNN classification. (I'll discuss the kknn package in a Week 7 video, and feature it on HW 7.)