Homework


Homework exercises will be assigned throughout the summer session. These exercises will be discussed in class, but will not be submitted for grading.
The due dates given below refer to the date of the class discussion.

due June 9

1) (Introduction to R) Use R to execute all of the commands in the command file that I started discussing in class on June 7.


due June 14

2) (Bayes classifier) Suppose that for class 0 X has a uniform (0, 1) distribution, and for class 1 X has a beta distribution having pdf
f(x) = 2x I(0, 1)(x).
Assuming that cases to be classified are just as likely to come from class 1 as they are class 0, obtain the Bayes classifier and give its error rate. Also give the Bayes classifier and its error rate if cases to be classified are twice as likely to come from class 1 as they are class 0.


due June 23

3) (Sampling distribution of nearest neighbor) Do Ex. 2.3 in HTF (which is to derive equation (2.24)).

4) (CART Walkabout) Go through the Interactive Walkabout for CART. (It takes 30 minutes or so to go through it carefully.) Make a note of any questions that you may have about it, and I will try to answer some of your questions in class. (Note: It seems to me that the Walkabout may pertain to a slightly older version of CART. I don't think it's necessary to specify so much information about the target variable for classification and other categorical veriables --- I think that the latest version of CART figures out that information on it's own.) (I wonder if the variable importance values given by CART by would be useful for selecting predictor variables to be used in k-nearest-neighbors classification --- one wouldn't want to let weak predictors contribute the same as strong ones in determining the distances, and so some way of omitting weak predictors would be good.)

5) (Tactics for multiple regression when there are a relatively large number (compared to the sample size) of explanatory variables) Go through the R example I have pertaining to the lasso, ridge regression, and principal components rregression, and also go through the R example I have pertaining to the example introduced on p. 17 of HTF. Make a note of any questions that you may have about them, and I will try to answer some of your questions in class.


due June 28

6) (Linear regression spline) Go through the R example I posted on the lecture supplements web page, but without looking at it, use R to fit a linear regression spline for the data used in the example (using knots at 30, 60, and 90), and use the software output to produce a piecewise expression for E(Y|x), with each piece being of the form ax + b.

7) (Cubic splines for regression modeling) Go through the R example I have pertaining to the use of natural cubic splines in a multiple regression setting. (Here is some information about the data used in the example.)


due June 30

8) (MARS Walkabout) Go through the Interactive Walkabout for MARS.


due July 7

9) (Local methods for classification) Go through the R example I posted on the lecture supplements web page, pertaining to the example of p. 17 of HTF, and focus on the newly added parts pertaining to doing local methods for classification.

10) (Local methods for regression) Go through the R example I posted on the lecture supplements web page, pertaining to local regression and kernel regression.


due July 12

11) (C_p) Go through the two R demos I posted on the lecture supplements web page (under Lecture 9 heading), pertaining to C_p. (One of them is a modification of a demo previously used for local regression methods, but I've modified it to now include a study of using C_p. The 2nd demo is very similar to the first one, but I changed the way that I generated the data, which makes the results nicer for use with a C_p study.)

12) (Bagging) Read the rather short section on bagging in this book chapter that I wrote.

13) (Random Forests) Read through p. 52 in the RandomForests manual (and also read Ch. 7 if you have time). Note that p. 25 gives good information about how to read in Excel files. (My guess is that these guidelines apply to the other Salford Systems products as well.)


due July 19

14) (Boosting) Read the rather short section on boosting in this book chapter that I wrote.