homework, STAT 789, Summer 2005
Homework
Homework exercises will be assigned throughout the summer session. These exercises will be discussed in class, but
will not be submitted for grading.
The due dates given below refer to the date of the class discussion.
due June 9
1) (Introduction to R)
Use R to execute all of the commands in
the command file that I started discussing in class on June 7.
due June 14
2) (Bayes classifier) Suppose that for class 0 X has a uniform (0, 1) distribution, and for
class 1 X has a beta distribution having pdf
f(x) = 2x I(0, 1)(x).
Assuming that cases to be classified are just as likely to
come from class 1 as they are class 0, obtain the Bayes classifier and give its error rate.
Also give the Bayes classifier and its error rate if cases to be classified are twice as likely to come from class 1
as they are class 0.
due June 23
3) (Sampling distribution of nearest neighbor)
Do Ex. 2.3 in HTF (which is to derive equation (2.24)).
4) (CART Walkabout) Go through the
Interactive Walkabout for CART. (It takes 30
minutes or so to go through it
carefully.) Make a note of any questions that you may have about it, and I will try to answer
some of your questions in class. (Note: It seems to me that the Walkabout may pertain to a slightly
older version of CART. I don't think it's necessary to specify so much information about the target
variable for classification and other categorical veriables --- I think that the latest
version of CART figures out that information on
it's own.) (I wonder if the variable importance values given by CART by would be useful for selecting
predictor variables to be used in k-nearest-neighbors classification --- one wouldn't want to let weak predictors
contribute the same as strong ones in determining the distances, and so some way of omitting weak predictors would be
good.)
5) (Tactics for multiple regression when there are a relatively large number (compared to the sample size) of
explanatory variables) Go through the R example I have pertaining to the lasso, ridge regression, and
principal components
rregression, and also go
through the R example I have pertaining to the example introduced on p. 17 of HTF.
Make a note of any questions that you may have about them, and I will try to answer
some of your questions in class.
due June 28
6) (Linear regression spline)
Go through the R example I posted on the lecture supplements web page, but without looking at it, use R to fit a
linear regression spline for the data used in the example (using knots at 30, 60, and 90), and use the software output
to produce a piecewise expression for E(Y|x), with each piece being of the form ax + b.
7) (Cubic splines for regression modeling) Go through the R example I have pertaining to the use of natural cubic
splines in a multiple regression
setting.
(Here is some information about the data used in
the example.)
due June 30
8) (MARS Walkabout) Go through the
Interactive Walkabout for MARS.
due July 7
9) (Local methods for classification)
Go through the R example I posted on the lecture supplements web page, pertaining to the example of p. 17 of
HTF, and focus on the newly added parts pertaining to doing local methods for classification.
10) (Local methods for regression)
Go through the R example I posted on the lecture supplements web page, pertaining to local regression and
kernel regression.
due July 12
11) (C_p)
Go through the two R demos I posted on the
lecture supplements web page
(under Lecture 9 heading), pertaining
to C_p. (One of them is a modification of a demo previously used for local regression methods, but I've modified it
to now include a study of using C_p. The 2nd demo is very similar to the first one, but I changed the way that I
generated the data, which makes the results nicer for use with a C_p study.)
12) (Bagging)
Read the rather short section on bagging in this
book chapter that I wrote.
13) (Random Forests)
Read through p. 52 in the RandomForests manual (and also read Ch. 7 if you have time). Note that p. 25 gives
good information about how to read in Excel files. (My guess is that these guidelines apply to the other Salford
Systems products as well.)
due July 19
14) (Boosting)
Read the rather short section on boosting in this
book chapter that I wrote.