Announcement of C. Sutton's Summer Course for 2005

STAT 789

Advanced Topics in Statistics: Computer-intensive Methods for Classification and Regression


Time
The class will meet 7:20-10:00 PM on Tuesdays and Thursdays, starting June 7 and ending with the final exam on July 28. (Note: The printed schedule of classes incorrectly has the class starting at 7:10 PM.) Because I do not wish to deal with Incompletes, it is important that students will be able to complete the work for the course within the official time period.
Prerequisite
Permission of instructor, but essentially STAT 652 or STAT 656 (It would be great if students had graduate-level coursework in statistical inference, regression, categorical data analysis, and multivariate statistics, but it is unreasonable for me to expect to fill up a summer session class if I require all of this. But since the material covered by the course is rather advanced, students should have taken at least one of STAT 652 and STAT 656, or else have sufficient prior experience with modern methods of regression, classification, machine learning, data mining, etc.) People wishing to register for the course will have to obtain a permission slip from me and then register in person. Due to the fact that students will present their projects during the last two class meetings prior to the final exam, the class size is capped at 20 in order to keep the number of presentations manageable.
Description
This course will cover many methods of classification and regression; most of them being somewhat modern computer-intensive methods, but a few classical methods will be covered as well. An emphasis will be placed on the methods implemented by Salford Systems with their CART, MARS, TreeNet, and RandomForests software. (Note: Students will be able to download, at no charge, 90 day trial versions of this software once I send Salford-Systems a class list. (Please wait until I announce that you should download the software before attempting to do so.) We will also use Weka and R, both of which can be downloaded for free.) Lectures will be based on part of the material covered by the book The Elements of Statistical Learning: Data Mining, Inference, and Prediction by Hastie, Tibshirani, and Friedman, but other sources of material will be used, e.g., the book Classification and Regression Trees by Breiman, Friedman, Olshen, and Stone, a book chapter on Clasification and Regression Trees, Bagging, and Boosting which I wrote, and some journal articles. A preliminary list (subject to change) of topics to be covered is given below.
Grading
I may tweak this slightly prior to the start of the class, but currently I plan to give equal weight to each of the following: Unlike other courses I teach where I only give about 40-50% of the class grades of A- and A, for this summer course I may give an appreciably higher proportion of good grades if the performance of the class is decent. There may be some students taking the class who enter it with a rather extensive background in some of the methods covered, and if so, I will keep that in mind when assigning grades and make it so that hard-working students entering the class with the minimum prerequisite have a decent chance to earn a good grade. The grading system I am using differs from my usual system of giving 50% weight each to homework and a final exam. Because of the quizzes and class participation, attendence will be important. (Note: I will assign homework exercises, and we can discuss them in class, but I won't have students hand them in for grading.)