Announcement of C. Sutton's Summer Course for 2005
STAT 789
Advanced Topics in Statistics: Computer-intensive Methods for Classification and Regression
- Time
- The class will meet 7:20-10:00 PM on Tuesdays and Thursdays, starting June 7 and ending with the final exam on
July 28. (Note: The printed schedule of classes incorrectly has the class starting at 7:10 PM.)
Because I do not wish to deal with Incompletes, it is important that students will be able to complete the work for
the course within the official time period.
- Prerequisite
- Permission of instructor, but essentially STAT 652 or STAT 656
(It would be great if students had graduate-level coursework in statistical inference,
regression, categorical data analysis, and multivariate statistics, but it is unreasonable for me to expect to fill
up a summer session class if I require all of this. But since the material covered by the course is rather advanced,
students should have taken at least one of STAT 652 and STAT 656, or else have sufficient
prior experience with modern methods
of regression, classification, machine learning, data mining, etc.)
People wishing to register for the course will have to obtain a permission slip from me and then register in person.
Due to the fact that students will present their projects during the last two class meetings prior to the final exam,
the class size is capped at 20 in order to keep the number of presentations manageable.
- Description
- This course will cover many methods of classification and regression; most of them being somewhat modern
computer-intensive methods, but a few classical methods will be covered as well. An emphasis will be placed on the
methods implemented by
Salford Systems with their CART, MARS, TreeNet, and RandomForests
software.
(Note: Students will be able to download, at no charge,
90 day trial versions of this software once I send Salford-Systems a class
list. (Please wait until I announce that you should download the software before attempting to do so.) We will also
use
Weka and
R, both of which can be downloaded for free.)
Lectures will be based on part of the material covered by the book
The Elements of Statistical Learning: Data Mining, Inference, and Prediction by Hastie, Tibshirani, and
Friedman, but other sources of material will be used, e.g., the book
Classification and Regression Trees by Breiman, Friedman, Olshen, and Stone,
a book chapter on Clasification and Regression Trees, Bagging, and Boosting which I wrote, and some journal
articles. A preliminary list (subject to change) of topics to be covered is given below.
- introduction to prediction and modeling in classification and regression settings --- comparing model-based methods to
locally-adaptive methods
- linear methods for regression, and regression modeling strategies
- linear methods for classification (including linear discriminant analysis and logistic regression)
- tree-based methods for classification and regression (CART)
- basis expansions, splines (linear splines, cubic splines, and MARS), and regularization
- kernel methods (local regression, density estimation and mixture models for classification, naive Bayes
classifiers)
- model assessment and selection (cross-validation, bootstrap methods, and other ways of evaluating model
complexity and prediction accuracy)
- perturb and combine methods and ensemble classifiers (bagging, boosting, arcing, and random forests),
forward stagewise additive modeling (TreeNet)
- neural networks, support vector machines, and prototype methods
- Grading
- I may tweak this slightly prior to the start of the class, but currently I plan to give equal weight to each of
the following:
- best 10 of 12 short quizzes
and class participation,
- presentation of project (oral and written),
- final exam.
Unlike other courses I teach where I only give about 40-50% of the class grades of A- and A, for this summer
course I may give an appreciably higher proportion of good grades if the performance of the class is decent. There
may be some students taking the class who enter it with a rather extensive background in some of the methods covered,
and if so, I will keep that in mind when assigning grades and make it so that hard-working students entering the
class with the minimum prerequisite have a decent chance to earn a good grade. The grading system I am using differs
from my usual system of giving 50% weight each to homework and a final exam. Because of the quizzes and class
participation, attendence will be important.
(Note: I will assign homework exercises, and we can discuss them in class, but I won't have students hand them
in for grading.)