George Mason University
School of Information Technology and Engineering
Department of Applied and Engineering Statistics


STAT 789

Advanced Topics in Statistics: Computer-intensive Methods for Classification and Regression

Summer Session, 2005
Tuesdays and Thursdays from 7:20 to 10:00 PM (starting June 7, other dates given below)

Location: room 207 of Innovation Hall

Instructor: Clifton D. Sutton

Contact Information (phone, fax, e-mail, etc.)
Office Hours: 6:00-7:00 & 10:00-10:30 PM on class nights


Texts:


Prerequisite:

permission of instructor (it would be great if students had graduate-level coursework in statistical inference, regression, categorical data analysis, and multivariate statistics, but it is unreasonable for me to expect to fill up a summer session class if I require all of this --- therefore, I will try to present the material in such a way that only a course such as STAT 554 is required); students need to have access to a computer on which they can download and install software


Description:

This course will cover many methods of classification and regression; most of them being somewhat modern computer-intensive methods, but a few classical methods will be covered as well. An emphasis will be placed on the methods implemented by Salford Systems with their CART, MARS, TreeNet, and RandomForests software. (Note: Students will be able to download, at no charge, 90 day trial versions of this software once I send Salford-Systems a class list. (Please wait until I announce that you should download the software before attempting to do so.) We will also use Weka and R (both of which can be downloaded for free).


Approximate week-by-week content:

[1] Tu June 7:
introduction to prediction and modeling in classification and regression settings --- comparing model-based methods to locally-adaptive methods
[2] Th June 9:
linear methods for regression, and regression modeling strategies
[3] Tu June 14:
linear methods for classification (including linear discriminant analysis and logistic regression)
[4] Th June 16:
tree-based methods for classification and regression (CART)
[5] Tu June 21:
more on CART
[6] Th June 23:
basis expansions, splines (linear splines, cubic splines, and MARS), and regularization
[7] Tu June 28:
more on MARS
[8] Th June 30:
kernel methods (local regression, density estimation and mixture models for classification, naive Bayes classifiers)
[**] Tu July 5:
(No class due to 4th of July holiday break
[9] Th July 7:
model assessment and selection (cross-validation, bootstrap methods, and other ways of evaluating model complexity and prediction accuracy)
[10] Tu July 12:
perturb and combine methods and ensemble classifiers (bagging, boosting, arcing, and random forests)
[11] Th July 14:
forward stagewise additive modeling (TreeNet)
[12] Tu July 19:
neural networks, support vector machines, and prototype methods
[13] Th July 21:
presentation of regression projects, review for exam
[14] Tu July 26:
presentation of classification projects, review for exam
[**] Th July 28:
Final Exam (note: exam period is from 7:30 to 10:15 PM)

Grading:


Additonal Comments: