George Mason University
School of Information Technology and Engineering
Department of Applied and Engineering Statistics
STAT 789
Advanced Topics in Statistics: Computer-intensive Methods for Classification and Regression
Summer Session, 2005
Tuesdays and Thursdays from 7:20 to 10:00 PM (starting June 7, other dates given below)
Location: room 207 of
Innovation Hall
Contact Information (phone, fax,
e-mail, etc.)
Office Hours: 6:00-7:00 & 10:00-10:30 PM
on class nights
Texts:
Prerequisite:
permission of instructor
(it would be great if students had graduate-level coursework in statistical inference,
regression, categorical data analysis, and multivariate statistics, but it is unreasonable for me to expect to fill
up a summer session class if I require all of this --- therefore, I will try to present the material in such a way that
only a course such as STAT 554 is required); students need to have access to a computer on which they can download and
install software
Description:
This course will cover many methods of classification and regression; most of them being somewhat modern
computer-intensive methods, but a few classical methods will be covered as well. An emphasis will be placed on the
methods implemented by
Salford Systems with their CART, MARS, TreeNet, and RandomForests
software.
(Note: Students will be able to download, at no charge,
90 day trial versions of this software once I send Salford-Systems a class
list. (Please wait until I announce that you should download the software before attempting to do so.) We will also
use
Weka and
R (both of which can be downloaded for free).
Approximate week-by-week content:
- [1] Tu June 7:
-
introduction to prediction and modeling in classification and regression settings --- comparing model-based methods to
locally-adaptive methods
- [2] Th June 9:
-
linear methods for regression, and regression modeling strategies
- [3] Tu June 14:
- linear methods for classification (including linear discriminant analysis and logistic regression)
- [4] Th June 16:
- tree-based methods for classification and regression (CART)
- [5] Tu June 21:
- more on CART
- [6] Th June 23:
- basis expansions, splines (linear splines, cubic splines, and MARS), and regularization
- [7] Tu June 28:
- more on MARS
- [8] Th June 30:
- kernel methods (local regression, density estimation and mixture models for classification, naive Bayes
classifiers)
- [**] Tu July 5:
- (No class due to
4th of July holiday break
- [9] Th July 7:
- model assessment and selection (cross-validation, bootstrap methods, and other ways of evaluating model
complexity and prediction accuracy)
- [10] Tu July 12:
- perturb and combine methods and ensemble classifiers (bagging, boosting, arcing, and random forests)
- [11] Th July 14:
- forward stagewise additive modeling (TreeNet)
- [12] Tu July 19:
- neural networks, support vector machines, and prototype methods
- [13] Th July 21:
- presentation of regression projects, review for exam
- [14] Tu July 26:
- presentation of classification projects, review for exam
- [**] Th July 28:
- Final Exam (note: exam period is
from 7:30 to 10:15 PM)
Grading:
- 10% for useful participation (which is not the same thing as talking)
- 40% for presentation (oral and written) of project (which will be a thorough analysis of a data set that you
choose (but I agree on), using several of the methods emphasized in class)
- 50% for open
book (and notes) final exam
Additonal Comments:
- Put STAT 789 in the subject line when you send me e-mail
(due to spam, I sometimes delete messages without reading them, based
on the subject line).
- Be sure to note that there is not a class meeting scheduled for July 5 (due to 4th of July holiday
break). However, if any class meetings are canceled prior to July 5 (perhaps due to power outage),
it could be that July 5 will be used to make up for the missed class.
- I can possibly
make arrangements to meet with you outside of my
scheduled hours; however,
on Tuesdays and Thursdays I do not like to be
bothered from 7:00 to 7:17
and on Fridays I'm tied up with other activities for most of the day.
- Please do not leave long messages on my voice-mail,
and since I often don't get around to returning calls until the evening,
you should state what time you plan to go to sleep. Always leave your
phone number, speaking slowly, even though you might have
given it to me previously. I find it better to communicate with people
in person or via e-mail --- phone tag is frustrating and sometimes the
GMU voice-mail system doesn't work the way it is supposed to.
- You are expected to familiarize yourself with the
George Mason University honor code and abide by it.
In particular, it will be considered to be a violation of the honor code if you
give or
receive aid on the final exam.
- You are expected to take the final exam during the
designated time slot; Incompletes will
not be granted except under very unusual circumstances.
- Please abide by the university policy that cell phone ringers be
turned off while class is in session.
- Note that eating and drinking in electronic classrooms is not allowed --- please do not
violate the rules. (We will have a 10 minute break during each lecture period.)
- Any class meetings canceled by the university due to
snow, sleet, power outage, bombing,
etc. will be made up if possible (at a time to be agreed upon by me and
as many members of the class as is possible if the univeristy doesn't
specify a particular make-up date). With regard to bad weather, I will
plan to teach class if the university is open and not teach it if the
university is closed. So instead of calling me to find out if I plan
to have class, just find out if the university is open or closed.
- Caveat: The schedule and procedures described here for this course are subject to change (and it is the responsibility of
students to attend all class meetings and keep themselves informed of
any changes).