Welcome to CSI 771 / STAT 751

Computational Statistics

Spring, 2007

Instructor: James Gentle

Class meets on Mondays from 7:20pm to 10:00pm in Innovation Hall, room 133.

Final exam is May 14 from 7:30pm to 10:15pm.

This course is about modern, computationally-intensive methods in statistics. It emphasizes the role of computation as a fundamental tool of discovery in data analysis, of statistical inference, and for development of statistical theory and methods.

The text for the course is Elements of Computational Statistics.

The general description of the course is available at mason.gmu.edu/~jgentle/csi771/

Project schedule

Lectures / assignments / exams schedule

January 22

Course overview; websites; software; brief introduction to R or S-Plus.
Monte Carlo studies in statistics.

Brief introduction to random number generation.
Simulation of stochastic data generating processes in R or S-Plus.

Preliminaries of computational statistics.

Monte Carlo studies in statistics.

Some introductory material on R.

Assignment: Read Section 2.1 (pages 39-53) and Appendix A and B (pages 337-362).
Make a web page for your project. Choose two articles in statistics literature that report Monte Carlo studies and write brief descriptions of them on your web page.

Two examples from the December 2006 issue of the Journal of the American Statistical Association are the one by Zou on the use of lasso in variable selection and the one by Mao on on a method of estimating the number of species in a population. There are several more articles in that issue that use Monte Carlo simulation to study statistical methods.

You can use articles from any peer-reviewed scientific journal. Many are available online; for example, the Journal of the American Statistical Association is available by going the GMU library home page, then to E-Journal Finder, then enter "Journal of the American Statistical Association" in the "Journal" box. Several options come up next. The first one, "ABI/INFORM Complete", works, although all you can do is print it.

A good source of pdf files of journal articles that are at least a few years old is JSTOR. Go to the GMU library home page, then to Databases, then select "J" and finally JSTOR. At that point, "Browse" and select "Statsitics". For example in the Sept, 2001, issue contains several articles that use Monte Carlo simulation studies. See the one beginning on p. 1088 by Bunzel et al.

January 29

Discussion of methods of statistical inference.
The role of optimization in statistical estimation: minimization of residuals; maximization of likelihood; etc.
The functional approach to statistical estimation.

Assignments: Read Chapter 1; work problems 1.2, 1.3, 1.7, and 1.9 to turn in (as hardcopies).
Put a brief description of your project on your web page. You will add to this description as the semester progresses.

February 5

Discussion of Monte Carlo studies; student presentations of descriptions of articles (first project milestone).
Continue discussion of some material from Chapter 1 on least squares, and methods of optimization.
EM examples.
Random number generation; methods to convert uniform variates to other random variables (inverse CDF; acceptance/rejection).

Assignment: Read Chapter 2; work problems 2.2, 2.4, and 2.7 to turn in Feb 19.

February 12

Discussion of projects (second project milestone)
Review acceptance/rejection, Markov chain methods.
Inference using Monte Carlo: Monte Carlo tests, and "parametric bootstrap".

Assignments: Work problems 2.8 note typo in solution on p. 379, 2.9, and 2.10 to turn in Feb 19.
Read Chapter 3.

February 19

Brief student presentations of the third project milestone.
Randomization and data partitioning.
Bootstrap methods.

Assignment: Read Chapter 4; work problems 3.6, 4.1, 4.5, 4.9 to discuss in class.

February 26

Measures of similiarity.
Assignment: Read Chapter 5; work problems 5.1, 5.2, 5.5, 5.8 to discuss in class.

March 5

Review material and work exercises from Chapters 1-5.

March 12 No class; GMU spring break.

March 19

In-class midterm exam. Covers material from Chapters 1-5.
This will be open book and open notes.
Assignment: Read Chapter 6.

March 26

Estimation of functions.
Nonparametric estimation of probability density functions.
The equation at the top of page 221 should be
k_{rs} = r/(2B(s+1,1/r))
You get this by making the change of variable x=t^r, and integrating over [0,1] and then doubling the integral.
A better way of writing the value is to use B(1/r,s+1) because that is the most obvious integral you see when you rewrite it. Of course, B(1/r,s+1)=B(s+1,1/r), so it doesn't matter.

Assignments: Work problems 6.6, 6.7, 6.9, 6.10 to turn in April 2.
Read Chapter 9; work problems 9.3, 9.5, 9.10, 9.11 to turn in April 9.

April 2

Discuss midterm.
Brief presentations of projects. This may provide feedback for the final reports.
More on nonparametric estimation of probability density functions.
Assignment: Read Chapter 10.

April 9

Return Chapter 6 homework; discuss.
Clustering and classification.
Assignments: Work problems 10.1, 10.5, 10.7, 10.12. Read Chapter 11; work problem 11.1.

April 16

Return Chapter 9 homework; discuss.
Continuation of clustering and classification.
Models of dependencies.
Simulated annealing.

April 23

Student presentations of projects (final project milestone).

April 30

Continuation of student presentations of projects as necessary.
More on classification.
Discuss exercises in Chapters 10 and 11.
Handout take-home portion of final exam.

May 14

7:30pm - 10:15pm. In-class final exam.
This will be closed book and closed notes.
Exam with solutions.

Other Resources

The most important WWW repository of statistical stuff (datasets, programs, general information, connection to other sites, etc.) is StatLib Index at Carnegie Mellon.

Another important repository for scientific computing is Netlib.