Welcome to CSI 771 / STAT 751

Computational Statistics

Fall, 2005

Instructor: James Gentle

Class meets on Mondays from 7:20pm to 10:00pm in Innovation Hall, room 131.

Final exam is December 12 from 7:30pm to 10:15pm.

This course is about modern, computationally-intensive methods in statistics. It emphasizes the role of computation as a fundamental tool of discovery in data analysis, of statistical inference, and for development of statistical theory and methods.

The text for the course is Elements of Computational Statistics.

The general description of the course is available at mason.gmu.edu/~jgentle/csi771/

Project schedule

Lectures / assignments / exams schedule

August 29

Course overview; websites; software; brief introduction to R or S-Plus.
Monte Carlo studies in statistics.

Brief introduction to random number generation.
Simulation of stochastic data generating processes in R or S-Plus.

Assignment: Read Section 2.1 (pages 39-53) and Appendix A and B (pages 337-362).
Make a web page for your project. Choose two articles in statistics literature that report Monte Carlo studies and write brief descriptions of them on your web page.

Two examples from the March 2005 issue of the Journal of the American Statistical Association are the one by Romano and Wolf on problems of multiple hypothesis testing and the one by Lahiri and Larsen on regression with linked data. There are several more articles in that issue that use Monte Carlo simulation to study statistical methods.

You can use articles from any peer-reviewed scientific journal. Many are available online; for example, the Journal of the American Statistical Association is available by going the GMU library home page, then to E-Journal Finder, then enter "American Statistical Association" in the "keyword" box. Several options come up next. The first one, "ABI/INFORM Complete", works. (Thanks to Pragyansmita Nayak for pointing this out!)

September 5

Labor Day; no classes.

September 12

Discussion of Monte Carlo studies; student presentations of descriptions of articles (first project milestone).
Discussion of methods of statistical inference.
The role of optimization in statistical estimation: minimization of residuals; maximization of likelihood; etc.
The functional approach to statistical estimation.

Assignments: Read Chapter 1; work problems 1.2, 1.3, 1.7, and 1.9 to turn in (as hardcopies), Sept 19.
Put a brief description of your project on your web page. You will add to this description as the semester progresses.

September 19

Discussion of projects if necessary (second project milestone)
Continue discussion of some material from Chapter 1 on least squares, and methods of optimization.
EM examples.
Random number generation; methods to convert uniform variates to other random variables (inverse CDF; acceptance/rejection).

Assignment: Read Chapter 2; work problems 2.2, 2.4, and 2.7 to turn in, Sept 26.

September 26

Brief general discussion of projects (third project milestone).
Review acceptance/rejection, Markov chain methods.
Inference using Monte Carlo: Monte Carlo tests, and "parametric bootstrap".

Assignments: Work problems 2.8 note typo in solution on p. 379, 2.9, and 2.10 to turn in, Oct 3. Read Chapter 3.

October 3

Brief student presentations of the third project milestone.
Randomization and data partitioning.
Bootstrap methods.
Assignment: Read Chapter 4; work problems 3.6, 4.1, 4.5, 4.9 to turn in, Oct 11.

October 11 (Tuesday)

Measures of similiarity.
Assignment: Read Chapter 5; work problems 5.1, 5.2, 5.5, 5.8 to turn in, Oct 17.

October 17


October 24

In-class midterm exam. Covers material from Chapters 1-5.
This will be open book and open notes.

October 31

Discuss midterm.
Estimation of functions.
Assignment: Read Chapter 6; work problems 6.6, 6.7, 6.9, 6.10 to turn in (due November 7).

November 7

Nonparametric estimation of probability density functions.
The equation at the top of page 221 should be
k_{rs} = r/(2B(s+1,1/r))
You get this by making the change of variable x=t^r, and integrating over [0,1] and then doubling the integral.
A better way of writing the value is to use B(1/r,s+1) because that is the most obvious integral you see when you rewrite it. Of course, B(1/r,s+1)=B(s+1,1/r), so it doesn't matter.

Assignments: Read Chapter 9; work problems 9.3, 9.5, 9.10, 9.11 to turn in (due Nov 14 ** can turn in Nov 21**).

November 14

Brief presentations of projects. This may provide feedback for the final reports.
More on nonparametric estimation of probability density functions.
Assignment: Read Chapter 10.

November 21

Clustering and classification.
Models of dependencies.
Assignments: Work problems 10.1, 10.5, 10.7, 10.12 to turn in. Read Chapter 11; work problem 11.1 to turn in (due Nov 28).

November 28

Student presentations of projects (final project milestone).

December 5

Continuation of student presentations of projects as necessary.
More on classification. Review.
Handout take-home portion of final exam.

December 12

7:30pm - 10:15pm. In-class final exam.
This will be closed book and closed notes.

Other Resources

The most important WWW repository of statistical stuff (datasets, programs, general information, connection to other sites, etc.) is StatLib Index at Carnegie Mellon.

Another important repository for scientific computing is Netlib.

Student Webpages