Computational Statistics
Fall, 2005
Class meets on Mondays from 7:20pm to 10:00pm in Innovation Hall, room 131.
Final exam is December 12 from 7:30pm to 10:15pm.
This course is about modern, computationally-intensive
methods in statistics.
It emphasizes the role of computation as
a fundamental tool of discovery in data analysis, of statistical
inference, and for development of statistical theory and methods.
The text for the course is
Elements of Computational Statistics.
The general description of the course is available at
mason.gmu.edu/~jgentle/csi771/
Lectures / assignments / exams schedule
August 29
Course overview;
websites; software;
brief introduction to R or S-Plus.
Monte Carlo studies in statistics.
Brief introduction to random number generation.
Simulation of stochastic data generating processes in R or S-Plus.
Assignment: Read Section 2.1 (pages 39-53) and Appendix A and B
(pages 337-362).
Make a web page for your project.
Choose two articles in statistics literature that report Monte Carlo studies
and write brief descriptions of them on your web page.
Two examples from the March 2005 issue of the Journal of the
American Statistical Association are the one by Romano and Wolf
on problems of multiple hypothesis testing and the one by
Lahiri and Larsen on regression with linked data.
There
are several more articles in that issue that use Monte Carlo simulation
to study statistical methods.
You can use articles from any peer-reviewed scientific journal. Many are
available online; for example, the Journal of the
American Statistical Association is available by going the GMU
library home page, then to E-Journal Finder, then enter "American
Statistical Association" in the "keyword" box. Several options come up
next. The first one, "ABI/INFORM Complete", works.
(Thanks to Pragyansmita Nayak for pointing this out!)
September 5
Labor Day; no classes.
September 12
Discussion of Monte Carlo studies; student presentations of
descriptions of articles (first project milestone).
Discussion of methods of statistical inference.
The
role of optimization in statistical estimation:
minimization of residuals;
maximization of likelihood; etc.
The functional approach to statistical estimation.
Assignments: Read Chapter 1; work problems 1.2, 1.3, 1.7, and 1.9 to
turn in (as hardcopies), Sept 19.
Comments/solutions.
Put a brief description of your project on your web page. You will
add to this description as the semester progresses.
September 19
Discussion of projects if necessary (second project milestone)
Continue discussion of some material from Chapter 1 on least squares, and
methods of optimization.
EM examples.
Random number generation; methods to convert uniform variates to other
random variables (inverse CDF; acceptance/rejection).
Assignment: Read Chapter 2; work problems 2.2, 2.4, and 2.7 to
turn in, Sept 26.
Comments/solutions.
September 26
Brief general discussion of projects (third project milestone).
Review acceptance/rejection,
Markov chain methods.
Inference using Monte Carlo: Monte Carlo tests, and
"parametric bootstrap".
Assignments: Work problems 2.8 note typo in solution on p. 379,
2.9, and 2.10 to
turn in, Oct 3. Read Chapter 3.
Comments/solutions.
October 3
Brief student presentations of the third project milestone.
Randomization and data partitioning.
Bootstrap methods.
Outline.
Assignment: Read Chapter 4; work problems
3.6, 4.1, 4.5, 4.9 to turn in, Oct 11.
Comments/solutions.
October 11 (Tuesday)
Measures of similiarity.
Transformations..
Outline.
Assignment: Read Chapter 5; work problems
5.1, 5.2, 5.5, 5.8 to turn in, Oct 17.
Comments/solutions.
October 17
Review
October 24
In-class midterm exam.
Covers material from Chapters 1-5.
This will be open book and open notes.
October 31
Discuss midterm.
Estimation of functions.
Assignment: Read Chapter 6; work problems
6.6, 6.7, 6.9, 6.10 to turn in (due November 7).
Comments/solutions.
November 7
Nonparametric estimation of probability density functions.
The equation at the top of page 221 should be
k_{rs} = r/(2B(s+1,1/r))
You get this by making the change of variable x=t^r, and integrating
over [0,1] and then doubling the integral.
A better way of writing the value is to use B(1/r,s+1) because that is
the most obvious integral you see when you rewrite it. Of course,
B(1/r,s+1)=B(s+1,1/r), so it doesn't matter.
Assignments: Read Chapter 9; work problems
9.3, 9.5, 9.10, 9.11 to turn in (due Nov 14 ** can turn in Nov 21**).
Comments/solutions.
November 14
Brief presentations of projects. This may provide feedback for the
final reports.
More on nonparametric estimation of probability density functions.
Assignment: Read Chapter 10.
November 21
Clustering and classification.
Models of dependencies.
Assignments: Work problems
10.1, 10.5, 10.7, 10.12 to turn in. Read Chapter 11; work problem
11.1 to turn in (due Nov 28).
November 28
Student presentations of projects (final project milestone).
December 5
Continuation of student presentations of projects as necessary.
More on classification. Review.
Handout take-home portion of final exam.
December 12
7:30pm - 10:15pm. In-class final exam.
This will be closed book and closed notes.
Other Resources
The most important WWW repository of statistical stuff (datasets,
programs,
general information, connection to other sites, etc.) is
StatLib Index at Carnegie
Mellon.
Another important repository for scientific computing is
Netlib.