Computational Statistics
Spring, 2007
Class meets on Mondays from 7:20pm to 10:00pm in Innovation Hall, room 133.
Final exam is May 14 from 7:30pm to 10:15pm.
This course is about modern, computationally-intensive
methods in statistics.
It emphasizes the role of computation as
a fundamental tool of discovery in data analysis, of statistical
inference, and for development of statistical theory and methods.
The text for the course is
Elements of Computational Statistics.
The general description of the course is available at
mason.gmu.edu/~jgentle/csi771/
Lectures / assignments / exams schedule
January 22
Course overview;
websites; software;
brief introduction to R or S-Plus.
Monte Carlo studies in statistics.
Brief introduction to random number generation.
Simulation of stochastic data generating processes in R or S-Plus.
Preliminaries of computational statistics.
Monte Carlo studies in statistics.
Some introductory material on R.
Assignment: Read Section 2.1 (pages 39-53) and Appendix A and B
(pages 337-362).
Make a web page for your project.
Choose two articles in statistics literature that report Monte Carlo studies
and write brief descriptions of them on your web page.
Two examples from the December 2006 issue of the Journal of the
American Statistical Association are the one by Zou on the use
of lasso in variable selection and the one by
Mao on on a method of estimating the number of species in a population.
There
are several more articles in that issue that use Monte Carlo simulation
to study statistical methods.
You can use articles from any peer-reviewed scientific journal. Many are
available online; for example, the Journal of the
American Statistical Association is available by going the GMU
library home page, then to E-Journal Finder, then enter "Journal of the American
Statistical Association" in the "Journal" box. Several options come up
next. The first one, "ABI/INFORM Complete", works, although all you can
do is print it.
A good source of pdf files of journal articles that are at least a few
years old is JSTOR. Go to the GMU
library home page, then to Databases, then select "J" and finally JSTOR.
At that point, "Browse" and select "Statsitics". For example in the
Sept, 2001, issue contains several articles that use Monte Carlo simulation
studies. See the one beginning on p. 1088 by Bunzel et al.
January 29
Discussion of methods of statistical inference.
The
role of optimization in statistical estimation:
minimization of residuals;
maximization of likelihood; etc.
The functional approach to statistical estimation.
Assignments: Read Chapter 1; work problems 1.2, 1.3, 1.7, and 1.9 to
turn in (as hardcopies).
Comments/solutions.
Put a brief description of your project on your web page. You will
add to this description as the semester progresses.
February 5
Discussion of Monte Carlo studies; student presentations of
descriptions of articles (first project milestone).
Continue discussion of some material from Chapter 1 on least squares, and
methods of optimization.
EM examples.
Random number generation; methods to convert uniform variates to other
random variables (inverse CDF; acceptance/rejection).
Assignment: Read Chapter 2; work problems 2.2, 2.4, and 2.7 to
turn in Feb 19.
Comments/solutions.
February 12
Discussion of projects (second project milestone)
Review acceptance/rejection,
Markov chain methods.
Inference using Monte Carlo: Monte Carlo tests, and
"parametric bootstrap".
Assignments: Work problems 2.8 note typo in solution on p. 379,
2.9, and 2.10 to
turn in Feb 19.
Comments/solutions.
Read Chapter 3.
February 19
Brief student presentations of the third project milestone.
Randomization and data partitioning.
Bootstrap methods.
Assignment: Read Chapter 4; work problems
3.6, 4.1, 4.5, 4.9 to discuss in class.
Comments/solutions.
February 26
Measures of similiarity.
Transformations..
Outline.
Assignment: Read Chapter 5; work problems
5.1, 5.2, 5.5, 5.8 to discuss in class.
Comments/solutions.
March 5
Review material and work exercises from Chapters 1-5.
March 12 No class; GMU spring break.
March 19
In-class midterm exam.
Covers material from Chapters 1-5.
This will be open book and open notes.
Assignment: Read Chapter 6.
March 26
Estimation of functions.
Nonparametric estimation of probability density functions.
The equation at the top of page 221 should be
k_{rs} = r/(2B(s+1,1/r))
You get this by making the change of variable x=t^r, and integrating
over [0,1] and then doubling the integral.
A better way of writing the value is to use B(1/r,s+1) because that is
the most obvious integral you see when you rewrite it. Of course,
B(1/r,s+1)=B(s+1,1/r), so it doesn't matter.
Assignments: Work problems
6.6, 6.7, 6.9, 6.10 to turn in April 2.
Read Chapter 9; work problems
9.3, 9.5, 9.10, 9.11 to turn in April 9.
Comments/solutions.
April 2
Discuss midterm.
Brief presentations of projects. This may provide feedback for the
final reports.
More on nonparametric estimation of probability density functions.
Assignment: Read Chapter 10.
April 9
Return Chapter 6 homework;
discuss.
Clustering and classification.
Assignments: Work problems
10.1, 10.5, 10.7, 10.12. Read Chapter 11; work problem
11.1.
April 16
Return Chapter 9 homework;
discuss.
Continuation of clustering and classification.
Models of dependencies.
Simulated annealing.
April 23
Student presentations of projects (final project milestone).
April 30
Continuation of student presentations of projects as necessary.
More on classification.
Discuss exercises in Chapters 10 and 11.
Review.
Handout take-home portion of final exam.
May 14
7:30pm - 10:15pm. In-class final exam.
This will be closed book and closed notes.
Exam with solutions.
Other Resources
The most important WWW repository of statistical stuff (datasets,
programs,
general information, connection to other sites, etc.) is
StatLib Index at Carnegie
Mellon.
Another important repository for scientific computing is
Netlib.