CSI 771 / STAT 751: Computational Statistics
James Gentle
This course is about modern, computationally-intensive
methods in statistics.
It emphasizes the role of computation as
a fundamental tool of discovery in data analysis, of statistical
inference, and for development of statistical theory and methods.
What is computational statistics?
Here's some background.
Topics
- Monte Carlo studies in statistics
- Computational inference
- Data partitioning and resampling
- Numerical methods in statistics ("statistical computing")
- Nonparametric probability density estimation
- Statistical models and data fitting
Prerequsites
- a course in applied statistics such as
STAT 554
- a course in statistical inference such as
CSI 672 / STAT 652.
Text
The text for the course is
Elements of Computational Statistics.
There are some errata and other notes. Let me know of any other errors (including
minor typos) that you find.
You should look over the notation descriptions and definitions in Appendix
C beginning on page 363.
Grading
Student work in the course (and the relative weighting of this work
in the overall grade) will consist of
- a number of small assignments, problems, etc. (15)
- a
semester project to replicate and extend a published Monte
Carlo study (30)
Project will be graded on
- design and conduct of the study
- written report
- presentation
- an in-class midterm (25)
- a final exam consisting of an in-class component and a
take-home component (30)
Project
The project is first to identify an article in the scientific/statistical
literature that reports on a Monte Carlo study that was conducted
to evaluate a statistical method. Well over half of the articles
in the current statistical literature use such studies.
(There are many other uses of Monte Carlo methods, and many articles
that report on results that were obtained by Monte Carlo,
such as use of Monte Carlo to evaluate a complicated integral. There are
many other articles that discuss ways of using Monte Carlo.
Notice that the article chosen for the class project is to be one that
uses Monte Carlo methods to study, evaluate, and/or compare statistical
methods, such as statistical estimators or statistical tests;
it is not to be just any article that does something with Monte Carlo.)
After an article has been selected, the project is to design, conduct, and
report on a similar study. The Monte Carlo study in the project should
replicate at least a portion of the study reported in the chosen article.
The project should extend the Monte Carlo experiment reported to include
additional factors or additional levels of factors.
There are milestones for the various stages of the project during the
semester. The final report of the project must be posted on the web,
and must be presented orally in class.
Students' Webpages
You must have an account on a system that has a web server.
The CSI system is scs.gmu.edu. There are several other possibilities,
including the university systems mason.gmu.edu and osf1.gmu.edu, and
systems in IT&E. If you do not have an account yet, you can get one
on scs.gmu.edu by filling out a request form that you can get from
the SCS office in 103 Science & Technology I.
The scs.gmu.edu system requires a secure login (ssh) and secure ftp.
You can get information about the system and options for accessing
it at www.scs.gmu.edu/computing/
Each student will
prepare a Web page
for presentation of
the project and for some of the smaller assignments.
There are several programs that help you write html. I do not use any of these
but you may find them useful. You can also produce html output directly from
Microsoft Word. I do not use that for html either. (In fact, I use Word as
infrequently as possible.)
Writing
Clear communication is as important as any other aspect of scientific work.
In my opinion, the best way to produce scientific text documents is by
use of TeX, together with the add-on package LaTeX.
LaTeX has the capabilities of producing beautiful mathematical symbols
(even if the underlying mathematics is not beautiful!). Graphical
displays coded in PostScript files (as well as other codings) can
easily be incorporated. Cross-references, tables of contents, indexes,
and the other features of documents, large and small, can be produced
in LaTeX.
For computer-projector presentations and for dissemination of scientific
documents generally, Adobe's pdf format is best.
TeX and LaTeX are freely available.
Both html and LaTeX are mark-up languages, meaning that processors that
handle them accept plain text files that
include display or typesetting directives.
The best way of producing plain text files is to use either emacs or,
on MS Windows, Crimson Editor. Both are freely available.
Information about Crimson Editor,
including links for downloading, can be obtained at
http://www.crimsoneditor.com/
Software for Computational Statistics
The main analysis software used in the course will be S-Plus or R.
Information about R, including links for downloading, can be obtained at
http://www.r-project.org/