CSI 771 / STAT 751: Computational Statistics
James Gentle
This course is about modern, computationally-intensive
methods in statistics.
It emphasizes the role of computation as
a fundamental tool of discovery in data analysis, of statistical
inference, and for development of statistical theory and methods.
What is computational statistics?
Here's some background.
Topics
- Monte Carlo studies in statistics
- Numerical methods in statistics ("statistical computing")
- Computational inference
- Data partitioning and resampling
- Nonparametric probability density estimation
- Statistical models and data fitting
Prerequsites
- a course in applied statistics such as
STAT 554
- a course in statistical inference such as
CSI 672 / STAT 652.
Grading
Student work in the course (and the relative weighting of this work
in the overall grade) will consist of
- a number of small assignments, problems, etc. (15)
- a
semester project to replicate and extend a published Monte
Carlo study (30)
Project will be graded on
- design and conduct of the study
- written report
- presentation
- an in-class midterm, plus possibly a take-home component (25)
- an in-class final exam, plus possibly a take-home component (30)
Project
The project is first to identify an article in the scientific/statistical
literature that reports on a Monte Carlo study that was conducted
to evaluate a statistical method. Well over half of the articles
in the current statistical literature use such studies.
(There are many other uses of Monte Carlo methods, and many articles
that report on results that were obtained by Monte Carlo,
such as use of Monte Carlo to evaluate a complicated integral. There are
many other articles that discuss ways of using Monte Carlo.
Notice that the article chosen for the class project is to be one that
uses Monte Carlo methods to study, evaluate, and/or compare statistical
methods, such as statistical estimators or statistical tests;
it is not to be just any article that does something with Monte Carlo.)
After an article has been selected, the project is to design, conduct, and
report on a similar study. The Monte Carlo study in the project should
replicate at least a portion of the study reported in the chosen article.
The project should extend the Monte Carlo experiment reported to include
additional factors or additional levels of factors.
There are milestones for the various stages of the project during the
semester. The final report of the project must be posted on the web,
and must be presented orally in class.
Writing
Clear communication is as important as any other aspect of scientific work.
In my opinion, the best way to produce scientific text documents is by
use of TeX, together with the add-on package LaTeX.
LaTeX has the capabilities of producing beautiful mathematical symbols
(even if the underlying mathematics is not beautiful!). Graphical
displays coded in PostScript files (as well as other codings) can
easily be incorporated. Cross-references, tables of contents, indexes,
and the other features of documents, large and small, can be produced
in LaTeX.
For computer-projector presentations and for dissemination of scientific
documents generally, Adobe's pdf format is best.
TeX and LaTeX are freely available.
Both html and LaTeX are mark-up languages, meaning that processors that
handle them accept plain text files that
include display or typesetting directives.
The best way of producing plain text files is to use either emacs or,
on MS Windows, Crimson Editor. Both are freely available.
Information about Crimson Editor,
including links for downloading, can be obtained at
http://www.crimsoneditor.com/
Software for Computational Statistics
The main analysis software used in the course will be S-Plus or R.
Information about R, including links for downloading, can be obtained at
http://www.r-project.org/
-
Fall, 2013
-
Fall, 2011
-
Fall, 2009
-
Spring, 2007
-
Fall, 2005
-
Fall, 2002
-
Fall, 2000
-
Fall, 1998
-
Fall, 1996
-
Fall, 1994