CSI 771 / STAT 751: Computational Statistics

James Gentle


This course is about modern, computationally-intensive methods in statistics. It emphasizes the role of computation as a fundamental tool of discovery in data analysis, of statistical inference, and for development of statistical theory and methods.

What is computational statistics? Here's some background.


Topics


Prerequsites


Grading

Student work in the course (and the relative weighting of this work in the overall grade) will consist of


Project

The project is first to identify an article in the scientific/statistical literature that reports on a Monte Carlo study that was conducted to evaluate a statistical method. Well over half of the articles in the current statistical literature use such studies. (There are many other uses of Monte Carlo methods, and many articles that report on results that were obtained by Monte Carlo, such as use of Monte Carlo to evaluate a complicated integral. There are many other articles that discuss ways of using Monte Carlo. Notice that the article chosen for the class project is to be one that uses Monte Carlo methods to study, evaluate, and/or compare statistical methods, such as statistical estimators or statistical tests; it is not to be just any article that does something with Monte Carlo.)

After an article has been selected, the project is to design, conduct, and report on a similar study. The Monte Carlo study in the project should replicate at least a portion of the study reported in the chosen article. The project should extend the Monte Carlo experiment reported to include additional factors or additional levels of factors.

There are milestones for the various stages of the project during the semester. The final report of the project must be posted on the web, and must be presented orally in class.


Writing

Clear communication is as important as any other aspect of scientific work.

In my opinion, the best way to produce scientific text documents is by use of TeX, together with the add-on package LaTeX. LaTeX has the capabilities of producing beautiful mathematical symbols (even if the underlying mathematics is not beautiful!). Graphical displays coded in PostScript files (as well as other codings) can easily be incorporated. Cross-references, tables of contents, indexes, and the other features of documents, large and small, can be produced in LaTeX.

For computer-projector presentations and for dissemination of scientific documents generally, Adobe's pdf format is best.

TeX and LaTeX are freely available.

Both html and LaTeX are mark-up languages, meaning that processors that handle them accept plain text files that include display or typesetting directives.

The best way of producing plain text files is to use either emacs or, on MS Windows, Crimson Editor. Both are freely available. Information about Crimson Editor, including links for downloading, can be obtained at http://www.crimsoneditor.com/


Software for Computational Statistics

The main analysis software used in the course will be S-Plus or R.
Information about R, including links for downloading, can be obtained at http://www.r-project.org/