## CSI 771 / STAT 751: Computational Statistics

James Gentle

This course is about modern, computationally-intensive methods in statistics. It emphasizes the role of computation as a fundamental tool of discovery in data analysis, of statistical inference, and for development of statistical theory and methods.

What is computational statistics? Here's some background.

#### Topics

• Monte Carlo studies in statistics
• Numerical methods in statistics ("statistical computing")
• Computational inference
• Data partitioning and resampling
• Nonparametric probability density estimation
• Statistical models and data fitting

#### Prerequsites

• a course in applied statistics such as STAT 554
• a course in statistical inference such as CSI 672 / STAT 652.

Student work in the course (and the relative weighting of this work in the overall grade) will consist of

• a number of small assignments, problems, etc. (15)
• a semester project to replicate and extend a published Monte Carlo study (30)
• design and conduct of the study
• written report
• presentation
• an in-class midterm, plus possibly a take-home component (25)
• an in-class final exam, plus possibly a take-home component (30)

#### Project

The project is first to identify an article in the scientific/statistical literature that reports on a Monte Carlo study that was conducted to evaluate a statistical method. Well over half of the articles in the current statistical literature use such studies. (There are many other uses of Monte Carlo methods, and many articles that report on results that were obtained by Monte Carlo, such as use of Monte Carlo to evaluate a complicated integral. There are many other articles that discuss ways of using Monte Carlo. Notice that the article chosen for the class project is to be one that uses Monte Carlo methods to study, evaluate, and/or compare statistical methods, such as statistical estimators or statistical tests; it is not to be just any article that does something with Monte Carlo.)

After an article has been selected, the project is to design, conduct, and report on a similar study. The Monte Carlo study in the project should replicate at least a portion of the study reported in the chosen article. The project should extend the Monte Carlo experiment reported to include additional factors or additional levels of factors.

There are milestones for the various stages of the project during the semester. The final report of the project must be posted on the web, and must be presented orally in class.

#### Writing

Clear communication is as important as any other aspect of scientific work.

In my opinion, the best way to produce scientific text documents is by use of TeX, together with the add-on package LaTeX. LaTeX has the capabilities of producing beautiful mathematical symbols (even if the underlying mathematics is not beautiful!). Graphical displays coded in PostScript files (as well as other codings) can easily be incorporated. Cross-references, tables of contents, indexes, and the other features of documents, large and small, can be produced in LaTeX.

For computer-projector presentations and for dissemination of scientific documents generally, Adobe's pdf format is best.

TeX and LaTeX are freely available.

Both html and LaTeX are mark-up languages, meaning that processors that handle them accept plain text files that include display or typesetting directives.

The best way of producing plain text files is to use either emacs or, on MS Windows, Crimson Editor. Both are freely available. Information about Crimson Editor, including links for downloading, can be obtained at http://www.crimsoneditor.com/

#### Software for Computational Statistics

The main analysis software used in the course will be S-Plus or R.