Welcome to STAT 751
Computational Statistics
Fall, 2015
Instructor:
James Gentle
Lectures: Thursday, 4:30-7:10pm; Nguyen Engineering Building 1110
Some of the lectures will be based on notes posted on this
website. Some lectures will be accompanied only by notes written on the board.
This course is about modern, computationally-intensive
methods in statistics.
It emphasizes the role of computation as
a fundamental tool of discovery in data analysis, of statistical
inference, and for development of statistical theory and methods.
Topics
- Monte Carlo studies in statistics
- Numerical methods in statistics ("statistical computing")
- Computational inference
- Data partitioning and resampling
- Nonparametric probability density estimation
- Statistical models and data fitting
The general description of the course is available at
mason.gmu.edu/~jgentle/csi771/
Prerequisites:
a course in applied statistics such as
STAT 554
a course in statistical inference such as
CSI 672 / STAT 652.
Text:
Elements of Computational Statistics
ISBN 978-1441930248.
List of probability distributions.
Computational Software:
The main computational software that I use is R.
R is open source and is free. It is installed on some GMU computers, but
there are various binary executables available at
the main R website,
and it is best to load it on your own computer.
A good way to learn R is just to use it for
progressively more complicated problems. While there are many books on R, the
various PDF manuals that come with the installation (use "Help" on the GUI)
should be sufficient.
Document Development Software:
The main document development software that I use is TeX.
TeX is owned by the American Mathematical Society. It is free. There are various
implementations, and it is installed on some GMU computers. One version is
MiKTeX. It is available at
miktex.org,
and it is best to load it on your own computer.
There are many books on TeX, but
a good way to learn TeX is just to use it for
progressively more complicated writing projects.
Email Communication
The primary means of communication outside of class will be by email.
Students must use their Mason email accounts to receive important University
information, including messages related to this class.
(You may, of course, forward email from
your Mason email account to one that you check regularly.)
If you send email to the instructor,
please put "CSI 771" or "STAT 751" in the subject line.
Grading
Student work in the course (and the relative weighting of this work
in the overall grade) will consist of
- a number of small assignments, problems, etc. (20)
- a
semester project to replicate and extend a published Monte
Carlo study (25)
- an in-class midterm (25)
- an in-class final exam (30)
Homework
Each homework will be graded based on 100 points, and 5 points will be deducted
for each day that the homework is late.
Start each problem on a new sheet of paper and label it clearly.
Homework will not be accepted as computer files; it must be submitted on
paper.
Project
The course requires each student to complete a project that involves a
Monte Carlo study of a statistical method.
Project will be graded on
- design and conduct of the study
- written report
- presentation
Collaboration and Academic Integrity
Each student enrolled in this course must assume the
responsibilities of an active participant in GMU's scholarly
community in which everyone's academic work and behavior are
held to the highest standards of honesty. The GMU policy on
academic conduct will be followed in this course.
Collaborative work
Students are free to discuss homework problems or other topics
with each other or anyone else, and are
free to use any reference sources. Group work and discussion outside of
class is encouraged, but of course explicit copying of homework solutions
should not be done.
Make sure that work that is supposed to be yours is indeed your own.
With cut-and-paste capabilities on webpages, it is easy to plagiarize.
Sometimes it is even accidental, because it results from legitimate note-taking;
nevertheless, it is plagiarism and it is illegal.
Although the likelihood of "getting caught" should not influence your ethical
standards, you should be aware of the fact that web searches can often
identify plagiarism, and that there is even specialized software to facilitate
such searches. Whenever I encounter phrases in a student's work
that seem to be inconsistent with the usual language that the student uses,
I routinely search the web for documents containing those phrases.
Some good guidelines are here:
http://ori.dhhs.gov/education/products/plagiarism/
See especially the entry "26 Guidelines at a Glance".
Self-plagiarism
The definition of ``plagiarism'' applies to the ``work of others'',
so copying your own work does not fall within the scope of the crime of plagiarism.
Generally, of course, you are free to copy what you've written. I do this
all the time with class notes, for example. Whenever you reuse any material,
except for relatively brief background or supporting material, you should
reference your original source. In the case of my class notes that have not
appeared in formal publications, I do not include references to my earlier work.
Representing a rehash or restatement of earlier work as original work
is wrong. Such self-plagiarism becomes a breech of academic honor, for example,
when a paper submitted for credit in one instance is subsequently submitted
for credit in another instance.
Students with disabilities
Certification of a disability that requires accommodations must be
be made by the
Office of Disability Services (ODS).
If you are a student with a disability and desire academic accommodations,
please contact ODS and inform me during the first two week of classes.
All academic accommodations must be arranged through the ODS.
Lectures / assignments / exams schedule
Week 1, August 29
Course overview.
Brief introduction to R.
R functions.
Random number generation in R.
Saving graphics files in R.
Monte Carlo methods in statistics.
Random number generation.
I may not always post lecture slides (you're expected to attend class
and make notes as you see fit!),
but since this lecture included many little picky things about
R,
here they are.
Assignments:
(1) Read Appendix A in text (pages 337-350).
(2) Choose two articles in JASA from 2010 to the present
that report Monte Carlo studies
and write brief descriptions of them
(about a page for each), telling specifically what questions were studied by
Monte Carlo.
Week 2, September 10
Brief discussion of Monte Carlo methods in statistics and project.
Objectives and methods of computational statistics.
Use of the ECDF.
Statistical methods as optimization problems.
Lecture notes
Assignments: Read revised Chapter 1.
Work problems 1.3, 1.4, 1.11, 1.18, 1.20, and 1.26 in revised Chapter 1
to turn in (as hardcopies), October 1. (These problem numbers correspond to the
current version of the revised Chapter 1.)
Week 3, September 17
Computer arithmetic.
Optimization in statistics.
Methods of optimization.
Week 4, September 24
EM methods.
Linear transformations.
Assignment, due October 8: Prepare and write up your plan for your project.
This includes a brief description of the Monte Carlo study in the paper.
(What are the statistical methods being evaluated? What scenarios
were studied? What are the ``treatments'' (that is, methods) you will study?
What scenarios (that is, blocks in your experiment) will you study?)
I expect this write-up should be about 5 to 10 pages long.
Week 5, October 1
Matrix operations.
The QR decomposition.
See additional slides on Householder reflections.
Random number generation and Monte Carlo methods.
- Simulation of nonuniform probability distributions
- Monte Carlo estimation and other methods for statistical inference.
- Markov chains in Monte Carlo
Assignments
- Work problems 5.1, 5.2, 5.3, 5.4, 5.5, and 5.6 in Chapter 5.
- Work problems 2.1, 2.2, and 2.5 in Chapter 2.
- Work problems A and B.
Week 6, October 8
Review; discussion of homework and miscellaneous problems.
Chapter 1.
The .Machine variable in R
Chapter 2.
Chapter 5.
problems A and B.
Week 7, October 15
Midterm exam.
Closed book, closed notes, and closed computers except for one sheet (front and back) of
prewritten notes.
Week 8, October 22
Week 9, October 29
Week 10, November 5
- Structure in data; clustering and classification (continuation).
Week 11, November 12
Week 12, November 19
November 26
Class does not meet this week
Week 13, December 3
Project due
This should be a hardcopy document that identifies the article you
used, describes the problem studied, describes the design of your Monte Carlo study
and how it compares with the one in the article and what you did to extend the
study, summarizes the results of your
study and how they compare with those in the article, and states the conclusions of
your study.
Presentations of projects.
Week 14, December 10
Presentations of projects.
Review.
Class evaluations.
December 17
4:30pm - 7:15pm Final Exam.