reading STAT 789

Reading Guide

After a brief introduction, during the first class I'll begin discussing bootstrapping. Two articles that I want you to read (but not necessarily prior to the first lecture) are:

Statistical data analysis in the computer age
- Efron, B., and Tibshirani, R.
- Science, Vol. 253 (1991), pp. 390-395
A leisurely look at the bootstrap, the jackknife, and cross-validation
- Efron, B. and Gong, G.
- The American Statistician, Vol. 37 (1983), pp. 36-48

I tried to put a link to an e-journal version of the first article on this web page, but I couldn't get it to work. However, I think you can easily read this article (and can print it) by going to the JSTOR web site, and doing a search. (From the home page of the JSTOR site, click on SEARCH. Then go down to item 2 of that page and click on Expand the journal list, and check the box in front of the listing of the journal Science under the General Science heading. Now go down to item 3 of the same page and type 1991 into both of the Date Range boxes. Next, go back up to item 1 and type Efron into the author box (3rd box down) and computer age into the title box (4th box down). Finally, click the Search button, and the desired article should come up as the only one which matches the search constraints. Upon clicking the View Article link, you should see the text of the article.)

The second article doesn't appear to be available in e-journal format, so I'll plan to give you a photocopy at the first class meeting. Also, I'll give you some handouts of my lecture notes on bootstrapping. Since bootstrapping and jackknifing will be two of the topics covered this summer (with bootstrapping being a topic that will get more emphasis than most topics), some of you may want to buy or borrow a book that covers these subjects. If so, I strongly recommend

An Introduction to the Bootstrap,
- by Efron and Tibshirani
- (Chapman & Hall, 1993),
- (fairly easy to read, with an emphasis on understanding and applications, rather than theory).

For future reference, here are some other books on bootstrapping (I don't recommend that you read them for this course):

Data Analysis by Resampling: Concepts and Applications,
- C. Lunneborg,
- Duxbury (2000),
- (chock full of nonstandard notation/terminology, but includes information about software);
Bootstrap Methods: A Practitioner's Guide,
- M. Chernick,
- Wiley (1999),
- (provides a good summary of the relevant literature and contains a lot of references);
Bootstrap Methods and their Applications,
- Davison and Hinkley,
- Cambridge (1997),
- (gives more mathematical details than the other books, and more advanced and harder to read).
Computational Statistics Handbook with MATLAB,
- Martinez and Martinez (both GMU graduates),
- Chapman & Hall/CRC (2002),
- (Contains information on bootstrapping, jackknifing, and other computer-intensive statistical methods, along with appropriate MATLAB code. (I haven't had time to look at this book much, but I thought the MATLAB information may be of interest to some.))

A more advanced article on jackknifing, which I'll list here primarily for your further reference (and which is available through JSTOR), is:

The jackknife --- a review
- Ruppert G. Miller, Jr.
- Biometrika, Vol. 61 (1974), pp. 1-15

During the first 2 or 3 weeks of the summer session, I may also briefly discuss several topics from the first chapter of the Miller book. So, when you can find the time, you should read this chapter, and to also read my related comments about the Miller book. Among topics that I may address (see the June 10 announcement for further information) are:

the Shapiro-Francia test (p. 15 of Miller),
transformations and tests about the mean (pp. 16-18 of Miller),
trimmed means (pp. 29-31 of Miller),
dealing with correlated observations (pp. 34-36 of Miller).

During the 4th week of the course, and part of the 5th week, I plan to cover some material related to Chapter 3 of Miller's book.

I'll start with Sec. 3.3, but present some methods for dealing with heteroscedasticity different from the transformation approach that Miller describes.
Next, I will cover monotone alternatives (subsection 3.1.3 of Miller).
Then I'll turn attention to the random effects model, and consider point estimates and confidence intervals for the variance components, including a confidence interval based on jackknifing. (Pertinent pages from Miller's book are pages 95-98, the bottom portion of p. 100 and the top portion of p. 101, and pages 108 and 109.)

For the class on Thursday, July 11, I'll start by giving a brief description of multiple regression modelling, and then I'll explain tree-structured regression, using the last several overheads of my CART presentation. Then I'll cover the 6 pages of the class notes about trimmed means. I'll next start a presntation about robust regression. It would be good to read over my paper, A Comparison of Regression Methods, and also look at the small section of notes I gave you on M-estimators. (I don't think I'll have time to go through everything I have planned on this subject on 7/11, but I'll continue on 7/16. I think you'll find my presentation easier to follow if you read over the photocopied material first.)