Information Pertaining to Final Exam


Extra Office Hours

I'll hold the following extra office hours prior to the exam: I could also make myself available in the late afternoon and/or early evening on Saturday, July 20, if you want to stop by my office in Sci-Tech 2 with questions (but you should set up an appointment for a specific time on Saturday via e-mail prior to 2 PM on Friday, July 19).

I won't have office hours on Thursday, July 18.

Hints About the 3 Big Problems

The exam is worth 25 points. There will be three 5 point problems, and I'll count your best two of three scores on these (and so 40% of the weight of the final (10% of the weight of your overall course average) will come from these three problems).

One problem will deal with the first three sections of the article by Efron and Gong, with a focus on Tables 1 and 2.

One problem will be pertain to the Satterthwaite approximation. Rather than have you do anything real messy for the exam, I'll construct a somewhat artifical situation, but in order for you to supply the answers I want, you should understand the derivation of the df formulas for the various Satterthwaite approximations that we've dealt with this summer (and in particular, see pages E21, E22, E30, and E31 of the class notes).

One problem will pertain to classification. It won't focus on the details of how CART works. To prepare for it, study pages 4, 5, 6, and 10-16 of the CART handout (from the overhead projector presentation), as well as pages H5 and H15 of the class notes. (Note that page 10 of the CART handout is particularly important. Also note that on p. 13, the prior probabilities for the two classes are assumed to be equal, as was the case on p. 12. On pages 14 and 15, an assumption of equal prior probabilites is also being used, although it's not stated.)

Hints About the 20 Small Problems/Questions

There will be twenty 1 point problems/questions, and you'll need to choose 15 of them to answer. I'm not going to count your best fifteen of twenty scores on these, since most will be of the multiple choice and true/false variety and I don't want guessing to play too large of a role in determining grades. So you'll have to X out 5 of them, and I'll just grade the fifteeen you choose to submit answers for (and so 60% of the weight of the final (15% of the weight of your overall course average) will come from these twenty problems/questions).

Rather than try to cover the entire course somewhat uniformly, I'll instead concentrate heavily on a few topics, and then also have some of the problems/questions dealing with the rest of the material. (I think students learn more when they study for an exam if they focus on a subset of the material, as opposed to trying to review all of the material covered by a course. The downside of this scheme is that I won't cover a lot of important topics, because I'm concentrating heavily on a few, but I think my practice of narrowing the scope of coverage and giving hints actually gets students to study more, and get more out of their studying.)

Four questions will deal with jackknifing, with an emphasis on pseudo-values (and procedures involving pseudo-values). (I suggest that you study pages C1 through C9 of the class notes. For information about pseudo-values, see pages C7, E33, E34, F3, and G4.)

Five questions will deal with histogram density estimators. (I won't have anything on the somewhat messy cross-validation method for selecting an optimal bin width.) In particular, some of the material on pages H5 through H9 of the class notes will be covered on the exam.

Four questions will be based on the article I wrote titled A Comparison of Regression Methods, and the presntation I gave about the article. The questions won't focus on the details of how the study was performed, or how the various regression methods work. Rather, the focus will be on the results of the study, and conclusions that can be drawn from them.

One question will deal with prediction error. Studying pages J1 through J4 of the class notes is recommended, but you don't have to worry about any questions based on the rest of Section J.

Six problems/questions will deal with things such as expected value of an estimator, bias of an estimator, variance of an estimator, standard error of an estimator, mean squared error of an estimator, sampling distributions and approximate sampling distributions of statistics, and the concept of one distribution being stochastically larger than another distribution. Quiz #8, Quiz #9, Quiz #13, and Quiz #14 would be good to review (but some of the final exam problems aren't as complicated as some of the problems from the quizzes). (Note: You can also glance over Quiz #5, Quiz #7, and Quiz #11, but since the exam questions that are related to these quizzes are rather loosely related, I don't think you should spend too much time studying these three quizzes.)

Note: Some questions/problems can be viewed as falling into more than one of the areas indicated above. For example, one could say that there are six questions about histogram density estimators instead of 5, because I counted the 6th such question as being more in the last area indicated above.

As a final comment about what to prepare for, I'll state that some of the questions/problems, from at least two of the areas indicated above, pertain to the one factor random effects model, first introduced in the class notes on page D2. While I'm not going to have any problems based on Section D of the notes, you can glance over page D2, and then make note of some of the distribution theory facts given on page E30, and note that various estimators for the variances are given on page E32. (The exam won't have anything on it about the more complicated estimators on page E32, but may make use of a couple of the simpler estimators, like the unbiased estimators, and the Hodges-Lehman estimator.) You'll notice an absence of emphasis on bootstrapping, cross-validation, and the material from the first part of Section E. It's not that these topics aren't important, but rather that I just wanted to have you focus on a relatively small subset of the material that was covered this summer.

Reminder

The official exam period is 7:30-10:15 PM on Tuesday, July 23.