Information Pertaining to Final Exam
Extra Office Hours
I'll hold the
following extra office hours prior to the exam:
- Sunday, July 21, 6:00-8:00 PM in room 25 of the Central Module,
- Monday, July 22, 7:00-9:00 PM in room 25 of the Central Module.
I could also make myself available in the late afternoon and/or early
evening on Saturday, July 20, if you want to stop by my office in
Sci-Tech 2 with questions (but you should set up an appointment for a
specific time on Saturday via e-mail prior to 2 PM on Friday, July 19).
I won't have office hours on Thursday, July 18.
Hints About the 3 Big Problems
The exam is worth 25 points. There will be three 5 point problems, and
I'll count your best two of three scores on these (and so 40% of the
weight of the final (10% of the weight of your overall course average)
will come from these three problems).
One problem will deal with the first three sections of the article by
Efron and Gong, with a focus on Tables 1 and 2.
One problem will be pertain to the Satterthwaite approximation.
Rather than have you do anything real messy for the exam, I'll construct
a somewhat artifical situation, but in order for you to supply the
answers I want, you should understand the derivation of the df
formulas for the various Satterthwaite approximations that we've dealt
with this summer (and in particular,
see pages E21, E22, E30, and E31 of the class notes).
One problem will pertain to classification. It won't focus on the
details of how CART works. To prepare for it, study pages 4, 5, 6, and
10-16 of the CART handout (from the overhead projector presentation), as
well as pages H5 and H15 of the class notes. (Note that page 10 of the
CART handout is particularly important. Also note that on p. 13, the
prior probabilities for the two classes
are assumed to be equal, as was the case on p. 12. On pages 14 and 15, an
assumption of equal prior probabilites is also being used, although it's not
stated.)
Hints About the 20 Small Problems/Questions
There will be twenty 1 point problems/questions, and
you'll need to choose 15 of them to answer.
I'm not going to count your best fifteen of twenty scores on these,
since most will be of the multiple choice and true/false variety and I
don't want guessing to play too large of a role in determining grades.
So you'll have to X out 5 of them, and I'll just grade the
fifteeen you choose to submit answers for
(and so 60% of the
weight of the final (15% of the weight of your overall course average)
will come from these twenty problems/questions).
Rather than try to cover the entire course somewhat uniformly, I'll
instead concentrate heavily on a few topics, and then also have some of
the problems/questions dealing with the rest of the material.
(I think students learn more when they study for an exam if they
focus on a subset of the material, as opposed to trying to review all
of the material covered by a course. The downside of this scheme is
that I won't cover a lot of important topics, because I'm concentrating
heavily on a few, but I think my practice of narrowing the scope of
coverage and giving hints actually gets students to study more, and get
more out of their studying.)
Four questions will deal with jackknifing, with an emphasis on
pseudo-values (and procedures involving pseudo-values). (I suggest that
you study pages C1 through C9 of the class notes. For information
about pseudo-values, see pages C7, E33, E34, F3, and G4.)
Five questions will deal with histogram density estimators. (I
won't have anything on the somewhat messy cross-validation method
for selecting an optimal bin width.) In particular, some of the material on pages H5
through H9 of the class
notes will be covered on the exam.
Four questions will be based on the article I wrote titled A
Comparison of Regression Methods, and the presntation I gave about
the article. The questions won't focus on the details of how the study
was performed, or how the various regression methods work. Rather, the
focus will be on the results of the study, and conclusions that can be drawn
from them.
One question will deal with prediction error. Studying pages J1 through
J4 of the class notes is recommended, but you don't have to worry about
any questions based on the rest of Section J.
Six problems/questions will deal with things such as
expected value of an estimator,
bias of an estimator,
variance of an estimator,
standard error of an estimator,
mean squared error of an estimator,
sampling distributions and approximate sampling distributions of
statistics, and the concept of one distribution being stochastically
larger than another distribution.
Quiz #8,
Quiz #9,
Quiz #13, and
Quiz #14 would be good to review (but some of the final exam problems
aren't as complicated as some of the problems from the quizzes).
(Note: You can also glance over
Quiz #5,
Quiz #7,
and
Quiz #11, but since the exam questions that are related to these quizzes
are rather loosely related, I don't think you should spend too much time
studying these three quizzes.)
Note: Some questions/problems can be viewed as falling into more
than one of the areas indicated above. For example, one could say that
there are six questions about histogram density estimators instead of 5,
because I counted the 6th such question as being more in the last
area indicated above.
As a final comment about what to prepare for, I'll state that some of
the questions/problems, from at least two of the areas indicated above,
pertain to the one factor random effects model, first introduced in the
class notes on page D2. While I'm not going to have any problems based
on Section D of the notes, you can glance over page D2, and then make
note of some of the distribution theory facts given on page E30, and
note that various estimators for the variances are given on page E32.
(The exam won't have anything on it about the more complicated
estimators on page E32, but may make use of a couple of the simpler
estimators, like the unbiased estimators, and the Hodges-Lehman
estimator.) You'll notice an absence of emphasis on bootstrapping,
cross-validation, and the material from the first part of Section E.
It's not that these topics aren't important, but rather that I just
wanted to have you focus on a relatively small subset of the material
that was covered this summer.
Reminder
The official exam period is 7:30-10:15 PM on Tuesday, July 23.