Comments about Ch. 5 of Applied Logistic Regression, 2nd Ed.
- (p. 146) Despite the fact that the square of the denominator in
(5.1) looks different from the usual denominator used for Pearson's
chi-square goodness-of-fit statistic, (5.2) is indeed equal to the
usual Pearson's statistic for the corresponding 2 by J table. (The key
is that the sum of the two terms for a column in the table yields just
one term in the (5.2) representation. (I can show you this if you
want, but it's pretty easy if you want to try it.)) Seeing where the
chi-square distribution comes from is easy if you write it as in (5.2)
--- you have the sum of the squares of J asymptotically normal
random variables. While J - (p+1) df results from
following the usual "rules", to me it isn't easy to see why those
are the proper df.
- (p. 146) I think the deviance residual isn't quite as
straightforward as the Pearson residual. It seems like it can be
described as a difference in log-likelihoods: the log-likelihood of the
saturated model minus the log-likelihood of the fitted model under
consideration.
Recall that at our first meeting there was some
confusion about the term saturated model --- confusion that I
blame on the authors of the book. P. 13 indicates that the likelihood
of the saturated model is 1, which can only be if each factor in the
likelihood product is 1, which can only be if each success has an
associated probability of 1 and each failure has an associated
probability of 1. Since H&L were referring to the setting in Table 1.1,
where there are outcomes of both 0 and 1 for cases having the same
covariate pattern, one can only conclude that the saturated model is
some sort of an ideal model incorporating variables not in the data set,
and that these phantom variables can be used to fully account for
outcomes of both 0 and 1 for cases having the same covariate pattern of
the observed covariates (and not the phantom ones). However, in other
places in the book, H&L use saturated model as Dr. Bolstein was at the
first meeting. If there are J < n distinct covariate
patterns, and outcomes of both 0 and 1 occur for some cases having the
same covariate pattern, then one creates a fit for a saturated model by
fitting each distinct
covariate pattern using a sample proportion, and one does not have a
likelihood of 1. This matches material on p. 166, where the
log-likelihood of the saturated model is 0 (so likelihood is 1) only in
the case of J = n (although it seems to me that the
log-likelihood could be 0 even if J < n).
- (p. 146, first sentence after (5.4)) (5.4) seems like (1.8) on p.
13 rather than (1.10) on p. 14.
- (p. 148) I find it odd that H&L put a hat on their test statistic
--- usually the hat notation is used to denote an estimate.
- (p. 150, first paragraph and Table 5.1) It's not clear where the
expected value of 12.7 comes from. Maybe it's as stated in the first
paragraph, but in that case it's not clear where the probability 0.234
in the 5th row of the table comes from. That is, I don't see the
correspondence between the probabilities given in the table and the
expected counts given in the tables. I would have guessed the expected
count for the 5th row would be 58*0.234, but that doesn't equal 12.7.
- (p. 152) The description of the Tsiatis test at the top of the page
doesn't seem very clear --- but I'm going to choose not to worry about
the details for now.
- (p. 153) To understand why the procedure described is
sensible, I suppose one has to look elsewhere --- the book seems to give
a description w/o sufficient motivation.
- (p. 157) PMC can be made close to 0.5 by putting
theta1 equal to 0.5 and beta1 =
mu close to 0.
- (p. 162) Is it proper to call the area under the ROC curve the ROC,
as is indicated by the general rule given on this page?
- (p. 162) I wonder how good the guidelines suggested by the
general rule are. I know in some classification settings one wouldn't
be happy with less than 90% correct predictions ... but how does this
relate to ROC? Recall that Jill brought this up before --- with
something like CART, can we get an ROC curve, and if so how? (I think
the simple way would be to change the priors --- I'm pretty sure this
would work fine.) Why is the area under the ROC curve of great
interest? In the end, don't we choose a specific classifier (e.g.,
logisitc regression with a specific value for the probability cutoff),
and once that is done, aren't we interested in it's misclassification
rate, or perhaps both its sensitivity and specificity?
The area under the curve seems to relate to the performance of a
collection of classifiers.
- (p. 164) The first paragraph of subsection 5.2.5 indicates that the
R2 measures aren't really measures of goodness-of-fit,
even though they do relate to the degree of improvement over the
constant model relative to the improvement of the perfect model over the
constant model.
The same is true in ordinary regression: one can have a high value of
R2 with a model that is clearly wrong (as indicated by
a residual analysis), and one can have a low value when the model is
correct (if the error term variance is relatively large). (An
interesting thing about logisitc regression is that the variation is
related to the probabilities which are being fit --- whereas in ordinary
regression, the variation need not be at all related to what is being
fit.)
- (p. 165) Recall that Ed Prokop indicated that he was having trouble
matching some of the values given on this page.
- (p. 173) Given that the approximation of the residual is
(1 - hj) yj, why isn't the factor
(1 - hj) squared in the expression for its variance
(which is the first displayed expression on p. 173)? (It can be noted
that (5.14) and (5.15) are consistent with not squaring this
factor, and so I hesitate to think that the square was accidentally
omitted from the expression for the variance.) Also, since the
probability is being estimated, it should be referred to as an estimated
variance.
Whoa! I initially failed to note that hj is not a
constant. From (5.13) on p. 169 it can be seen that it's a function of
all of the responses (through pi hat), and so a random variable.
- (p. 181) The first part of the paragraph at the bottom of the page,
pertaining to pattern 31, is evidence that the diagnostic statistics can
be somewhat misleading. The approximations aren't really good --- note
that H&L claim that the approximations are only modestly correlated
with the actual observed changes.
- (p. 182) The issue of whether or not to delete the 5 cases is
interesting. There are 628 cases in all, so if one fit a model to all
but 5 of them, it could be thought to be good about 99% of the time.
But then to use the model, should one check to see if a subject under
consideration has a covariate pattern similar to one of the 5 that were
ignored? Since the subject matter experts did not think the 5 covariate
patterns were particularly unusual, the decision to not delete them was
made. Would you have made the same decision?
- (p. 187) It puzzles me greatly why the test described on the top
portion of this page should be a two-tailed test! One could reject
because the validation sample was in too much agreement with the model,
which doesn't seem at all sensible to me.
I think H&L have a(nother) mistake! In subsection 1.2 on p. 1146 of the
pertinent JASA article, it states "Typically, large values of
the test statistic ... indicate a lack of fit" --- so I think one should
use an upper-tailed rejection region.
- (p. 192) The RACE/SITE results are interesting. Taking the loose
relative risk interpretation of odds ratio for the sake of simplicity,
nonwhites are twice as likely as whites to remain drug free at Site A, but
only half as likely at Site B. This serves to demonstrate the need for
the interaction term for these two variables, but at the same time it
may seem puzzling --- what is it about the different sites that is
responsible for this?
- (p. 198) H&L again get screwed up with percent increases. (Were
they absent the day this was covered in the 6th grade?) If 11 vs. 10 is
a 10% increase, why isn't 2 vs. 1 a 100% increase (as opposed to their
claim that it's a 50% increase)?