Some Comments about Chapter 10 of Hollander & Wolfe
While you might be surprised to see methods for the analysis of nominal
data in a nonparamteric statistics book, thinking that such methods are more
appropriately included in books on categorical data analysis, H&W isn't
the only nonparametric statistics book to include such methods. While I
don't believe that such methods really belong in a course on
nonparametric statistics, it can be pointed out that methods for the
analysis of ordinal data, are perhaps closer in spirit to nonparamteric
statistics methods than they are to methods for the analysis of
categorical data. (If we had another week or two in the semester, I
would work in (more) coverage on the analysis of ordinal data.) For
example, Kendall's test for association can be applied to ordinal data,
since the concepts of concordant and discordant pairs of bivariate
observations extend easily to ordinal data. However, typically there
are a lot of ties, and so the tables designed to use with continuous
data cannot be used. (StatXact tends to be a good software package
to use
with ordinal data, because it can handle ties in an exact manner.) It
can also be noted that to obtain the null hypothesis sampling
distribution of some test statistics used with nominal data,
randomization concepts like those used in nonparametric statistics are
employed.
Section 10.1
A lot of the material of this section is adequately covered in STAT 554,
and so I won't spend too much time on it. A good plan will be for
you to ask questions about whatever parts of it you want me to comment
on. This will be a good place for me to insert some material on
Person's chi-square goodness-of-fit test, and other goodness-of-fit
procedures covered in StatXact. (Note: Even though
Pearson's chi-square goodness-of-fit test is covered in STAT 554,
StatXact does an exact version of it that is accurate for small
sample sizes (but doesn't have a chi-square distribution).)
I'll offer some specific comments
about the text below.
- p. 459
- Note that while Barnard's test seems like a natural test to
consider, it's not at all commonly used. (We can use StatXact to
do Barnard's test --- but otherwise it would be much harder to do.)
With regard to the expression
Pp(D >= Dobs),
it should have been
stressed that the probability under the case of the null hypothesis
being true is being considered. (Also, I would have just used
d instead of Dobs.)
- p. 464
- Minitab's warning about the expected cell counts being less
than 5 is good I guess, but I wonder if people who would need such a
warning will know the significance of it, or know what sort of
adjustment they should make. (A warning could also be given because one
of the observed cell counts is less than 2.)
- p. 467, Comment 2
- (error in book) The estimator
given by (10.12) (and not (10.5)) should be the denominator of (10.31).
- p. 469, Comment 6
- I don't think that adequate explanation is given for the estimated
standard deviation, and for the z statistic that assumes the
value of 1.71. In class, I'll explain McNemar's test by making use of
the first two sentences of the first new paragraph on p. 469. Really,
since the exact test is so simple (see bottom portion of p. 469), I
don't see the point in introducing the large-sample procedure
(particularly if it's not explained).
- p. 470, Comment 6
- (error in book) H&W should have
approximate and exact instead of "approximate
exact".
- p. 471, Properties
- Property 2 is the asymptotic equivalence of Pearson's
chi-square test and the generalized likelihood ratio test. Since we
will have little need for approximate versions of these tests, I'll
emphasize that exact versions of these two tests can be viewed as
competitors to Fisher's exact test.
Section 10.2
This is a short section about a test that I cover in STAT 554, and so I
won't spend a lot of time on it (although I will spend a fair amount of
time on some related material --- other exact tests that we can easily
do using StatXact). It can be noted that while exact tests
for 2 by 2 contingency tables are fairly commonly done, they seem to be
less often used for tables other than 2 by 2 tables. But when one has
expected cell counts less than 5, and observed cell counts less than 2,
approximate p-values can be off by quite a bit, and so obtaining exact
p-values using StatXact (or obtaining them some other way) can be
a good thing to do at times.
I'll offer some specific comments
about the text below.
- p. 473
- (error in book)
Two ns are missing from the numerator on the right side of
(10.37).
- p. 475, Comment 11
- Many users of the commonly employed approximate version of
Pearson's chi-square test don't realize how inaccurate the test can be,
even when the sample sizes aren't real small. For example, on p. 495 of
the StatXact manual, the sample sizes considered are 77 and 350,
and the smallest null expected cell count is about 32.1. Yet the exact
p-value from Fisher's exact test (and the exact version of Pearson's
test) is about 0.0551, and the approximate p-value from the commonly
used approximate version of Pearson's test is about 0.0437, and so not
even one significant digit is correct, and the approximate p-value
suggests that one should reject with a size 0.05 test, while the exact
p-value indicates that one should not reject.
Section 10.3
The StatXact manual states (on p. 444) that "One often prefers to
estimate parameters like the difference of proportions or the ratio of
proportions because they are easier to interpret than the odds ratio."
I agree --- and I often find the odds ratio to be annoying to deal with,
and hard to explain to others. To me, the ratio of success
probabilites, which is the relative risk introduced at the very
bottom of p. 483 of H&W (in the Problems section), is easier to explain
to people, and usually more meaningful than the odds ratio. (The
StatXact manual also has the following on p. 446: "Often it is
clear that the real parameter of interest is not the odds ratio, but the
ratio or difference of proportions.") It is nice that the latest
version of StatXact has included some relatively new inference
procedures for the ratio and difference of the probabilites of success.
Although in class I'll spend more time on the odds ratio (because it is
commonly used, and also because it's harder to understand and so
requires more time), I'll mention some of the newish StatXact
procedures, and I'll encourage you to keep them in mind as an
alternative to the odds ratio route.
I'll offer a specific comment
about the text below.
- p. 479, Confidence Intervals
- Note that Symmetric (2 lines above (10.56) doesn't mean that
the endpoints are equidistant from the point estimate. Rather it means
that the interval is just is likely to fail to cover by having the upper
bound below the true value as it is to fail to cover by
having the lower
bound above the true value. But really, it's only asymptotically
symmetric in this sense.