Comments on Beyond ANOVA: Basics of Applied Statistics by
R. G. Miller, Jr.
Below are some comments about the various chapters of the book.
(You can use these links to jump down to comments about
Ch. 1,
Ch. 2,
Ch. 3,
Ch. 4,
Ch. 6,
and Ch. 7.)
I may add comments about Ch. 5 later, but my presentation of regression
in STAT 554 more closely matches the one in the book by Bhattacharyya
and Johnson.
The number of comments do not reflect the importance of the various parts
of the book --- rather I have just added comments that I think may be
helpful to you as you read through the book, and I'll let some parts of
the book stand on their own.
This book matches a lot of what I cover in my course fairly well. In my
course notes and my lectures I won't have time to go into all of the
more detailed information in Miller's book, but I may
briefly mention some issues that are in the book and for which it would be
great if you could look into on your own.
For some topics my course notes supply
a bit more explanation than does Miller's book.
It may be a good idea
to read the course notes first, then listen to the lecture, and then read the
appropriate parts of Miller's book (and if you choose this type of
attack, you can wait until after Week 4 to begin reading Miller's book).
My advice is to consult the
reading guide to see which parts of
the book correspond to which lectures, and then read Miller's book while
referring to the notes that I have below. Don't worry if you find
yourself not getting all of the material mastered as you read through
the book --- at this stage it should be okay to understand the material
at the level at which I present it during my lectures.
- (p. 2) The 2nd and 3rd sentences of the 1st complete paragraph on
this page are in agreement with my emphasis on reporting p-values over just
stating whether or not one rejects the null hypothesis with a test of a
certain size or level.
- (p. 2) I agree with Miller in that for a lot of applied settings,
the Bayesian approach doesn't seem appropriate. (I cover some material
on Bayesian statistics in STAT 652, so if you take that course you'll be
told what the strategy is, and I'll discuss when it may be appropriate.)
- (pp. 2-3) Because the likelihood function approach (like the
Bayesian approach) requires that a parametric model be specified, I
don't mention it in STAT 554 (but may briefly mention it in STAT 652
(652 uses likelihood functions a lot, but the particular
likelihood function technique that Miller is referring to isn't stressed
a lot)). Except for methods that are for approximately normal
distributions, most of the methods covered in STAT 554 don't assume a
particular parametric model, since in a lot of applied work one doesn't
have enough information to confidently choose a parametric model.
- (p. 3) I generally like to report p-values using 2 significant
digits, although if the p-value results from an approximation or relies
heavily on robustness that may be somewhat suspect because of a small
sample size, I may just use one significant digit to report a p-value.
I the p-value is small (say less than 0.005, 0.001, or 0.0005 in many
situations), then I may just state that fact rather than report using 1 or 2
significant digits (since approximations of small probabilities tend to
break down when one gets far enough out into the tail of the
distribution).
- (p. 3) The paragraph at the bottom of the page gets into the
relationship between the magnitude of p-values and sample size.
The power of a test typically depends on both the sample size and how
"far" what is true is from the null hypothesis, and power relates to the
likelihood of obtaining small p-values. It's important to
note that with a large sample size the p-value can be small even though
the truth is only slightly different from the null hypothesis (e.g.,
maybe there is only a very minor treatment effect), and with a small
sample size, the p-value may be somewhat large even though the
alternative hypothesis is true because of low power.
One should keep in mind that the p-value is a measure of how
compatible/incompatible the observed data is with the null hypothesis,
and isn't a measure of how far the truth is from the null hypothesis
(although it's related to this "distance", one also needs to bring the sample
size into consideration).
- (p. 4) In expressions (1.2) and (1.3), I think mu0 should
be replaced by muX, or just mu.
(I think mu0 should just be used in hypothesis
testing. In (1.2) and (1.3) Miller means the true mean of the distribution, and
we don't typically use
mu0 for that.)
- (p. 4) When one sees the +/- notation, it may not be clear
what is meant. Some follow the +/- with the half-width of a
confidence interval, others with the estimated standard error, and
others with some other value. I think it's best to report a confidence
interval since sometimes +/- one standard error isn't a very
meaningful thing. (Similarly, when one sees "error bars" in a graph, it
may not be clear what is intended unless it is stated that it's a
confidence interval, or +/- one standard error.)
- (p. 4) Some would disagree with the last sentence on the page.
I think it's okay in some cases to do a one-sided test even though it isn't clear that
the effect has to be in a certain direction if there is an effect.
To me, the important thing is to specify the direction of the
alternative before looking at the data.
However, in some situations, the fair thing may be to do a two-sided test.
For example, with employee sex discrimination cases, I think a two-sided
test is appropriate. Since a company can be criticized for promoting
men in favor of women or women in favor of men, to allow
one-saided tests would put the company in double jepordy (since the
chance of a p-value less than or equal to 0.05 when there is no
discrimination becomes 0.1 if we allow "attacks" on the company with
one-sided tests in both directions). Of course, I think the use of
simple statistical tests in discrimination cases is quite suspect in any
case, since I think one cannot adequately model employees as
equally-likely selections when it comes to selecting people to get
promoted.
- (pp. 6-7) Miller has "For most distributions, the central limit
theorem will have had time to weave it's magic on y-bar by n =
10" --- you should note that he's referring to the distribution of
the sample mean, and not the t statistic (when one has to
estimate the standard deviation, the convergence to normality may not be
quite as quick).
- (very bottom of p. 7 and very top of p. 8)
Note that the indication that the effect on the t test is small
pertains to two-tailed tests, which aren't as sensitive (due to
cancellation of errors effect). Certainly a skewness of magnitude 0.7
can have an appreciable effect on a one-tailed t test (unless the
sample size is sufficiently large).
- (p. 8, top of page) The actual p-value refers to the one
which would be obtained if the true (but unknown) null sampling distribution
of the test statistic could be used. The stated p-value refers
to the one that results from using the Tn-1
distribution. (Recall that if the Xi don't have a
normal distribution, then the t statistic won't have a
Tn-1 null sampling distribution. So in practice, if
we use the
Tn-1 distribution to obtain a p-value, we're relying
on robustness; hoping that the sampling distribution will be close to a
Tn-1 distribution even if the
Xi aren't perfectly normal.)
- (p. 8) The first full sentence on this page refers to symmetric
distributions (even though Miller doesn't indicate that this is the
case). Skewness can change the story, as the next sentence suggests.
- (p. 8)
Although the Gayen article may have just considered positively skewed
distributions, it's important to keep in mind that negative skewness can
be just as bad. I think any guidelines pertaining to skewness should
use the absolute value of gamma1. (I think the
guidelines pertaining to skewness and kurtosis suggested by line 7
and previous comments in the paragraph are
not so good anyway. I tend to want to make some sort of adjustment for
milder cases of nonnormality. For example, Johnson's modified t
test can offer an appreciable improvement to the t test when the
magnitude of the skewness is less than 1 and the sample size is small.
(I don't think Miller was aware of Johnson's modified t test,
which is often a decent way to deal with distribution skewness when
making inferences about the mean. Bootstrap methods are also sometimes
effective for dealing with skewed distributions, as is a test due to
Chen which is similar to Johnson's test. Unfortunately, Miller's book
just doesn't present good alternative methods to cope with skewness when
making inferences about a mean.)
Also, I looked over Gayen's paper, and I couldn't really see where the
kurtosis greater than 4 guideline is coming from. One can note
that the Laplace (double exponential) distribution, which has a kurtosis
of 3, is a distribution for which the sign test is much more efficient
than the t test (although the Laplace distribution is not a
typical case in that it's quite peaked in the middle (which favors the
sign test) compared to many other heavy-tailed distributions).)
- (p. 9) The test that Miller refers to in the last complete sentence
on the page is the type of test dealt with in STAT 652, where the focus
is on deriving procedures tailored to particular parametric models.
- (p. 10) Miller's description of outliers differs a bit from what I
give in the STAT 554 class notes. (I like my more general description
better.)
- (p. 11) Normal probability paper was still in use in the early
1980s, but now it seems completely silly since it's easier to use
software to produce probit plots.
- (p. 14) It should also be pointed out that formal goodness-of-fit
tests for nonnormality give the benefit of the doubt to the null
hypothesis of normality, and this doesn't seem like a good idea to me,
especially since such tests can have rather low power to detect
nonnormality, particularly when the sample size is small.
- (p. 15) I've done studies that suggest that the estimates I give in
the class notes are generally superior to those given in expression
(1.16). (Others also promote the ones given in the class notes.
Miller's are simpler,and would be fine with large enough samples, but
tend not to perform as well when the sample size is smallish.)
- (p. 15) Tests about the skewness and kurtosis give too much benefit
of the doubt tot he null hypothesis (with coincides with normality).
- (p. 16) V is the covariance matrix of the Z(i), not the
y(i).
- (p. 18, about mid page) I disagree that "the correspondence is
sufficient for practical purposes." Examples can be given in which the
transformation ploy will routinely give absurd results (and the method
tends to get worse as the sample size increases).
- (p. 18) The transformation method is fine if you're making
inferences about the median instead of the mean, or if you're content to
make an inference about E(Y) = E(g(X)) instead of E(X).
- (p. 20, p. 23, and other places in the book)
Miller doesn't emphasize that the use of a continuity correction often
improves the approximation of an integer-valued random variable's
distribution by a normal distribution. I've found that the use of such
a continuity correction is typically a good idea.
- (p. 20, 3rd line from bottom) The upper confidence bound is
wrong --- it should be n-s instead of n-s-1.
- (p. 21) I don't like the 2nd paragraph on this page. I don't think
it's good to think of the sign test as a quick screening device (with
software, other tests can be done quickly too), and I don't think it's
necessarily good to get a client out of the office quickly (I think it's
better to take a bit more time to try to few different tests, in order
to get a good feel for the data analysis situation at hand).
- (pp. 21-22) If zeros are ignored when testing, it's possible to
reject the null hypothesis and at the same time arrive at a point
estimate for the median which is compatible with the null hypothesis (if
zeros are not ignored when obtaining the estimate (and there is no
reason to ignore zeros when obtaining the estimate)). Despite the
possibly screwy situation (which should seldom occur in most settings),
the common practice is to ignore zeros when testing.
- (p. 22) Miller doesn't indicate that the signed-rank test has
usefullness as a test of the null hypothesis of no treatment effect
against the general alternative. (If one can assume independence, the
test is valid, unlike when using the signed-rank test to do a test about
the mean or median for which one also has to assume symmetry. Note that
Miller does indicate that a small p-value cannot necessarily be taken to
indicate that the mean or median differs from a specified value --- but
by assuming symmetry one can use the signed-rank test to do tests about
the mean or median.)
- (p. 23) Miller has "the reader can verify with a little thought or
mathematical induction" --- and while this may be true, I wonder if it's
the most productive way to spend your time.
- (p. 24) One need not bother with trying to understand the graphical
method described on this page since Minitab can be used to obtain both
the confidence interval and the Hodges-Lehmann estimate.
- (p. 24) On the 9th line from the bottom, n needs to be
replaced with n(n+1)/2.
- (p. 26) Miller indicates that the normal scores test is
inconvenient since it requires specialized tables. I'll point out that
there is an approximate version of the test that works quite well and
isn't too difficult to perform.
- (p. 27) Miller indicates that the permutation test is "clumsy to
carry out" --- but it's easy to perform with StatXact (a software
package).
- (p. 30) For STAT 554, please use ssdw,g/[h(h-1)],
where h = n - 2g, and g is the number trimmed from each
end of the ordered sample, to estimate the variance (squared standard
error) of a trimmed mean. I think this should generally result in increased
accuracy.
- (p. 31) I think the Huber M-estimator is more popular than
the Tukey bisquare (biweight) version, although I like the Tukey
version better in general (but in a lot of cases, it will matter
little which version is used).
- (p. 32) Miller has "Some hardy souls recommend continued use of
symmetric robust estimators ... but I canot recommend this." I guess
if the sample size is smallish, I'm one of those hardy souls --- and
I've done studies to justify my position. (Even if the estimator is
biased for the mean or median, a reduction in variance may offset the
contribution to the MSE (mean squared error) due to the bias.)
- (pp. 32-37) In STAT 554, I just don't have a lot of time to address
things covered by Miller's Sec. 1.3. I suggest that you read through
the end of p. 33, and then skim through expression (1.52) on p. 36 in
order to get some appreciation for the topics covered. A course in time
series methods should address some of the issues.
- (p. 34) In expression (1.47), it should be j not equal to 0,
1, and -1.
- (pp. 36-37) The sentence that begins on p. 36 and ends on p. 37
isn't clear to me.
- (p. 37) To me, Exercise 1 would be more appropriate for STAT 652
than STAT 554.
- (p. 38) Exercise 2 is a bit too theoretical in nature too to be a
good STAT 554 problem.
- (p. 40) In the 1st paragraph Miller has "the problem remains a
one-sample problem of comparing the mean difference with zero." I think
Miller places too much emphasis on the mean. In some situations the
median may be of more interest, or perhaps the proportion of cases in
which the treatment would be effective will be the main interest. Or
perhaps one could start by just testing for the presence of a treatment
effect (of whatever sort), and then focus on characterizing the nature
of the treatment effect if one is found.
- (p. 42) The 1st sentence pertains to the equal variance case. (If
the variances differ, then the central limit theorem does not
necessarily make Student's two-sample t test asymptotically valid.)
- (p. 43) The effect of skewness on Student's t statistics
isn't as easy to summarize in the
two-sample case as it is in the one-sample case.
I think additional studies are called for --- and an industrious M.S.
student could make a nice individualized study project of such a study.
- (p. 46) The last sentence of the 2nd paragraph doesn't make sense:
in order to have the expected number in each cell (under the null
hypothesis)
to be at least 5, only 10 observations are required for each sample ---
and indicating that one needs 10 in each sample is not being
conservative (since for fewer than 10, the chi-square approximation may
not work well enough).
- (p. 49) Miller has "assign i to the ith largest
observation." It should be:
assign i to the ith smallest
observation. (The 4th smallest observation gets rank 4. The 4th
largest gets rank n1
+ n2 - 3. (The largest gets rank
n1
+ n2, the 2nd largest gets rank
n1
+ n2 - 1, and so on.))
- (p. 50) Note the nifty argument given for expression (2.17) at the
top of the page.
- (p. 55) The description of the test given in the class notes is
just a bit different (and leads to a slightly more accurate test).
- (p. 56) You may find the details easier to follow the way they are
presented in the class notes.
- (p. 57) In the 2nd paragraph, Miller has "The effect on the P value is
not large." Then the example given has a reported value of 0.05 vs. a
true value of 0.03. Some may find such a difference bothersome. In
practice, a difference of 0.04 vs. 0.06 may be quite bothersome.
- (p. 57) In the last line, note that the actual type I error rate
can be 0.22 when the nominal level is 0.05 --- and so the effect of
unequal variances on Student's t can be very great.
- (p. 58) I think Miller should have described how the transformation
ploy can sometimes yield very silly results. I think it is bad practice
to make inferences with the transformed values and then report results
about the original scale.
- (p. 59) Right below expression (2.35) Miller suggests that the
delta method approximations can be justified asymptotically if y
is asymptotically normal. I don't follow: the distribution is what
it is and isn't going to change as the sample size increases. (If
we were concerned with a sample mean instead of a single random
variable, then the sample mean would be asymptotically normal.)
- (p. 61) You may find the description in the class notes easier to
follow.
- (pp. 62-63) In the last sentence of Section 2.3 reference
is made to Potthoff's 1963 paper. I believe that Miller's summary is
perhaps a bit inaccurate. The paper seems to claim that a
modification of the Wilcoxon test is robust for the generalized
Behrens-Fisher problem, but that the ordinary Wilcoxon procedure is
anticonservative (see the last sentence of Section 3 of the Potthoff
paper). Miller seems to imply that while the Wilcoxon test is affected
by unequal variances, it more or less does okay. I believe that it
should be mentioned that a modification of the Wilcoxon test should be
preferred (and that other methods may be even better).
- (p. 64) The 1st exercise is a STAT 652 type of problem, and the
next three exercises are more theoretically oriented than I assign in
STAT 554 (but Exercise 4 is a very interesting one).
- (p. 67) Calling the eij the "random (unexplained)
variation" is "okay" --- typically it is called the "error term" and
I'll often call it that as a matter of habit or convenience, but since
measurement error may only account for a small portion of the variation,
perhaps "error term" is a bit misleading. Usually, the variation is due
to both (1) measurement imprecision and (2) natural variation of values in
the population (and this variation becomes random when we randomly
select population members to measure).
- (p. 68) This page is a particularly good one. The first full
paragraph describes the important concept of random effects,
the second full paragraph gives a nice example, and the third full
paragraph describes how we sometimes model things as a random effect
even if we don't make random selections --- all of this stuff should
give you a better understanding of random effects.
- (p. 69) Perhaps the constraint in the footnote seems more sensible
(since otherwise the treatment effect magnitudes are related to the
sample sizes, and one may prefer to think of them as fixed values that
shouldn't depend on sample size), but the constraint given in the top
half of the page is related to the noncentrality parameter of the
F distribution of the test statistic.
- (p. 70) You may be unsure about the notation in expression (3.4).
Consider the 2nd of the 3 distribution facts. We have that
SS(A)/sigma2 has a chi-square distribution with
I-1 df and noncentrality parameter given by the ratio in the
parentheses. If the null hypothesis is true, the noncetrality parameter
equals 0 and we just have an ordinary (central) chi-square
distribution for
SS(A)/sigma2. The way it's expressed in the book,
with
SS(A) not divided by sigma2, gives us that
the distribution of SS(A) is that of a chi-square random variable
multiplied by the constant
sigma2 (and so a scale altered chi-square
distribution).
- (p. 71) Miller wrote that the F test "has several
deficiencies" by which he means that since it's not optimal (it's not a
UMP test), even if the error term distribution is normal, the "door is open" to alternative methods that may have
higher power in some situations.
- (p. 72) On the last line probability is misspelled.
- (p. 73) The last sentence of the paragraph that begins the page is
important.
- (p. 74) Neither the F test or the studentized range test
dominates the other --- but the studentized range intervals dominate the
Scheffe intervals, which are related to the F test. You may be
wondering how this can be. Well, the key thing to note is that the
Scheffe intervals are not in complete correspondence with the F
test. Note the inequality in expression (3.10). It allows for caes in
which all of the intervals include 0, but the test still rejects,
and so in a sense the confidence intervals are too wide, and the
studentized range intervals can be shorter even though the F test
yields a smaller p-value.
- (pp. 75-76) In STAT 554 I don't have time to cover contrasts in
general, and so you can skip or skim the bottom part of p. 75 and the
top part of p. 76.
- (p. 76) In STAT 554 I don't have a lot of time to spend on monotone
alternatives, and so after getting through the top half of p. 77 (in
order to gain some understanding of what the main issues are with
monotone alternatives),
you can skip or skim the rest of subsection 3.1.3 (however, I will make
some comments in order to possibly assist the interested few who tackle
the rest of the subsection).
- (pp. 78-79) Keep in mind that the ci sum to
0. (This follows from p. 75.)
- (p. 79) The values of the ci given in expression
(3.21) for the cases in which I is odd may seem defective in a
sense. For example, in testing against the alternative
mu1
<= mu2
<= mu3, the value of the sample mean of the 2nd sample does
not come into play --- and it could be way out of line with the
alternative and not have an effect on the p-value ... in a sense meaning
there is no accounting for a strong piece of evidence suggesting the the
alternative hypothesis isn't true. This fact stresses the importance of
believing that either the null hypothesis is true or the alternative
hypothesis is true --- in which case the test should perform okay with
regard to the type I error rate. If it could be that neither the null
or the alternative hypothesis is true (and some other ordering exists
for the means), then the test can misbehave. In particular, there can
be a high probability of getting a rejection of the null hypothesis even
though the alternative hypothesis isn't true.
- (p. 80) The first sentence of subsection 3.2.1 is generally true
for the case of iid error terms. If the distributions are not
identical, and in particular if the variances aren't close to being
equal, then the null sampling distribution of the test statistic can be
appreciably different from the nominal F distribtion.
- (p. 81) In the first full paragraph on p. 81 it's not clear to me
what J is. (Perhaps Miller got the notation from the journal
article confused with the notation he's using in his book. Maybe
J should be n.)
- (p. 82) I don't like it that Miller doesn't warn that one has to be
careful using the transformation ploy (since one can get very misleading
results by doing so).
- (p. 83) Towards the bottom of the page, Miller indicates that the
exact distribution (multivariate hypergeometric) is "too difficult to
work with" but StatXact can be used to produce exact p-values based on this
distribution.
- (p. 89) Towards the top of the page, Miller has one sentence about
robust estimation. Two books by Rand R. Wilcox (1997 & 2001) contain
updated information about robust methods for multiple comparisons.
- (pp. 89-90) Note that expression (3.38) weights the variances
roughly proportional to sample size, whereas, in a lot of cases,
expression (3.39) gives them
roughly equal weight. (Miller gives more info on this at the bottom of
p. 90 and the top of p. 91.)
- (p. 90) Note that the sum in expression (3.40) equals 0 if
all of the variances are equal, but is otherwise greater than 0.
- (p. 90) About 75-80% down the page, Miller points out that the
F test can be anticonservative (the resulting p-value can easily
be smaller than it deserves to be), which is bad.
Miller claims that the effect is not large, but Rand R. Wilcox indicates
that it can be large, even if the sample sizes are equal (and I think this is the case: it can be
large, but in a lot of situations it may be smallish).
If the sample sizes are appreciably different, then the test can badly
misbehave.
- (p. 91) The square of the denominator estimates the true variance
of the contrast if the variances are equal, but estimates expression
(3.42) if the variances differ. This mismatch results in misbehavior.
- (p. 92) The last sentence of subsection 3.3.1 suggest a good
project for a student to do.
- (p. 92) The 2nd to the last sentence of subsection 3.3.2 is typical
of many such statements in the book by Miller. He treats "substantial
extra computation" as perhaps a reason "not to go there" but I think we
shouldn't fret so much over the extra work --- create macros or functions
to do the extra work for you.
- (p. 92) About 50% down the page, correct is incorrectly
spelled.
- (p. 92 & p. 93) From expression (3.44) and from two lines above it, it
follows that one needs the derivative of g to be proportional to
1/h, which leads to the integral expression, (3.45).
- (p. 93) Three lines below expression (3.45), y-bar (and not
y) should be under the square root.
- (p. 93) The last sentence of the second to the last paragraph deals
with the performance of the nonparametric tests if there are unequal
variances. If one is doing a test for the general k sample
problem, unequal variances are okay. For tests about means, they can be
okay in some circumstances, but if one cannot assume that one
distribution is stochastically larger than the other one whenever two
distributions differ, then unequal variances can hurt the validity of
the nonparametric tests, and I suspect that in some cases the effect can
be quite large.
- (p. 93) The last paragraph refers to multiple comparison procedures
for unequal variance settings. The one I discuss in class is chosen for
it's simplicity, and perhaps some of the other procedures may work
better.
- (p. 94) The first paragraph points out that the Abelson-Tukey test
can be easily adjusted to deal with unequal variances --- one may need
to consult the 1946 Satterthwaite article to see how to find the proper
df (but for the I = 3 case, the df formula for Welch's test can
be used, since the test statistic just uses two of the three samples).
- (p. 94) In the 2nd paragraph of Sec. 3.4, the serial "correlation in
blocks" refers to two-way and higher-way designs.
- (pp. 97-98) In STAT 554, I don't have time to cover a lot of what's
on these two pages. It can be noted that the estimators on these two
pages are all for balanced designs, and thus cannot always be used.
- (pp. 101-104) I skip subsection 3.5.3 completely in STAT 554 (due
to lack of time).
- (p. 104) In expression (3.71) the denominator of the first term
should be N instead of NI. (Of course, one can express
N as either In or nI.)
- (p. 108 (last line) and p. 109 (1st and 2nd lines)) The three
subscripts of k should be i (or else the index of
summation should be k instead of i).
- (p. 110) Consider the last paragraph of Sec. 3.7. Miller indicates
that nonnparametric procedures cannot be used to estimate the variances
(they don't yield variance estimates), but in the unequal variance case
(the focus of Sec. 3.7), there is no (single) error term variance ---
the variances are different, and one would just have to use the various
samples to supply a set of estimates. (In comment 27 above, I indicate
that nonparametric tests can sometimes be used to do a test about the
mean even if the variances appear to be different.) Miller finally puts something
negative
about transformations in the last sentence of the
paragraph (but he doesn't explain himself very well).
- (pp. 110-111) I skip Sec. 3.8 completely in STAT 554 (due
to lack of time).
- (p. 111) I work out Exercise 1 in my class notes.
- (pp. 115-116) The data for Exercise 11 is also used in Exercise 11
of Ch. 4. Based on the experimental design, Ch. 4 methods
are more appropriate than Ch. 3 methods.
- (p. 119) I'm not real sure what Miller means by the sentence right
before subsection 4.1.1. Maybe the intermediate models are ones in
which the interaction terms are random, and are thus
governed/contstrained in a way that doesn't allow for completely
arbitrary cell means (and at the same time makes the model nonadditive).
- (p. 122) The first full paragraph on this page is important: if
there are interactions, the interpretation of the main effects is more
complicated.
- (pp. 122-123) The paragraph that starts on the botton of p. 122 and
continues on p. 123 describes a method that I won't have time to cover
in STAT 554 (since it is based on multiple regression, and I barely have
time to cover the basics of simple regression).
- (p. 125)
It should be degrees (instead of degress).
- (p. 126) What is meant by
"In
other instances"
in the sentence that is right under the table
is if interactions are
present and nonignorable.
- (p. 127) You can skip over the description of the technique on p.
127. Advanced ploys such as this one might be covered in a course
that has ample time to go into details of experimental design and ANOVA.
- (p. 127)
To be consistent with the previous page, SS(E) should be changed
to SS(AB) in three places: once before (4.17), once in (4.17),
and once after (4.17) (although it may be better to change
AB to E on p. 126 and leave p. 127 alone).
- (p. 128) In the 1st full paragraph on this page, Miller returns to
the case where n > 1 (he had been addressing the n = 1
case). The approximate ANOVA scheme is something I sometimes give a HW
problem about in 554, but not always --- it's certainly something you
should be aware of.
- (p. 128) The 2nd and 3rd full paragraphs go beyond the scope of
STAT 554. The Miller book is chock full of nice things, that at some
point in your life you may find useful, but we just don't have time to
get to them all in STAT 554.
(Note: In GMU's
M.S. level course on ANOVA, unbalanced designs are not covered due to
lack of time. Miller's book touches on some pretty advanced stuff!)
- (p. 128) The sentence that begins at the bottom of the page and
continues on the next page pertains to the approximate ANOVA scheme.
I'll point out that the difference between the approximate ANOVA scheme
and the mupltiple regression scheme referred to in item 3 above can be
very slight, and that statistical software packages often have a method
to handle unbalanced designs without making the user go to a whole lot
of trouble.
- (p. 129) This page is okay for STAT 554, but you can skip the rest
of subsection 4.1.2. (It gets messy on p. 130.)
- (p. 129)
The first line of the paragraph that begins on the bottom portion of the
page should have nij instead of ni.
- (pp. 131-134) You can skip subsection 4.1.3. It'll be nice if you
can fight through the monotone altrnatives subsection in Ch. 3, and just
note that the methods can be extended to more complex designs.
- (p. 134) The word the is misspelled in the 4th line from the
top.
- (p. 135) The last sentence of the 1st paragraph of Sec. 4.2
indicates that nonnormality has little effect on the test procedures.
I think it's more accurate to indicate that it has little effect on the
validity (type I error rate), since heavy tails can have a large effect
on the power characteristics.
- (p. 137) I'm not so sure that transformations "do not seem to be as
frequently used with the two-way classification" --- many experienced
statisticians immediately turn to transformations if the see a funnel
shape when resisuals are plotted against the cell means ... and
transformations are okay in such cases as long as you are willing to
deal with the transformed model (i.e., don't try to "backtransform" or
apply results obtained from the transformed model to the original
model). Miller suggests in several places in this chapter that one can
sometimes find a transformation that tends to make
everything nice: you get (approximate) normality, homoscedasticity
(constant variance), and maybe even additivity. At times this may
suggest that you have stumbled across a better description for the
phenomenon: you get a simple model when you model log Y or the
square root of Y, instead of a messier model if you simply try to
model Y.
- (p. 138) Page's test is covered in the last full paragraph. I
cover this test in STAT 657 (nonparametric statistics).
- (p. 138-139) The paragraph that starts on the bottom of the page and
continues onto p. 139 refers to methods that I haven't found time to
cover in STAT 657 --- once again indicating that Miller's book is just
chock full of references to numerous methods that you may not be exposed
to elsewhere in our M.S. in statistical science program.
The last paragraph on p. 139 addresses yet another nonparametric method
that I don't have time to work into STAT 657.
- (p. 140) Two books on robust statistics by Rand R. Wilcox can
supply updated informationon robust estimation (and testing). For this
material, his 1997 book has more details. (Note: I don't agree with
everything that is in Wilcox's books, but they are chock full of good
references.)
- (p. 142) I'll try to supply some insight about the paragraph in the
middle of the page. If the observations in each column are positively
correlated, then the standard errors for the column means will be
underestimated, and so the standard errors for the differences of two
column effects will be too small, making the effects seem different when
in fact they could be equal and the column means differ only due to
random chance. If observations are negatively correlated, then the
standard errors will be overestimated, and true differences may not
be detected. (The inflated standard error estimates will render the
observed differences statistically nonsignificant.)
- (pp. 142-143) Although we don't have time to deal with repeated
measures in STAT 554, it'll be nice if you at least know a little
bit about them. Miller addresses them in the last paragraph of Sec.
4.4. Basically you have a repeated measures design if the same
subjects/plots of land/experimental units are observed at different
times. So you have different time periods, but the observations at each
time are made on the same subjects, as opposed to having different
sibjects for each time period.
- (p. 143) The first full paragraph on the page is a good one.
- (p. 144) It's okay if you don't grasp everything in the last
paragraph --- the typical STAT 554 student doesn't have a sufficient
background to make mastering this material easy.
- (p. 157) In the 1st paragraph Miller is suggesting that if one
doesn't reject the null hypothesis of no factor B effect, then one might
combine SS(B) and SS(E) to get a more powerful test.
What this amounts do is ignoring the different levels of factor B and
doing a one-way ANOVA to test for differences due to factor A. But many
may be hesitant to do this. It should be recalled that just because one
doesn't reject the null hypothesis, it doesn't mean the null hypothesis
is true. So one could have mild differences due to factor B that are
not statistically significant due to low power. I'd say that when in
doubt, one should respect the original experimental design and do the
analysis on the nested model. When you ignore factor B, you're in
effect assuming that observations under the same level of factor A
are iid, and they may not be (due to
factor B effects (even though they may not be statistically
significant)).
- (p. 158) Miller is skimpy on the details here --- the paragraph
containing expression (4.53) leads to a confidence interval for
mu based on a t statistic. For beginning M.S. students,
it seems a bit far-fetched that it's easy to arrive at this result based
on what is given.
- (pp. 159-161) Most of the Ch. 4 exercises have a theoretical bent to
them, and your background may be insufficient for you to easily solve
them. (Miller assumes a lot more probability / distribution theory than
most M.S. students have.)
- (p. 162) Exercise 10 deals with a Latin square design, but Miller's
book doesn't cover the basics of Latin square designs.
- (p. 241) Miller indicates that
E(y)/E(x) is the focus more often than
E(y)/E(x) is, but I wonder if that's correct. I seem to think of
more cases that involve the mean ratio rather than the ratio of the
means. The mean ratio, E(y/x) is easier to handle: basically one
just puts wi
= yi / xi, and applies Ch. 1 methods to the
wi. Because of this, Ch. 6 of Miller's book deals
with E(y)/E(x), which is a tougher target for inferences.
- (pp. 244-245) In my summer offering of STAT 789 (a special topics
course), I'll go through the derivation of the confidence interval dealt
with on these two pages. It's a bit trickier than some of the ones I
cover in STAT 554 since one doesn't seem to directly trap the estimand
between a LCB and an UCB. (My STAT 652 course addresses a similar
confidence interval problem involving a quadratic equation.)
- (p. 248) I don't think the equality in expression (6.18) is exact
--- I think it should be approximately equal, but not exactly. (Maybe
the same is true of expression (6.17).)
- (p. 251) The flow of this page is a bit confusing. The paragraph
in the middle of the page is like an aside, and the paragraph below it
returns to the topic dealt with on p. 250 and the top of p. 251.
- (p. 260) I would include a factor of (n - 1) in the
numerator of the pivotal statistic given a bit more than half way down
the page, since doing it this way would result in a pivotal statistic
having a chi-square distribution.
- (p. 261) Miller indicates that the likelihood ratio test has
critical constants slightly different than those given in expression
(7.6) (although (7.6) is correct if the sample sizes are equal). I
address this in my course notes for STAT 652 (but unfortunately, there
usually isn't a lot of time that I can spend expaining the details in
class).
- (p. 262) Note that expression (7.9) is equal to the MSE from
a one-way ANOVA.
- (p. 262) It's an important point that Hartley's test and Cochran's
test are only for the equal sample sizes setting.
- (p. 263) There is strong evidence that two variances differ if the
interval indicated in expression (7.15) does not contain 1.
- (p. 263) The last paragraph on this page suggests a possible M.S.
thesis project.
- (p. 264) The first sentence on this page is very important.
- (p. 264) In the 1st paragraph, note that light tails result in a
conservative test, and heavy tails result in an anticonservative test
--- this is the opposite of what is true for normal theory tests about means.
The 3rd paragraph indicates that the effect of nonnormality can be quite
large --- the actual size of the test can differ from its nominal level
by a factor of 5 or more.
- (p. 269) While one of the variations for Levene's test referred to
in the 4th paragraph may better achieve approximate normality, the
original version indicated in the 2nd line of the page corresponds
better to the distribution variance --- the mean of the
zij approximates the variance of the ith
distribution if the
zij are squared deviations from the sample mean.
If it appears that a scale model holds, meaning that the
distributions have the same shape but differ only in scale, then the
variations in the 4th paragraph can be applied without worry, and I'd go
with whichever one best achieves approximate normality. But if the
distributions seem to have different shapes, the original version based
on the squared deviations from the sample means seems more appropriate
(and one would just have to rely on the robustness of the t test
if the
zij appear to be rather nonnormal. Also, if a scale model
can be assumed, then Student's t test makes more sense than
Welch's test, but otherwise I wonder if Welch's test could sometimes be
more appropriate.
- (p. 273) Something seems missing right at the start of the page.
I think it should be product-moment correlation coefficient
instead of just coefficient (although perhaps transformed
correlation coefficient would be more accurate).
- (p. 276) There should be a space between and and
identically in Exercise 3.