comments on Miller

Comments on Beyond ANOVA: Basics of Applied Statistics by R. G. Miller, Jr.

Below are some comments about the various chapters of the book. (You can use these links to jump down to comments about Ch. 1, Ch. 2, Ch. 3, Ch. 4, Ch. 6, and Ch. 7.) I may add comments about Ch. 5 later, but my presentation of regression in STAT 554 more closely matches the one in the book by Bhattacharyya and Johnson. The number of comments do not reflect the importance of the various parts of the book --- rather I have just added comments that I think may be helpful to you as you read through the book, and I'll let some parts of the book stand on their own.

This book matches a lot of what I cover in my course fairly well. In my course notes and my lectures I won't have time to go into all of the more detailed information in Miller's book, but I may briefly mention some issues that are in the book and for which it would be great if you could look into on your own. For some topics my course notes supply a bit more explanation than does Miller's book.

It may be a good idea to read the course notes first, then listen to the lecture, and then read the appropriate parts of Miller's book (and if you choose this type of attack, you can wait until after Week 4 to begin reading Miller's book). My advice is to consult the reading guide to see which parts of the book correspond to which lectures, and then read Miller's book while referring to the notes that I have below. Don't worry if you find yourself not getting all of the material mastered as you read through the book --- at this stage it should be okay to understand the material at the level at which I present it during my lectures.

Chapter 1

(p. 2) The 2nd and 3rd sentences of the 1st complete paragraph on this page are in agreement with my emphasis on reporting p-values over just stating whether or not one rejects the null hypothesis with a test of a certain size or level.
(p. 2) I agree with Miller in that for a lot of applied settings, the Bayesian approach doesn't seem appropriate. (I cover some material on Bayesian statistics in STAT 652, so if you take that course you'll be told what the strategy is, and I'll discuss when it may be appropriate.)
(pp. 2-3) Because the likelihood function approach (like the Bayesian approach) requires that a parametric model be specified, I don't mention it in STAT 554 (but may briefly mention it in STAT 652 (652 uses likelihood functions a lot, but the particular likelihood function technique that Miller is referring to isn't stressed a lot)). Except for methods that are for approximately normal distributions, most of the methods covered in STAT 554 don't assume a particular parametric model, since in a lot of applied work one doesn't have enough information to confidently choose a parametric model.
(p. 3) I generally like to report p-values using 2 significant digits, although if the p-value results from an approximation or relies heavily on robustness that may be somewhat suspect because of a small sample size, I may just use one significant digit to report a p-value. I the p-value is small (say less than 0.005, 0.001, or 0.0005 in many situations), then I may just state that fact rather than report using 1 or 2 significant digits (since approximations of small probabilities tend to break down when one gets far enough out into the tail of the distribution).
(p. 3) The paragraph at the bottom of the page gets into the relationship between the magnitude of p-values and sample size. The power of a test typically depends on both the sample size and how "far" what is true is from the null hypothesis, and power relates to the likelihood of obtaining small p-values. It's important to note that with a large sample size the p-value can be small even though the truth is only slightly different from the null hypothesis (e.g., maybe there is only a very minor treatment effect), and with a small sample size, the p-value may be somewhat large even though the alternative hypothesis is true because of low power. One should keep in mind that the p-value is a measure of how compatible/incompatible the observed data is with the null hypothesis, and isn't a measure of how far the truth is from the null hypothesis (although it's related to this "distance", one also needs to bring the sample size into consideration).
(p. 4) In expressions (1.2) and (1.3), I think mu₀ should be replaced by mu_X, or just mu. (I think mu₀ should just be used in hypothesis testing. In (1.2) and (1.3) Miller means the true mean of the distribution, and we don't typically use mu₀ for that.)
(p. 4) When one sees the +/- notation, it may not be clear what is meant. Some follow the +/- with the half-width of a confidence interval, others with the estimated standard error, and others with some other value. I think it's best to report a confidence interval since sometimes +/- one standard error isn't a very meaningful thing. (Similarly, when one sees "error bars" in a graph, it may not be clear what is intended unless it is stated that it's a confidence interval, or +/- one standard error.)
(p. 4) Some would disagree with the last sentence on the page. I think it's okay in some cases to do a one-sided test even though it isn't clear that the effect has to be in a certain direction if there is an effect. To me, the important thing is to specify the direction of the alternative before looking at the data. However, in some situations, the fair thing may be to do a two-sided test. For example, with employee sex discrimination cases, I think a two-sided test is appropriate. Since a company can be criticized for promoting men in favor of women or women in favor of men, to allow one-saided tests would put the company in double jepordy (since the chance of a p-value less than or equal to 0.05 when there is no discrimination becomes 0.1 if we allow "attacks" on the company with one-sided tests in both directions). Of course, I think the use of simple statistical tests in discrimination cases is quite suspect in any case, since I think one cannot adequately model employees as equally-likely selections when it comes to selecting people to get promoted.
(pp. 6-7) Miller has "For most distributions, the central limit theorem will have had time to weave it's magic on y-bar by n = 10" --- you should note that he's referring to the distribution of the sample mean, and not the t statistic (when one has to estimate the standard deviation, the convergence to normality may not be quite as quick).
(very bottom of p. 7 and very top of p. 8) Note that the indication that the effect on the t test is small pertains to two-tailed tests, which aren't as sensitive (due to cancellation of errors effect). Certainly a skewness of magnitude 0.7 can have an appreciable effect on a one-tailed t test (unless the sample size is sufficiently large).
(p. 8, top of page) The actual p-value refers to the one which would be obtained if the true (but unknown) null sampling distribution of the test statistic could be used. The stated p-value refers to the one that results from using the T_n-1 distribution. (Recall that if the X_i don't have a normal distribution, then the t statistic won't have a T_n-1 null sampling distribution. So in practice, if we use the T_n-1 distribution to obtain a p-value, we're relying on robustness; hoping that the sampling distribution will be close to a T_n-1 distribution even if the X_i aren't perfectly normal.)
(p. 8) The first full sentence on this page refers to symmetric distributions (even though Miller doesn't indicate that this is the case). Skewness can change the story, as the next sentence suggests.
(p. 8) Although the Gayen article may have just considered positively skewed distributions, it's important to keep in mind that negative skewness can be just as bad. I think any guidelines pertaining to skewness should use the absolute value of gamma₁. (I think the guidelines pertaining to skewness and kurtosis suggested by line 7 and previous comments in the paragraph are not so good anyway. I tend to want to make some sort of adjustment for milder cases of nonnormality. For example, Johnson's modified t test can offer an appreciable improvement to the t test when the magnitude of the skewness is less than 1 and the sample size is small. (I don't think Miller was aware of Johnson's modified t test, which is often a decent way to deal with distribution skewness when making inferences about the mean. Bootstrap methods are also sometimes effective for dealing with skewed distributions, as is a test due to Chen which is similar to Johnson's test. Unfortunately, Miller's book just doesn't present good alternative methods to cope with skewness when making inferences about a mean.) Also, I looked over Gayen's paper, and I couldn't really see where the kurtosis greater than 4 guideline is coming from. One can note that the Laplace (double exponential) distribution, which has a kurtosis of 3, is a distribution for which the sign test is much more efficient than the t test (although the Laplace distribution is not a typical case in that it's quite peaked in the middle (which favors the sign test) compared to many other heavy-tailed distributions).)
(p. 9) The test that Miller refers to in the last complete sentence on the page is the type of test dealt with in STAT 652, where the focus is on deriving procedures tailored to particular parametric models.
(p. 10) Miller's description of outliers differs a bit from what I give in the STAT 554 class notes. (I like my more general description better.)
(p. 11) Normal probability paper was still in use in the early 1980s, but now it seems completely silly since it's easier to use software to produce probit plots.
(p. 14) It should also be pointed out that formal goodness-of-fit tests for nonnormality give the benefit of the doubt to the null hypothesis of normality, and this doesn't seem like a good idea to me, especially since such tests can have rather low power to detect nonnormality, particularly when the sample size is small.
(p. 15) I've done studies that suggest that the estimates I give in the class notes are generally superior to those given in expression (1.16). (Others also promote the ones given in the class notes. Miller's are simpler,and would be fine with large enough samples, but tend not to perform as well when the sample size is smallish.)
(p. 15) Tests about the skewness and kurtosis give too much benefit of the doubt tot he null hypothesis (with coincides with normality).
(p. 16) V is the covariance matrix of the Z_(i), not the y_(i).
(p. 18, about mid page) I disagree that "the correspondence is sufficient for practical purposes." Examples can be given in which the transformation ploy will routinely give absurd results (and the method tends to get worse as the sample size increases).
(p. 18) The transformation method is fine if you're making inferences about the median instead of the mean, or if you're content to make an inference about E(Y) = E(g(X)) instead of E(X).
(p. 20, p. 23, and other places in the book) Miller doesn't emphasize that the use of a continuity correction often improves the approximation of an integer-valued random variable's distribution by a normal distribution. I've found that the use of such a continuity correction is typically a good idea.
(p. 20, 3rd line from bottom) The upper confidence bound is wrong --- it should be n-s instead of n-s-1.
(p. 21) I don't like the 2nd paragraph on this page. I don't think it's good to think of the sign test as a quick screening device (with software, other tests can be done quickly too), and I don't think it's necessarily good to get a client out of the office quickly (I think it's better to take a bit more time to try to few different tests, in order to get a good feel for the data analysis situation at hand).
(pp. 21-22) If zeros are ignored when testing, it's possible to reject the null hypothesis and at the same time arrive at a point estimate for the median which is compatible with the null hypothesis (if zeros are not ignored when obtaining the estimate (and there is no reason to ignore zeros when obtaining the estimate)). Despite the possibly screwy situation (which should seldom occur in most settings), the common practice is to ignore zeros when testing.
(p. 22) Miller doesn't indicate that the signed-rank test has usefullness as a test of the null hypothesis of no treatment effect against the general alternative. (If one can assume independence, the test is valid, unlike when using the signed-rank test to do a test about the mean or median for which one also has to assume symmetry. Note that Miller does indicate that a small p-value cannot necessarily be taken to indicate that the mean or median differs from a specified value --- but by assuming symmetry one can use the signed-rank test to do tests about the mean or median.)
(p. 23) Miller has "the reader can verify with a little thought or mathematical induction" --- and while this may be true, I wonder if it's the most productive way to spend your time.
(p. 24) One need not bother with trying to understand the graphical method described on this page since Minitab can be used to obtain both the confidence interval and the Hodges-Lehmann estimate.
(p. 24) On the 9th line from the bottom, n needs to be replaced with n(n+1)/2.
(p. 26) Miller indicates that the normal scores test is inconvenient since it requires specialized tables. I'll point out that there is an approximate version of the test that works quite well and isn't too difficult to perform.
(p. 27) Miller indicates that the permutation test is "clumsy to carry out" --- but it's easy to perform with StatXact (a software package).
(p. 30) For STAT 554, please use ssd_w,g/[h(h-1)], where h = n - 2g, and g is the number trimmed from each end of the ordered sample, to estimate the variance (squared standard error) of a trimmed mean. I think this should generally result in increased accuracy.
(p. 31) I think the Huber M-estimator is more popular than the Tukey bisquare (biweight) version, although I like the Tukey version better in general (but in a lot of cases, it will matter little which version is used).
(p. 32) Miller has "Some hardy souls recommend continued use of symmetric robust estimators ... but I canot recommend this." I guess if the sample size is smallish, I'm one of those hardy souls --- and I've done studies to justify my position. (Even if the estimator is biased for the mean or median, a reduction in variance may offset the contribution to the MSE (mean squared error) due to the bias.)
(pp. 32-37) In STAT 554, I just don't have a lot of time to address things covered by Miller's Sec. 1.3. I suggest that you read through the end of p. 33, and then skim through expression (1.52) on p. 36 in order to get some appreciation for the topics covered. A course in time series methods should address some of the issues.
(p. 34) In expression (1.47), it should be j not equal to 0, 1, and -1.
(pp. 36-37) The sentence that begins on p. 36 and ends on p. 37 isn't clear to me.
(p. 37) To me, Exercise 1 would be more appropriate for STAT 652 than STAT 554.
(p. 38) Exercise 2 is a bit too theoretical in nature too to be a good STAT 554 problem.

Chapter 2

(p. 40) In the 1st paragraph Miller has "the problem remains a one-sample problem of comparing the mean difference with zero." I think Miller places too much emphasis on the mean. In some situations the median may be of more interest, or perhaps the proportion of cases in which the treatment would be effective will be the main interest. Or perhaps one could start by just testing for the presence of a treatment effect (of whatever sort), and then focus on characterizing the nature of the treatment effect if one is found.
(p. 42) The 1st sentence pertains to the equal variance case. (If the variances differ, then the central limit theorem does not necessarily make Student's two-sample t test asymptotically valid.)
(p. 43) The effect of skewness on Student's t statistics isn't as easy to summarize in the two-sample case as it is in the one-sample case. I think additional studies are called for --- and an industrious M.S. student could make a nice individualized study project of such a study.
(p. 46) The last sentence of the 2nd paragraph doesn't make sense: in order to have the expected number in each cell (under the null hypothesis) to be at least 5, only 10 observations are required for each sample --- and indicating that one needs 10 in each sample is not being conservative (since for fewer than 10, the chi-square approximation may not work well enough).
(p. 49) Miller has "assign i to the ith largest observation." It should be: assign i to the ith smallest observation. (The 4th smallest observation gets rank 4. The 4th largest gets rank n₁ + n₂ - 3. (The largest gets rank n₁ + n₂, the 2nd largest gets rank n₁ + n₂ - 1, and so on.))
(p. 50) Note the nifty argument given for expression (2.17) at the top of the page.
(p. 55) The description of the test given in the class notes is just a bit different (and leads to a slightly more accurate test).
(p. 56) You may find the details easier to follow the way they are presented in the class notes.
(p. 57) In the 2nd paragraph, Miller has "The effect on the P value is not large." Then the example given has a reported value of 0.05 vs. a true value of 0.03. Some may find such a difference bothersome. In practice, a difference of 0.04 vs. 0.06 may be quite bothersome.
(p. 57) In the last line, note that the actual type I error rate can be 0.22 when the nominal level is 0.05 --- and so the effect of unequal variances on Student's t can be very great.
(p. 58) I think Miller should have described how the transformation ploy can sometimes yield very silly results. I think it is bad practice to make inferences with the transformed values and then report results about the original scale.
(p. 59) Right below expression (2.35) Miller suggests that the delta method approximations can be justified asymptotically if y is asymptotically normal. I don't follow: the distribution is what it is and isn't going to change as the sample size increases. (If we were concerned with a sample mean instead of a single random variable, then the sample mean would be asymptotically normal.)
(p. 61) You may find the description in the class notes easier to follow.
(pp. 62-63) In the last sentence of Section 2.3 reference is made to Potthoff's 1963 paper. I believe that Miller's summary is perhaps a bit inaccurate. The paper seems to claim that a modification of the Wilcoxon test is robust for the generalized Behrens-Fisher problem, but that the ordinary Wilcoxon procedure is anticonservative (see the last sentence of Section 3 of the Potthoff paper). Miller seems to imply that while the Wilcoxon test is affected by unequal variances, it more or less does okay. I believe that it should be mentioned that a modification of the Wilcoxon test should be preferred (and that other methods may be even better).
(p. 64) The 1st exercise is a STAT 652 type of problem, and the next three exercises are more theoretically oriented than I assign in STAT 554 (but Exercise 4 is a very interesting one).

Chapter 3

(p. 67) Calling the e_ij the "random (unexplained) variation" is "okay" --- typically it is called the "error term" and I'll often call it that as a matter of habit or convenience, but since measurement error may only account for a small portion of the variation, perhaps "error term" is a bit misleading. Usually, the variation is due to both (1) measurement imprecision and (2) natural variation of values in the population (and this variation becomes random when we randomly select population members to measure).
(p. 68) This page is a particularly good one. The first full paragraph describes the important concept of random effects, the second full paragraph gives a nice example, and the third full paragraph describes how we sometimes model things as a random effect even if we don't make random selections --- all of this stuff should give you a better understanding of random effects.
(p. 69) Perhaps the constraint in the footnote seems more sensible (since otherwise the treatment effect magnitudes are related to the sample sizes, and one may prefer to think of them as fixed values that shouldn't depend on sample size), but the constraint given in the top half of the page is related to the noncentrality parameter of the F distribution of the test statistic.
(p. 70) You may be unsure about the notation in expression (3.4). Consider the 2nd of the 3 distribution facts. We have that SS(A)/sigma² has a chi-square distribution with I-1 df and noncentrality parameter given by the ratio in the parentheses. If the null hypothesis is true, the noncetrality parameter equals 0 and we just have an ordinary (central) chi-square distribution for SS(A)/sigma². The way it's expressed in the book, with SS(A) not divided by sigma², gives us that the distribution of SS(A) is that of a chi-square random variable multiplied by the constant sigma² (and so a scale altered chi-square distribution).
(p. 71) Miller wrote that the F test "has several deficiencies" by which he means that since it's not optimal (it's not a UMP test), even if the error term distribution is normal, the "door is open" to alternative methods that may have higher power in some situations.
(p. 72) On the last line probability is misspelled.
(p. 73) The last sentence of the paragraph that begins the page is important.
(p. 74) Neither the F test or the studentized range test dominates the other --- but the studentized range intervals dominate the Scheffe intervals, which are related to the F test. You may be wondering how this can be. Well, the key thing to note is that the Scheffe intervals are not in complete correspondence with the F test. Note the inequality in expression (3.10). It allows for caes in which all of the intervals include 0, but the test still rejects, and so in a sense the confidence intervals are too wide, and the studentized range intervals can be shorter even though the F test yields a smaller p-value.
(pp. 75-76) In STAT 554 I don't have time to cover contrasts in general, and so you can skip or skim the bottom part of p. 75 and the top part of p. 76.
(p. 76) In STAT 554 I don't have a lot of time to spend on monotone alternatives, and so after getting through the top half of p. 77 (in order to gain some understanding of what the main issues are with monotone alternatives), you can skip or skim the rest of subsection 3.1.3 (however, I will make some comments in order to possibly assist the interested few who tackle the rest of the subsection).
(pp. 78-79) Keep in mind that the c_i sum to 0. (This follows from p. 75.)
(p. 79) The values of the c_i given in expression (3.21) for the cases in which I is odd may seem defective in a sense. For example, in testing against the alternative mu₁ <= mu₂ <= mu₃, the value of the sample mean of the 2nd sample does not come into play --- and it could be way out of line with the alternative and not have an effect on the p-value ... in a sense meaning there is no accounting for a strong piece of evidence suggesting the the alternative hypothesis isn't true. This fact stresses the importance of believing that either the null hypothesis is true or the alternative hypothesis is true --- in which case the test should perform okay with regard to the type I error rate. If it could be that neither the null or the alternative hypothesis is true (and some other ordering exists for the means), then the test can misbehave. In particular, there can be a high probability of getting a rejection of the null hypothesis even though the alternative hypothesis isn't true.
(p. 80) The first sentence of subsection 3.2.1 is generally true for the case of iid error terms. If the distributions are not identical, and in particular if the variances aren't close to being equal, then the null sampling distribution of the test statistic can be appreciably different from the nominal F distribtion.
(p. 81) In the first full paragraph on p. 81 it's not clear to me what J is. (Perhaps Miller got the notation from the journal article confused with the notation he's using in his book. Maybe J should be n.)
(p. 82) I don't like it that Miller doesn't warn that one has to be careful using the transformation ploy (since one can get very misleading results by doing so).
(p. 83) Towards the bottom of the page, Miller indicates that the exact distribution (multivariate hypergeometric) is "too difficult to work with" but StatXact can be used to produce exact p-values based on this distribution.
(p. 89) Towards the top of the page, Miller has one sentence about robust estimation. Two books by Rand R. Wilcox (1997 & 2001) contain updated information about robust methods for multiple comparisons.
(pp. 89-90) Note that expression (3.38) weights the variances roughly proportional to sample size, whereas, in a lot of cases, expression (3.39) gives them roughly equal weight. (Miller gives more info on this at the bottom of p. 90 and the top of p. 91.)
(p. 90) Note that the sum in expression (3.40) equals 0 if all of the variances are equal, but is otherwise greater than 0.
(p. 90) About 75-80% down the page, Miller points out that the F test can be anticonservative (the resulting p-value can easily be smaller than it deserves to be), which is bad. Miller claims that the effect is not large, but Rand R. Wilcox indicates that it can be large, even if the sample sizes are equal (and I think this is the case: it can be large, but in a lot of situations it may be smallish). If the sample sizes are appreciably different, then the test can badly misbehave.
(p. 91) The square of the denominator estimates the true variance of the contrast if the variances are equal, but estimates expression (3.42) if the variances differ. This mismatch results in misbehavior.
(p. 92) The last sentence of subsection 3.3.1 suggest a good project for a student to do.
(p. 92) The 2nd to the last sentence of subsection 3.3.2 is typical of many such statements in the book by Miller. He treats "substantial extra computation" as perhaps a reason "not to go there" but I think we shouldn't fret so much over the extra work --- create macros or functions to do the extra work for you.
(p. 92) About 50% down the page, correct is incorrectly spelled.
(p. 92 & p. 93) From expression (3.44) and from two lines above it, it follows that one needs the derivative of g to be proportional to 1/h, which leads to the integral expression, (3.45).
(p. 93) Three lines below expression (3.45), y-bar (and not y) should be under the square root.
(p. 93) The last sentence of the second to the last paragraph deals with the performance of the nonparametric tests if there are unequal variances. If one is doing a test for the general k sample problem, unequal variances are okay. For tests about means, they can be okay in some circumstances, but if one cannot assume that one distribution is stochastically larger than the other one whenever two distributions differ, then unequal variances can hurt the validity of the nonparametric tests, and I suspect that in some cases the effect can be quite large.
(p. 93) The last paragraph refers to multiple comparison procedures for unequal variance settings. The one I discuss in class is chosen for it's simplicity, and perhaps some of the other procedures may work better.
(p. 94) The first paragraph points out that the Abelson-Tukey test can be easily adjusted to deal with unequal variances --- one may need to consult the 1946 Satterthwaite article to see how to find the proper df (but for the I = 3 case, the df formula for Welch's test can be used, since the test statistic just uses two of the three samples).
(p. 94) In the 2nd paragraph of Sec. 3.4, the serial "correlation in blocks" refers to two-way and higher-way designs.
(pp. 97-98) In STAT 554, I don't have time to cover a lot of what's on these two pages. It can be noted that the estimators on these two pages are all for balanced designs, and thus cannot always be used.
(pp. 101-104) I skip subsection 3.5.3 completely in STAT 554 (due to lack of time).
(p. 104) In expression (3.71) the denominator of the first term should be N instead of NI. (Of course, one can express N as either In or nI.)
(p. 108 (last line) and p. 109 (1st and 2nd lines)) The three subscripts of k should be i (or else the index of summation should be k instead of i).
(p. 110) Consider the last paragraph of Sec. 3.7. Miller indicates that nonnparametric procedures cannot be used to estimate the variances (they don't yield variance estimates), but in the unequal variance case (the focus of Sec. 3.7), there is no (single) error term variance --- the variances are different, and one would just have to use the various samples to supply a set of estimates. (In comment 27 above, I indicate that nonparametric tests can sometimes be used to do a test about the mean even if the variances appear to be different.) Miller finally puts something negative about transformations in the last sentence of the paragraph (but he doesn't explain himself very well).
(pp. 110-111) I skip Sec. 3.8 completely in STAT 554 (due to lack of time).
(p. 111) I work out Exercise 1 in my class notes.
(pp. 115-116) The data for Exercise 11 is also used in Exercise 11 of Ch. 4. Based on the experimental design, Ch. 4 methods are more appropriate than Ch. 3 methods.

Chapter 4

(p. 119) I'm not real sure what Miller means by the sentence right before subsection 4.1.1. Maybe the intermediate models are ones in which the interaction terms are random, and are thus governed/contstrained in a way that doesn't allow for completely arbitrary cell means (and at the same time makes the model nonadditive).
(p. 122) The first full paragraph on this page is important: if there are interactions, the interpretation of the main effects is more complicated.
(pp. 122-123) The paragraph that starts on the botton of p. 122 and continues on p. 123 describes a method that I won't have time to cover in STAT 554 (since it is based on multiple regression, and I barely have time to cover the basics of simple regression).
(p. 125) It should be degrees (instead of degress).
(p. 126) What is meant by "In other instances" in the sentence that is right under the table is if interactions are present and nonignorable.
(p. 127) You can skip over the description of the technique on p. 127. Advanced ploys such as this one might be covered in a course that has ample time to go into details of experimental design and ANOVA.
(p. 127) To be consistent with the previous page, SS(E) should be changed to SS(AB) in three places: once before (4.17), once in (4.17), and once after (4.17) (although it may be better to change AB to E on p. 126 and leave p. 127 alone).
(p. 128) In the 1st full paragraph on this page, Miller returns to the case where n > 1 (he had been addressing the n = 1 case). The approximate ANOVA scheme is something I sometimes give a HW problem about in 554, but not always --- it's certainly something you should be aware of.
(p. 128) The 2nd and 3rd full paragraphs go beyond the scope of STAT 554. The Miller book is chock full of nice things, that at some point in your life you may find useful, but we just don't have time to get to them all in STAT 554. (Note: In GMU's M.S. level course on ANOVA, unbalanced designs are not covered due to lack of time. Miller's book touches on some pretty advanced stuff!)
(p. 128) The sentence that begins at the bottom of the page and continues on the next page pertains to the approximate ANOVA scheme. I'll point out that the difference between the approximate ANOVA scheme and the mupltiple regression scheme referred to in item 3 above can be very slight, and that statistical software packages often have a method to handle unbalanced designs without making the user go to a whole lot of trouble.
(p. 129) This page is okay for STAT 554, but you can skip the rest of subsection 4.1.2. (It gets messy on p. 130.)
(p. 129) The first line of the paragraph that begins on the bottom portion of the page should have n_ij instead of n_i.
(pp. 131-134) You can skip subsection 4.1.3. It'll be nice if you can fight through the monotone altrnatives subsection in Ch. 3, and just note that the methods can be extended to more complex designs.
(p. 134) The word the is misspelled in the 4th line from the top.
(p. 135) The last sentence of the 1st paragraph of Sec. 4.2 indicates that nonnormality has little effect on the test procedures. I think it's more accurate to indicate that it has little effect on the validity (type I error rate), since heavy tails can have a large effect on the power characteristics.
(p. 137) I'm not so sure that transformations "do not seem to be as frequently used with the two-way classification" --- many experienced statisticians immediately turn to transformations if the see a funnel shape when resisuals are plotted against the cell means ... and transformations are okay in such cases as long as you are willing to deal with the transformed model (i.e., don't try to "backtransform" or apply results obtained from the transformed model to the original model). Miller suggests in several places in this chapter that one can sometimes find a transformation that tends to make everything nice: you get (approximate) normality, homoscedasticity (constant variance), and maybe even additivity. At times this may suggest that you have stumbled across a better description for the phenomenon: you get a simple model when you model log Y or the square root of Y, instead of a messier model if you simply try to model Y.
(p. 138) Page's test is covered in the last full paragraph. I cover this test in STAT 657 (nonparametric statistics).
(p. 138-139) The paragraph that starts on the bottom of the page and continues onto p. 139 refers to methods that I haven't found time to cover in STAT 657 --- once again indicating that Miller's book is just chock full of references to numerous methods that you may not be exposed to elsewhere in our M.S. in statistical science program. The last paragraph on p. 139 addresses yet another nonparametric method that I don't have time to work into STAT 657.
(p. 140) Two books on robust statistics by Rand R. Wilcox can supply updated informationon robust estimation (and testing). For this material, his 1997 book has more details. (Note: I don't agree with everything that is in Wilcox's books, but they are chock full of good references.)
(p. 142) I'll try to supply some insight about the paragraph in the middle of the page. If the observations in each column are positively correlated, then the standard errors for the column means will be underestimated, and so the standard errors for the differences of two column effects will be too small, making the effects seem different when in fact they could be equal and the column means differ only due to random chance. If observations are negatively correlated, then the standard errors will be overestimated, and true differences may not be detected. (The inflated standard error estimates will render the observed differences statistically nonsignificant.)
(pp. 142-143) Although we don't have time to deal with repeated measures in STAT 554, it'll be nice if you at least know a little bit about them. Miller addresses them in the last paragraph of Sec. 4.4. Basically you have a repeated measures design if the same subjects/plots of land/experimental units are observed at different times. So you have different time periods, but the observations at each time are made on the same subjects, as opposed to having different sibjects for each time period.
(p. 143) The first full paragraph on the page is a good one.
(p. 144) It's okay if you don't grasp everything in the last paragraph --- the typical STAT 554 student doesn't have a sufficient background to make mastering this material easy.
(p. 157) In the 1st paragraph Miller is suggesting that if one doesn't reject the null hypothesis of no factor B effect, then one might combine SS(B) and SS(E) to get a more powerful test. What this amounts do is ignoring the different levels of factor B and doing a one-way ANOVA to test for differences due to factor A. But many may be hesitant to do this. It should be recalled that just because one doesn't reject the null hypothesis, it doesn't mean the null hypothesis is true. So one could have mild differences due to factor B that are not statistically significant due to low power. I'd say that when in doubt, one should respect the original experimental design and do the analysis on the nested model. When you ignore factor B, you're in effect assuming that observations under the same level of factor A are iid, and they may not be (due to factor B effects (even though they may not be statistically significant)).
(p. 158) Miller is skimpy on the details here --- the paragraph containing expression (4.53) leads to a confidence interval for mu based on a t statistic. For beginning M.S. students, it seems a bit far-fetched that it's easy to arrive at this result based on what is given.
(pp. 159-161) Most of the Ch. 4 exercises have a theoretical bent to them, and your background may be insufficient for you to easily solve them. (Miller assumes a lot more probability / distribution theory than most M.S. students have.)
(p. 162) Exercise 10 deals with a Latin square design, but Miller's book doesn't cover the basics of Latin square designs.

Chapter 6

(p. 241) Miller indicates that E(y)/E(x) is the focus more often than E(y)/E(x) is, but I wonder if that's correct. I seem to think of more cases that involve the mean ratio rather than the ratio of the means. The mean ratio, E(y/x) is easier to handle: basically one just puts w_i = y_i / x_i, and applies Ch. 1 methods to the w_i. Because of this, Ch. 6 of Miller's book deals with E(y)/E(x), which is a tougher target for inferences.
(pp. 244-245) In my summer offering of STAT 789 (a special topics course), I'll go through the derivation of the confidence interval dealt with on these two pages. It's a bit trickier than some of the ones I cover in STAT 554 since one doesn't seem to directly trap the estimand between a LCB and an UCB. (My STAT 652 course addresses a similar confidence interval problem involving a quadratic equation.)
(p. 248) I don't think the equality in expression (6.18) is exact --- I think it should be approximately equal, but not exactly. (Maybe the same is true of expression (6.17).)
(p. 251) The flow of this page is a bit confusing. The paragraph in the middle of the page is like an aside, and the paragraph below it returns to the topic dealt with on p. 250 and the top of p. 251.

Chapter 7

(p. 260) I would include a factor of (n - 1) in the numerator of the pivotal statistic given a bit more than half way down the page, since doing it this way would result in a pivotal statistic having a chi-square distribution.
(p. 261) Miller indicates that the likelihood ratio test has critical constants slightly different than those given in expression (7.6) (although (7.6) is correct if the sample sizes are equal). I address this in my course notes for STAT 652 (but unfortunately, there usually isn't a lot of time that I can spend expaining the details in class).
(p. 262) Note that expression (7.9) is equal to the MSE from a one-way ANOVA.
(p. 262) It's an important point that Hartley's test and Cochran's test are only for the equal sample sizes setting.
(p. 263) There is strong evidence that two variances differ if the interval indicated in expression (7.15) does not contain 1.
(p. 263) The last paragraph on this page suggests a possible M.S. thesis project.
(p. 264) The first sentence on this page is very important.
(p. 264) In the 1st paragraph, note that light tails result in a conservative test, and heavy tails result in an anticonservative test --- this is the opposite of what is true for normal theory tests about means. The 3rd paragraph indicates that the effect of nonnormality can be quite large --- the actual size of the test can differ from its nominal level by a factor of 5 or more.
(p. 269) While one of the variations for Levene's test referred to in the 4th paragraph may better achieve approximate normality, the original version indicated in the 2nd line of the page corresponds better to the distribution variance --- the mean of the z_ij approximates the variance of the ith distribution if the z_ij are squared deviations from the sample mean. If it appears that a scale model holds, meaning that the distributions have the same shape but differ only in scale, then the variations in the 4th paragraph can be applied without worry, and I'd go with whichever one best achieves approximate normality. But if the distributions seem to have different shapes, the original version based on the squared deviations from the sample means seems more appropriate (and one would just have to rely on the robustness of the t test if the z_ij appear to be rather nonnormal. Also, if a scale model can be assumed, then Student's t test makes more sense than Welch's test, but otherwise I wonder if Welch's test could sometimes be more appropriate.
(p. 273) Something seems missing right at the start of the page. I think it should be product-moment correlation coefficient instead of just coefficient (although perhaps transformed correlation coefficient would be more accurate).
(p. 276) There should be a space between and and identically in Exercise 3.