Comments on Statistical Concepts and Methods by Bhattacharyya and Johnson


Below are some comments about the various chapters of the book. (You can use these links to jump down to comments about Ch. 1, Ch. 3, Ch. 4, Ch. 5, Ch. 6, Ch. 7, and Ch. 8. Later in the semester I'll try to supply comments about other chapters.) The number of comments do not reflect the importance of the various parts of the book --- rather I have just added comments that I think may be helpful to you as you read through the book, and I'll let some parts of the book stand on their own.

This book matches a lot of what I cover in my course fairly well, but is somewhat elementary and lacks sufficient information about dealing with nonnormality and heteroscedasticity. The books by Miller and Wilcox, as well as my lecture presentations, will supply you with information about newer and/or more advanced methods that address these concerns. But reading through Bhattacharyya and Johnson should help you master the basics, and set you up to better understand the more advanced material.

My advice is to consult the reading guide to see what parts of the book correspond to which lectures, and then start at the beginning and try to read the first part of the book in a more or less linear manner to insure continuity, but perhaps skipping material on probability with which you are sufficiently well acquainted. This will require a lot of reading during the first few weeks of the semester, but hopefully the reading will go fast and you'll find most everything easy to follow. (I won't get into the material of the Miller book until the last half of the 4th lecture, so you can leave it alone at the beginning of the semester, and focus on this book. But you may want to examine the reading guide to determine how you want to pace yourself through the Wilcox book. It's somewhat like this one in that you should read a lot of it during the first half of the semester.)

Since Bhattacharyya and Johnson puts all of its information about nonparametric methods in a single chapter towards the end of the book, to better prepare for the 5th lecture you may want to deviate from a linear reading of the book and read some of the material in Ch. 15. Also, in the second half of the semester, I cover topics in a different order than the book does, and so you might find yourself skipping about a bit.

Chapter 1

  1. In my lectures, I mention some of the specific things covered in Ch. 1, but for other parts of the chapter, I assume you'll tie things together and get the "big picture" as we go along. Reading Ch. 1 ought to help you get a better understanding of what applied statistics is about. (In my first lecture we start to get into some of the details of a simple situation and I'm a little skimpy with regard to motivating the relevance of data analysis to experimenters from various fields.)
  2. In some fields, a typical M.S. thesis may be concerned with an investigation as to whether a certain hypothesis appears to be true. For example, in biology, a student may want to determine if the presence of a certain type of artifical aquatic plant attracts a particular species of fish. Experimental data could be collected to determine if there is evidence that the artifical plants attract the fish. Because the number of fish observed at a particular location could flucuate due to many reasons, careful application of statistical methods should be applied to the observations resulting from a properly designed experiment. A poorly designed and executed experiment and/or incorrect use of statistical methods could lead to erroroneous conclusions (or there could be so much uncertainty reflected by the data that no firm conclusion could be reached). (Comment: Too often people from other fields consult with statisticans after all of the data has been collected, only to then learn that the experimental design that was used was poorly suited for the hypotheses to be addressed.)
  3. (p. 5) Descriptive statistics and inferential statistics are mentioned. Modelling could also be mentioned --- it's somewhat like descriptive statistics in that the goal is to provide a mathematical model for a certain phenomenon, but to build the model, inferential methods are used.
  4. (Sec. 1.6) The population variability, combined with the fact that you may just have a smallish sample of observations to work with, leads to the challenging problem of making accurate inferences about the larger population based on the smaller smaple.
  5. (p. 7) Note the distinction between a sampling unit and an observation. The sampling unit may be a tree, but the observation may be a wood density measurement for the tree.
  6. (p. 8) Note that the population is comprised of (potential) observations, not of sampling units.
  7. (p. 8) Note that a sample is a collection of observations. So if I have a set of 50 wood density measurements, I have a single sample of size 50 (50 observations). We say that the sample size is 50. In some fields, people seem to use the term sample like statisticians tend to use the term observation. Some might say "I have 10 samples" when they have 10 sampling units (say specimens of dirt) from which observations will be made. In statistics, 10 samples would typically imply 10 sets of observations made from 10 populations.
  8. (p. 8) In some cases, it's better not to think of a sample as being a subset of a finite population. Rather we think of observations as arising from distributions, and we want to make an inference about some aspect of the distribution underlying the observations. (For example, the observations may be measurements of some aspect of air quality made at a particular location at various times. Since time is measured on a continuous scale, there is not a finite population of measurements that could be made, but rather an infinite number, and so it's hard to think of the measurements as being a subset of a finite population.)
  9. (p. 9) The boxed information is important, as is the sentence that immediately precedes Sec. 1.7

Chapter 3


Chapter 4


Chapter 5

  1. (pp. 141-150) Hopefully, you're already rather familiar with the Bernoulli and binomial distributions.
  2. (pp. 144-145) In Example 5.2, it isn't so important that the patients aren't physically identical. If we regard each patient in the study as being randomly selected, then we can sort of think that there is a constant probability of getting an S (where the sort of is due to the fact that it may be better to model the situation with a hypergeometric dist'n (see pp. 152-154 of the book) since if we sample without replacement from the population of all people who have the disease we won't have independent trials). If we sample with replacement from all people having the disease then we will have iid Bernoulli trials (even if not all people are identical, since on each trial there is a constant probability (just the proportion of curable patients) of getting an outcome of cure (S)).
  3. (pp. 150-151) You can skip the subsection on Other tables.
  4. (pp. 152-154) We'll briefly deal with the hypergeometric distribution during the 4th lecture (and on HW #2).
  5. (p. 153) The pmf given in the box for the hypergeometric distribution is nonzero for all values x satisfying max{ 0, n - (N - D) } <= x <= min{ n, D }. (This gives the support of X even if n > D or n > N - D.)
  6. (pp. 154-159) You can skip Sections 5.7 and 5.8. (Although the geometric and the Poisson distributions are important in general, we won't do anything with them in this class.)

Chapter 6

  1. (p. 166, 1st 2 lines) You'll see in the pages to come that H will take the role of the research hypothesis (alternate hypothesis), H1, and will take the role of the null hypothesis, H0.
  2. (p. 166) In the Typical conclusion I don't like the phrase "highly unlikely that the statistical hypothesis is true." To me, the word likely suggests probability, but I don't like to think of the hypothesis being true or not true according to a probability distribution. A hypothesis either is or is not true --- we just don't know for sure which one. If the data is rather incompatible with a hypothesis, then it may suggest that the hypothesis may not be true. (Note: My objection is with the book's use of the word likely. It's a small matter perhaps, and some may think I'm being too picky. (I know I use likely in this manner at times too --- but I don't think I should.))
  3. (p. 167, near bottom of page, & p. 168) Note the explanation for the word null. More explanation is given in the 1st full paragraph on p. 168.
  4. (pp. 167-168) The sentence that begins on p. 167 and continues onto p. 168 doesn't make a lot of sense to me --- it seems as though they have some words wrong.
  5. (p. 169, 4th line) I don't particularly like the phrase "test of the null hypothesis," although it's not that uncommon. I prefer to say that I'm testing to determine if the data provides statistically significant evidence for the alternative hypothesis (except instead of saying alternative hypothesis, I'd put into words whatever the particular alternative is for the case at hand (e.g., p > 0.6)).
  6. (p. 170) Note: I use alpha for the maximum (really the supremum of the values) probability of a type I error, and don't usually concern myself with alpha(p). Also, the book uses beta for the probability of a type II error, while I use it for the power function (and the book uses gamma for the power function). It's unfortunate that terminology and notation isn't consistent among statisticians, and it's particularly unfortunate that the book's usage isn't the same as mine. I try to go with what's most proper, or most common if one choice doesn't seem more or less proper than another. With beta, a lot of undergrauate-level books use it for the probability of type II error (like this book does), but it's frequently used for power at the graduate level.
  7. (p. 171) Although one could use tables to obtain the values in TABLE 6.1, I encourage you to see if you can get them using software.
  8. (p. 170) Instead of "power of the test at the alternative," some use "power against the alternative." Some students think the choice against (I say "agaist the alternative" a lot) is odd, and think it'd be better to say "power for the alternative." I believe the choice of against seems sensible if one thinks of a plot of the power function (like Fig. 6.2 on p. 173) in which the power is plotted using the vertical axis against the parameter value on the horizontal axis.
  9. (p. 173) This page is important (but keep in mind that the book uses beta for the probability of a type II error instead of for power).
  10. (p. 174) Really, size would be a better choice than "level of significance." (As I have in my class notes on p. 31, there is a distinction between level and size, but often people say level when size would be a better choice (and sometimes I mess up and say level when I'd prefer to say size). One reason the distinction is often overlooked is because if a test statistic has a continuous distribution (as opposed to a discrete one), one can make the size of the test match any chosen level (w/o using a randomized test).
  11. (p. 174) The paragraph right before Sec. 6.5 is important --- one should choose the rejection region (and thus the size and power characteristics) taking into account the consequences of type I and type II errors. As for the last sentence of the paragraph, keep in mind that the 5 errors pertain to 100 testing situations in which the null hypothesis is true. The expected number of type I errors could be less than 5 since the value 0.05 is the maximum prob. of type I error, and the actual prob. of type I error could be less than 0.05. Also, note that even if the prob. of a type I error is exactly 0.05 if the null hyp. is true, it doesn't mean that the expected number of type I errors in 100 tests is 5 unless the null hypothesis is true in every case. If the alternate hyp. is sometimes true, then the expected number of type I errors in 100 tests will be less than 5 (since it's impossible to make a type I error if the alternate hyp. is true).
  12. (p. 174) The last 3 sentences are important.
  13. (p. 175) p-value is more commonly used than significance probability. (Also, the book's notation of P with an asterisk isn't common (although some do use just P).)
  14. (p. 177) Note that getting a power value for a two-tailed test is a bit more work than getting a power value for a one-tailed test (but it's not too bad --- one just has to add two probabilities).
  15. (p. 178-179) Steps (a) through (e) give a nice summary of the general scheme in hypothesis testing. In step (e), note that even in the case where the null hyp. is rejected and it can be said that there is strong evidence to support the alternate hypothesis, one should not say that the null has been proven false --- there is no definite proof since one cannot absolutely rule out the possibility that a type I error has been made.
  16. (p. 180, top half of page) While in some cases theory does lead to an optimal test, in other cases there is no best test and a good test must be selected in some way. (In such cases, the generalized likelihood ratio approach often yields a reasonable test, but sometimes one cannot go this route due to lack of sufficiently detailed knowledge of the distribution underlying the data. That is, if we don't have a firm parametric model, the likelihood ratio approach cannot be applied, in which case one might use a nonparametric method or else rely on the robustness of a test derived for a model that may not be true for the situation at hand.)
  17. (pp. 180-181) Example 6.2 nicely illustrates hypothesis testing.

Chapter 7


Chapter 8

  1. (pp. 233-237) It's important to keep in mind that just because I don't make a comment about a particular page, it doesn't mean that the page isn't important. My comments are meant to add something extra when I think some clarification may be in order.
  2. (p. 237) In my class notes I don't emphasize the term standard error. But since it's commonly used, I should emphasize it a bit more. So please make a note of its simple definition at the bottom of the page.
  3. (pp. 238-239) To give that an approximate 95.4% error bound for the sample mean is +/- 2 estimated S.E. is overstating the precision unless the sample size is rather large. Unless the random variables are normally distributed, the sample mean won't be exactly normally distributed, but only approximately so. Also, precision is lost when the true standard deviation is replaced by an estimate of the standard deviation.
  4. (p. 240) Point (b) is good. People aren't consistent in their use of the a +/- b notation, and so unless more information is provided, one doesn't know how to interpret something like 53.4 +/- 4.6. Also, it isn't clear what is meant by the phrase "margin of error." One way to get around the confusion is to report a confidence interval (and state that you're reporting a confidence interval, being sure to give the confidence level).
  5. (p. 246) The 1st 3 sentences are extremely important. They give the dos and don'ts of how one should state the results when a confidence interval is determined. Also, I prefer to always write the result as an interval, e.g. (41.1, 43.3), instead of writing something like 41.1 < mu < 44.3 (because we don't know for sure that mu is trapped between the two confidence bounds). (Points (a), (b), and (c) on p. 247 provide a good summary.)
  6. (p. 248) Note that the book isn't indicating what is meant by large (with regard to the value of n). How large n has to be for the approximation to be good depends on (i) how good do we want it to be, & (ii) the type and degree of nonnormality of the distribution underlying the data. In some cases n = 50 (or even less) may result in a great approximate confidence interval, and in other cases n = 500 may not be large enough.
  7. (pp. 248-249) I think it would be a good idea to use more significant digits than the book does in the example. If one wants 2 significant digits to be reported in the final answer, then keep 4 or more digits in the calculations, and round to 2 digits only at the final step. Similarly, if one wants to report 3 digits, then keep 5 or more digits until the final step. Also, I do think one should avoid reporting a lot of significant digits in the final answer (that is, rounding at the final step is good). Not only are figures with a lot of significant digits somewhat hard to digest with a quick glance, but they reflect more accuracy than is warranted, since in most cases we are making assumptions and approximations (and so it's misleading to report a lot of digits).
  8. (p. 250) Just because the book designates n < 30 as being small, one shouldn't assume that n >= 30 is large enough for the large sample approximate confidence interval given on p. 248 to be highly accurate in all cases. In some cases (particularly if the distribution underlying the data is highly skewed), n = 300 may not be large enough (although such cases are somewhat rare). But when n < 30, we shouldn't think that the distribution underlying the data has to be exactly normal, since if we did that when could we ever make use of the interval (since it would be very rare indeed to have data from exactly a normal dist'n).
  9. (p. 262) Again, it can be dangerous to take the n >= 30 rule of thumb too seriously --- how large the sample size needs to be depends very much on the nature of the nonnormality.
  10. (p. 266) Note carefully the last sentence of the 1st paragraph in Sec. 8.7 (and the last sentence on p. 269) --- the test and confidence interval derived under the assumption of normality can perform poorly (and be misleading) if the assumption of normality is too badly violated.
  11. (pp. 266-269) I cover material related to this section in Unit 5 of my course notes (which will be covered only briefly during lecture close to the end of the semester --- but I will suggest that you read my short Unit 5 and work a bonus homework problem which pertains to variances (which I'll give you towards the end of the semester)).
  12. (p. 270) Point (b) is very important. But they are wrong to suggest that serious errors are not to be worried about if n is at least 15.
  13. (p. 271) Point (d) is important.