Some Comments about Chapter 4 of Samuels & Witmer



Note that even though the chapter is titled The Normal Distribution, there is not just one normal distribution, but rather a whole family of them.

I plan to cover this chapter rather quickly in class, and in particular I don't want to spend a lot of time explaining the use of Table 3 on pp. 675-676 (since you should be able to figure it out on your own given the examples in the book, and since software can be used to help obtain such probabilities). In order to have more time to discuss more advanced material, we'll have to not spend too much time on the simpler stuff.


Section 4.1

  1. (p. 119, 2nd paragraph) Normal distributions are often (too often, actually) used as a model for many phenomena. For example, observations in a randomly selected sample of otter weights might be approximately modeled as a sample from an appropriate normal distribution (i.e., one having a suitable mean and variance). (Also see Example 4.1 on p. 119 and Fig. 4.1 on p. 120.) If an assumption of (approximate) normality is justified, an inference method derived by assuming a sample from a normal distribution ought to be a good one to use. (However, it should be kept in mind that even though statistical procedures based on assumptions of normality are commonly used, such an inference method might not be so good to use if a normal model is not a good one in a particular situation. We don't expect to have natural phenomena follow normal distributions exactly, but we need to be aware that if the parent distribution of the sample is too much different from a normal distribution, inferences based on an assumption of (approximate) normality may be appreciably nonoptimal, especially if the sample size is rather small.) A particular normal distribution, called the standard normal distribution (which is just the normal distribution having mean 0 and variance 1), is also useful for approximating the sampling distribution of an estimator or test statistic in many cases. (Sampling distributions are the subject of Ch. 5.)
  2. (p. 121) If one takes many measurements of the same object, then any variation in the values can be attributed to measurement error. (See Example 4.4 on p. 121.) If one has a very precise way of measuring and takes one measurement of many different objects, then the observed variation is almost entirely due to differences among individuals in the population. Often variation among observations in a sample are due to a combination of measurement error and population differences. (See Example 2.14 on pp. 23-24.) In such a case, one way to get a good idea about the relative contributions of the two sources of variation is to make more than one measurement on the individuals selected from the population.

Section 4.2

  1. (p. 122) While some books do use the notation indicated on the 4th line of Sec. 4.2, I think more commonly the value of the variance is given as the 2nd parameter value instead of the value of the standard deviation. That is, in a lot of books, N(100, 25) is used to denote the normal distribution with mean 100 and variance 25, while in S&W this would indicate a standard deviation of 25.

Section 4.3

  1. (p. 125) More precisely, 95% of the probability mass is within about 1.96 (as opposed to 2) standard deviations of the mean (which means that a normal random variable asssumes a value within about 1.96 standard deviations of its mean with probability 0.95). (I put about here because the precise value corresponding to 95% of the probability mass isn't exactly 1.96 --- but rounded to 4 decimal places, it's 1.9600.)
  2. (p. 129) Most books use z0.05 instead of Z0.05 (and in fact I cannot recall seeing an upper case letter used like this in any other book). I plan to use lower case for this, and I want you to use lower case too. Some books use z0.95 to indicate the value that separates the upper 5% of the probability mass from the lower 95%, but z0.05 is more commonly used for this purpose.

Section 4.4

  1. (p. 133) The facts referred to on the bottom of p. 133 are sometimes collectively called "the empirical rule." I've never bothered to use the "empirical rule" to assess approximate normality, since there are better methods to employ.
  2. (pp. 134-135, Normal Probability Plots) These plots are called many different things. I tend to call them probit plots. They are a special case of q-q plots (quantile-quantile plots). They should be your chief method for assessing approximate normality.
  3. (p. 134) The last sentence on this page is a good one --- using a histogram is not a good way to assess approximate normality.
  4. (p. 135) Even if the parent distribution of the data is normal, the points in the plot are not expected to be on a perfectly straight line. This point is made on p. 136.
  5. (p. 136, Fig. 4.26) The plots shown are all based on a sample of size 20. For larger sample sizes, the straight line pattern should be more prominent, but for smaller sample sizes we can generally expect more wiggle. Plots made from samples for which the parent distribution is approximately normal are hard to distinguish from those made from samples for which the parent distribution is exactly normal. Fortunately, in applied statistics, we only have to determine if the parent distribution is approximately normal (we never expect to have data from a normal distribution).
  6. (p. 138, Computer note) Before the widespread use of computers, statisticians used special graph paper to make the kind of plots which are now so easy to make using software. SPSS, Minitab, and some other commonly used statistical software packages plot the data values using the horizontal axis and the normal scores using the vertical axis, and this is the way I'm used to doing it. (I'll show you in class how the plots on p. 137 would look if they were redone the way that SPSS makes such plots. The shapes indicated on p. 137 should not be used to interpret the plots made using SPSS.)
  7. (pp. 138-139, Transformations for Nonnormal Data) While such transformations are good for some applications, for certain kinds of analyses (that I'll discuss later) they can lead to misleading results.

Section 4.5

  1. (p. 142 & p. 144) The last paragraph of the section on p. 144 explains when to use the continuity correction, and part (a) on p. 142 shows a typical application.

Section 4.6

  1. (p. 145) The second to the last paragraph of the section states that methods developed specifically for samples from normal distributions can work well even if the parent distribution of the data is not normal. It should also be added that in some situations other methods will perform much better than those based on a assumption of normality --- the type and degree of nonnormality affects the robustness of normal theory procedules to nonnormality.
  2. (p. 145) The last paragraph of the section suggests that skewed distributions are better models for some phenomena that are normal distributions.