Ch. 6 notes for S&W, STAT 535

Some Comments about Chapter 6 of Samuels & Witmer

Section 6.1

(p. 179) The 1st paragraph is a good summary of some things that have been touched on before. With Ch. 6 we will start to make some inferences beyond just giving the value of a point estimate. To add a bit more to what is in the paragraph, there are two strategies that will be investigated: (1) giving a point estimate, along with an estimate of its standard error in order to indicate something about the uncertainty associated with the point estimate; (2) giving an interval estimate (also known as a confidence interval) in order to have a more precise way to express an estimate in the presence of uncertainty.
(pp. 179-180, Example 6.1) Sometimes I like to refer to the estimand as the distribution mean (which is the mean of the parent distribution associated with the observed sample) instead of the population mean. In a setting like this one, it's not like we view the sample as a random sample selected from some finite population. Although I suppose that we could view the seeds used to produce the plants as being a sample that is representative of some population of seeds, since not all of the seeds produce a plant grown under the same conditions as those in the observed sample, the population corresponding to the sample seems a bit nebulous --- its members are measurements that would be made if more plants are grown from different seeds under the same conditions. Because of this, I like to view the observations as the observed values of random variables that have some distribution --- and if more plants are grown under the same conditions, their stem length measurements would be the observed values of other random variables having the same distribution. *** The last two sentences of the example (on p. 180) serve to remind us of some things that have been stated previously.

Section 6.2

(p. 180) I'm one of the "Some statisticians" referred to in the footnote --- the standard error is the actual standard deviation of the statistic's sampling distribution, and not the estimate of this value. (Note: If one considers just a single random variable, one can refer to its standard deviation. But the standard deviation of a statistic, which is just a function of a sample of random variables, is called its standard error --- but its just a standard deviation. Calling it the standard error makes it less confusing since we sometimes want to also refer to the standard deviation of the parent distribution of the data.)
(p. 181, paragraph following the first example) If an estimator has approximately a normal distribution, then in many cases it'll take a value within one standard error of the estimand with a probability of about 0.68, and it'll take a value within two standard errors of the estimand with a probability of about 0.95. But the sampling distribution of some estimators is not approximately normal, and in these cases the standard error isn't as meaningful, and it's better to find some way to give a confidence interval. *** Also, the last sentence of the paragraph touches upon some important points.
(p. 181, Standard Error Versus Standard Deviation) This paragraph relates to the parenthetical note I have above in my comments about p. 180. One thing to keep in mind is that almost every statistic has a standard error --- not just the sample mean. Because of this, one shouldn't just say or write standard error unless it is clear what estimator is being referred to --- if there would be any doubt, better to put standard error of the sample mean (or whatever the estimator is). (Also, it isn't really proper to refer to the standard error of an estimate, since an estimate is just a number, and not a random variable. However, we can refer to the estimated standard error associated with an estimate --- it is the estimated standard error of the estimator which produced the estimate.) For estimators other than the sample mean, the standard error isn't just the standard deviation divided by the square root of n. (I've known of both students and faculty members here at GMU to make mistakes regarding this point.)
(p. 181, footnote at bottom of page) While there perhaps are other reasonable schemes, this seems like a nice one to follow. It is bad practice to report estimates based on random samples using too many digits --- it just doesn't always make sense to give 4 or 5 significant digits (see Appendix 6.1 if you need a refresher about significant digits, rounding, and scientific notation) if sampling variation results in an estimate for which there is no guarantee that even 1 or 2 significant digits are correct (with correct meaning that they match the actual unknown value being estimated). Reporting values with a lot of significant digits seems to imply more accuracy than is really warrented.
(p. 182, 2nd and 3rd lines) The phrase "the variability from one lamb to the next" isn't a good one to use here. The sample standard deviation is a measure pertaining to the dispersion of the values in a sample about the sample mean, and not really a measure giving some specific information about consecutive values in a sample.
(p. 182, Fig. 6.3) Note that when the sample size is increased by a factor of 100 from 28 to 2800, the standard error only decreased by a factor of 10. Similarly, due to the square root of n in the denominator of the standard error of the sample mean, it takes making a sample 4 times larger to decrease the standard error by a factor of 2. So if you do an experiment and find that the standard error is way too large, it's not that you should have used just a few more observations. (Sometimes I urge people to increase their sample size from 8 to 12 --- not so much because it will greatly reduce the standard error, but usually so that certain approximations become more trustworthy, and better diagnostic checks can be done.)
(p. 183, Fig. 6.5) People aren't consistent in their use of error bars --- some use bars for +/- 1 estimated standard error, others use bars for a confidence interval (which is often close to +/- 2 estimated standard errors), and still others use +/- 1 sample standard deviation, or use bars in place of a boxplot (so covering the range of the middle 50% of the data). Also see first paragraph on p. 184.

Section 6.3

(p. 187) t_.025 is often written t_n-1,0.025, or something like t_19,0.025, if the sample size is 20. I don't think it's good to refer to it as "the two-tailed 5% critical value" --- most books refer to it as the 0.025 critical value, or something similar to this.
(p. 188, Fig. 6.9) The probit plot in part (b) is consistent with approximate normality, but I wouldn't feel nearly as comfortable arriving at such a conclusion if all I had to go one was the histogram of part (a).
(p. 190, Remark) This part isn't real important.
(p. 191) Be careful to note that the first probability expression presented under the figure isn't a proper thing to write, for the reason stated in the book. I don't even like expressions such as (6.3) at the bottom of p. 190 (and the two similar expressions on p. 189), because to me they indicate that the actual mean is betwen the two numerical values, and we don't know that this is the case. Instead I like the interval expressions given on pages 192 and 193 --- when one gives a point estimate, one gives a single value (a point); when one gives an interval estimate, one should express it as an interval (and it should be kept in mind that the interval is still an estimate --- we don't know for sure that the estimand is between the two endpoints of the interval).
(p. 191, last 5 lines) This part is real important --- the 95% is because the method has a probability of 0.95 of working (that is, the interval will contain the estimand).
(p. 193) Look at the computer output. Since the (estimated) standard error is 36 when rounded to two significant digits, the endpoints of the confidence interval should be rounded to the nearest integer. But instead of rounding the estimated standard error and the sample mean first, and then creating the interval (as seems to have been done in the preceding example), more significant digits should be used until the final step, and then rounding to the nearest integer done when reporting the final interval estimate. Of course, we can often let the software do a lot of the grubby work for us. But we won't always use the number of significant digits indicated in the output. For the output on this page, we can conclude that rounding should be to the nearest integer, and then we would look at the interval in the output and report it as (255, 384) (as opposed to using 385 for the upper confidence bound). Note that we have three significant digits for each of the confidence bounds (the endpoints of the interval), and this is plenty considering that the interval is so wide, suggesting that we cannot estimate the mean very accurately given the observations available.

Section 6.4

The first paragraph makes a very important point: large standard errors associated with your estimates may make them not too meaningful. I've seen plenty of studies done by GMU students, some involving data collection over a two year period, where no firm conclusions could be reached due to too much uncertainty associated with the inferences.

The second through the fourth paragraphs point out that the standard error associated with the sample mean depends upon two things: the variability associated with the individual measurements, and the sample size. Although in some cases one may design an experiment in such a way as to reduce experimental noise due to factors not of central interest, there is typically some natural variability due to the fact that members of the population of interest are different --- this variability is part of the phenomenon/population/process being studied, and we just have to deal with it as we make inferences.

The rest of the section focuses on determining the proper sample size so that the standard error of the sample mean won't be too large. While this is nice, and at times may prevent you from thinking that you need a larger sample size than you do, my experience is that for the vast majority of cases when people do such calculations, the sample size arrived at is unreasonably large given their resources, and they reach the conclusion that they should use the largest sample size that they can manage. It's seldom the case that when all of the data has been collected and the analysis is done, it is determined that the sample size was perfect, or too large --- typically, one wishes that a larger sample size could have been used in order to reduce some of the uncertainty associated with the results.

Section 6.5

(p. 201) As is stated in S&W, how large n needs to be depends on the degree of nonnormality. For mild and "nice" nonnormality, a dozen observations might be fine. (Note that if one has fewer than 10 observations it's very hard to check for approximate normality. Histograms are of little use, and some probit plot patterns that are consistent with approximate normality can result from a sample from a distribution which isn't approximately normal. Sometimes one might hope that the inferences will be okay if the probit plot is consistent with approximate normality, but one shouldn't be highly confident about such inferences.) For highly skewed distributions, over 100 observations may be needed to obtain an accurate interval estimate (and it may take a few hundred observations for hypothesis testing).)
(p. 201, Table 6.4) It's odd that the performance with samples of sizes 2 and 4 is better than the performance with a sample of size 8 for 99% confidence intervals for the mean of Population 3. But it's worth noting that in all cases the intervals didn't cover 99% of the time. Also, the intervals could be so wide as to be nearly useless as estimates. You'll see that even with 10 observations it's common to have intervals which are so wide that they don't do a good job of providing a meaningful estimate of the unknown mean.
(p. 202, blue box) Condition 2 (b) would be better if "large" was replaced with sufficiently large, since otherwise it seems to imply that there is some threshhold that may always work, whereas the threshhold depends upon the degree of nonnormality. (This is suggested by the three lines following the blue box, but it would be better to get the point across in the statement of the conditions.)
(p. 203, 3rd paragraph) I seldom bother to make a histogram. With regard to checking approximate normality, I don't think a histogram adds anything if I've already done a probit plot.
(p. 204) The last paragraph of the section makes an important point. Transformations to normality are okay if you're content to make a statement about the distribution associated with the transformed values, but there is no way to take an inference about the mean based on transformed values and do some sort of inverse transformation to arrive at a good inference about the parent distribution of the original (untransformed) observations.

Section 6.6

(p. 207) The alternative estimate, (y + 2)/(n + 4), isn't recommended as a replacement of the simple sample proportion, y/n, as a point estimate. But it works nicely as the center of a confidence interval for p. In a sense, it's a way of doing a better central limit theorem approximation than the simple one which is given in most introductory books.
(p. 207) The blue box doesn't give the standard error of the alternative estimator of p --- it gives an estimate of the standard error (and not the most obvious or best estimate to use, but rather one that is relatively simple and seems to work okay).
(p. 208) The blue box formula works only if the desired confidence level is 95%. If some other level is desired, one needs to make the adjustments indicated in the Other Confidence Levels subsection on pp. 209-210. SPSS doesn't seem to do any confidence intervals for p (and some other statistical software packages are similar --- I guess the thinking is that one can easily do them with a calculator), and so for the sake of simplicity I'll just have you do 95% confidence intervals in this setting (and you can read in the book to find out what adjustments should be made for other confidence levels). It can be noted that the interval in the blue box is equivalent to the standard approximate interval given in the footnote at the bottom of p. 208 if one adds 2 successes and 2 failures to the sample. (Note: S&W have some things wrong in Appendix 6.2 on pp. 621-622. The Wilson interval is the same as the score interval. It works better than the simpler Wald interval because it uses a normal approximation in a better way. The interval given in the blue box on p. 208 is the Agresti-Coull interval, and it's just a approximation of the 95% Wilson/score interval, with the approximation making things simpler).
(p. 209, Example 6.17) When a confidence bound winds up outside of the set of possible values for the parameter being estimated (in this case the lower bound is negative but we know that p cannot be smaller than 0), it indicates that the approximation should not be used! (S&W have done a bad bad thing here!) For values on p near 0 or 1, n has to be rather large for the approximations to work sufficiently well. In this case, 15 is just too small of a sample size. In a case like this, I'll certainly use an exact confidence interval of some sort, even if they do tend to be messy (but not really a problem with some software). One exact method gives (0, 0.22) as a 95% confidence interval. Note that not only is the lower bound reasonable, but the upper bound isn't as large as the one given in S&W. So even though some don't seem to like the exact methods (see bottom of p. 622), in this case the exact method is more accurate, and it yields a shorter interval (which is desirable).

Section 6.7

(p. 214, the blue box) Note that the proper t critical values are indicated for 90% and 99% intervals. This isn't so important to us because SPSS will allow us to specify whatever confidence level is desired. Nevetheless, hopefully you can see why those values are needed for 90% and 99% intervals.

(p. 214, the blue box) It should be stated that n has to be sufficiently large in order for the approximate confidence interval methods to work well. (See my comments related to Example 6.17 above.)