Some Comments about Chapter 6 of Samuels & Witmer
Section 6.1
- (p. 179) The 1st paragraph is a good summary of some things that
have been touched on before. With Ch. 6 we will start to make some
inferences beyond just giving the value of a point estimate. To add a
bit more to what is in the paragraph, there are two strategies that will
be investigated: (1) giving a point estimate, along with an estimate of
its standard error in order to indicate something about the uncertainty
associated with the point estimate; (2) giving an interval estimate (also
known as a confidence interval) in
order to have a more precise way to express an estimate in the presence
of uncertainty.
- (pp. 179-180, Example 6.1) Sometimes I like to refer to the
estimand as the distribution mean (which is the mean of the
parent distribution associated with the observed sample) instead of the
population mean. In a setting like this one, it's not like we view the
sample as a random sample selected from some finite population.
Although I suppose that we could view the seeds used to produce the
plants as being a sample that is representative of some population of
seeds, since not all of the seeds produce a plant grown under the same
conditions as those in the observed sample, the population corresponding
to the sample seems a bit nebulous --- its members are measurements that
would be made if more plants are grown from different seeds under the
same conditions. Because of this, I like to view the observations as
the observed values of random variables that have some distribution ---
and if more plants are grown under the same conditions, their stem
length measurements would be the observed values of other random
variables having the same distribution. *** The last two sentences of the
example (on p. 180) serve to remind us of some things that have been
stated previously.
Section 6.2
- (p. 180) I'm one of the "Some statisticians" referred to in the
footnote --- the standard error is the actual standard deviation of the
statistic's sampling distribution, and not the estimate of this value.
(Note: If one considers just a single random variable, one can
refer to its standard deviation. But the standard deviation of a
statistic, which is just a function of a sample of random variables, is
called its standard error --- but its just a standard deviation.
Calling it the standard error makes it less confusing since we sometimes
want to also refer to the standard deviation of the parent distribution
of the data.)
- (p. 181, paragraph following the first example) If an estimator
has approximately a normal distribution, then in many cases it'll take a
value within one standard error of the estimand with a probability of
about 0.68, and it'll take a
value within two standard errors of the estimand with a probability of
about 0.95. But the sampling distribution of some estimators is not
approximately normal, and in these cases the standard error isn't as
meaningful, and it's better to find some way to give a confidence
interval. *** Also, the last sentence of the paragraph touches upon
some important points.
- (p. 181, Standard Error Versus Standard Deviation) This
paragraph relates to the parenthetical note I have above in my comments
about p. 180. One thing to keep in mind is that almost every statistic
has a standard error --- not just the sample mean. Because of this,
one shouldn't just say or write standard error unless it is clear
what estimator is being referred to --- if there would be any doubt,
better to put standard error of the sample mean (or whatever the
estimator is). (Also, it isn't really proper to refer to the standard
error of an estimate, since an estimate is just a number, and not
a random variable. However, we can refer to the estimated standard
error associated with an estimate --- it is the estimated standard error
of the estimator which produced the estimate.) For estimators other
than the sample mean, the standard error isn't just the standard
deviation divided by the square root of n.
(I've known of both students and faculty members here at GMU to make
mistakes regarding this point.)
- (p. 181, footnote at bottom of page) While there perhaps are other
reasonable schemes, this seems like a nice one to follow. It is bad
practice to report estimates based on random samples using too many
digits --- it just doesn't always make sense to give 4 or 5 significant digits
(see Appendix 6.1 if you need a refresher about significant digits,
rounding, and scientific notation)
if sampling variation results in an estimate for which there is no
guarantee that even 1 or 2 significant digits are correct (with correct
meaning that they match the actual unknown value being estimated).
Reporting values with a lot of significant digits seems to imply more
accuracy than is really warrented.
- (p. 182, 2nd and 3rd lines) The phrase "the variability from one
lamb to the next" isn't a good one to use here. The sample standard
deviation is a measure pertaining to the dispersion of the values in a
sample about the sample mean, and not really a measure giving some
specific information about consecutive values in a sample.
- (p. 182, Fig. 6.3) Note that when the sample size is
increased by a factor of 100 from 28 to 2800, the standard error only
decreased by a factor of 10. Similarly, due to the square root of
n in the denominator of the standard error of the sample mean,
it takes making a sample 4 times
larger to decrease the standard error by a factor of 2. So if you do an
experiment and find that the standard error is way too large, it's not
that you should have used just a few more observations.
(Sometimes I urge people to increase their sample size from 8 to 12 ---
not so much because it will greatly reduce the standard error, but
usually so that certain approximations become more trustworthy, and
better diagnostic checks can be done.)
- (p. 183, Fig. 6.5) People aren't consistent in their use of
error bars --- some use bars for +/- 1 estimated standard error, others
use bars for a confidence interval (which is often close to +/- 2
estimated standard errors), and still others use +/- 1 sample standard
deviation, or use bars in place of a boxplot (so covering the range of the middle
50% of the data). Also see first paragraph on p. 184.
Section 6.3
- (p. 187) t.025
is often written tn-1,0.025, or
something like t19,0.025, if the sample size is 20.
I don't think it's good to refer to it as "the two-tailed 5% critical
value" --- most books refer to it as the 0.025 critical value, or
something similar to this.
- (p. 188, Fig. 6.9) The probit plot in part (b) is
consistent with approximate normality, but I wouldn't feel nearly as
comfortable arriving at such a conclusion if all I had to go one was the
histogram of part (a).
- (p. 190, Remark) This part isn't real important.
- (p. 191) Be careful to note that the first probability expression
presented under the figure isn't a proper thing to write, for the reason
stated in the book. I don't even like expressions such as (6.3) at the
bottom of p. 190 (and the two similar expressions on p. 189), because to
me they indicate that the actual mean is betwen the two numerical
values, and we don't know that this is the case. Instead I like the
interval expressions given on pages 192 and 193 --- when one gives a
point estimate, one gives a single value (a point); when one gives an
interval estimate, one should express it as an interval (and it
should be kept in mind that the interval is still an estimate --- we
don't know for sure that the estimand is between the two endpoints of the
interval).
- (p. 191, last 5 lines) This part is real important --- the 95% is
because the method has a probability of 0.95 of working (that is, the
interval will contain the estimand).
- (p. 193) Look at the computer output. Since the
(estimated) standard error is
36 when rounded to two significant digits, the endpoints of the
confidence interval should be rounded to the nearest integer. But
instead of rounding the estimated standard error and the sample mean
first, and then creating the interval (as seems to have been done in the
preceding example), more significant digits should be used
until the final step, and then rounding to the nearest integer done when
reporting the final interval estimate. Of course, we can often let the
software do a lot of the grubby work for us. But we won't always use
the number of significant digits indicated in the output. For the
output on this page, we can conclude that rounding should be to the
nearest integer, and then we would look at the interval in the output
and report it as (255, 384) (as opposed to using 385 for the upper
confidence bound). Note that we have three significant digits for each
of the confidence bounds (the endpoints of the interval), and this is
plenty considering that the interval is so wide, suggesting that we
cannot estimate the mean very accurately given the observations
available.
Section 6.4
The first paragraph makes a very important point: large standard errors
associated with your estimates may make them not too meaningful.
I've seen plenty of studies done by GMU students, some involving data
collection over a two year period, where no firm conclusions could be
reached due to too much uncertainty associated with the inferences.
The second through the fourth paragraphs point out that the standard
error associated with the sample mean depends upon two things: the
variability associated with the individual measurements, and the sample
size. Although in some cases one may design an experiment in such a way
as to reduce experimental noise due to factors not of central interest,
there is typically some natural variability due to the fact that members
of the population of interest are different --- this variability is part
of the phenomenon/population/process being studied, and we just have to
deal with it as we make inferences.
The rest of the section focuses on determining the proper sample size so
that the standard error of the sample mean won't be too large. While
this is nice, and at times may prevent you from thinking that you need a
larger sample size than you do, my experience is that for the vast majority
of cases when people do
such calculations, the sample size arrived at is unreasonably large
given their resources, and they reach the conclusion that they
should use the largest sample size that they can manage. It's seldom
the case that when all of the data has been collected and the analysis is done, it is determined that
the sample size was perfect, or too large --- typically, one wishes
that a larger sample size could have been used in order to reduce some
of the uncertainty associated with the results.
Section 6.5
- (p. 201) As is stated in S&W, how large n needs to be
depends on the degree of nonnormality. For mild and "nice"
nonnormality, a dozen observations might be fine. (Note that if one has
fewer than 10 observations it's very hard to check for approximate
normality. Histograms are of little use, and some probit plot patterns
that are consistent with approximate normality can result from a
sample from a distribution which isn't approximately normal. Sometimes
one might hope that the inferences will be okay if the probit plot is
consistent with approximate normality, but one shouldn't be highly
confident about such inferences.) For highly skewed distributions, over
100 observations may be needed to obtain an accurate interval estimate
(and it may take a few hundred observations for hypothesis testing).)
- (p. 201, Table 6.4) It's odd that the performance with
samples of sizes 2 and 4 is better than the performance with a sample of
size 8 for 99% confidence intervals for the mean of Population 3. But it's
worth noting that in all cases the intervals didn't cover 99% of the
time. Also, the intervals could be so wide as to be nearly useless as
estimates. You'll see that even with 10 observations it's common to
have intervals which are so wide that they don't do a good job of
providing a meaningful estimate of the unknown mean.
- (p. 202, blue box) Condition 2 (b) would be better if "large" was
replaced with sufficiently large, since otherwise it seems to
imply that there is some threshhold that may always work, whereas the
threshhold depends upon the degree of nonnormality. (This is suggested
by the three lines following the blue box, but it would be better to get
the point across in the statement of the conditions.)
- (p. 203, 3rd paragraph) I seldom bother to make a histogram. With
regard to checking approximate normality, I don't think a histogram adds
anything if I've already done a probit plot.
- (p. 204) The last paragraph of the section makes an important
point. Transformations to normality are okay if you're content to make
a statement about the distribution associated with the transformed
values, but there is no way to take an inference about the mean based on
transformed values and do some sort of inverse transformation to arrive
at a good inference about the parent distribution of the original
(untransformed) observations.
Section 6.6
- (p. 207) The alternative estimate, (y + 2)/(n + 4),
isn't recommended as a replacement of the simple sample proportion,
y/n, as a point estimate. But it works nicely as the
center of a confidence interval for p. In a sense, it's a way of
doing a better central limit theorem approximation than the simple one
which is given in most introductory books.
- (p. 207) The blue box doesn't give the standard error of the
alternative estimator of p --- it gives an estimate of the
standard error (and not the most obvious or best estimate to use, but
rather one that is relatively simple and seems to work okay).
- (p. 208) The blue box formula works only if the desired confidence
level is 95%. If some other level is desired, one needs to make the
adjustments indicated in the Other Confidence Levels subsection
on pp. 209-210. SPSS doesn't seem to do any confidence intervals for
p (and some other statistical software packages are similar --- I
guess the thinking is that one can easily do them with a calculator),
and so for the sake of simplicity I'll just have you do 95% confidence
intervals in this setting (and you can read in the book to find out what
adjustments should be made for other confidence levels).
It can be noted that the interval in the blue box is equivalent to the
standard approximate interval given in the footnote at the bottom of p.
208 if one adds 2 successes and 2 failures to the sample.
(Note:
S&W have some things wrong in Appendix 6.2 on pp. 621-622.
The Wilson interval is the same as the score interval.
It works better
than the simpler Wald interval because it uses a normal approximation in
a better way.
The interval given in the blue box on p. 208 is the
Agresti-Coull interval, and it's just a approximation of the 95%
Wilson/score interval, with the approximation making
things simpler).
- (p. 209, Example 6.17) When a confidence bound winds up
outside of the set of possible values for the parameter being estimated
(in this case the lower bound is negative but we know that p
cannot be smaller than 0), it indicates that the approximation should
not be used!
(S&W have done a bad bad thing here!)
For values on p near 0 or 1, n has to be rather large for
the approximations to work sufficiently well. In this case, 15 is just
too small of a sample size. In a case like this, I'll certainly use an
exact confidence interval of some sort, even if they do tend to be messy
(but not really a problem with some software). One exact method gives
(0, 0.22) as a 95% confidence interval. Note that not only is the lower
bound reasonable, but the upper bound isn't as large as the one given in
S&W. So even though some don't seem to like the exact methods (see
bottom of p. 622), in this
case the exact method is more accurate, and it yields a shorter interval
(which is desirable).
Section 6.7
(p. 214, the blue box) Note that the proper t critical
values are indicated for 90% and 99% intervals. This isn't so important
to us because SPSS will allow us to specify whatever confidence level
is desired. Nevetheless, hopefully you can see why those values are
needed for 90% and 99% intervals.
(p. 214, the blue box) It should be stated that n has to be
sufficiently large in order for the approximate confidence interval
methods to work well. (See my comments related to
Example 6.17 above.)