Some Comments about Chapter 5 of Samuels & Witmer
Section 5.1
- (p. 149) While the first sentence of this section is okay (and can
be extended beyond the field of biology), to me it's suggestive of
hypothesis testing (which is introduced in Ch. 7) and not so much of the
types of things considered in Ch. 5 and Ch. 6, where the focus is on
estimation. For example, with the examples of Ch. 1, a key focus of the
data analysis can be whether there are differnces due to some treatment,
or differences between different populations. What makes this hard to
determine in some cases is that we expect some differences in the
samples even if there are no differences in the parent distributions
associated with the samples. Hypothesis testing is concerned with
determining if there is strong evidence of real differences in the
presence of sampling variability. With estimation the focus is
different: the desire is to estimate the mean, median, or some other
summary measure of the parent distribution of the sample, from only the
observations belonging to the sample at hand. Because we don't have
enough information about a distribution to determine some value associated
with a population or distribution exactly,
we have to be content with producing a guess about the unknown value
that is of interest, and sampling variability leaves uncertainty
associated with our best guess. With estimation the focus is to
minimize the error in estimation due to the sampling variability, and to
quantify the amount of uncertainty associated with an estimate.
- (p. 149) One usually refers to the sampling distribution of
some specific statistic (e.g., an estimator or a test statistic). Knowledge
about the
sampling distriibution of a statistic let's us know how sampling
variability affects its value, and this allows us to study the
distribution of the error in estimation or to determine if the data
provides strong evidence of a treatment effect or of population
differences (as opposed to the data being consistent with a hypothesis
of no difference).
- (p. 150, Example 5.1) This is similar to Problem 1 of the
homework (a problem that you are suppose to do, but not turn in). It
shows that when you produce an estimate (on p. 150, the sample mean can
be used as an estimate of the population mean), different samples can
produce different values of the estimate, and that an estimate of the
mean need not equal the actual population mean. (The example shows the
effects of sampling variation.)
- (p. 150) The 2nd sentence after Table 5.2 is important. In
some cases it may seem that the sample is a subset of a finite
population. In such a case, the random variables associated with the
observations are not independent. If we know the population size, we
can adjust for the lack of independence and obtain a more accurate
inference. But often we don't know the population size. However, if we
know that the population size, whatever it is, is much larger than
the sample size, the adjustment that should be made is negligible (see
the long footnote at the bottom of p. 158), and so
it is common to ignore the lack of independence and make inferences
under the assumption of independent observations.
(Although this is nothing
to worry about if you don't understand the explanation, some may find it
interesting, and at first a bit odd, that when we take a subset of a
finite population, which is referred to as sampling without
replacement, the random variables associated with the sample
are not independent, even though one is guaranteed not to pick the
sample population member more than once, but when you sample with
replacement by making an observation and then having it so the same
subject can be picked again to contribute another observation, the random
variables associated with a sample are independent, even though two or
more of the observations in the sample can be due to the same subject.
As a concrete example, suppose that there are some unknow number of type
A objects in a population of size 100, with the rest of the
objects being type B objects. When we draw a sample without
replacement, we have a lack of independence, since whatever type of
object is selected first influences the proportion of type A objects
remaining at the time of drawing the second object. However, if we just
observe the type of the first object and put it back in the population
so that it can possibly be selected again, the distribution at the time
of the second drawing of an object doesn't depend on what type of object
was drawn first, which means that the first two observations are
independent of one another. It should be noted that sampling without
replacement is preferable, since there is less uncertainty when we
observe a sample based on n different population members, than
there is in a sample of size n which can include certain
population members more than once. But when n is small compared
to the population size, it can make very little difference which type of
sampling is used.)
There is another viewpoint that makes the lack of knowledge of the
population size even more unimportant. As an example, we could say that
we're not
interested in just the treatment of current cancer patients, but also of
those which will occur in the future. By the time the results of some
study can lead to something useful, some of the cancer patients
available at the time the study was done may have died, and new patients
identified. So in one sense the population of interest is always
changing --- but another take on it could be that it is infinite, and
with such a viewpoint it is legitimate to view the sample at hand as
resulting from independent random variables.
Section 5.2
- (p. 151) Dichotomous observations are associated with
variables that have only two possible outcomes. When the outcomes are
coded as 0 and 1, dichotomous observations can be referred to a
binary observations. It is common to use dichotomous and binary
interchangeably, although I suppose to be technical, the use of the term
binary should be restricted to the 0/1 coding (as opposed to coding with
A and B, for example). Often binary/dichotomous variables are modeled
as iid Bernoulli random variables, but it is important to keep in mind
that this should be done only if we have independence and
a constant probability of "success" on each trial.
- (p. 152, Example 5.4) From Table 5.3 it can be seen
that the mean of the sample proportion is 0.3, which is the value of
p. It can be shown that
whatever value p is, the mean of the sample proportion
is equal to p. Because of this, the sample proportion is
referred to as an unbiased estimator. (In general, if the
expected value of an estimator is equal to the estimand (the
value being estimated), whatever value the estimand may be, the
estimator is said to be unbiased.) I think that unbiasedness is an
overrated property, and that the distribution of the error in estimation
(perhaps summarized by some average value associated with the magnitude
of the error in estimation) should be the focus --- by itself,
unbiasedness just means that in a sense the overestimates would be of
the same average magnitude as the underestimates in repeated
applications of the estimator (and so unbiasedness doesn't focus on the
magnitude of the error in estimation, but only on a balance in the
tendency to overestimate and underestimate).
Table 5.3 shows that even though the sample proportion is
unbiased, it doesn't produce an estimate close to the estimand with high
probability. In this case, it isn't that the estimator is defective
(the sample proportion is the best estimator to use in this situation),
but rather its poor performance is due to the sample size being so
small --- with only two observations, one cannot expect to have a good
estimate of p. Note that with
n = 2 the most likely value for the sample proportion is 0,
whereas with
n = 20 the most likely value for the sample proportion is 0.3
(see
Table 5.4 on p. 154),
which is the actual value of p.
- (p. 155, Example 5.7) For all values of n, the mean
of the sample proportion is equal to the estimand, p. But the
variance of the sample proportion is
p(1 - p)/n, which is a decreasing function of
n. From Fig. 5.5 it can be seen that the probability mass
becomes more highly concentrated near p as n increases,
and this fact is also shown in
Table 5.5. If
Fig. 5.5 were to be extended to include a really large sample
size, it could be seen that with very high probability the sample
proportion will take a value extremely close to p. (Recall that
the law of large numbers gives us that a sample mean, which is
what the sample proportion is (if we view the number of successes as
being a sum of n Bernoulli random variables), converges to the
mean of the distribution (which is p for the Bernoulli random
variables of the sample mean).)
Section 5.3
- (p. 158, 1st paragraph) The facts given here about the sampling
distribution of the sample mean assume that the sample mean is based on
iid (independent identically distributed) random variables --- this is
implied on the bottom of p. 157 with the reference to random samples.
The mean and variance of the sampling distribution of the sample mean
(or one could just refer to the mean and variance of the sample mean
when taking it to be a random variable (so upper-case), as opposed to its
value based on a particular sample), can be obtained using the four
rules on pp. 100-101, as I showed in class as I led up to the law of
large numbers.
- (p. 158, Example 5.8)
The first part of the 3rd sentence
should never have been written.
- (p. 159, Theorem 5.1) The first two parts are addressed at
the top of p. 158. The third part is new, and will be shown to be very
important as we cover the next two chapters. Note that all of the parts
together give us that the sampling distribution of the sample mean is
either normal or approximately normal, with a mean equal to the mean of
the distribution of the random variables making up the sample mean, and
a variance which decreases as the sample size increases. All of this is
consistent with the law of large numbers --- as the variance gets
smaller with increasing sample size, the probability mass associated
with the sample mean becomes more
highly concentrated about the mean of the parent distribution of the
observations, and so for very large n the sample mean will assume
a value very close to the mean of the parent distribution (which is what
is meant by stating that the sample mean converges to the mean of the
parent distribution).
- (p. 159, Fig. 5.7) Although the general idea is to show how
the probability mass is more highly concentrated about the mean
for the sampling distribution of the sample mean compared to the
distribution associated with just a single observation, the labeling of
the axes is rather screwy (and I advise not spending a lot of time
trying to figure it out).
- (p. 161, Example 5.10) This example is similar in spirit to
Example 5.7 from the preceding section.
- (p. 163, Example 5.12) This example shows that histograms
aren't necessarily good estimates of the density of the parent
distribution of a sample, since 8 different samples from the same
distribution produce rather different histograms.
Section 5.4
- (p. 168, lines 7 & 8) The phrase "violently skewed" is just a bit
too whacky for me --- I think that highly skewed is a
more suitable expression to use.
- (p. 169, 1st paragraph) This paragraph touches upon some important
points. (However, even though one might see statements similar to the
2nd to the last sentence of the paragraph in other
books, I think that there are situations in which the distribution mean
is still a very relevant summary measure for a highly skewed distribution.
Even though the distribution mean isn't a typical value for a single
observation from the distribution, if ones considers a sample of values,
the mean of the parent distribution can be a typical value for the
sample mean.)
Section 5.5
(p. 172, Example 5.17) Although the availability of
statistical software makes using the approximation to obtain
(approximate) probabilities less important than it used to be, there
still can be times when one may want to employ the approximation (e.g.,
if one doesn't have suitable software handy, or n is so large
that it causes a problem for the software). Despite that we may not use
the approximation so much for numerical work, the approximation as
expressed in
Theorem 5.2 on p. 170 is still very important for the
justification of certain statistical procedures.
(p. 174) The rule of thumb given may not be the best one to use.
An alternate rule of thumb is to require that n be at least as
large as 9 times the larger of
p/(1 - p)
and (1 - p)/p.
Section 5.6
- (p. 176) The last paragraph is very important --- we can compare
the anticipated performances of various statistics by comparing their
sampling distributions (or sometimes, in practice, their estimated or
approximated sampling distributions). Also, while it is true that the
sample median is better than the sample mean for estimating the
mean/median/center of some symmteric distributions, the distribution has
to be pretty odd for this to be the case. However, for many
heavy-tailed symmetric distributions, some other estimator of the
mean/median/center can be superior to both the sample mean and the
sample median.