Some Comments about Chapter 1 of Samuels & Witmer
Section 1.1
This short section gives a nice description of the type of data analysis
that we'll be dealing with: the analysis of data from designed
experiments. We need to know how to make good inferences from
experiments that have some type of experimental "noise" associated with
the results; e.g., how to account for the variability due to the fact that
measurements were only taken for a random sample instead of the entire
population.
Section 1.2
Example 1.1
On p. 3 the point is made that if the sample sizes had been smaller, the
lack of variability of outcomes within the two treatment groups
shouldn't be taken to be conclusive evidence of a treatment effect.
E.g., if there were only 3 in each group, an observed outcome of all
surviving in the treatment group and all having died in the control
group would occur with a probability of 1/20 under one scenerio
consistent with the hypotheis that the vaccine treatment has no effect
whatsoever. While the smallish probability of 0.05 suggests that the
treatment may be effective, the point is that we can feel much more
comfortable in reaching that conclusion given the results of Table 1.1
based on the larger sample sizes, since with 24 in each group, the
probability that just the random selection of the animals for the
treatment group led to the observed results is way less than one
billionth, instead of being 1/20. Noting that the observed results are
very inconsistent with the hypothesis of no treatment effect, since
random selection would yield such a clear pattern with very small
probability, it's reasonable to think that something else, namely a
very effective vaccine, is responsible for the observed difference in
the outcomes for the two groups. (At this point in the course, you
don't have to know how to obtain such probabilities.)
Example 1.2
In contrast to the observed results of the experiment of the previous
example, the results here suggest a variable response under both
treatments. One explanation is that not all of the mice in a
group are identical, and some of the mice get a tumor under the
circumstances of the treatment, and others do not. But if we
acknowledge that differences in the mice can lead to different
responses, we need to be concerned about the possibility that the
observed differences could be due to only differences in the
mice, and that the differences in the treatments may not contribute to
the different cancer rates at all. To arrive at a claim of solid
evidence for a difference due to treatment choice, we need to show that
the observed results are rather inconsistent with the possibility that
just random selection of the mice for the two groups could lead to such
a large observed difference of the cancer rates.
Example 1.3
Suppose that the flooding had no effect on the ATP. In such a case, all
8 values can be viewed has having come from the same distribution, and it
can be shown that the chance that the values in one set of 4 are all
larger than the values in another set of 4 is only 1/70. In this case,
even though there are only 4 observations in each group, we have fairly
strong evidence that the flooding effects ATP.
Example 1.4
Some issues concerning the sampling and measurements are brought up.
It's always best to have random samples of subjects, and if treatments
are assigned after the subjects are selected, it's best to use
randomization in the assignment of the treatments (although in this
example, the role of the treatments is assumed by the 3 classes of
patients, and the patients were diagnosed as opposed to being randomly
assigned to the treatments --- and so here three random samples needed
to be drawn from three different populations).
Sometimes, subjects are not actually randomly drawn from a larger
population, but instead the people, animals, plants, plots of land, etc.
used in the experiment are just the ones which are convenient to use.
(E.g., a GMU student wanted to make a comparison between soil in the
Piedmont region of VA and soil in the Plains region of VA. While in
principle she could have randomly picked locations using a map of VA,
since she wouldn't necessarily have access to the regions randomly
selected, she instead used 3 Piedmont sites and 3 Plains sites that
she had access to and which weren't so far apart in order to make it
easier to collect the necessary data. She chose to view the sites
used as being representative of the regions, but perhaps she should not
have done so since they were randomly selected. From her experimental
results, she could safely make conclusions about the soil in the parts
of the Piedmont and Plains regions which are similar to the sites she
used, but it was stretching it a bit to claim anything about the
Piedmont and Plains regions in general.)
Example 1.5
Since there is a worry about lack of independence (since one larvae may
make a choice, deliberate or not, and others may simply follow), with the
design indicated in Fig. 1.3 (b) on p. 5, we can claim to have 6
independent trails even if the larvae do not act independently. (The
Chapter Notes on p. 633 state that actually 11 dishes were used instead
of 6, which greatly improves the quality of the experiment.)
Example 1.6
The Chapter Notes on p. 633 touch upon some of the extra things I
mentioned when I extended the example in class from what is on pp. 5-6.
The entire article can be found using the E-Journal Finder on the
GMU libraries web site. All of the measurements were obtained from dead
people from southern California hospitals. The subjects used were put
into matched sets of three --- matching one homosexual man, one
heterosexual man, and one heterosexual woman in each set of three
subjects by age in order to reduce any variation due to age.
That being the case, a two-way ANOVA should have been
used, but the article indicates that a one-way ANOVA was used --- which
is so wrong! (By creating the matched sets and doing a one-way
ANOVA they eliminated some of the differences due to age, but they
did not account for the variation due to age, and if they had done that
there wouldn't have had to have been as great of an allowance for any
observed differences possibly being due to natural variation among the
subjects. In this particular case, the important differences were large
enough so that the failure to reduce the unexplained experimental noise
didn't hurt them so much, but in general what they did is very bad
practice.) Even using a two-way ANOVA, there is a worry about
inadequate matching, since so many more of the homosexul men died of AIDS
(and matching by age doesn't directly reduce variation due to other
subject differences, like whether or not they died of AIDS). An
alternative approach would be to create a multiple regression model, in
which differences between subjects can be adjusted for instead of
attempting to use matching to cancel out the effects of the differences.
(If we can randomly assign the subjects in the matched sets to the
various treatment groups, then we don't have to worry as much about
having only a loose matching. But in a case like this one, where we do
not randomly assign the subjects to the treatment groups, we don't have
randomization to give us a fair experiment even if matching isn't so
good, and so it may be better to try to incorporate more variables into
the model instead of hoping that the effects due to other variables
cancel out due to the matching.) Upon reading the article, I can see
that they thought about trying to adjust for other differences in an
attempt
to isolate differences due to sexual orientation, but I also suspect
that the statistical analysis wasn't done in the best possible way. In
addition to not using a two-way ANOVA where it was appropriate, there is
also mention of a certain multiple comparison procedure which may not
have been the best one to use. (That the statistical analysis was not
done properly does not surprise me, although medical studies tend to be
better than others. I've witnessed improper statistical analysis here
at GMU, and have been greatly disappointed that in some cases the
researchers were resistant to changing to better methods once
inadequacies were made known to them.)
Example 1.7
Even if there is no interaction between sex and dose, it's still a good
idea to adjust for sex when determining the effect of the dose difference.
If sex differences contribute to the observed variation, and we can ajust
for some of the variation due to sex, we have less variation to muddy
the waters when trying to assess the effect of the dose difference.
Example 1.8
I'd like to know how the energy expenditures were measured. It appears
that GMU doesn't have the 1981 volume of the journal, and it doesn't
seem to be available using the E-Journal Finder.