Ch. 1 notes for S&W, STAT 535

Some Comments about Chapter 1 of Samuels & Witmer

Section 1.1

This short section gives a nice description of the type of data analysis that we'll be dealing with: the analysis of data from designed experiments. We need to know how to make good inferences from experiments that have some type of experimental "noise" associated with the results; e.g., how to account for the variability due to the fact that measurements were only taken for a random sample instead of the entire population.

Section 1.2

Example 1.1
On p. 3 the point is made that if the sample sizes had been smaller, the lack of variability of outcomes within the two treatment groups shouldn't be taken to be conclusive evidence of a treatment effect. E.g., if there were only 3 in each group, an observed outcome of all surviving in the treatment group and all having died in the control group would occur with a probability of 1/20 under one scenerio consistent with the hypotheis that the vaccine treatment has no effect whatsoever. While the smallish probability of 0.05 suggests that the treatment may be effective, the point is that we can feel much more comfortable in reaching that conclusion given the results of Table 1.1 based on the larger sample sizes, since with 24 in each group, the probability that just the random selection of the animals for the treatment group led to the observed results is way less than one billionth, instead of being 1/20. Noting that the observed results are very inconsistent with the hypothesis of no treatment effect, since random selection would yield such a clear pattern with very small probability, it's reasonable to think that something else, namely a very effective vaccine, is responsible for the observed difference in the outcomes for the two groups. (At this point in the course, you don't have to know how to obtain such probabilities.)

Example 1.2
In contrast to the observed results of the experiment of the previous example, the results here suggest a variable response under both treatments. One explanation is that not all of the mice in a group are identical, and some of the mice get a tumor under the circumstances of the treatment, and others do not. But if we acknowledge that differences in the mice can lead to different responses, we need to be concerned about the possibility that the observed differences could be due to only differences in the mice, and that the differences in the treatments may not contribute to the different cancer rates at all. To arrive at a claim of solid evidence for a difference due to treatment choice, we need to show that the observed results are rather inconsistent with the possibility that just random selection of the mice for the two groups could lead to such a large observed difference of the cancer rates.

Example 1.3
Suppose that the flooding had no effect on the ATP. In such a case, all 8 values can be viewed has having come from the same distribution, and it can be shown that the chance that the values in one set of 4 are all larger than the values in another set of 4 is only 1/70. In this case, even though there are only 4 observations in each group, we have fairly strong evidence that the flooding effects ATP.

Example 1.4
Some issues concerning the sampling and measurements are brought up. It's always best to have random samples of subjects, and if treatments are assigned after the subjects are selected, it's best to use randomization in the assignment of the treatments (although in this example, the role of the treatments is assumed by the 3 classes of patients, and the patients were diagnosed as opposed to being randomly assigned to the treatments --- and so here three random samples needed to be drawn from three different populations). Sometimes, subjects are not actually randomly drawn from a larger population, but instead the people, animals, plants, plots of land, etc. used in the experiment are just the ones which are convenient to use. (E.g., a GMU student wanted to make a comparison between soil in the Piedmont region of VA and soil in the Plains region of VA. While in principle she could have randomly picked locations using a map of VA, since she wouldn't necessarily have access to the regions randomly selected, she instead used 3 Piedmont sites and 3 Plains sites that she had access to and which weren't so far apart in order to make it easier to collect the necessary data. She chose to view the sites used as being representative of the regions, but perhaps she should not have done so since they were randomly selected. From her experimental results, she could safely make conclusions about the soil in the parts of the Piedmont and Plains regions which are similar to the sites she used, but it was stretching it a bit to claim anything about the Piedmont and Plains regions in general.)

Example 1.5
Since there is a worry about lack of independence (since one larvae may make a choice, deliberate or not, and others may simply follow), with the design indicated in Fig. 1.3 (b) on p. 5, we can claim to have 6 independent trails even if the larvae do not act independently. (The Chapter Notes on p. 633 state that actually 11 dishes were used instead of 6, which greatly improves the quality of the experiment.)

Example 1.6
The Chapter Notes on p. 633 touch upon some of the extra things I mentioned when I extended the example in class from what is on pp. 5-6. The entire article can be found using the E-Journal Finder on the GMU libraries web site. All of the measurements were obtained from dead people from southern California hospitals. The subjects used were put into matched sets of three --- matching one homosexual man, one heterosexual man, and one heterosexual woman in each set of three subjects by age in order to reduce any variation due to age. That being the case, a two-way ANOVA should have been used, but the article indicates that a one-way ANOVA was used --- which is so wrong! (By creating the matched sets and doing a one-way ANOVA they eliminated some of the differences due to age, but they did not account for the variation due to age, and if they had done that there wouldn't have had to have been as great of an allowance for any observed differences possibly being due to natural variation among the subjects. In this particular case, the important differences were large enough so that the failure to reduce the unexplained experimental noise didn't hurt them so much, but in general what they did is very bad practice.) Even using a two-way ANOVA, there is a worry about inadequate matching, since so many more of the homosexul men died of AIDS (and matching by age doesn't directly reduce variation due to other subject differences, like whether or not they died of AIDS). An alternative approach would be to create a multiple regression model, in which differences between subjects can be adjusted for instead of attempting to use matching to cancel out the effects of the differences. (If we can randomly assign the subjects in the matched sets to the various treatment groups, then we don't have to worry as much about having only a loose matching. But in a case like this one, where we do not randomly assign the subjects to the treatment groups, we don't have randomization to give us a fair experiment even if matching isn't so good, and so it may be better to try to incorporate more variables into the model instead of hoping that the effects due to other variables cancel out due to the matching.) Upon reading the article, I can see that they thought about trying to adjust for other differences in an attempt to isolate differences due to sexual orientation, but I also suspect that the statistical analysis wasn't done in the best possible way. In addition to not using a two-way ANOVA where it was appropriate, there is also mention of a certain multiple comparison procedure which may not have been the best one to use. (That the statistical analysis was not done properly does not surprise me, although medical studies tend to be better than others. I've witnessed improper statistical analysis here at GMU, and have been greatly disappointed that in some cases the researchers were resistant to changing to better methods once inadequacies were made known to them.)

Example 1.7
Even if there is no interaction between sex and dose, it's still a good idea to adjust for sex when determining the effect of the dose difference. If sex differences contribute to the observed variation, and we can ajust for some of the variation due to sex, we have less variation to muddy the waters when trying to assess the effect of the dose difference.

Example 1.8
I'd like to know how the energy expenditures were measured. It appears that GMU doesn't have the 1981 volume of the journal, and it doesn't seem to be available using the E-Journal Finder.