MTB > # Let's first examine the data a bit.
MTB > dotplot c1-c3;
SUBC> same.
:
.: . . .:... . . .. . . .
+---------+---------+---------+---------+---------+-------Normals
. :... . :. .: : . . .
+---------+---------+---------+---------+---------+-------Alloxan
. .
::: :.: : . .. .
+---------+---------+---------+---------+---------+-------Allox+in
0 120 240 360 480 600
MTB > desc c1-c3
N MEAN MEDIAN TRMEAN STDEV SEMEAN
Normals 20 186.1 124.5 169.6 158.8 35.5
Alloxan 18 181.8 139.5 172.6 144.8 34.1
Allox+in 19 112.9 82.0 97.8 105.8 24.3
MIN MAX Q1 Q3
Normals 14.0 655.0 92.0 274.7
Alloxan 13.0 499.0 70.3 276.0
Allox+in 18.0 465.0 44.0 133.0
MTB > let k90 = 1
MTB > exec 'qqnorm'
Executing from file: qqnorm.MTB
C92 - *
-
- *
1.2+ *
- *
- *
- * *
- **
0.0+ **
- 2
- **
- *
- *
-1.2+ *
- *
-
- *
+---------+---------+---------+---------+---------+------C90
0 120 240 360 480 600
MTB > exec 'skku'
Executing from file: skku.MTB
skewness 1.62836
kurtosis 2.94431
MTB > let k90 = 2
MTB > exec 'qqnorm'
Executing from file: qqnorm.MTB
C92 - *
-
- *
1.2+
- * *
- *
- *
- 2
0.0+ * *
- * *
- *
- *
- **
-1.2+
- *
-
- *
+---------+---------+---------+---------+---------+------C90
0 100 200 300 400 500
MTB > exec 'skku'
Executing from file: skku.MTB
skewness 1.13609
kurtosis 0.371562
MTB > let k90 = 3
MTB > exec 'qqnorm'
Executing from file: qqnorm.MTB
C92 - *
-
- *
1.2+ *
- *
- *
- * *
- 2
0.0+ *
- 2
- * *
- *
- *
-1.2+ *
- *
-
- *
+---------+---------+---------+---------+---------+------C90
0 100 200 300 400 500
MTB > exec 'skku'
Executing from file: skku.MTB
skewness 2.30892
kurtosis 6.40979
MTB > # Because the sample sizes are nearly the same, the normal theory
MTB > # procedures ought to be fairly accurate for tests of the general
MTB > # k sample problem. (If the null hypothesis is true, the distributions
MTB > # are identical, and in addition to the variances being equal, there
MTB > # would be tremendous cancellation of skewness. So if the null
MTB > # hypothesis is true, the test statistic's actual sampling distribution
MTB > # shouldn't be too different from what it is in the case of iid normal
MTB > # random variables. If differences in skewness and variance lead to a
MTB > # rejection, then fine ... if there are differences in the distributions
MTB > # we want to get a rejection of the null hypothesis of identical dist'ns.)
MTB > # Still, although the normal theory procedures are fairly robust for
MTB > # validity, it could be that some of the nonparamteric procedures are
MTB > # more powerful.
MTB > # (Note: From a previous session, I've already got the three samples stacked
MTB > # into c5, with the groups indicated by c6. Also, I have already named the
MTB > # Minitab columns. (I'm using a Minitab worksheet I had saved previously.))
MTB > oneway c5 c6;
SUBC> tukey 0.1.
ANALYSIS OF VARIANCE ON albumen
SOURCE DF SS MS F p
tr group 2 64357 32178 1.67 0.197
ERROR 54 1037470 19212
TOTAL 56 1101827
INDIVIDUAL 95 PCT CI'S FOR MEAN
BASED ON POOLED STDEV
LEVEL N MEAN STDEV --+---------+---------+---------+----
1 20 186.1 158.8 (---------*---------)
2 18 181.8 144.8 (----------*----------)
3 19 112.9 105.8 (----------*---------)
--+---------+---------+---------+----
POOLED STDEV = 138.6 60 120 180 240
Tukey's pairwise comparisons
Family error rate = 0.100
Individual error rate = 0.0407
Critical value = 2.97
Intervals for (column level mean) - (row level mean)
1 2
2 -90
99
3 -20 -27
166 165
MTB > # Since all of the confidence intervals indicated above contain 0,
MTB > # we can conclude that the p-value for the Tukey-Kramer test exceeds
MTB > # 0.10. After trying various values with the tukey subcommand, I
MTB > # concluded that the p-value is about 0.23 or 0.24.
*******************************************************************************
*** StatXact ***
To do this test in StatXact use
Basic_Statistics > ANOVA...
Then click Var1 into the Value box (assuming the 57 observations from the three
samples are stacked into Var1) and click Var2 into the Factor 1 box (assuming
Var2 has 20 1s, followed by 18 2s, followed by 19 3s). Finally, click OK. The
resulting p-value is in agreement with Minitab's p-value.
********************************************************************************
MTB > mood c5 c6
Mood median test of albumen
Chisquare = 3.32 df = 2 p = 0.191
Individual 95.0% CI's
tr group N<= N> Median Q3-Q1 ---+---------+---------+---------+---
1 10 10 125 183 (-+------------------)
2 7 11 139 206 (---------+-------------)
3 13 6 82 89 (-----+-------)
---+---------+---------+---------+---
60 120 180 240
Overall median = 122
MTB > # If one incorporates the adjustment factor given near the middle of p. 10-6
MTB > # of the class notes, the value of the adjusted statistic is about 3.26 and
MTB > # the corresponding p-value is 0.196.
***********************************************************************************
*** StatXact ***
To do this test in StatXact use
Nonparametics > K Independent Samples > Median...
Then click Var1 into the Response box (assuming the 57 observations from the three
samples are stacked into Var1) and click Var2 into the Population box (assuming
Var2 has 20 1s, followed by 18 2s, followed by 19 3s). Next, click to select Exact
under Compute, and then click OK. The resulting asymptotic p-value is in close
agreement with Minitab's p-value (0.190 for StatXact vs. 0.191 for Minitab). But
the preferred exact p-value is about 0.202.
***********************************************************************************
MTB > krus c5 c6
LEVEL NOBS MEDIAN AVE. RANK Z VALUE
1 20 124.50 32.2 1.05
2 18 139.50 32.4 1.06
3 19 82.00 22.4 -2.11
OVERALL 57 29.0
H = 4.44 d.f. = 2 p = 0.109
H = 4.45 d.f. = 2 p = 0.109 (adj. for ties)
****************************************************************************************
*** StatXact ***
To do this test in StatXact use
Nonparametics > K Independent Samples > Kruskal-Wallis...
Then click Var1 into the Response box (assuming the 57 observations from the three
samples are stacked into Var1) and click Var2 into the Population box (assuming
Var2 has 20 1s, followed by 18 2s, followed by 19 3s). The sample sizes are too large
for an exact p-value to be obtained, so the Monte Carlo option will be used. Click to
select Exact using Monte Carlo under Compute, and then click Options. Next click the
Monte Carlo tab (when the Options box appears). Change the Crude Monte Carlo Sample Size
from 10000 to 1000000. Change the Random Number Seed from Clock to Fixed, and use the
default seed of 23456. Then click OK to close the Options box, and finally click OK to
run the K-W test routine. The resulting asymptotic p-value is in close agreement with
Minitab's p-value (0.108 for StatXact vs. 0.109 for Minitab). The Monte Carlo estimate of
the exact p-value is about 0.108, and since the interval estimate for the exact p-value
is about (0.107, 0.109) we can be fairly confident that the p-value rounds to 0.11.
****************************************************************************************
MTB > # The average ranks for the three groups are about 32.15, 32.42, and 22.45.
MTB > # Using these values it can be determined that the value of the rank analog
MTB > # of the Tukey-Kramer test statistic is about 2.582. It can be concluded
MTB > # that the (approx.) p-value exceeds 0.10.
****************************************************************************************
*** StatXact ***
Although StatXact doesn't do this test, it can be of some help. If the data values are
in Var1 and the indicators of group/population are in Var2, one can put the ranks of the
data values in Var3 using
DataEditor > Compute Scores...
Click Var1 into the Response box, type Var3 in the Target Variable box. Make sure that
Wilcoxon (Mid-Rank) is selected under Score, and click OK. Now
Basic_Statistics > Descriptive Statistics...
can be used to obtain the average ranks. Click Var3 into the Selected Variables box, and
click Var2 into the By Variable 1 box. (This will result in summaries for each of the 3
groups.) Under Central Tendency click to select Mean, and under Summary click to select Sum.
Then click OK. Using a calculator to divide the sums by the sample sizes avoids rounding
error in the reported means (which correspond to the average ranks).
****************************************************************************************
MTB > # Doing M-W tests on all pairs of the three samples results in
MTB > # the following values of u_ij:
MTB > # u_12 = 179,
MTB > # u_13 = 254,
MTB > # & u_23 = 231.5.
MTB > # From these it can be obtained that the value of the Steel-Dwass
MTB > # test statistic is about 2.600. It can be concluded (using tables
MTB > # of critical values from studentized range distributions) that the
MTB > # (approx.) p-value exceeds 0.10 (since the test statistic value is
MTB > # less than the 0.10 critical value).
**********************************************************************************************
*** StatXact ***
Although StatXact doesn't do this test, it can be of some help. In the CaseData editor, copy
and paste values in Var1 (assuming that's where the data values are) and Var2 (assuming that's
where the indicators of group/population are) to create 6 new columns (Vars). In the first two
new columns, copy and paste the data values and group indicators for samples 1 and 2. In the
next two columns, copy and paste the data values and group indicators for samples 1 and 3. And
then in the last two new columns, copy and paste the data values and group indicators for samples
2 and 3. Then use
Nonparametrics > Two Independent Samples > Wilcoxon-Mann-Whitney...
to do a Wilcoxon rank sum test on samples 1 and 2. In the output, the value under Observed is the
sum of the ranks for sample 1. To get the value of the M-W U statistic, subtract off n_1(n_1 + 1)/2.
Proceed in a similar manner to get the other two M-W statistic values needed for the Steel-Dwass test,
remembering to subtract off n_2(n_2 + 1)/2 instead of n_1(n_1 + 1)/2 when you do the test on samples
2 and 3.
**********************************************************************************************
*** StatXact ***
Other tests can be done with StatXact. One can do the normal scores test described in
the class notes (using van der Waerden scores) using
Nonparametics > K Independent Samples > Normal-Scores...
Using the Monte Carlo option (with 1000000 Monte Carlo trials) one gets a p-value of about
0.15. (The distribution skewness makes the K-W test a better performer. But the normal
scores test yields a smaller p-value than the one-way ANOVA F test. (The extreme observations
resulting from the skewness makes the F test a bit conservative.))
One can also use Savage scores, using
Nonparametics > K Independent Samples > Savage...
Using the Monte Carlo option (with 1000000 Monte Carlo trials) one gets a p-value of about
0.15. (I'll confess that I had expected the Savage scores test to yield a smaller p-value
since it typically does well for moderately skewed distributions.)
One can do a k-sample permutation test, using
Nonparametics > K Independent Samples > ANOVA with General Scores...
Using the Monte Carlo option (with 1000000 Monte Carlo trials) one gets a p-value of about
0.20 (close to the ANOVA F test result).
To do a percentile modifed rank test one first needs to create the desired scores. Suppose
we want to use
-24, -23, -22, ..., -3, -2, -1, 0, 0, 0, ..., 0, 0, 1, 2, 3, ..., 22, 23, 24.
To create these scores, first bring the CaseData editor to the front, and then use
DataEditor > Compute Score...
Click Var1 into the response box and type Var3 in the Target Variable box. Select Wilcoxon
under Score, and then click OK. Now in the next column of the CaseData editor (Var4) type
-24 in Var4 next to 1 in Var3, type -23 in Var4 next to 2 in Var3, and so on, eventually
typing -1 in Var4 next to 24 in Var3. Next type 24 in Var4 next to 57 in Var3, type 23 in
Var4 next to 56 in Var3, ..., and type 1 in Var4 next to 34 in Var3. Fill in the other Var4
spots with nine 0s. (Note: If noninteger midranks are encountered in Var3, also use midranks
appropriately in Var4.) Now use
Nonparametics > K Independent Samples > ANOVA with General Scores...
Click Var4 (the percentile modified rank scores) in as the Response. Using the Monte Carlo
option (with 1000000 Monte Carlo trials) one gets a p-value of about 0.12.
Finally, one can use StatXact to do the extension of the median test described on p. 10-9 of
of the class notes. One can use m = 3, with t_1 = t_2 = t_3 = N/3 = 19. The midranks in Var3
of the CaseData editor can be used to determine the observations in the lower third, middle
third, and upper third of the ordered combined sample. These lead to the following 3 by 3
table.
lower middle upper
------------------
sample 1 | 4 | 8 | 8 | 20
------------------
sample 2 | 5 | 5 | 8 | 18
------------------
sample 3 | 10 | 6 | 3 | 19
------------------
19 19 19
Use
File > New...
and then select Table Data from the available file types and click OK. Then request 1 table
with 3 rows and 3 columns, and click OK. Next fill in the counts shown above into the 3 by 3
table. Then use
Nonparametrics > Unordered R x C Table > Pearson's Chi-square...
Select Exact under Compute and click OK. The exact p-value is about 0.175 (whereas the chi-square
approximation results in an approximate p-value of about 0.165).