*** Note: StatXact info is inserted below.
MTB > # I'll enter the data.
MTB > set c1
DATA> 4.6 4.9 5.0 5.7 6.3 6.8 7.4 7.9
DATA> end
MTB > set c2
DATA> 4.7 5.0 5.1 5.8 6.4 6.6 7.1 8.3
DATA> end
MTB > set c3
DATA> 5.6 5.9 6.6 6.7 6.8 7.4 8.3 9.6
DATA> end
MTB > set c4
DATA> 6.0 6.8 8.1 8.4 8.6 8.9 9.8 11.5
DATA> end
MTB > name c1 'Age 15' c2 'Age 20' c3 'Age 25' c4 'Age 30'
MTB > # Now let's look at the data and some summary statistics.
MTB > print c1-c4
ROW Age 15 Age 20 Age 25 Age 30
1 4.6 4.7 5.6 6.0
2 4.9 5.0 5.9 6.8
3 5.0 5.1 6.6 8.1
4 5.7 5.8 6.7 8.4
5 6.3 6.4 6.8 8.6
6 6.8 6.6 7.4 8.9
7 7.4 7.1 8.3 9.8
8 7.9 8.3 9.6 11.5
MTB > dotplot c1-c4;
SUBC> same.
. : . . . . .
---+---------+---------+---------+---------+---------+---Age 15
. .. . .. . .
---+---------+---------+---------+---------+---------+---Age 20
. . .: . . .
---+---------+---------+---------+---------+---------+---Age 25
. . . .. . . .
---+---------+---------+---------+---------+---------+---Age 30
4.5 6.0 7.5 9.0 10.5 12.0
MTB > desc c1-c4
N MEAN MEDIAN TRMEAN STDEV SEMEAN
Age 15 8 6.075 6.000 6.075 1.226 0.433
Age 20 8 6.125 6.100 6.125 1.221 0.432
Age 25 8 7.113 6.750 7.113 1.308 0.462
Age 30 8 8.512 8.500 8.512 1.697 0.600
MIN MAX Q1 Q3
Age 15 4.600 7.900 4.925 7.250
Age 20 4.700 8.300 5.025 6.975
Age 25 5.600 9.600 6.075 8.075
Age 30 6.000 11.500 7.125 9.575
MTB > # Before trying the J-T test and the rank test in the spirit of the
MTB > # Abelson-Tukey test (described on p. 88 of Beyond ANOVA), I'm curious
MTB > # as to what the ANOVA F test and the K-W test will give as p-values.
MTB > stack c1-c4 c5;
SUBC> subs c6.
MTB > name c5 'min dist' c6 'age gr'
MTB > oneway c5 c6
ANALYSIS OF VARIANCE ON min dist
SOURCE DF SS MS F p
age gr 3 31.31 10.44 5.50 0.004
ERROR 28 53.09 1.90
TOTAL 31 84.40
INDIVIDUAL 95 PCT CI'S FOR MEAN
BASED ON POOLED STDEV
LEVEL N MEAN STDEV --------+---------+---------+--------
1 8 6.075 1.226 (--------*-------)
2 8 6.125 1.221 (-------*-------)
3 8 7.113 1.308 (-------*--------)
4 8 8.512 1.697 (-------*-------)
--------+---------+---------+--------
POOLED STDEV = 1.377 6.0 7.2 8.4
MTB > krus c5 c6
LEVEL NOBS MEDIAN AVE. RANK Z VALUE
1 8 6.000 11.4 -1.78
2 8 6.100 11.8 -1.63
3 8 6.750 17.8 0.46
4 8 8.500 25.0 2.96
OVERALL 32 16.5
H = 11.11 d.f. = 3 p = 0.011
H = 11.13 d.f. = 3 p = 0.011 (adj. for ties)
MTB > # To get the M-W U statistic values needed for the J-T test, I can do a
MTB > # bunch of M-W tests. The output gives the sum of the ranks for the 1st
MTB > # sample, but by subtracting 1+2+3+4+5+6+7+8 = 36 from the sum of the ranks,
MTB > # I'll get the number of comparisons (out of 8 times 8 = 64) for which the
MTB > # observation in the 1st sample is larger than the observation in the 2nd
MTB > # sample.
MTB > mann c2 c1
Mann-Whitney Confidence Interval and Test
Age 20 N = 8 Median = 6.100
Age 15 N = 8 Median = 6.000
Point estimate for ETA1-ETA2 is 0.100
95.9 pct c.i. for ETA1-ETA2 is (-1.500,1.500)
W = 70.5
Test of ETA1 = ETA2 vs. ETA1 n.e. ETA2 is significant at 0.8336
The test is significant at 0.8335 (adjusted for ties)
Cannot reject at alpha = 0.05
MTB > mann c3 c1
Mann-Whitney Confidence Interval and Test
Age 25 N = 8 Median = 6.750
Age 15 N = 8 Median = 6.000
Point estimate for ETA1-ETA2 is 1.000
95.9 pct c.i. for ETA1-ETA2 is (-0.600,2.400)
W = 81.0
Test of ETA1 = ETA2 vs. ETA1 n.e. ETA2 is significant at 0.1893
The test is significant at 0.1886 (adjusted for ties)
Cannot reject at alpha = 0.05
MTB > mann c4 c1
Mann-Whitney Confidence Interval and Test
Age 30 N = 8 Median = 8.500
Age 15 N = 8 Median = 6.000
Point estimate for ETA1-ETA2 is 2.350
95.9 pct c.i. for ETA1-ETA2 is (0.700,4.001)
W = 93.5
Test of ETA1 = ETA2 vs. ETA1 n.e. ETA2 is significant at 0.0087
The test is significant at 0.0086 (adjusted for ties)
MTB > mann c3 c2
Mann-Whitney Confidence Interval and Test
Age 25 N = 8 Median = 6.750
Age 20 N = 8 Median = 6.100
Point estimate for ETA1-ETA2 is 0.900
95.9 pct c.i. for ETA1-ETA2 is (-0.500,2.400)
W = 83.0
Test of ETA1 = ETA2 vs. ETA1 n.e. ETA2 is significant at 0.1278
The test is significant at 0.1272 (adjusted for ties)
Cannot reject at alpha = 0.05
MTB > mann c4 c2
Mann-Whitney Confidence Interval and Test
Age 30 N = 8 Median = 8.500
Age 20 N = 8 Median = 6.100
Point estimate for ETA1-ETA2 is 2.300
95.9 pct c.i. for ETA1-ETA2 is (0.600,3.900)
W = 93.0
Test of ETA1 = ETA2 vs. ETA1 n.e. ETA2 is significant at 0.0101
MTB > mann c4 c3
Mann-Whitney Confidence Interval and Test
Age 30 N = 8 Median = 8.500
Age 25 N = 8 Median = 6.750
Point estimate for ETA1-ETA2 is 1.500
95.9 pct c.i. for ETA1-ETA2 is (-0.600,3.000)
W = 85.5
Test of ETA1 = ETA2 vs. ETA1 n.e. ETA2 is significant at 0.0742
The test is significant at 0.0740 (adjusted for ties)
Cannot reject at alpha = 0.05
MTB > # We have that
MTB > # u_21 = 34.5, u_31 = 45, u_41 = 57.5,
MTB > # u_32 = 47, u_42 = 57,
MTB > # u_43 = 49.5,
MTB > # which gives us that b = 290.5 (where b is the observed
MTB > # value of the JT test statistic). Since the table in
MTB > # G&C doesn't have values for 4 samples of size 8, I'll
MTB > # use the normal approximation to get a p-value. Under
MTB > # the null hypothesis of identical distributions, the
MTB > # expected value is 192 and the variance is 2656/3.
MTB > # Since we reject for large values of b, using a continuity
MTB > # correction, the approximate p-value is
MTB > # 1 - Phi( ( 290.5 + 1/2 - 192 )/sqrt( 2656/3 )
MTB > # = Phi( (192 - 1/2 - 290.5 )/sqrt( 2656/3 ) ).
MTB > let k1 = (192 - 1/2 - 290.5)/sqrt(2656/3)
MTB > cdf k1 k2;
SUBC> norm 0 1.
MTB > name k1 'z' k2 'p-value'
MTB > print k1 k2
z -3.32722
p-value 0.000438631
MTB > # So the (approx) p-value is about 0.0004 (about 10 times smaller
MTB > # than the p-value from the nondirectional F test p-value).
*************************************************************************************
*** StatXact info ***
Having put the data values in Var1 (of the CaseData editor) and the group indicators
(8 1s, followed by 8 2s, followed by 8 3s, followed by 8 4s) in Var2, use
Nonparametrics > K Independent Samples > Jonckheere-Terpstra...
Click Var1 into the Response box, Var2 into the Population box, select Exact under
Compute, and click OK.
The test statistic value and mean match what I have above, but the standard deviation
makes an adjustment for ties that I didn't bother to do ... and StatXact doesn't use a
continuity correction. StatXact's asymptotic p-value is about 0.00046, which is close
to what I got with Minitab. But the preferable exact p-value is about 0.00034.
*************************************************************************************
MTB > # Now I'll do an approximate version of the rank test in the spirit of the
MTB > # Abelson-Tukey test (described on p. 88 of Beyond ANOVA).
MTB > rank c5 c7
MTB > unstack c7 c11-c14;
SUBC> subs c6.
MTB > print c11-c14
ROW C11 C12 C13 C14
1 1.0 2.0 7.0 11
2 3.0 4.5 10.0 18
3 4.5 6.0 14.5 24
4 8.0 9.0 16.0 27
5 12.0 13.0 18.0 28
6 18.0 14.5 21.5 29
7 21.5 20.0 25.5 31
8 23.0 25.5 30.0 32
MTB > # These columns contain the midranks for the four age groups.
MTB > let k11 = mean(c11)
MTB > let k12 = mean(c12)
MTB > let k13 = mean(c13)
MTB > let k14 = mean(c14)
MTB > let k15 = k11 + 2*k12 + 3*k13 + 4*k14
MTB > # k15 contains the value of the L statistic.
MTB > print k11-k15
K11 11.3750
K12 11.8125
K13 17.8125
K14 25.0000
K15 188.437
MTB > # Under the null hypothesis, the mean of the statistic is 165
MTB > # and the variance is 55. So the approximate p-value from
MTB > # an upper-tailed test is
MTB > # 1 - Phi( ( 188.437 - 165 )/sqrt(55) )
MTB > # = Phi( (165 - 188.437)/sqrt(55) ).
MTB > let k1 = (165 - k15)/sqrt(55)
MTB > cdf k1 k2;
SUBC> norm 0 1.
MTB > print k1 k2
z -3.16031
p-value 0.000788093
MTB > # So the approximate p-value is about 0.0008.
MTB > save 'eyes'
Saving worksheet in file: eyes.MTW
*************************************************************************************
*** StatXact info ***
This test isn't on StatXact's menus. But we can "trick" StatXact into doing the test.
Having put the data values in Var1 (of the CaseData editor) and the group indicators
(8 1s, followed by 8 2s, followed by 8 3s, followed by 8 4s) in Var2, we can set up
two additional columns. Use
DataEditor > Compute Scores...
to put the Wilcoxon ranks/midranks into Var3. Then use
DataEditor > Transform Variable...
Type Var4 into the Target Variable box. In the big box after the = sign, type Var2/8
and click OK. (Note: In general, the values in Var4 need to be i/n_i. E.g., if the
1st group was 8 observations, the 2nd group 5 observations, and the 3rd (and last)
group 4 observations, Var4 should have 8 values of 1/8 = 0.125, followed by 5 values
of 2/5 = 0.2, followed by 4 values of 3/4 = 0/75.)
Now use
Nonparametrics > K Independent Samples > Linear-by-linear Association...
Click Var3 into the Response box, Var4 into the Population box, select Exact under
Compute, and click OK.
The test statistic value and mean match what I have above, but the standard deviation
makes an adjustment for ties that I didn't bother to do. StatXact's asymptotic p-value
is about 0.00078, which is close to what I got using Minitab. But the preferable exact
p-value is about 0.00045.
*************************************************************************************