MTB > # Let's first examine the data a bit. MTB > dotplot c1-c3; SUBC> same. : .: . . .:... . . .. . . . +---------+---------+---------+---------+---------+-------Normals . :... . :. .: : . . . +---------+---------+---------+---------+---------+-------Alloxan . . ::: :.: : . .. . +---------+---------+---------+---------+---------+-------Allox+in 0 120 240 360 480 600 MTB > desc c1-c3 N MEAN MEDIAN TRMEAN STDEV SEMEAN Normals 20 186.1 124.5 169.6 158.8 35.5 Alloxan 18 181.8 139.5 172.6 144.8 34.1 Allox+in 19 112.9 82.0 97.8 105.8 24.3 MIN MAX Q1 Q3 Normals 14.0 655.0 92.0 274.7 Alloxan 13.0 499.0 70.3 276.0 Allox+in 18.0 465.0 44.0 133.0 MTB > let k90 = 1 MTB > exec 'qqnorm' Executing from file: qqnorm.MTB C92 - * - - * 1.2+ * - * - * - * * - ** 0.0+ ** - 2 - ** - * - * -1.2+ * - * - - * +---------+---------+---------+---------+---------+------C90 0 120 240 360 480 600 MTB > exec 'skku' Executing from file: skku.MTB skewness 1.62836 kurtosis 2.94431 MTB > let k90 = 2 MTB > exec 'qqnorm' Executing from file: qqnorm.MTB C92 - * - - * 1.2+ - * * - * - * - 2 0.0+ * * - * * - * - * - ** -1.2+ - * - - * +---------+---------+---------+---------+---------+------C90 0 100 200 300 400 500 MTB > exec 'skku' Executing from file: skku.MTB skewness 1.13609 kurtosis 0.371562 MTB > let k90 = 3 MTB > exec 'qqnorm' Executing from file: qqnorm.MTB C92 - * - - * 1.2+ * - * - * - * * - 2 0.0+ * - 2 - * * - * - * -1.2+ * - * - - * +---------+---------+---------+---------+---------+------C90 0 100 200 300 400 500 MTB > exec 'skku' Executing from file: skku.MTB skewness 2.30892 kurtosis 6.40979 MTB > # Because the sample sizes are nearly the same, the normal theory MTB > # procedures ought to be fairly accurate for tests of the general MTB > # k sample problem. (If the null hypothesis is true, the distributions MTB > # are identical, and in addition to the variances being equal, there MTB > # would be tremendous cancellation of skewness. So if the null MTB > # hypothesis is true, the test statistic's actual sampling distribution MTB > # shouldn't be too different from what it is in the case of iid normal MTB > # random variables. If differences in skewness and variance lead to a MTB > # rejection, then fine ... if there are differences in the distributions MTB > # we want to get a rejection of the null hypothesis of identical dist'ns.) MTB > # Still, although the normal theory procedures are fairly robust for MTB > # validity, it could be that some of the nonparamteric procedures are MTB > # more powerful. MTB > # (Note: From a previous session, I've already got the three samples stacked MTB > # into c5, with the groups indicated by c6. Also, I have already named the MTB > # Minitab columns. (I'm using a Minitab worksheet I had saved previously.)) MTB > oneway c5 c6; SUBC> tukey 0.1. ANALYSIS OF VARIANCE ON albumen SOURCE DF SS MS F p tr group 2 64357 32178 1.67 0.197 ERROR 54 1037470 19212 TOTAL 56 1101827 INDIVIDUAL 95 PCT CI'S FOR MEAN BASED ON POOLED STDEV LEVEL N MEAN STDEV --+---------+---------+---------+---- 1 20 186.1 158.8 (---------*---------) 2 18 181.8 144.8 (----------*----------) 3 19 112.9 105.8 (----------*---------) --+---------+---------+---------+---- POOLED STDEV = 138.6 60 120 180 240 Tukey's pairwise comparisons Family error rate = 0.100 Individual error rate = 0.0407 Critical value = 2.97 Intervals for (column level mean) - (row level mean) 1 2 2 -90 99 3 -20 -27 166 165 MTB > # Since all of the confidence intervals indicated above contain 0, MTB > # we can conclude that the p-value for the Tukey-Kramer test exceeds MTB > # 0.10. After trying various values with the tukey subcommand, I MTB > # concluded that the p-value is about 0.23 or 0.24. ******************************************************************************* *** StatXact *** To do this test in StatXact use Basic_Statistics > ANOVA... Then click Var1 into the Value box (assuming the 57 observations from the three samples are stacked into Var1) and click Var2 into the Factor 1 box (assuming Var2 has 20 1s, followed by 18 2s, followed by 19 3s). Finally, click OK. The resulting p-value is in agreement with Minitab's p-value. ******************************************************************************** MTB > mood c5 c6 Mood median test of albumen Chisquare = 3.32 df = 2 p = 0.191 Individual 95.0% CI's tr group N<= N> Median Q3-Q1 ---+---------+---------+---------+--- 1 10 10 125 183 (-+------------------) 2 7 11 139 206 (---------+-------------) 3 13 6 82 89 (-----+-------) ---+---------+---------+---------+--- 60 120 180 240 Overall median = 122 MTB > # If one incorporates the adjustment factor given near the middle of p. 10-6 MTB > # of the class notes, the value of the adjusted statistic is about 3.26 and MTB > # the corresponding p-value is 0.196. *********************************************************************************** *** StatXact *** To do this test in StatXact use Nonparametics > K Independent Samples > Median... Then click Var1 into the Response box (assuming the 57 observations from the three samples are stacked into Var1) and click Var2 into the Population box (assuming Var2 has 20 1s, followed by 18 2s, followed by 19 3s). Next, click to select Exact under Compute, and then click OK. The resulting asymptotic p-value is in close agreement with Minitab's p-value (0.190 for StatXact vs. 0.191 for Minitab). But the preferred exact p-value is about 0.202. *********************************************************************************** MTB > krus c5 c6 LEVEL NOBS MEDIAN AVE. RANK Z VALUE 1 20 124.50 32.2 1.05 2 18 139.50 32.4 1.06 3 19 82.00 22.4 -2.11 OVERALL 57 29.0 H = 4.44 d.f. = 2 p = 0.109 H = 4.45 d.f. = 2 p = 0.109 (adj. for ties) **************************************************************************************** *** StatXact *** To do this test in StatXact use Nonparametics > K Independent Samples > Kruskal-Wallis... Then click Var1 into the Response box (assuming the 57 observations from the three samples are stacked into Var1) and click Var2 into the Population box (assuming Var2 has 20 1s, followed by 18 2s, followed by 19 3s). The sample sizes are too large for an exact p-value to be obtained, so the Monte Carlo option will be used. Click to select Exact using Monte Carlo under Compute, and then click Options. Next click the Monte Carlo tab (when the Options box appears). Change the Crude Monte Carlo Sample Size from 10000 to 1000000. Change the Random Number Seed from Clock to Fixed, and use the default seed of 23456. Then click OK to close the Options box, and finally click OK to run the K-W test routine. The resulting asymptotic p-value is in close agreement with Minitab's p-value (0.108 for StatXact vs. 0.109 for Minitab). The Monte Carlo estimate of the exact p-value is about 0.108, and since the interval estimate for the exact p-value is about (0.107, 0.109) we can be fairly confident that the p-value rounds to 0.11. **************************************************************************************** MTB > # The average ranks for the three groups are about 32.15, 32.42, and 22.45. MTB > # Using these values it can be determined that the value of the rank analog MTB > # of the Tukey-Kramer test statistic is about 2.582. It can be concluded MTB > # that the (approx.) p-value exceeds 0.10. **************************************************************************************** *** StatXact *** Although StatXact doesn't do this test, it can be of some help. If the data values are in Var1 and the indicators of group/population are in Var2, one can put the ranks of the data values in Var3 using DataEditor > Compute Scores... Click Var1 into the Response box, type Var3 in the Target Variable box. Make sure that Wilcoxon (Mid-Rank) is selected under Score, and click OK. Now Basic_Statistics > Descriptive Statistics... can be used to obtain the average ranks. Click Var3 into the Selected Variables box, and click Var2 into the By Variable 1 box. (This will result in summaries for each of the 3 groups.) Under Central Tendency click to select Mean, and under Summary click to select Sum. Then click OK. Using a calculator to divide the sums by the sample sizes avoids rounding error in the reported means (which correspond to the average ranks). **************************************************************************************** MTB > # Doing M-W tests on all pairs of the three samples results in MTB > # the following values of u_ij: MTB > # u_12 = 179, MTB > # u_13 = 254, MTB > # & u_23 = 231.5. MTB > # From these it can be obtained that the value of the Steel-Dwass MTB > # test statistic is about 2.600. It can be concluded (using tables MTB > # of critical values from studentized range distributions) that the MTB > # (approx.) p-value exceeds 0.10 (since the test statistic value is MTB > # less than the 0.10 critical value). ********************************************************************************************** *** StatXact *** Although StatXact doesn't do this test, it can be of some help. In the CaseData editor, copy and paste values in Var1 (assuming that's where the data values are) and Var2 (assuming that's where the indicators of group/population are) to create 6 new columns (Vars). In the first two new columns, copy and paste the data values and group indicators for samples 1 and 2. In the next two columns, copy and paste the data values and group indicators for samples 1 and 3. And then in the last two new columns, copy and paste the data values and group indicators for samples 2 and 3. Then use Nonparametrics > Two Independent Samples > Wilcoxon-Mann-Whitney... to do a Wilcoxon rank sum test on samples 1 and 2. In the output, the value under Observed is the sum of the ranks for sample 1. To get the value of the M-W U statistic, subtract off n_1(n_1 + 1)/2. Proceed in a similar manner to get the other two M-W statistic values needed for the Steel-Dwass test, remembering to subtract off n_2(n_2 + 1)/2 instead of n_1(n_1 + 1)/2 when you do the test on samples 2 and 3. ********************************************************************************************** *** StatXact *** Other tests can be done with StatXact. One can do the normal scores test described in the class notes (using van der Waerden scores) using Nonparametics > K Independent Samples > Normal-Scores... Using the Monte Carlo option (with 1000000 Monte Carlo trials) one gets a p-value of about 0.15. (The distribution skewness makes the K-W test a better performer. But the normal scores test yields a smaller p-value than the one-way ANOVA F test. (The extreme observations resulting from the skewness makes the F test a bit conservative.)) One can also use Savage scores, using Nonparametics > K Independent Samples > Savage... Using the Monte Carlo option (with 1000000 Monte Carlo trials) one gets a p-value of about 0.15. (I'll confess that I had expected the Savage scores test to yield a smaller p-value since it typically does well for moderately skewed distributions.) One can do a k-sample permutation test, using Nonparametics > K Independent Samples > ANOVA with General Scores... Using the Monte Carlo option (with 1000000 Monte Carlo trials) one gets a p-value of about 0.20 (close to the ANOVA F test result). To do a percentile modifed rank test one first needs to create the desired scores. Suppose we want to use -24, -23, -22, ..., -3, -2, -1, 0, 0, 0, ..., 0, 0, 1, 2, 3, ..., 22, 23, 24. To create these scores, first bring the CaseData editor to the front, and then use DataEditor > Compute Score... Click Var1 into the response box and type Var3 in the Target Variable box. Select Wilcoxon under Score, and then click OK. Now in the next column of the CaseData editor (Var4) type -24 in Var4 next to 1 in Var3, type -23 in Var4 next to 2 in Var3, and so on, eventually typing -1 in Var4 next to 24 in Var3. Next type 24 in Var4 next to 57 in Var3, type 23 in Var4 next to 56 in Var3, ..., and type 1 in Var4 next to 34 in Var3. Fill in the other Var4 spots with nine 0s. (Note: If noninteger midranks are encountered in Var3, also use midranks appropriately in Var4.) Now use Nonparametics > K Independent Samples > ANOVA with General Scores... Click Var4 (the percentile modified rank scores) in as the Response. Using the Monte Carlo option (with 1000000 Monte Carlo trials) one gets a p-value of about 0.12. Finally, one can use StatXact to do the extension of the median test described on p. 10-9 of of the class notes. One can use m = 3, with t_1 = t_2 = t_3 = N/3 = 19. The midranks in Var3 of the CaseData editor can be used to determine the observations in the lower third, middle third, and upper third of the ordered combined sample. These lead to the following 3 by 3 table. lower middle upper ------------------ sample 1 | 4 | 8 | 8 | 20 ------------------ sample 2 | 5 | 5 | 8 | 18 ------------------ sample 3 | 10 | 6 | 3 | 19 ------------------ 19 19 19 Use File > New... and then select Table Data from the available file types and click OK. Then request 1 table with 3 rows and 3 columns, and click OK. Next fill in the counts shown above into the 3 by 3 table. Then use Nonparametrics > Unordered R x C Table > Pearson's Chi-square... Select Exact under Compute and click OK. The exact p-value is about 0.175 (whereas the chi-square approximation results in an approximate p-value of about 0.165).