MTB > # Analysis of data from study of DBH activity in two groups of MTB > # schizophrenic patients. MTB > # I'll put the data into c1 and c2 and do some exploration. MTB > set c1 DATA> .0104 .0105 .0112 .0116 .0130 .0145 .0154 .0156 .0170 .0180 .0200 DATA> .0200 .0210 .0230 .0252 DATA> end MTB > set c2 DATA> .0150 .0204 .0208 .0222 .0226 .0245 .0270 .0275 .0306 .0320 DATA> end MTB > name c1 'nonpsych' c2 'psycho' MTB > dotplot c1 c2; SUBC> same. : .. . . : . . : . . . -----+---------+---------+---------+---------+---------+-nonpsych . .. .. . .. . . -----+---------+---------+---------+---------+---------+-psycho 0.0120 0.0160 0.0200 0.0240 0.0280 0.0320 MTB > # Just based on this plot, one should guess that we ought to be able to MTB > # find significant evidence of a difference. My guess is that it's MTB > # safe to assume that one dist'n is stochastically larger than the other MTB > # if they aren't the same, and so I think it'll be okay to interpret the MTB > # nonparametric tests as being tests about the means (if that's desired). MTB > name c3 'n score' MTB > nsco c1 c3 MTB > plot c3 c1 n score - - * - 1.2+ * - * - 2 - - * * 0.0+ * - * * - * - * - * -1.2+ * - - * - +---------+---------+---------+---------+---------+------nonpsych 0.0090 0.0120 0.0150 0.0180 0.0210 0.0240 MTB > nsco c2 c3 MTB > plot c3 c2 n score - * - - 1.0+ * - - * - * - * 0.0+ - * - * - * - -1.0+ * - - - * --+---------+---------+---------+---------+---------+----psycho 0.0140 0.0175 0.0210 0.0245 0.0280 0.0315 MTB > desc c1 c2 N MEAN MEDIAN TRMEAN STDEV SEMEAN nonpsych 15 0.01643 0.01560 0.01622 0.00470 0.00121 psycho 10 0.02426 0.02355 0.02445 0.00514 0.00163 MIN MAX Q1 Q3 nonpsych 0.01040 0.02520 0.01160 0.02000 psycho 0.01500 0.03200 0.02070 0.02827 MTB > # If we're doing a test of the general two-sample problem, the dist'ns MTB > # are identical if the null hypothesis is true. The probit plots MTB > # suggest that in this case the common dist'n isn't too far from normal, MTB > # and so Student's two sample t test ought to work decently as a test MTB > # of the general two-sample problem. MTB > twos c1 c2; SUBC> pool. TWOSAMPLE T FOR nonpsych VS psycho N MEAN STDEV SE MEAN nonpsych 15 0.01643 0.00470 0.0012 psycho 10 0.02426 0.00514 0.0016 95 PCT CI FOR MU nonpsych - MU psycho: ( -0.0120, -0.0037) TTEST MU nonpsych = MU psycho (VS NE): T= -3.94 P=0.0007 DF= 23 POOLED STDEV = 0.00487 MTB > # For a test about the means, one might think it's better to allow for MTB > # unequal variances and use Welch's test instead. MTB > twos c1 c2 TWOSAMPLE T FOR nonpsych VS psycho N MEAN STDEV SE MEAN nonpsych 15 0.01643 0.00470 0.0012 psycho 10 0.02426 0.00514 0.0016 95 PCT CI FOR MU nonpsych - MU psycho: ( -0.0121, -0.0036) TTEST MU nonpsych = MU psycho (VS NE): T= -3.86 P=0.0011 DF= 18 MTB > # Note: Even though the sample standard deviations don't suggest a big MTB > # difference in variances, and the sample sizes aren't hugely different, MTB > # Welch's test gives a larger p-value than Student's t test did. I wonder MTB > # what some of the nonparametric tests will give. MTB > # The mood command will do an approximate version of the median test. MTB > stack c1 c2 c4; SUBC> subs c5. MTB > mood c4 c5 Mood median test of C4 Chisquare = 11.78 df = 1 p = 0.001 Individual 95.0% CI's C5 N<= N> Median Q3-Q1 ------+---------+---------+---------+ 1 12 3 0.0156 0.0084 (------+--------) 2 1 9 0.0236 0.0076 (-----+---------) ------+---------+---------+---------+ 0.0150 0.0200 0.0250 0.0300 Overall median = 0.0200 A 95.0% C.I. for median(1) - median(2): (-0.0135,-0.0022) MTB > # The p-value corresponds to using a chi-square approximation w/o Yates's MTB > # continuity correction. I will reproduce it below. MTB > let k3 = 25*(12*9 - 3)*(12*9 - 3)/(15*10*13*12) MTB > cdf k3 k4; SUBC> chis 1. MTB > let k4 = 1 - k4 MTB > print k3 k4 K3 11.7788 K4 0.000599384 MTB > # Output from Mood's test has p-value rounded to nearest thousandth. MTB > # Now I'll do an approximate version of Fisher's exact test using a chi-square MTB > # approximation incorporating Yates's continuity correction. MTB > let k1 = 25*( 12*9 - 3*1 - 12.5)*(12*9 - 3*1 - 12.5)/(15*10*13*12) MTB > cdf k1 k2; SUBC> chis 1. MTB > let k2 = 1 - k2 MTB > print k2 K2 0.00249916 MTB > # p-value from approx. version of Fisher's exact test (w/ c.c.) MTB > # We can also use Minitab to do an approximate version of the Mann-Whitney MTB > # test (using a normal approximation w/ a c.c., and adjusting the variance MTB > # for ties). MTB > mann c1 c2 Mann-Whitney Confidence Interval and Test nonpsych N = 15 Median = 0.01560 psycho N = 10 Median = 0.02355 Point estimate for ETA1-ETA2 is -0.00775 95.1 pct c.i. for ETA1-ETA2 is (-0.01200,-0.00350) W = 140.0 Test of ETA1 = ETA2 vs. ETA1 n.e. ETA2 is significant at 0.0025 The test is significant at 0.0025 (adjusted for ties) MTB > # W-W runs test w/ c.c. MTB > let k5 = (10.5 - 13)/sqrt(5.5) MTB > cdf k5 k6; SUBC> norm 0 1. MTB > print k5 k6 K5 -1.06600 K6 0.143211 MTB > # The approx. p-value is about 0.14. MTB > save 'DBH' Saving worksheet in file: DBH.MTW _______________________________________________________________________________________ --------------------------------------------------------------------------------------- Info About Using StatXact To do most of the two-sample tests, we need to put the values from both samples into a single variable in the CaseData editor, and then in another variable put m 1s followed by n 2s (where m is the sample size of the first sample and n is the sample size of the second sample). --------------------------------------------------------------------------------------- *** Student's two-sample t test & Welch's test Use Basic_Statistics > t-test > Independent... Then click Var1 (assuming data values from both samples stacked in Var1) into the Response box, and Var2 (assuming that's where you put the m 1s and n 2s) into the Population box. Select either the Equal variance button (for Student's t test) or the Unequal variance button (for Welch's test), and then click OK. You should get p-values of 0.0007 and 0.0011 to match Minitab's p-values. (Note: Due to lack of precise normality the p-values are only approximate. With these rather small approximate p-values, just using one or two significant digits is appropriate.) Note: Oddly, StatXact doesn't give p-values for both one-sided and two-sided tests. The p-value given is for a 2-sided test. So for a 1-sided test the p-value has to be modified. (If the 2-sided test p-value is 0.042, then the 1-sided test p-value will be either 0.021 or possibly 0.979 (if the data is more in agreement with the null hypothesis than it is the alternative hypothesis). --------------------------------------------------------------------------------------- *** Mann-Whitny test (equivalent to the Wilcoxon rank sum test) Use Nonparametrics > Two Independent Samples > Wilcoxon-Mann-Whitney... and click in Response and Population variables (as decribed above). Select Exact (under Compute), and then click OK. The exact p-value is 0.0014. (It's based on midranks (since two values are tied).) The asymptotic p-value is done using normal approx. w/o c.c., but with adjustment to the variance for ties. It's 0.0023 (a bit larger than the exact p-value). Minitab's p-value of 0.0025 is a bit larger still, because it uses a continuity correction for its normal approximation. --------------------------------------------------------------------------------------- *** two-sample Kolmogorov-Smirnov test Use Nonparametrics > Two Independent Samples > Kolmogorov-Smirnov... and click in Response and Population variables. Select Exact (under Compute), and then click OK. The exact p-value is 0.0023. (Note that the asymptotic p-value is more than twice as large.) --------------------------------------------------------------------------------------- *** Wald-Wolfowitz runs test Use Nonparametrics > Two Independent Samples > Wald-Wolfowitz Runs... and click in Response and Population variables. Select Exact (under Compute), and then click OK. The exact p-value is about 0.14. (It comes from using the exact distribution based on the two sample sizes. Any ties in the data don't alter the the null distribution that is used to determine the p-value. However, the value of the test statistic can depend on how any ties are broken to create an ordered sequence of 1s and 2s. Using the "Largest Possible" parts of the output matches the conservative way that most prefer --- ties are broken to maximize the number of runs, and thus maximize the p-value.) With this data, StatXact 8 gives that the smallest number of runs is 0, which makes no sense!!! (The output correctly indicates that with these two sample sizes the min and max number of runs possible are 2 and 21. It's not clear to me how the ties can be broken to yield 0 runs.) I plan to write the folks at Cytel about this. I hope they don't expect me to find all of their bugs. (Although, in general, I love using StatXact, I'm beginning to wonder how many people are using it (if bugs persist in tests that have been on the menu for years now), and how much testing they do of their product.) StatXact's asymptotic p-value comes from the normal approximation, with a continuity correction. --------------------------------------------------------------------------------------- *** Fisher's exact test (aka the two-sample median test) This test is not on StatXact's menu of choices under two independent samples of continuous data. But we can "trick" StatXact into doing the test. Use DataEditor > Compute Scores... Then type Var3 into the Target Variable box. In the Variables list, highlight the Var corresponding to the data values (say Var1) and click it into the Response box. Then select Wilcoxon from the Scores list, and click OK. The Wilcoxon midranks should appear in Var3. In Var 4, put a 1 beside any midrank of 13 or less, and a 0 beside any midrank of 14 or more. (So 1s in Var 4 correspond to data values in the lower portion of the combined ordered sample, and 0s in Var 4 correspond to data values in the upper portion of the combined ordered sample.) Now use Nonparametrics > Two Independent Samples > Permutation... and click Var 4 in as the Response variable and Var 2 as the Population variable. Select Exact (under Compute), and then click OK. The exact p-value is about 0.00098. Another way to do the test is to use File > New... and then select Table Data and click OK. Change the Table Settings to request 1 Table with 2 Rows and 2 Columns, and click OK. Now fill in the table with the counts, letting the 1st row correspond to the 1st sample and the 2nd row correspond to the 2nd sample. In the 1st column put the number of values in each sample belonging ot the lower portion of the combined ordered sample, and in the 2nd column put the number of values in each sample belonging to the upper portion of the combined ordered sample. So the table should appear as below. --------------- | | | | 12 | 3 | | | | --------------- | | | | 1 | 9 | | | | --------------- Use Nonparametrics > Two Independent Binomials > Fisher's Exact Test... and click OK. The output is a bit cluttered. There are two rows labeled "Exact" but only one of them gives a p-value under 2-sided. That's the p-value we want (0.00098). To use StatXact to get the approximate p-value using a chi-square approximation and w/o using Yates's c.c., use Nonparametrics > Two Independent Binomials > Pearson's Chi-square Test... and click OK. Then look for the asymptotic p-value under 2-sided. (Note: Since we can get an exact p-value using StatXact, there is not much point in getting an approximate one.)