MTB > # Analysis of data from study of DBH activity in two groups of
 MTB > # schizophrenic patients.


 MTB > # I'll put the data into c1 and c2 and do some exploration.
 
 MTB > set c1
 DATA> .0104 .0105 .0112 .0116 .0130 .0145 .0154 .0156 .0170 .0180 .0200
 DATA> .0200 .0210 .0230 .0252
 DATA> end
 MTB > set c2
 DATA> .0150 .0204 .0208 .0222 .0226 .0245 .0270 .0275 .0306 .0320
 DATA> end
 MTB > name c1 'nonpsych' c2 'psycho'

 MTB > dotplot c1 c2;
 SUBC> same.

            : ..   .  .  :   . .    :  .    .    .
           -----+---------+---------+---------+---------+---------+-nonpsych
 
                        .            ..   ..   .      ..       .  .
           -----+---------+---------+---------+---------+---------+-psycho  
           0.0120    0.0160    0.0200    0.0240    0.0280    0.0320
 
 MTB > # Just based on this plot, one should guess that we ought to be able to
 MTB > # find significant evidence of a difference.  My guess is that it's
 MTB > # safe to assume that one dist'n is stochastically larger than the other
 MTB > # if they aren't the same, and so I think it'll be okay to interpret the
 MTB > # nonparametric tests as being tests about the means (if that's desired).

 MTB > name  c3 'n score'
 MTB > nsco c1 c3
 MTB > plot c3 c1
 
  n score -
          -                                                       *
          -
       1.2+                                                *
          -                                         *
          -                                      2
          -
          -                            *  *
       0.0+                       *
          -                   *  *
          -              *
          -          *
          -        *
      -1.2+      *
          -
          -      *
          -
            +---------+---------+---------+---------+---------+------nonpsych
       0.0090    0.0120    0.0150    0.0180    0.0210    0.0240
 
 MTB > nsco c2 c3
 MTB > plot c3 c2
 
  n score -                                                      *
          -
          -
       1.0+                                                  *
          -
          -                                          *
          -                                        *
          -                                 *
       0.0+
          -                            *
          -                          *
          -                      *
          -
      -1.0+                     *
          -
          -
          -      *
            --+---------+---------+---------+---------+---------+----psycho  
         0.0140    0.0175    0.0210    0.0245    0.0280    0.0315
 
 MTB > desc c1 c2
 
                 N     MEAN   MEDIAN   TRMEAN    STDEV   SEMEAN
 nonpsych       15  0.01643  0.01560  0.01622  0.00470  0.00121
 psycho         10  0.02426  0.02355  0.02445  0.00514  0.00163
 
               MIN      MAX       Q1       Q3
 nonpsych  0.01040  0.02520  0.01160  0.02000
 psycho    0.01500  0.03200  0.02070  0.02827
 
 MTB > # If we're doing a test of the general two-sample problem, the dist'ns
 MTB > # are identical if the null hypothesis is true.  The probit plots 
 MTB > # suggest that in this case the common dist'n isn't too far from normal,
 MTB > # and so Student's two sample t test ought to work decently as a test
 MTB > # of the general two-sample problem.

 MTB > twos c1 c2;
 SUBC> pool.
 
 TWOSAMPLE T FOR nonpsych VS psycho
            N      MEAN     STDEV   SE MEAN
 nonpsych  15   0.01643   0.00470    0.0012
 psycho    10   0.02426   0.00514    0.0016
 
 95 PCT CI FOR MU nonpsych - MU psycho: ( -0.0120,  -0.0037)
 
 TTEST MU nonpsych = MU psycho (VS NE): T= -3.94  P=0.0007  DF=  23
 
 POOLED STDEV =    0.00487
 
 MTB > # For a test about the means, one might think it's better to allow for
 MTB > # unequal variances and use Welch's test instead.

 MTB > twos c1 c2
 
 TWOSAMPLE T FOR nonpsych VS psycho
            N      MEAN     STDEV   SE MEAN
 nonpsych  15   0.01643   0.00470    0.0012
 psycho    10   0.02426   0.00514    0.0016
 
 95 PCT CI FOR MU nonpsych - MU psycho: ( -0.0121,  -0.0036)
 
 TTEST MU nonpsych = MU psycho (VS NE): T= -3.86  P=0.0011  DF=  18
 
 MTB > # Note: Even though the sample standard deviations don't suggest a big
 MTB > # difference in variances, and the sample sizes aren't hugely different,
 MTB > # Welch's test gives a larger p-value than Student's t test did.  I wonder
 MTB > # what some of the nonparametric tests will give.

 MTB > # The mood command will do an approximate version of the median test.

 MTB > stack c1 c2 c4;
 SUBC> subs c5.
 MTB > mood c4 c5
 
 Mood median test of C4      
 
 Chisquare = 11.78   df = 1   p = 0.001
 
                                         Individual 95.0% CI's
       C5   N<=    N>   Median    Q3-Q1  ------+---------+---------+---------+
        1    12     3   0.0156   0.0084  (------+--------)
        2     1     9   0.0236   0.0076                   (-----+---------)
                                         ------+---------+---------+---------+
                                          0.0150    0.0200    0.0250    0.0300
 Overall median = 0.0200
 
 A 95.0% C.I. for median(1) - median(2): (-0.0135,-0.0022)
 
 MTB > # The p-value corresponds to using a chi-square approximation w/o Yates's
 MTB > # continuity correction.  I will reproduce it below.
 
 MTB > let k3 = 25*(12*9 - 3)*(12*9 - 3)/(15*10*13*12)
 MTB > cdf k3 k4;
 SUBC> chis 1.
 MTB > let k4 = 1 - k4
 MTB > print k3 k4
 
 K3       11.7788
 K4       0.000599384

 MTB > # Output from Mood's test has p-value rounded to nearest thousandth.

 MTB > # Now I'll do an approximate version of Fisher's exact test using a chi-square
 MTB > # approximation incorporating Yates's continuity correction.
 
 MTB > let k1 = 25*( 12*9 - 3*1 - 12.5)*(12*9 - 3*1 - 12.5)/(15*10*13*12)
 MTB > cdf k1 k2;
 SUBC> chis 1.
 MTB > let k2 = 1 - k2
 MTB > print k2
 
 K2       0.00249916

 MTB > # p-value from approx. version of Fisher's exact test (w/ c.c.)

 MTB > # We can also use Minitab to do an approximate version of the Mann-Whitney
 MTB > # test (using a normal approximation w/ a c.c., and adjusting the variance
 MTB > # for ties).

 MTB > mann c1 c2
 
 Mann-Whitney Confidence Interval and Test
 
 nonpsych   N =  15     Median =     0.01560
 psycho     N =  10     Median =     0.02355
 Point estimate for ETA1-ETA2 is    -0.00775
 95.1 pct c.i. for ETA1-ETA2 is (-0.01200,-0.00350)
 W = 140.0
 Test of ETA1 = ETA2  vs.  ETA1 n.e. ETA2 is significant at 0.0025
 The test is significant at 0.0025 (adjusted for ties)
 
 MTB > # W-W runs test w/ c.c.
 MTB > let k5 = (10.5 - 13)/sqrt(5.5)
 MTB > cdf k5 k6;
 SUBC> norm 0 1.
 MTB > print k5 k6
 
 K5       -1.06600
 K6       0.143211

 MTB > # The approx. p-value is about 0.14.


 MTB > save 'DBH'
 Saving worksheet in file: DBH.MTW


_______________________________________________________________________________________
---------------------------------------------------------------------------------------
 
                           Info About Using StatXact 

To do most of the two-sample tests, we need to put the values from both samples into a 
single variable in the CaseData editor, and then in another variable put m 1s followed
by n 2s (where m is the sample size of the first sample and n is the sample size of the 
second sample).

---------------------------------------------------------------------------------------

*** Student's two-sample t test  &  Welch's test

Use
   Basic_Statistics > t-test > Independent...

Then click Var1 (assuming data values from both samples stacked in Var1) into the 
Response box, and Var2 (assuming that's where you put the m 1s and n 2s) into the
Population box.

Select either the Equal variance button (for Student's t test) or the Unequal
variance button (for Welch's test), and then click OK.

You should get p-values of 0.0007 and 0.0011 to match Minitab's p-values.  (Note:
Due to lack of precise normality the p-values are only approximate.  With these 
rather small approximate p-values, just using one or two significant digits is
appropriate.)

Note: Oddly, StatXact doesn't give p-values for both one-sided and 
two-sided tests.  The p-value given is for a 2-sided test.  So for a 1-sided test the
p-value has to be modified.  (If the 2-sided test p-value is 0.042, then the 1-sided 
test p-value will be either 0.021 or possibly 0.979 (if the data is more in agreement 
with the null hypothesis than it is the alternative hypothesis).

---------------------------------------------------------------------------------------

*** Mann-Whitny test (equivalent to the Wilcoxon rank sum test)

Use
   Nonparametrics > Two Independent Samples > Wilcoxon-Mann-Whitney...
and click in Response and Population variables (as decribed above).

Select Exact (under Compute), and then click OK.

The exact p-value is 0.0014.  (It's based on midranks (since two values are tied).)
The asymptotic p-value is done using normal approx. w/o c.c., but with adjustment to
the variance for ties.  It's 0.0023 (a bit larger than the exact p-value).  Minitab's
p-value of 0.0025 is a bit larger still, because it uses a continuity correction for
its normal approximation.

---------------------------------------------------------------------------------------

*** two-sample Kolmogorov-Smirnov test

Use
   Nonparametrics > Two Independent Samples > Kolmogorov-Smirnov...
and click in Response and Population variables.

Select Exact (under Compute), and then click OK.

The exact p-value is 0.0023.  (Note that the asymptotic p-value is more than twice as
large.)

---------------------------------------------------------------------------------------

*** Wald-Wolfowitz runs test

Use
   Nonparametrics > Two Independent Samples > Wald-Wolfowitz Runs...
and click in Response and Population variables.

Select Exact (under Compute), and then click OK.

The exact p-value is about 0.14.  (It comes from using the exact distribution based on
the two sample sizes.  Any ties in the data don't alter the the null distribution that
is used to determine the p-value.  However, the value of the test statistic can depend
on how any ties are broken to create an ordered sequence of 1s and 2s.  Using the 
"Largest Possible" parts of the output matches the conservative way that most prefer
--- ties are broken to maximize the number of runs, and thus maximize the p-value.)

With this data, StatXact 8 gives that the smallest number of runs is 0, which makes no
sense!!!  (The output correctly indicates that with these two sample sizes the min and
max number of runs possible are 2 and 21.  It's not clear to me how the ties can be 
broken to yield 0 runs.)   I plan to write the folks at Cytel about this.  I hope they
don't expect me to find all of their bugs.  (Although, in general, I love using StatXact, 
I'm beginning to wonder how many people are using it (if bugs persist in tests that have
been on the menu for years now), and how much testing they do of their product.)  

StatXact's asymptotic p-value comes from the normal approximation, with a continuity
correction.

---------------------------------------------------------------------------------------

*** Fisher's exact test (aka the two-sample median test)

This test is not on StatXact's menu of choices under two independent samples of continuous
data.  But we can "trick" StatXact into doing the test.  Use
   DataEditor > Compute Scores...

Then type Var3 into the Target Variable box.  In the Variables list, highlight the Var
corresponding to the data values (say Var1) and click it into the Response box.  Then 
select Wilcoxon from the Scores list, and click OK.

The Wilcoxon midranks should appear in Var3.  In Var 4, put a 1 beside any midrank of
13 or less, and a 0 beside any midrank of 14 or more.  (So 1s in Var 4 correspond to 
data values in the lower portion of the combined ordered sample, and 0s in Var 4
correspond to data values in the upper portion of the combined ordered sample.)

Now use
   Nonparametrics > Two Independent Samples > Permutation...
and click Var 4 in as the Response variable and Var 2 as the Population variable.

Select Exact (under Compute), and then click OK.

The exact p-value is about 0.00098.

Another way to do the test is to use
  File > New...
and then select Table Data and click OK.  Change the Table Settings to request 1 Table
with 2 Rows and 2 Columns, and click OK.  

Now fill in the table with the counts, letting the 1st row correspond to the 1st sample
and the 2nd row correspond to the 2nd sample.  In the 1st column put the number of values
in each sample belonging ot the lower portion of the combined ordered sample, and in the
2nd column put the number of values in each sample belonging to the upper portion of the 
combined ordered sample.  So the table should appear as below.
 
                     ---------------
                     |      |      |
                     |  12  |   3  |
                     |      |      |
                     ---------------
                     |      |      |
                     |   1  |   9  |
                     |      |      |
                     ---------------

Use
   Nonparametrics > Two Independent Binomials > Fisher's Exact Test...
and click OK.

The output is a bit cluttered.  There are two rows labeled "Exact" but only one of
them gives a p-value under 2-sided.  That's the p-value we want (0.00098).
   
To use StatXact to get the approximate p-value using a chi-square approximation
and w/o using Yates's c.c., use
   Nonparametrics > Two Independent Binomials > Pearson's Chi-square Test...
and click OK.  Then look for the asymptotic p-value under 2-sided.  (Note: Since
we can get an exact p-value using StatXact, there is not much point in getting
an approximate one.)