sibling height data


 MTB > # Heights (in inches) of Brothers and Sisters


 MTB > # First I'll enter the data and then look at a scatter plot.

 MTB > set c1
 DATA> 71 68 66 67 70 71 70 73 72 65 66
 DATA> end
 MTB > set c2
 DATA> 69 64 65 63 65 62 65 64 66 59 62
 DATA> end
 MTB > name c1 'brother' c2 'sister'
 MTB > print c1 c2
 
  ROW  brother  sister
 
    1       71      69
    2       68      64
    3       66      65
    4       67      63
    5       70      65
    6       71      62
    7       70      65
    8       73      64
    9       72      66
   10       65      59
   11       66      62
 
 MTB > plot c2 c1
          -                                          *
  sister  -
          -
          -
      66.5+
          -                                                 *
          -         *                          2
          -
          -                      *                                 *
      63.0+                *
          -         *                                *
          -
          -
          -
      59.5+
          -  *
          -
            --------+---------+---------+---------+---------+--------brother 
                 66.0      67.5      69.0      70.5      72.0
 
 MTB > # The plot suggests that there may be a mild positive association.


 MTB > # Using the formula on near the top of p. 11-8 of the class notes, one
 MTB > # gets that tauu_b is about 0.4315, in agreement with StatXact's value for
 MTB > # tau_b.  If the formula at the very top of 11-5 is used, one obtains that
 MTB > # tau_a is 0.4, in agreement with the tau given by StatXact.

 MTB > # Now let's consider some one-sided tests to determine if we can reject the
 MTB > # null hypothesis of independence in favor of the alternative that tau > 0.

 MTB > # Table L in the Appendix of G&C gives critical values for the case of n = 11,
 MTB > # but these assume no ties.  If Table L is used with tau_b, the conclusion is
 MTB > # that 0.025 < p-value < 0.05, but if tau_a is used we get 0.05 < p-value < 0.1.
 MTB > # StatXact's exact p-value is about 0.041.  (Note: StatXact's p-values are the
 MTB > # same whether tau_a or tau_b is used.  (The two taus differ only by a scaling
 MTB > # factor, which doesn't affect the p-value.))  So, using the table to bracket
 MTB > # the p-value, tau_b gave a conclusion consistent with the exact p-value,  but
 MTB > # tau_a did not.

 MTB > # To further explore testing using the two versions of the tau estimator, I'll
 MTB > # try using the normal approximation, with and without a continuity correction.

 MTB > name k1 'null se'
 MTB > name k1 'null se' k2 'denom a' k3 'denom b' k4 'num w cc' k5 'num wocc'
 MTB > let k1 = sqrt(54/990)
 MTB > let k2 = 55
 MTB > let k3 = sqrt(52*50)
 MTB > let k4 = 21
 MTB > let k5 = 22
 MTB > let k6 = k5/k2
 MTB > let k7 = k5/k3
 MTB > name k6 'tau_a' k7 'tau_b'
 MTB > print k1-k7
 
 null se  0.233550
 denom a  55.0000
 denom b  50.9902
 num w cc 21.0000
 num wocc 22.0000
 tau_a    0.400000
 tau_b    0.431455

 MTB > # I'll put z scores in k21-k24, multiplying each by -1 so that I can simply use the
 MTB > # cdf command to get the approximate p-values.  I'll put the approximate p-values
 MTB > # in k31-k34.

 MTB > let k21 = -k6/k1
 MTB > let k22 = -k7/k1
 MTB > let k23 = -k4/(k2*k1)
 MTB > let k24 = -k4/(k3*k1)
 MTB > cdf k21 k31;
 SUBC> norm 0 1.
 MTB > cdf k22 k32;
 SUBC> norm 0 1.
 MTB > cdf k23 k33;
 SUBC> norm 0 1.
 MTB > cdf k24 k34;
 SUBC> norm 0 1.
 MTB > print k21-k24 k31-k34
 
 K21      -1.71270
 K22      -1.84738
 K23      -1.63485
 K24      -1.76341
 K31      0.0433841
 K32      0.0323458
 K33      0.0510405
 K34      0.0389156

 MTB > # The various approximate p-values are given in the table below.
 MTB >  
 MTB > #                                 w/o c.c.   w/ c.c.
 MTB > #                       tau_a      0.043      0.051
 MTB > #                       tau_b      0.032      0.039
 MTB >  
 MTB > # Using tau_b with a continuity correction works pretty good (but so does
 MTB > # using tau_a without a continuity correction).  None of these match
 MTB > # StatXact's asymptotic p-value (of about 0.035).


 MTB > # Now I'll try Spearman's correlation coefficient.
 MTB > rank c1 c11
 MTB > rank c2 c12
 MTB > corr c11 c12
 
 Correlation of C11 and C12 = 0.501
 
 MTB > # Table M of G&C, based on an assumption of no ties, suggests that we have
 MTB > # 0.05 < p-value < 0.1.  StatXact gives an exact p-value of about 0.059,
 MTB > # and so the table result is in agreement with the exact result.
 MTB > # (StatXact gives 0.5012 as the value of Spearman's statistic.)

 MTB > # I'll now try the two approximations given on p. 11-12 of the class notes.
     
 MTB > # First the normal approximation.

 MTB > let k41 = sqrt(10)*0.5012
 MTB > cdf k41 k42;
 SUBC> norm 0 1.
 MTB > let k42 = 1 - k42

 MTB > # Now the t approximation.

 MTB > let k43 = sqrt(9)*0.5012/sqrt( 1 - 0.5012**2 )
 MTB > cdf k43 k44;
 SUBC> t 9.
 MTB > let k44 = 1 - k44
 MTB > name k41 'z' k42 'z p-v' k43 't' k44 't p-v'
 MTB > print k41-k44
 
 z        1.58493
 z p-v    0.0564907
 t        1.73760
 t p-v    0.0581435

 MTB > # The t approximation result matches StatXact's asymptotic p-value
 MTB > # (since StatXact uses the same approximation).  Note that the t
 MTB > # approximation did better than the normal approximation (with
 MTB > # regard to being close to the exact p-value).


 MTB > # Now I'll try Pearson's correlation coefficient.
 MTB > corr c1 c2
 
 Correlation of brother and sister = 0.558
 
 MTB > # The easy way to get the p-value is to do a simple regression.
 MTB >  
 MTB > regress c2 1 c1
 
 The regression equation is
 sister = 27.6 + 0.527 brother
 
 Predictor       Coef       Stdev    t-ratio        p
 Constant       27.64       18.04       1.53    0.160
 brother       0.5270      0.2612       2.02    0.074
 
 MTB > # The p-value (based on an assumption of normality) is about 0.037.
 MTB > # (One needs to divide the regression result p-value by 2 for a
 MTB > # one-tailed test.)  The correlation of 0.558 matches StatXact, and
 MTB > # the p-value of 0.037 matches StatXact's asymptotic result.  The
 MTB > # exact p-value obtained by StatXact is about 0.038.  This is the
 MTB > # smallest of the exact p-values.


 MTB > save 'sibs'
 Saving worksheet in file: sibs.MTW