MTB > # Heights (in inches) of Brothers and Sisters
MTB > # First I'll enter the data and then look at a scatter plot.
MTB > set c1
DATA> 71 68 66 67 70 71 70 73 72 65 66
DATA> end
MTB > set c2
DATA> 69 64 65 63 65 62 65 64 66 59 62
DATA> end
MTB > name c1 'brother' c2 'sister'
MTB > print c1 c2
ROW brother sister
1 71 69
2 68 64
3 66 65
4 67 63
5 70 65
6 71 62
7 70 65
8 73 64
9 72 66
10 65 59
11 66 62
MTB > plot c2 c1
- *
sister -
-
-
66.5+
- *
- * 2
-
- * *
63.0+ *
- * *
-
-
-
59.5+
- *
-
--------+---------+---------+---------+---------+--------brother
66.0 67.5 69.0 70.5 72.0
MTB > # The plot suggests that there may be a mild positive association.
MTB > # Using the formula on near the top of p. 11-8 of the class notes, one
MTB > # gets that tauu_b is about 0.4315, in agreement with StatXact's value for
MTB > # tau_b. If the formula at the very top of 11-5 is used, one obtains that
MTB > # tau_a is 0.4, in agreement with the tau given by StatXact.
MTB > # Now let's consider some one-sided tests to determine if we can reject the
MTB > # null hypothesis of independence in favor of the alternative that tau > 0.
MTB > # Table L in the Appendix of G&C gives critical values for the case of n = 11,
MTB > # but these assume no ties. If Table L is used with tau_b, the conclusion is
MTB > # that 0.025 < p-value < 0.05, but if tau_a is used we get 0.05 < p-value < 0.1.
MTB > # StatXact's exact p-value is about 0.041. (Note: StatXact's p-values are the
MTB > # same whether tau_a or tau_b is used. (The two taus differ only by a scaling
MTB > # factor, which doesn't affect the p-value.)) So, using the table to bracket
MTB > # the p-value, tau_b gave a conclusion consistent with the exact p-value, but
MTB > # tau_a did not.
MTB > # To further explore testing using the two versions of the tau estimator, I'll
MTB > # try using the normal approximation, with and without a continuity correction.
MTB > name k1 'null se'
MTB > name k1 'null se' k2 'denom a' k3 'denom b' k4 'num w cc' k5 'num wocc'
MTB > let k1 = sqrt(54/990)
MTB > let k2 = 55
MTB > let k3 = sqrt(52*50)
MTB > let k4 = 21
MTB > let k5 = 22
MTB > let k6 = k5/k2
MTB > let k7 = k5/k3
MTB > name k6 'tau_a' k7 'tau_b'
MTB > print k1-k7
null se 0.233550
denom a 55.0000
denom b 50.9902
num w cc 21.0000
num wocc 22.0000
tau_a 0.400000
tau_b 0.431455
MTB > # I'll put z scores in k21-k24, multiplying each by -1 so that I can simply use the
MTB > # cdf command to get the approximate p-values. I'll put the approximate p-values
MTB > # in k31-k34.
MTB > let k21 = -k6/k1
MTB > let k22 = -k7/k1
MTB > let k23 = -k4/(k2*k1)
MTB > let k24 = -k4/(k3*k1)
MTB > cdf k21 k31;
SUBC> norm 0 1.
MTB > cdf k22 k32;
SUBC> norm 0 1.
MTB > cdf k23 k33;
SUBC> norm 0 1.
MTB > cdf k24 k34;
SUBC> norm 0 1.
MTB > print k21-k24 k31-k34
K21 -1.71270
K22 -1.84738
K23 -1.63485
K24 -1.76341
K31 0.0433841
K32 0.0323458
K33 0.0510405
K34 0.0389156
MTB > # The various approximate p-values are given in the table below.
MTB >
MTB > # w/o c.c. w/ c.c.
MTB > # tau_a 0.043 0.051
MTB > # tau_b 0.032 0.039
MTB >
MTB > # Using tau_b with a continuity correction works pretty good (but so does
MTB > # using tau_a without a continuity correction). None of these match
MTB > # StatXact's asymptotic p-value (of about 0.035).
MTB > # Now I'll try Spearman's correlation coefficient.
MTB > rank c1 c11
MTB > rank c2 c12
MTB > corr c11 c12
Correlation of C11 and C12 = 0.501
MTB > # Table M of G&C, based on an assumption of no ties, suggests that we have
MTB > # 0.05 < p-value < 0.1. StatXact gives an exact p-value of about 0.059,
MTB > # and so the table result is in agreement with the exact result.
MTB > # (StatXact gives 0.5012 as the value of Spearman's statistic.)
MTB > # I'll now try the two approximations given on p. 11-12 of the class notes.
MTB > # First the normal approximation.
MTB > let k41 = sqrt(10)*0.5012
MTB > cdf k41 k42;
SUBC> norm 0 1.
MTB > let k42 = 1 - k42
MTB > # Now the t approximation.
MTB > let k43 = sqrt(9)*0.5012/sqrt( 1 - 0.5012**2 )
MTB > cdf k43 k44;
SUBC> t 9.
MTB > let k44 = 1 - k44
MTB > name k41 'z' k42 'z p-v' k43 't' k44 't p-v'
MTB > print k41-k44
z 1.58493
z p-v 0.0564907
t 1.73760
t p-v 0.0581435
MTB > # The t approximation result matches StatXact's asymptotic p-value
MTB > # (since StatXact uses the same approximation). Note that the t
MTB > # approximation did better than the normal approximation (with
MTB > # regard to being close to the exact p-value).
MTB > # Now I'll try Pearson's correlation coefficient.
MTB > corr c1 c2
Correlation of brother and sister = 0.558
MTB > # The easy way to get the p-value is to do a simple regression.
MTB >
MTB > regress c2 1 c1
The regression equation is
sister = 27.6 + 0.527 brother
Predictor Coef Stdev t-ratio p
Constant 27.64 18.04 1.53 0.160
brother 0.5270 0.2612 2.02 0.074
MTB > # The p-value (based on an assumption of normality) is about 0.037.
MTB > # (One needs to divide the regression result p-value by 2 for a
MTB > # one-tailed test.) The correlation of 0.558 matches StatXact, and
MTB > # the p-value of 0.037 matches StatXact's asymptotic result. The
MTB > # exact p-value obtained by StatXact is about 0.038. This is the
MTB > # smallest of the exact p-values.
MTB > save 'sibs'
Saving worksheet in file: sibs.MTW