MTB > # Heights (in inches) of Brothers and Sisters MTB > # First I'll enter the data and then look at a scatter plot. MTB > set c1 DATA> 71 68 66 67 70 71 70 73 72 65 66 DATA> end MTB > set c2 DATA> 69 64 65 63 65 62 65 64 66 59 62 DATA> end MTB > name c1 'brother' c2 'sister' MTB > print c1 c2 ROW brother sister 1 71 69 2 68 64 3 66 65 4 67 63 5 70 65 6 71 62 7 70 65 8 73 64 9 72 66 10 65 59 11 66 62 MTB > plot c2 c1 - * sister - - - 66.5+ - * - * 2 - - * * 63.0+ * - * * - - - 59.5+ - * - --------+---------+---------+---------+---------+--------brother 66.0 67.5 69.0 70.5 72.0 MTB > # The plot suggests that there may be a mild positive association. MTB > # Using the formula on near the top of p. 11-8 of the class notes, one MTB > # gets that tauu_b is about 0.4315, in agreement with StatXact's value for MTB > # tau_b. If the formula at the very top of 11-5 is used, one obtains that MTB > # tau_a is 0.4, in agreement with the tau given by StatXact. MTB > # Now let's consider some one-sided tests to determine if we can reject the MTB > # null hypothesis of independence in favor of the alternative that tau > 0. MTB > # Table L in the Appendix of G&C gives critical values for the case of n = 11, MTB > # but these assume no ties. If Table L is used with tau_b, the conclusion is MTB > # that 0.025 < p-value < 0.05, but if tau_a is used we get 0.05 < p-value < 0.1. MTB > # StatXact's exact p-value is about 0.041. (Note: StatXact's p-values are the MTB > # same whether tau_a or tau_b is used. (The two taus differ only by a scaling MTB > # factor, which doesn't affect the p-value.)) So, using the table to bracket MTB > # the p-value, tau_b gave a conclusion consistent with the exact p-value, but MTB > # tau_a did not. MTB > # To further explore testing using the two versions of the tau estimator, I'll MTB > # try using the normal approximation, with and without a continuity correction. MTB > name k1 'null se' MTB > name k1 'null se' k2 'denom a' k3 'denom b' k4 'num w cc' k5 'num wocc' MTB > let k1 = sqrt(54/990) MTB > let k2 = 55 MTB > let k3 = sqrt(52*50) MTB > let k4 = 21 MTB > let k5 = 22 MTB > let k6 = k5/k2 MTB > let k7 = k5/k3 MTB > name k6 'tau_a' k7 'tau_b' MTB > print k1-k7 null se 0.233550 denom a 55.0000 denom b 50.9902 num w cc 21.0000 num wocc 22.0000 tau_a 0.400000 tau_b 0.431455 MTB > # I'll put z scores in k21-k24, multiplying each by -1 so that I can simply use the MTB > # cdf command to get the approximate p-values. I'll put the approximate p-values MTB > # in k31-k34. MTB > let k21 = -k6/k1 MTB > let k22 = -k7/k1 MTB > let k23 = -k4/(k2*k1) MTB > let k24 = -k4/(k3*k1) MTB > cdf k21 k31; SUBC> norm 0 1. MTB > cdf k22 k32; SUBC> norm 0 1. MTB > cdf k23 k33; SUBC> norm 0 1. MTB > cdf k24 k34; SUBC> norm 0 1. MTB > print k21-k24 k31-k34 K21 -1.71270 K22 -1.84738 K23 -1.63485 K24 -1.76341 K31 0.0433841 K32 0.0323458 K33 0.0510405 K34 0.0389156 MTB > # The various approximate p-values are given in the table below. MTB > MTB > # w/o c.c. w/ c.c. MTB > # tau_a 0.043 0.051 MTB > # tau_b 0.032 0.039 MTB > MTB > # Using tau_b with a continuity correction works pretty good (but so does MTB > # using tau_a without a continuity correction). None of these match MTB > # StatXact's asymptotic p-value (of about 0.035). MTB > # Now I'll try Spearman's correlation coefficient. MTB > rank c1 c11 MTB > rank c2 c12 MTB > corr c11 c12 Correlation of C11 and C12 = 0.501 MTB > # Table M of G&C, based on an assumption of no ties, suggests that we have MTB > # 0.05 < p-value < 0.1. StatXact gives an exact p-value of about 0.059, MTB > # and so the table result is in agreement with the exact result. MTB > # (StatXact gives 0.5012 as the value of Spearman's statistic.) MTB > # I'll now try the two approximations given on p. 11-12 of the class notes. MTB > # First the normal approximation. MTB > let k41 = sqrt(10)*0.5012 MTB > cdf k41 k42; SUBC> norm 0 1. MTB > let k42 = 1 - k42 MTB > # Now the t approximation. MTB > let k43 = sqrt(9)*0.5012/sqrt( 1 - 0.5012**2 ) MTB > cdf k43 k44; SUBC> t 9. MTB > let k44 = 1 - k44 MTB > name k41 'z' k42 'z p-v' k43 't' k44 't p-v' MTB > print k41-k44 z 1.58493 z p-v 0.0564907 t 1.73760 t p-v 0.0581435 MTB > # The t approximation result matches StatXact's asymptotic p-value MTB > # (since StatXact uses the same approximation). Note that the t MTB > # approximation did better than the normal approximation (with MTB > # regard to being close to the exact p-value). MTB > # Now I'll try Pearson's correlation coefficient. MTB > corr c1 c2 Correlation of brother and sister = 0.558 MTB > # The easy way to get the p-value is to do a simple regression. MTB > MTB > regress c2 1 c1 The regression equation is sister = 27.6 + 0.527 brother Predictor Coef Stdev t-ratio p Constant 27.64 18.04 1.53 0.160 brother 0.5270 0.2612 2.02 0.074 MTB > # The p-value (based on an assumption of normality) is about 0.037. MTB > # (One needs to divide the regression result p-value by 2 for a MTB > # one-tailed test.) The correlation of 0.558 matches StatXact, and MTB > # the p-value of 0.037 matches StatXact's asymptotic result. The MTB > # exact p-value obtained by StatXact is about 0.038. This is the MTB > # smallest of the exact p-values. MTB > save 'sibs' Saving worksheet in file: sibs.MTW