HW STAT 657

Homework

Due Thursday, September 19

Exercise 1 (5 points): Consider Problem 11 on p. 51 of H&W. (The correct answer to Problem 11 is 4/2ⁿ = 2^-(n-2).) Change the critical region to consist of the eight values, 0, 1, 2, 3, [n(n+1)/2] - 3, [n(n+1)/2] - 2, [n(n+1)/2] - 1, and [n(n+1)/2], and give the size of the test.
Exercise 2 (5 extra credit points): Do Problem 35 on p. 59 of H&W.
Exercise 3 (5 points): Do Problem 54 on p. 71 of H&W. (You don't have to discuss the results as H&W requests.)

Due Thursday, September 26

Exercise 4 (10 points): Consider the five values given in Problem 9 on p. 51 of H&W. Using the exact null sampling distribution of the signed-rank statistic based on using midranks to handle the ties (as StatXact would do), give the p-value which results from an upper-tail test (6 points). Then, using the test statistic value based on the midranks, incorrectly use Table A.4 to obtain an incorrect p-value (1 point). Finally, breaking ties conservatively for an upper-tail test, use Table A.4 to obtain a p-value (3 points).
Exercise 5 (5 points): Consider Problem 55 on p. 71 of H&W, and give the expected value of the (random) p-value which results from the randomization process. (You don't have to discuss the results as H&W requests.)
Exercise 6 (5 points): Using the data given in Table 3.9 on p. 82 of the text, give the p-value which results from testing the null hypothesis that the 25th percentile of the underlying distribution is greater than or equal to 20 against the alternative hypothesis that the 25th percentile of the underlying distribution is less than 20.
Exercise 7 (10 points): Consider iid random variables having a distribution with a pdf having the same shape as
f(x) = |x| c^-2 I_{( -c, c )}(x),
but unknown median, and determine the ARE of the sign test with respect to the t test, and the ARE of the signed-rank test with respect to the t test.

Due Thursday, October 10

Exercise 8 (5 points): Using the data given in Table 3.9 on p. 82 of the text, give the p-value which results from testing the null hypothesis that the 10th percentile of the underlying distribution is less than or equal to 15 against the alternative hypothesis that the 10th percentile of the underlying distribution is greater than 15.
Exercise 9 (10 points): Consider iid random variables
X₁, X₂, ..., X_n,
having a uniform (a, a+c) distribution, and iid random variables
Y₁, Y₂, ..., Y_n,
having a uniform (b, b+c) distribution, and let
Z_i = Y_i - X_i.
Now consider using the Z_i to test for a treatment effect, and determine the ARE of the sign test with respect to the t test, and the ARE of the signed-rank test with respect to the t test. (Hint: Let a = b = 0. I think you'll find it easier to deal with this way.)
Exercise 10 (5 points): This exercise was distributed in class on 9/26. The intention is for you to use StatXact to obtain the five requested p-values. Please correctly round each p-value to 2 significant digits, and place your answers in the answer boxes on the sheet I gave you on 9/26.
Exercise 11 (5 points): This exercise was distributed in class on 9/26. The intention is for you to use StatXact to obtain the five requested p-values. Please correctly round each p-value to 2 significant digits, and place your answers in the answer boxes on the sheet I gave you on 9/26.

Due Thursday, October 17

Exercise 12 (15 points)

This exercise will be / was distributed in class on 10/3. Please correctly round each p-value to 2 significant digits, and place your answers in the answer boxes.

Comments: For column B, you are to assume that if the two distributions differ then one is stochastically larger than the other one. Note that this means that if the two distributions are different, then one mean is greater than the other mean. A small p-value can be taken to be strong evidence that the two distributions are different, and so it would also imply that it is reasonable to assume that the means are different. But your task for column B is to determine if the test addresses the alternative hypothesis that the mean of the manual distribution is less than the mean of the mechanical distribution. Clearly, given the "stochastically larger assumption", the tests can be deemed to be addressing whether or not the means are equal, but (as a hint) not all of the tests can be used with a one-sided alternative hypothesis involving the means, even if a one-tailed rejection region is used, because a small p-value can result with high probability in some cases for which one distribution is stochastically larger, but could also result with high probability for some cases for which the other distribution is stochastically larger. You shouldn't reason that all of the tests can be used for one-sided tests about the means because if a small p-value indicates a difference in means then graphics can be used to determine which distribution is stochastically larger than the other one. Instead, you should only report p-values in column B for those tests in which a small value of the test statistic could only occur with high probability when one particular mean is larger than the other mean, and a large value of the test statistics could only occur with high probability when the other distribution mean is the larger of the two. That is, use only those tests for which the value of the test statistic indicates which of the two distributions is the stochastically larger one.

Exercise 13 (5 points)

This exercise will be / was explained in class on 10/3. Basically you are to do the test of variances I will / have cover(ed) in class that is based on ranking absolute values of differences between pairs of observations. Please correctly round each p-value to 2 significant digits, and place your answers somewhere on the answer sheet for Exercise 12. (Note: I think what I have below matches the verbal description that I gave in class, but if not please let me know. When I post the solutions, I'll also give results for different ways of pairing the observations. (E.g., one could pair the first observation with the last, the second with the second to the last, and so forth.))

(a): Pair the first two observations in each sample, the second two observations in each sample, and so forth, and then determine the absolute difference for each pair of observations. Then just run a W-M-W test on the new samples of absolute differences, for which the sample sizes are half of the original sample sizes.
(b): Start as above, but don't use consecutive integers for the scores (like is done for the W-M-W test). Instead, do a permutation test using the squared integers (i.e., 1, 4, 9, 16, 25, ... ) as scores.

Due Thursday, October 24

Exercise 14 (10 points (2 points for (i) and 1 point each for the others))

Using the data in Table 6.8 on p. 215 of H&W, do each of the following parts and place your answers in the boxes on the answer sheet that I will distribute in class, rounding each value entered on the answer sheet to show 2 significant digits.

(a): Do a Monte Carlo approximation of an exact Kruskal-Wallis test using the guidelines given here, and report the point estimate of the exact p-value.
(b): Give the upper confidence bound for the exact p-value of the Kruskal-Wallis test.
(c): Give the approximate p-value of the Kruskal-Wallis test (making use of the commonly used chi-square approximation).
(d): Give the exact p-value of the Mood-Brown (median) test.
(e): Do a Monte Carlo approximation of an exact normal scores test using the guidelines given here, and report the point estimate of the exact p-value.
(f): Do a Monte Carlo approximation of an exact Savage scores test using the guidelines given here, and report the point estimate of the exact p-value.
(g): If use use StatXact's k-sample ANOVA with General Scores test, using the data values as the scores, to do a Monte Carlo approximation of an exact permutation test using the guidelines given here, you'll get an estimated p-value of 0. Since this seems undesirable, increase the number of Monte Carlo trials, using 10 times the number suggested here, and report the point estimate of the exact p-value.
(h): Use the Fligner-Wolfe test to test the null hypothesis that the distribution of fasting metabolic rate doesn't change against the alternative that at least one of the other time periods has a distribution that is stochastically smaller than the July-Aug. distribution and report the exact p-value.
(i): Use the Mack-Wolfe test of 6.3.A of H&W to test the null hypothesis that the distribution of fasting metabolic rate doesn't change against the umbrella alternative with a peak in May-June (as opposed to July-Aug. as is considered in the text) and report the approximate p-value.

(Note: I suggest that you save your data in StatXact, since we may use the same data for some other exercises to be assigned later.)

Exercise 15 (2 points)

Using just the first three groups (Jan.-Feb. through May-June) of the data in Table 6.8 on p. 215 of H&W, do each of the following parts and place your answers in the boxes on the answer sheet that I will distribute in class, rounding each value entered on the answer sheet to show 2 significant digits.

(a): Do a J-T test to determine if there is statistically significant evidence that fasting metabolic rate is increasing from winter to summer, and report the exact p-value (rounded).
(b): Use StatXact's Linear-by-linear Association test to perform a one-way k-sample test which is similar to Page's test of monotonicity for a two-way design to determine if there is statistically significant evidence that fasting metabolic rate is increasing from winter to summer, and report the exact p-value (rounded).

Exercise 16 (3 points)

Consider the three table entries for the k = 5, p = 3, n = 5 case of Table A.14 on p. 663 of H&W. Use the normal approximation to obtain approximate p-values corresponding to observed values of the test statistic of

(a): 97,
(b): 103,
(c): 113.

Give two answers for each part --- the first based on not using a continuity correction, and the second based on using a continuity correction. (You should see that when the p-value is not real small, using the continuioty correction gives a closer approximation, but that for the smallest p-value, not using the continuity correction is preferred (because both approximate p-values will be too large, but the one based on no continuity correction is better). The exact p-values are about 0.0944. 0.0449, and 0.0090, as can be seen from the table.)

Due Thursday, October 31

Exercise 17 (5 points): Using the Diabetic Mice data (to be distributed in class), perform the S-D-C-F test, making as accurate of a statement about the p-value as you can. (Show some (no need for a lot) of work --- certainly giving the value of the test statistic.) (As a way to check your procedure, I'll give you some results here for the bending strengths / parameters for 3 species of Canadian softwoods data that I distributed at the same time I distributed the Diabetic Mice data for this exercise. For the wood data, you should find that the expression in the square brackets in (6.62) on p. 241 equals about 1.52 for samples 1 and 2, 1.60 for samples 1 and 3, and 1.12 for samples 2 and 3. The "overall test statistic" equals about 2.265. From Table A.16 all we can determine is that the p-value exceeds 0.1041 (since the size 0.1041 critical value is about 2.944). The large-sample approximation suggests that the p-value exceeds 0.2.)
Exercise 18 (5 points): Give the (constant) value of A from Quade's test statistic in the case of continuous random variables which yield no tie situations with probability 1. Your answer should be a function of k and n. (Suggestion: Check your work by making up an example having k = 3 and n = 4, doing the hand calculation of A and B, checking the value of A against your general formula for A, and then checking the overall value of the test statistic, T_Q, with StatXact.)

Due Thursday, November 7

Exercise 19 (5 points)

You are to determine the null sampling distribution of the test statistic of Sec. 7.3 for the case of k = n = 3. (I did this for the case of k = 3 and n = 2 on a handout that I distributed in class on 10/17. There I considered 36 equally-likely possibilities under the null hypothesis. I needed to consider the 36 possibilities due to the fact that I wanted to use my work to also determine the null sampling distribution of Page's test statistic. But if I was only interested in the test of Sec. 7.3, I could have got by with just considering 6 equally-likely possibilities due to a symmetry argument. (See Comment 26 of p. 297 of H&W (which refers back to Comment 8 of p. 278 of H&W).) This means that you can use the table showing the 36 equally-likely possibilities for the n = 2 case to rather quickly arrive at the table (of 36 possibilities) that you need for the n = 3 case. (You can test your ability to "extend" from the n = 2 case to the n = 3 case by seeing if you can obtain the sampling distribution for the n = 2 case by considering just 6 possibilities instead of the 36 possibilities that I dealt with on my handout.)) Letting M denote the test statistic, give the values for the probabilities indicated below. (As a check, I'll give you that P(M = 1) = 0 and P(M = 6) = 1/36.)

(a): P(M = 0)
(b): P(M = 2)
(c): P(M = 3)
(d): P(M = 4)
(e): P(M = 5)

Due Thursday, November 14

Note that for some of the tests below, the appropriate statement about the p-value (or approximate p-value) may be of the form p-value < a, b < p-value < c, or p-value > d, where a, b, c, and d represent values from one of the tables of H&W.

Exercise 20 (7 points)

Consider the cement testing experiment, and test for differences between breakers. For parts (a) through (c) and (g), reduce the data to one observation per cell using the sample mean for each cell.

(a): Obtain an exact p-value using Friedman's test.
(b): Obtain an exact p-value using Quade's test.
(c): Obtain an exact p-value using the test of Sec. 7.3 of H&W. (Note: The table in the back of the book won't supply you with the p-value. From the table, you can conclude that the p-value exceeds 0.028, but I want the exact p-value. So you need a probability that isn't in the table in H&W. If you 've done Exercise 19 (or if I am nice enough to post the answer to Exercise 19 promptly), you can just make use of your work there (or my posted answer) to help obtain the desired p-value for this part.)
(d): Use the Table on p. 721 of H&W to make as accurate of a statement as is possible (from the use of the table) about the p-value of the Mack-Skillings test of Sec. 7.9.
(e): Use the tables in H&W to make as accurate of a statement as is possible (from the use of the tables) about the (approximate) p-value which results from the approximate version (corresponding to (7.75) on p. 340 of H&W) of the Mack-Skillings test of Sec. 7.10.
(f): Use the tables in H&W to make as accurate of a statement as is possible (from the use of the tables) about the conservative p-value which results from the conservative version (corresponding to (7.76) on p. 342 of H&W) of the Mack-Skillings test of Sec. 7.10.
(g): Obtain an approximate p-value using Doksum's test of Sec. 7.11. (Don't use the silly chart in H&W to get an approximate p-value. Rather, use statistical software or something else to report the exact upper-tail probability from the appropriate chi-square distribution as your approximate p-value.)

Exercise 21 (3 points)

Consider the chicken data, and for parts (a) and (b) perform tests using the monotone alternative of increasing weight with increasing dosage (but as usual, allowing for equality between adjacent groups as long as there is at least one difference in the right direction).

(a): Obtain an exact p-value using Page's test.
(b): Obtain an approximate p-value using Hollander's test of Sec. 7.12. (Use statistical software or something else to obtain the exact upper-tail probability of the standard normal distribution.)
(c): Now test to determine if there is statistically significant evidence that at least one of the two treatments results in greater chicken weight than the control of standard feed (and no dosage of the drug). Report the exact p-value (within the accuracy of the appropriate table) which results from the test of Sec. 7.4 of H&W.

Exercise 22 (5 points)

Consider the detergent data, and perform tests to determine if there is statistically significant evidence that at least two of the detergents differ (with regard to their useful "life" per washing session).

(a): Obtain an approximate p-value using Durbin's test of Sec. 7.6. (Don't use the silly chart in H&W to get an approximate p-value. Rather, use statistical software or something else to report the exact upper-tail probability from the appropriate chi-square distribution as your approximate p-value.)
(b): Use the tables in H&W to make as accurate of a statement as is possible (from the use of the tables) about the (approximate) p-value which results from the Skillings-Mack test of Sec. 7.7.

Due Thursday, November 21

Exercise 23 (10 points (2 points for parts (b) and (h), and 1 point each for the others))

Consider the data of Table 8.3 on p. 377 of H&W. Turn in answers for the various parts given below. For the parts that can be done rather simply using StatXact, you don't need to show any supporting work (but do take time to make sure that you get the data entered correctly). For parts (b) and (h), you can show a lot of work if you wish, just as long as you organize your solutions neatly. Whatever you choose to turn in, please clearly indicate your final answers by highlighting them, or drawing boxes around them. (You may find it useful to look at a scatterplot of the data. If you do so, then you shouldn't find it too surprising when you get some rather large p-values.)

(a): Give a point estimate of Kendall's tau.
(b): Give a 95% confidence interval for Kendall's tau using the method of Samara and Randles.
(c): Use Kendall's statistic to test the null hypothesis of independence against the general alternative, and report the p-value. Also give the value of the test statistic.
(d): Use Spearman's statistic to test the null hypothesis of independence against the general alternative, and report the p-value. Also give the value of the test statistic.
(e): Use Pearson's statistic to test the null hypothesis of independence against the general alternative, and report a Monte Carlo estimate of the exact p-value (using the guidelines given here).
(f): Use Pearson's statistic to test the null hypothesis of independence against the general alternative, and report the approximate p-value based on Student's T distribution. (It would be an exact p-value if the underlying distribution is a bivariate normal distribution, but otherwise it is only an approximate p-value.)
(g): Give a point estimate of Pearson's rho (aka, Pearson's (population / distribution) product moment correlation coefficient).
(h): Use Hoeffding's statistic to test the null hypothesis of independence against the general alternative, and make as precise of a statement about the approximate p-value using the techniques and/or tables recommended in H&W. Also give the value of the test statistic.

Due Thursday, December 12

Exercise 24 (8 points (2 points for each part))

Consider Boscovich's data which I will give you. (Historical Note: Boscovich developed LAD regression about 50 years prior to the development of OLS regression.)

(a): Use Theil's method to test the null hypothesis that there is no relation between transformed latitude and arc length against the alternative hypothesis that there is a linear relationship having a postive slope, and report the p-value.
(b): Use Theil's method to estimate the slope for the regression model indicated in part (a).
(c): Use Theil's method to produce a 91.6% confidence interval for the slope for the regression model indicated in part (a).
(d): Use the method of Hettmansperger, McKean, and Sheather to estimate the intercept for the regression model indicated in part (a).

Exercise 25 (2 points (1 point for each part))

Consider the data of Table 9.10 on p. 449 of H&W.

(a): Test the null hypothesis that the coefficient of Height is 0 against the general alternative and report the approximate p-value which results from the F approximation (as opposed to the chi-square approximation).
(b): Test the null hypothesis that the coefficient of Weight is 0 against the general alternative and report the approximate p-value which results from the F approximation (as opposed to the chi-square approximation).

Exercise 26 (8 points (1 point for each part))

Consider the data of Table 10.10 on p. 473 of H&W. For parts (a) through (f), test the null hypothesis that the proportion of women who have PTSD is the same for the two populations of women against the alternative that the population proportions differ, and place your answers, rounding each p-value to 2 significant digits, in the appropriate boxes on the answer sheet that I'll give you. For parts (g) and (h), give 95% confidence intervals for the difference in the population proportions, and place your answers, rounding each confidence bound to 2 significant digits, in the appropriate boxes on the answer sheet that I'll give you.

(a): Use Fisher's exact test.
(b): Use the exact generalized likelihood ratio test.
(c): Use the approximate generalized likelihood ratio test (usual chi-square approximation).
(d): Use the exact version of Pearson's test.
(e): Use the approximate version of Pearson's test (usual chi-square approximation). (Don't use the Yates continuity correction.)
(f): Use Barnard's test (exact version). (Note that this test supplies the smallest of the exact p-values.)
(g): Use the standard approximate method described in Sec. 10.1 of H&W.
(h): Use the second exact method offered by StatXact.

Exercise 27 (2 points)

Do Problem 6 on p. 472 of H&W, and place the exact p-value, rounded to two significant digits, which results from a two-tailed test in the appropriate box on the answer sheet.

Here is some information about the homework component of your grade for the course, my late homework policy, and the presentation of HW solutions.