Homework



Due Tuesday, September 14
Exercise 1 (4 points)
Consider independent random variables U and V, where U has a uniform (0, 1) distribution and V has a uniform (0, 2) distribution. Letting W be the larger of U and V, obtain the pdf of W, showing some work to adequately justify your answer. (Note: W can be thought of as the sample maximum of these two independent random variables which don't have the same distribution.) (Hint: Consider the bottom half of p. 2-3 of the class notes and determine what alterations need to be made due to the fact that the random variables are not identically distributed.)
Exercise 2 (4 points)
Consider Problem 2.3 on p. 66 of G&C. Without using a lot of mathematical expressions, one can argue that the equality holds in the following manner. Consider a random sample of n iid uniform (0, 1) random variables. The left side can be viewed as the probability of event A and the right side can be viewed as the probability of event B. The two probabilities must be equal since these two events are equivalent; i.e., event A occurs if and only if event B occurs. Your task is to define event A and event B in such a way so that the equality is established using a simple word argument as indicated above. (Hint: Define both events using the n uniform random variables (or some function of these random variables).)
Exercise 3 (2 points)
Letting X be a Bernoulli (0.5) random variable, sketch the quantile function for X, plotting p on the horizontal axis and the quantile function, Q(p), on the vertical axis. (Use the text's stipulation that Q(p) should only be defined for p in the open interval (0, 1).)
Exercise 4 (1 extra credit point (Reminder: Extra credit problems are to be solved entirely on your own. You should not discuss them with anyone else.))
Consider the sample median from a random sample of n uniform (0, 1) random variables, for an odd sample size. (So the sample median is just a single order statistic.) For large values of n, the value of the sample median's pdf at 0.5 (the midpoint of the interval (0, 1)) is asymptotically equivalent to
c n1/2,
where c is a constant. Determine the value of c, showing a little work. (Hint: Use Stirling's formula.) (Note: It's interesting to note that the value of the sample midrange's pdf at 0.5 is n, indicating that the probability mass of the sample midrange is more concentrated around 0.5 than is the probability mass of the sample median.) Feel free to use any expression/result given in G&C or the notes I provided in class as a starting point (i.e., you don't have to derive a result that I obtained in the notes I gave you or one that is given in G&C). (Note: Consider this to be a general rule for all homework exercises this semester.)
Exercise 5 (2 extra credit points (Reminder: Extra credit problems are to be solved entirely on your own. You should not discuss them with anyone else.))
Consider a random sample of 3 iid exponential random variables, each having mean 1. Letting T be the sum of the sample minimum and the sample maximum, obtain the pdf of T, showing enough work to adequately justify your answer. (Hint: You can proceed similarly to how I did on the top 3/4 of p. 2-8 of the class notes, making appropriate changes to account for the fact that we have 3 exponential random variables instead of n uniform random variables.) (Note: As a check, one can note that the sample minimum has an expected value of 1/3, and the sample maximum has an expected value of 1/3 + 1/2 + 1, which means that the expectation of T should be 1/3 + 1/3 + 1/2 + 1 = 13/6.)
Due Tuesday, September 21
Exercise 6 (3 points)
For a uniform (0, 1) variable U, give a function of U, V = g(U), that has a logistic distribution having cdf 1/( 1 + exp( -v ) ).
Exercise 7 (2 points)
For the case of n being even, and iid uniform (0, 1) random variables, subtract the variance of the sample median based on a sample of size n from the variance of the sample median based on a sample if size n + 1. (You can make use of the answer to Problem 2.13, and if you do that, this problem is pretty easy.) Simplify the difference of the variances to clearly show that the difference is positive (which suggests that if you are going to use the sample median to estimate the median of a uniform distribution and you have an odd sample size, you should randomly delete one of the observations and compute the sample median from the reduced sample).
Exercise 8 (10 points (2 pts for each part))
Consider the 3rd order statistic based on 20 iid exponential random variables having mean 1.
  • (a) Give the exact expected value, rounded to the nearest ten thousandth (which will be four significant digits). You can use the result stated in part (c) of Problem 2.21. (I used this in class on the 7th to obtain the expected values given about 1/3 the way down on p. 2-15 of the class notes.) Alternatively, you can use the result given on the first 4 lines of p. 2-15. (You can also check your answer by obtaining a Monte Carlo estimate.)
  • (b) Give the first approximation of the expected value, rounded to the nearest ten thousandth (which will be four significant digits).
  • (c) Give the second approximation of the expected value, rounded to the nearest ten thousandth (which will be four significant digits). (Note: Since the derivative of the pdf is just the pdf multipled by a constant, the expression for the second derivative of g given near the middle of p. 2-26 of the class notes can be simplifed in a manner similar to what I did for the normal distribution case near the middle of p. 2-27. Of course, one can not worry about the simplification and just execute the method given on the top portion of p. 2-27, since it will always work.)
  • (d) Give the exact variance, rounded to the nearest ten thousandth (which will be two significant digits). (Note: A result similar to the one stated in part (c) of Problem 2.21 can be obtained for the variance, and you can use such a result if you feel confident that you understand how to do it.) Alternatively, you can use the bottom three lines of p. 2-14 of the notes to obtain the 2nd monent and the top 4 lines of p. 2-15 to obtain the mean, and combine these to get the desired variance.
  • (e) Give the first approximation of the variance, rounded to the nearest ten thousandth (which will be two significant digits). (Note: If, just for fun, you obtain the second approximation of the variance, it will round to the same value.)
Due Tuesday, September 28
Exercise 9 (7 points (2 pts for (a), 4 pts for (b), and 1 pt for (c))
Using ths data, do parts (a) through (c) below.
  • (a) Give a tolerance interval corresponding to 75% of the probability mass having a tolerance coefficient of about 0.95. (Each of the endpoints of the interval you give should be one of the data values.)
  • (b) Give the exact tolerance coefficient, rounded to the nearest ten thousandth, for the tolerance interval used in part (a). (Note: Because the sample size is rather small, the answer should be obtained using an exact binomial or beta distribution computation instead of a normal approximation.)
  • (c) Letting s(x) denote the edf obtained from the sample, give the value of s(100).
(Note: No justification needs to be given for your answer to part (c). A correct answer for part (b) will serve as adequate justification for a correct answer to part (a), and since one can use software to obtain the answer for part (b), you don't have to show any work for (b) (but you can if you want to). Since you don't have to show any work to justify any of your answers for this exercise, you shouldn't just copy answers from another student, nor should you give your answers to another student, although it is okay to discuss the problems (in a general nature) with other students if you want to.)
Exercise 10 (1 extra credit point (Reminder: Extra credit problems are to be solved entirely on your own. You should not discuss them with anyone else.))
Use the sample of 15 observations given for Problem 4.18 on p. 151 of G&C to obtain a tolerance interval having the form
(0, x(s)),
corresponding to 60% of the probability mass and having a tolerance coefficient of about 0.91. (Note: The left endpoint is set to be 0, and all of the probability mass of the nonnegative distribution underlying the failure times should be assumed to be greater than 0. If you understand the derivation of the usual tolerance interval, having two random endpoints, given in the course notes, then you ought to be able to modify that derivation to address the case of the left endpoint being set to 0 and only the right endpoint being random.)
Exercise 11 (8 points (4 pts for each part))
Consider the null hypothesis setting pertaining to Sec. 3.2 of G&C, and suppose there are 5 type 1 objects and 5 type 2 objects.
  • (a) What is the probability that there will be only one run of type 1 objects?
  • (b) What is the probability that the first run will be a run of exactly 3 type 1 objects?
(Note: While you may be able to obtain the desired answers by plugging into some of the formulas given in G&C, it might be best to study how I used simple STAT 544 probability results to obtain similar results in the class notes. (The purpose of this exercise is to give you a better appreciation of how the formulas given in the text are obtained.))
Due Tuesday, October 5
Exercise 12 (10 points (2.5 pts for each part))
This link is to a data file showing the time-ordered intervals between failures of the air conditioning equipment of a specific jet aircraft. Use this data to do one-tailed runs tests of the null hypothesis that the sample resulted from iid random variables against the alternative that there is a greater tendency for alternation between large and small values. (Note: For all of the various runs tests covered, a null hypothesis of iid random variables corresponds to the "pure randomness" setting addressed in the course notes.) In each case report an exact p-value, rounded to two significant digits. (E.g., if an exact p-value is 1/63, it should be reported as 0.016, and not 0.02. Also, you should never report a p-value as 0, 0.0, or 0.00 unless the observed outcome is one which is impossible to have occurred if the null hypothesis is true.)
  • (a) Dichotomize the data using the sample median, and use the test based on the total number of runs above and below the sample median.
  • (b) Dichotomize the data using the sample median, and use the test based on the the length of the longest run above or below the sample median.
  • (c) Use the test based on the the total number of runs up or down.
  • (d) Use the test based on the the length of the longest run up or down.
Exercise 13 (10 points (2.5 pts for each part))
This link is to a data file showing time-ordered annual snowfalls (in inches) for 60 consecutive years. Use this data to do one-tailed runs tests of the null hypothesis that the sample resulted from iid random variables against the alternative of a greater tendency for trending. In each case report an exact or approximate p-value, as indicated, rounded to two significant digits. (As always, you should never report a p-value as 0, 0.0, or 0.00 unless the observed outcome is one which is impossible to have occurred if the null hypothesis is true.)
  • (a) Dichotomize the data using the sample median, and use the test based on the total number of runs above and below the sample median to obtain an exact p-value.
  • (b) Repeat part (a) only use a normal approximation with a continuity correction. (Note: For the Exercise 12 data, StatXact's asymptotic p-value is based on a normal approximation incorporating a continuity correction, but for this data you had better check to determine if StatXact uses a continuity correction.)
  • (c) Use the test based on the the total number of runs up or down to obtain an approximate p-value.
  • (d) Use the test based on the length of the longest run up or down to obtain an exact p-value.
Exercise 14 (2 extra credit points (Reminder: Extra credit problems are to be solved entirely on your own. You should not discuss them with anyone else.))
For general n, show that the null expectation of R is the result used in the formulas on p. 3-27 of the notes.
Due Tuesday, October 19
Exercise 15 (6 points (3 pts for each part))
For each part, report the approximate p-value which results from the rank von Neumann test (using the beta approximation instead of the normal approximation). (Note: While it's okay (but seldom necessary) to give an exact p-value using more than 2 significant digits, when giving an approximate p-value, or an estimate of an exact p-value, use exactly 2 significant digits (no more, and no less). Consider this to be a general rule for all homework exercises this semester. (Unless I specify otherwise in a specific exercise (like Exercise 16 below), I'll deduct points if you overstate the accuracy of an approximate p-value by reporting more than 2 significant digits. (Note that 0.23, 0.0023, and 0.000023 are examples of numbers with 2 significant digits.))
  • Use the airplane air conditioning data (used in Exercise 12) and do a one-tailed test against the alternative corresponding to an increased tendency for alternation between small and large values.
  • Use the Buffalo snowfall data (used in Exercise 13) and do a one-tailed test against the alternative of having a greater tendency for trending/clustering.
Exercise 16 (4 points (1 pt for each part))
Use the data given in Problem 4.1 on pp. 148-149 of G&C to test the null hypothesis indicated in the problem against the general alternative. (Note: This is very similar to the Example considered on p. 4-13 of the class notes.) StatXact should be used for parts (a) through (c). Before entering the data, use Nonparametrics > Settings for Nonparametric Procedures ... to change the number of Monte Carlo trials to 1000000 (one million) and specify that the fixed random number seed of 23456 (the default fixed seed) be used. Be sure to check the box to Save Monte Carlo Parameters Permanently before clicking OK. (See the boxed information near the bottom of this web page for detailed instructions.) Then, to be safe, before entering the data, shut down StatXact and then start it up again. (I've found that changing the settings doesn't always take effect immediately, but if you shut down and start back up again the changes will be in effect. I've also found that they aren't necessarily saved permanently.) When you use the Monte Carlo option you can see from the output what the seed was and how many trials were done. So make sure it does exactly what we want it to do for the homework. (Note: There is no need to show any work. Work carefully, and I'll just grade the final answers. As always, you shouldn't use an answer obtained from someone else. You should do the computer work on your own (although it is okay to talk with other students about using StatXact).)
  • (a) Give the exact p-value resulting from Pearson's chi-square goodness-of-fit test (rounded to the nearest thousandth). (Note: Even though n is 1200, I'm getting an exact p-value with StatXact. (Strangely, I'm also now getting exact p-values for the examples on pages 4-8 and 4-13 in the class notes, although previously the software indicated it couldn't produce exact p-values for these data sets.))
  • (b) Give the approximate p-value resulting from Pearson's chi-square goodness-of-fit test (rounded to the nearest thousandth), using the usual chi-square approximation. (Notes: (1) This value is also obtained when you use StatXact to get the exact p-value requested in part (a). (2) Rounding to the nearest thousandth will yield 3 significant digits for the approximate p-value, but that's what I want for this part so that I can make sure you did things correctly.)
  • (c) Just to try it (we don't need to do it this way for this data), use StatXact's Monte Carlo option to get an estimate of the exact p-value resulting from Pearson's chi-square goodness-of-fit test (rounded to the nearest thousandth). Be sure to use 1000000 Monte Carlo trials and a fixed random number seed of 23456.
  • (d) Give the approximate p-value resulting from the likelihood ratio test (rounded to the nearest thousandth), using the usual chi-square approximation. (Comment: With this data, the likelihood ratio test statistic value and the resulting p-value aren't real close to the correpsonding values from Pearson's test, but they're certainly "in the same ballpark.")
Due Tuesday, October 26
Exercise 17 (4 points)
Use this data to test the null hypothesis that the data are due to iid Poisson random variables against the general alternative. Use 6 categories (with only 1 of them having an expected count less than 5). Use the (usual) approximate version of Pearson's chi-square g-o-f test, and give the value of the test statistic and the approximate p-value (both rounded to the nearest thousandth). (To clarify, just do the test as described in the class notes for the case of a Poisson model without a specified mean, using the ordinary sample mean to estimate the mean, and adjusting the df accordingly. As a way to partially check your work for this problem, I'll give you that the asymptotic likelihood ratio test results in an approximate p-value of about 0.000 due to a test statistic value of about 123.611. (In practice I'd write p-value < 0.0005, but rounded to the nearest thousandth, it's 0.000.) The Pearson statistic isn't so close really, but the conclusion from the test is in agreement.) (Note: For all of the solutions due this week, it's okay to just give the answers. For all but this problem, I'm expecting you to use StatXact.)
Exercise 18 (4 points)
Consider the data (from a uniform distribution) given on p. 118 of G&C. (Don't take the square root of the values as is done in the example in the book.) Use the K-S test to test the null hypothesis that the underlying distribution is a beta distribution having mean 0.75 and variance 0.0375 and report the exact p-value (rounded to two significant digits (make sure you know what this means)) obtained when testing against the general alternative. (Note: The null hypothesis distribution has a rather simple pdf and cdf. Before doing the requested test, make sure that you've obtained the correct cdf. You can check your work by using the cdf to obtain the pdf and then using the pdf to obtain the mean and variance. If you employ the method described on the top of p. 4-29 of the class notes, you can carefully enter the data values from p. 118 of G&C into the first column of StatXact's CaseData editor (making sure that you don't make any data entry mistakes), and then use DataEditor > Transform Variable ... to transform the values in the first column and put the transformed values in the second column of the CaseData Editor. Type Var2 into the Target Variable box and a simple formula involving Var1 into the big box to the right of the = sign. For instance, if you wanted to square Var1 (which you don't for this problem), you could type Var1^2 into the big box. Finally, click OK to create the desired transform variable on which to do the K-S test.)
Exercise 19 (8 points (4pts for each part))
Click here to see the number of male pigs in each of 221 litters of 6 Duroc-Jersey pigs.
  • (a) Use the one-sample K-S test to test the null hypothesis that the underlying distribution of male pigs in a litter of size 8 is a binomial (6, 0.5) distribution. Test against the general alternative and report a Monte Carlo estimate (based on 1000000 trials with a seed of 23456) of the exact p-value, rounded to the nearest hundredth. (Note: The data web page has some information about how to use StatXact to do the desired test.)
  • (b) Now use Pearson's Chi-square test to test the same hypotheses as in part (a), using StatXact to obtain an exact p-value, and round it to the nearest hundredth. (Again, the data web page has some information about how to use StatXact to do the desired test.)
Exercise 20 (4 points (2 pts for each part))
Using this data, for each part below, test the null hypothesis that the data are due to iid normal random variables against the general alternative and report the p-value (rounded to the nearest thousandth) which results from the indicated test. (Note: StatXact will give an asymptotic p-value for the Shapiro-Wilk test, and will give a Monte Carlo estimate of the exact p-value for Lilliefors's test. (You should use 1000000 Monte Carlo trials with a seed of 23456.) Normally I would only report at most two significant digits for the approximate p-values from these tests, but in this case both p-values round to 1.00, and to get values less than 1, one needs to round to the nearest thousandth. To see why the p-values are so high, one can look at a Q-Q plot. (In StatXact, use Plots > Q-Q Normal ..., click the PTT variable into the Variable To Plot box, and click OK. You should see a close to straight line pattern of points. (I don't know why the points are generally above the reference line instead of the line running more closely through the plotted points, but of course the main thing to look for is a close-to-straight-line pattern for the points.)))
  • (a) Lilliefors's test.
  • (b) The Shapiro-Wilk test.
Exercise 21 (1 extra credit point (Reminder: Extra credit problems are to be solved entirely on your own. You should not discuss them with anyone else.))
Do Problem 4.16 on p. 151 of G&C, giving the smallest value of n that will work.
Due Tuesday, November 2
Exercise 22 (10 points (5 pts for each part))
Consider this data.
  • (a) Give an exact confidence interval for the 50th percentile (the median) of the underlying distribution having a confidence coefficient of about 0.95. If there is more than one good choice, use an interval estimator that misses from below with about the same probability that it misses from above.
  • (b) Do a test to determine if there is strong evidence that the 90th percentile is greater than 100 and report the resulting p-value.
Exercise 23 (0 points --- not to be turned in)
Suppose that observations are to arise from iid N(μ, 1) random variables, and consider testing
H0: μ >= 0 vs. H1: μ < 0
using a sign test based on 100 observations and having {0, 1, 2, ...,42, 43} as the rejection region. Denoting the size of this sign test by α, what sample size, n, will result in a size α z test having a power against the alternative μ = -0.1 as close as possible to the power of the specified sign test?
Due Tuesday, November 9
Exercise 24 (11 points (1 pt for each part))
Using the data given and described here, test the null hypothesis of no difference in treatments against the general alternative and report the exact or approximate p-value, as requested, using the test indicated for each part below. (Note: Since there is also a permutation test and a normal scores test for use with two independent samples, I indicate that you should do appropriate versions of these tests, meaning that you should use tests that are appropriate for use with matched pairs data.) Round each p-value to the nearest thousandth. (One shouldn't ever use more than two significant digits for an approximate p-value, and for small approximate p-values, using just one siginificant digit is often sensible. Even for exact p-values, two significant digits should typically be adequate.) You can do all but parts (c) and (f) using StatXact.
  • (a) Give an approximate p-value resulting from Student's t test.
  • (b) Give an exact p-value resulting from the sign test.
  • (c) Give an approximate p-value resulting from the sign test, using a normal approximation with a continuity correction.
  • (d) Give an approximate p-value resulting from the sign test, using a normal approximation without a continuity correction.
  • (e) Give an exact p-value resulting from Wilcoxon's signed-rank test.
  • (f) Give an approximate p-value resulting from Wilcoxon's signed-rank test, using a normal approximation with a continuity correction.
  • (g) Give an approximate p-value resulting from Wilcoxon's signed-rank test, using a normal approximation without a continuity correction.
  • (h) Give an exact p-value resulting from an appropriate normal scores test.
  • (i) Give an approximate p-value resulting from an appropriate normal scores test, using a normal approximation.
  • (j) Give an exact p-value resulting from an appropriate permuation test.
  • (k) Give an approximate p-value resulting from an appropriate permutation test, using a normal approximation.
(It can be noted that several of the exact p-values are just as small as the approximate p-value from the t test. It can also be noted that with such a small sample size, some of the approximate p-values are appreciably larger than the corresponding exact p-values.)
Exercise 25 (4 points)
Determine the A.R.E. of the signed-rank test w.r.t. the t test when the underlying distribution is a member of the location family of triangular distributions having pdfs which are translations of
f0(x) = (1 + x) I(-1, 0)(x) + (1 - x) I[0, 1)(x).
(That is, the pdf is 1+x for -1 < x < 0, and the pdf is 1-x for 0 <= x < 1.) You should show some work for this one, but you can just give answers for the preceding exercise.
Due Tuesday, November 16
Exercise 26 (15 points (1 pt for each part, except part (i) is worth 4 points))
Using the data given and described here, test the null hypothesis of
μi <= μc
against the alternative that
μi > μc,
where μi is the mean level for women with breast implants and μc is the mean level for women without breast implants. (You are to assume that one distribution is stochastically larger than the other if they differ.) Report the exact or approximate p-value, as requested, using the test indicated for each part below. Round each p-value to two significant digits. Show your work for part (i), but you can just give answers for the other parts if you wish.
  • (a) Give an approximate p-value resulting from Student's two-sample t test.
  • (b) Give an approximate p-value resulting from Welch's test.
  • (c) Give an exact p-value resulting from the Mann-Whitney test.
  • (d) Give an approximate p-value resulting from the Mann-Whitney test, using a normal approximation with a continuity correction.
  • (e) Give an approximate p-value resulting from the Mann-Whitney test, using a normal approximation without a continuity correction.
  • (f) Give an exact p-value resulting from the two-sample median test (aka Fisher's exact test).
  • (g) Give an approximate p-value resulting from the two-sample median test, incorporating Yates's continuity correction.
  • (h) Give an approximate p-value resulting from the two-sample median test, not incorporating Yates's continuity correction.
  • (i) Give an exact p-value resulting from the control median test, letting the test statistic be the number of observations from the implant group less than the 3rd smallest value from the control group.
  • (j) Give an exact p-value resulting from the Wald-Wolfowitz runs test. (Note: The p-value from a test against the general alternative should be used here. That's the best we can do with this test.)
  • (k) Give an approximate p-value resulting from the Wald-Wolfowitz runs test, using a normal approximation with a continuity correction.
  • (l) Give an exact p-value resulting from an appropriate Kolmogorov-Smirnov test.
Exercise 27 (5 points (1 pt for each part))
Using the data given and described here, test the null hypothesis of identical distributions against the general alterntive, reporting the exact or approximate p-value, as requested, using the test indicated for each part below. Round each of the four exact p-values to three significant digits (to make it easier for me to determine if you did everything correctly), but round the approximate p-value for part (a) to only two significant digits. (Since StatXact can be used for all five parts, you can just give answers without showing any work.)
  • (a) Give an approximate p-value resulting from Student's two-sample t test.
  • (b) Give an exact p-value resulting from the Mann-Whitney test, using StatXact to handle the two tied values in its exact way (using midranks).
  • (c) Give an exact p-value resulting from the two-sample median test (aka Fisher's exact test).
  • (d) Give an exact p-value resulting from the Wald-Wolfowitz runs test, after breaking the tie in a conservative manner. (This can be done using StatXact with the data as is. The output will have the desired p-value (as well as one based on an anticonservative tie-breaking strategy).)
  • (e) Give an exact p-value resulting from an appropriate Kolmogorov-Smirnov test.
Due Tuesday, November 30
Exercise 28 (5 points (1 pt for each part))
Using the data given and described here, test the null hypothesis of
μi <= μc
against the alternative that
μi > μc,
where μi is the mean level for women with breast implants and μc is the mean level for women without breast implants. (You are to assume that one distribution is stochastically larger than the other if they differ.) Report the exact p-value, using the test indicated for each part below. Round each p-value to two significant digits.
  • (a) normal scores test, using van der Waerden scores
  • (b) Savage scores test
  • (c) permutation test
  • (d) percentile modified rank test, using 6 nonzero scores at each end
  • (e) Sutton scores test, using N/4 nonzero scores at each end
(Since StatXact can be used for all five parts, you can just give answers without showing any work.) (Next time I might add some parts comparing normal approximation and t approximation p-values for the normal scores test to the exact p-value from the test.)
Exercise 29 (5 points (1 pt for each part))
Using the data given and described here, test the null hypothesis of identical distributions against the general alterntive, reporting the exact p-value, using the test indicated for each part below. Round each p-value to two significant digits. (Since StatXact can be used for all five parts, you can just give answers without showing any work.)
  • (a) normal scores test, using van der Waerden scores
  • (b) Savage scores test
  • (c) permutation test
  • (d) percentile modified rank test, using 8 nonzero scores at each end (use midranks to handle tied values)
  • (e) Sutton scores test, using N/4 nonzero scores at each end (use midranks to handle tied values)
Exercise 30 (5 points)
Similar to what was done on the top portion of p. 8-13 of the class notes, give the asymptotic correlation of T (as defined on the bottom of p. 8-12 of the class notes) and the Wilcoxon rank sum statistic (rounding the answer to three significant digits (after you take the limit as N tends to infinity)). (Show your work for this one.)
Exercise 31 (10 points (1 pt for each part, except that parts (g) and (h) are worth 2 pts apiece))
Using the data given and described here, test the null hypothesis of identical distributions against the general alterntive, reporting the exact p-value, using the test indicated for each part below. Round each p-value to two significant digits. (Since StatXact can be used for all eight parts, you can just give answers without showing any work.)
  • (a) Mood test (for dispersion differences)
  • (b) Ansari-Bradley test
  • (c) Seigel-Tukey test
  • (d) Klotz test
  • (e) Conover test
  • (f) percentile modified rank test (for dispersion differences), using 3 nonzero scores at each end
  • (g) Westenberg's test, letting the 12 middlemost values be the inner portion of the ordered combined sample, and letting the outermost portion be the four smallest and the four largest values in the ordered combined sample
  • (h) ranklike test, pairing the 1st and 6th, 2nd and 7th, 3rd and 8th, 4th and 9th, and 5th and 10th observations in each of the two samples, and applying the Wilcoxon rank sum test to the ten (five from each sample) absolute differences
Exercise 32 (2 extra credit points (Reminder: Extra credit problems are to be solved entirely on your own. You should not discuss them with anyone else.))
Without just plugging into (9.9.2) on p. 329 of G&C, but instead using a very simple argument involving probability and counting, give the null probability that Rosenbaum's test statistic (see p. 329 of G&C) will equal m. (Give some sort of explanation for your answer for this one.)
Due Tuesday, December 7
Exercise 33 (10 points (1 pt each for parts (a) through (d) and 1.5 pts each for parts (e) through (h)))
Using the data given and described here, test the null hypothesis of identical weight change distributions against the general alternative. Round each p-value to the nearest thousandth. (Some of the p-values are approximate, and indicating too much accuracy for them isn't good.)
  • (a) one-way ANOVA F test (report approximate p-value)
  • (b) k-sample median test (report exact p-value)
  • (c) Kruskal-Wallis test (report Monte Carlo approximation of exact p-value, using 1,000,000 trials and seed 23456)
  • (d) k-sample normal scores test (report Monte Carlo approximation of exact p-value, using 1,000,000 trials and seed 23456)
  • (e) percentile modified rank test (for location differences), using 30 nonzero scores at each end (report Monte Carlo approximation of exact p-value, using 1,000,000 trials and seed 23456)
  • (f) rank analog of the Tukey-Kramer test (give value of test statistic and bracket p-value using the critical values given here)
  • (g) Steel-Dwass test (give value of test statistic and bracket p-value using the critical values given here)
  • (h) modification of the k-sample median test, using the lower third, middle third, and upper third instead of lower and upper halves (report exact p-value based on Pearson's chi-square statistic)
Exercise 34 (5 points (2 pts for part (a) and 3 pts for part (b)))
Using the data given and described here, test the null hypothesis of identical distributions against the alterntive that increased caffeine tends to increase tapping rates. Round each p-value to two significant digits.
  • (a) Jonckheere-Terpstra test (report exact p-value)
  • (b) test described on bottom half of p. 10-19 and top half of p. 10-20 of the course notes (report exact p-value)
Exercise 35 (5 points (2 pts for part (a) and 3 pts for part (b)))
Using the data given and described here, test the null hypothesis that the medians of all three of the weight change distributions are less than or equal to 0 against the alternative that at least one of them is positive. Omit the one weight change of 0 and work with only 71 observations in all. (Doing it this way allows you to use software that does the sign test by ignoring values equal to the hypothesized median for part (a).) Round each p-value to the nearest thousandth.
  • (a) use the test statistic V described on p. 372 of G&C (and p. 10-21 of the class notes) (report exact p-value)
  • (b) use the test statistic V* described on p. 373 of G&C (and p. 10-22 of the class notes) (report approximate p-value)
Exercise 36 (9 points (3 pts for each part))
Using the data given and described here, test the null hypothesis that the all of the weight change distributions are identical against the alternative that at least one of the the two alternative treatment distributions is stochastically larger than the control distribution. (Note: The control sample is the middle sample of size 26.) Round each p-value to the nearest thousandth. (Some of the p-values are approximate, and indicating too much accuracy for them isn't good.)
  • (a) use the C-D test (report approximate p-value (use a continuity correction))
  • (b) use the Flinger-Wolfe test described on p. 376 of G&C (and p. 10-28 of the class notes) (report exact p-value)
  • (c) use the method described at the bottom of p. 10-28 of the class notes (use exact p-values from the M-W tests, but the final p-value wil be conservative due to the use of Boole's inequality)
Exercise 37 (4 points (1 pt for each part))
Using the data given and described here, use one-tailed tests to test the null hypothesis of independence against the alternative of a positive association. Round each p-value to two significant digits. (If StatXact reports 0.0000 for any of the p-values, use Options > Global... to make it report more informative p-values.)
  • (a) report the exact p-value which results from using Kendall's statistic
  • (b) report the exact p-value which results from using Spearman's statistic
  • (c) report the exact p-value which results from using Pearson's statistic (based on StatXact's permutation scheme, which isn't covered in the book or the notes but is dealt with in my StatXact examples (e.g., see the bottom portion of this example))
  • (d) report the approximate p-value which results from using Pearson's statistic (based on making an assumption of bivariate normality ... the p-value will be exact if the assumption is true, but otherwise the p-value is only approximate)
Exercise 38 (2 points (1 pt for each part))
This link is to a data file showing the numbers of certain types of accidents on British Rail for each year from 1970 to 1983. (The accidents include collisions between trains and derailments. Accidents involving trains and motor vehicles and accidents involving trains and pedestrians are not included, perhaps because they are primarily due to errors by drivers and pedestrians and not easily prevented by the railway.) Use two-tailed tests to test the null hypothesis of no time dependence against the alternative of either an increasing or decreasing trend. Round each p-value to two significant digits. (If StatXact reports 0.0000 for any of the p-values, use Options > Global... to make it report more informative p-values.)
  • (a) report the exact p-value which results from using Mann's test (Kendall's statistic)
  • (b) report the exact p-value which results from using Daniels' test (Spearman's statistic)
Due Friday, December 10 (but with a grace period until 10:15 PM on Tuesday, Dec. 14)
Exercise 39 (5 extra credit points)
Using the data given and described here, test the null hypothesis of no treatment differences against the general alternative. Round each p-value to two significant digits. (Note: If you can't use StatXact to get exact p-values for the nonparametric tests, then use it to obtain Monte Carlo estimates of the exact p-values, using 1,000,000 Monte Carlo trials and a starting seed of 23456.)
  • (a) report the approximate p-value which results from an ANOVA F test (the p-value would be exact only if an additive model with iid normal error terms is correct)
  • (b) report the exact p-value which results from Friedman's test
  • (c) report the approximate p-value which results from using a chi-square approximation with Friedman's test
  • (d) report the exact p-value which results from the Friedman aligned rank test
  • (e) report the exact p-value which results from the Quade test


Click here is some information about the homework component of your grade for the course, my late homework policy, and the presentation of HW solutions.