solutions to HW, STAT 789

Solutions for some HW problems

(Note: Some HW solutions will be distributed in paper form.)

Below are solutions for

Problem 4,
Problems 6(b) and 7,
Problem 8,
Problem 11,
Problem 12,
Problem 13,
Problem 14,
Problem 15,
Problem 16,
Problem 17,
Problem 18,
Problem 19,
Problem 21,
Problem 22,
Problem 23,
Problem 24.

Problem 4

(a) Since the coverage probability of the t interval, having half-width t_{19, alpha/2}S/n^1/2 is 1-alpha, the coverage probability of the interval having half-width z_0.025S/n^1/2 can be determined by figuring out the value of alpha which makes t_{19, alpha/2} equal to z_0.025. I'll use Minitab to do this.

MTB > # I'll put -z_0.025 in k1.
MTB > invcdf 0.025 k1;
SUBC> norm 0 1.
MTB > # alpha/2 is the probability mass under the T_19 distribution pdf
MTB > # to the left of -z_0.025, and the cdf command can be used to get
MTB > # this value, which I'll store in k2. Since 2*k2 equals alpha, the
MTB > # probability the confidence interval procedure doesn't cover, the
MTB > # desired coverage probability will be 1 -2*k2.
MTB > cdf k1 k2;
SUBC> t 19.
MTB > let k3 = 1 - 2*k2
MTB > print k1-k3

K1 -1.95996
K2 0.0324157
K3 0.935169

The desired coverage probability is about 0.935.

(b) Similarly, the desired coverage probability is about 0.940.

Problems 6(b) and 7

I'll do this using Minitab. Notice that once I get the data into c1, it only took 4 let commands to compute the 2 estimates (and if one wanted to, the estimates could be computed using just 2 commands).

MTB > set c1
DATA> 3.5 5.1 2.5 6.1 4.6 3.5 2.9 3.6
DATA> end
MTB > let c2 = sum(c1) - c1
MTB > # I put the data into c1. Then I put the sums of the 8 jackknife
MTB > # samples into the 8 elements of c2. Next, I'll put the 8 jack-
MTB > # knife replications into c3.
MTB > let c3 = (c2/7)**2
MTB > # I'll put the est. of bias into k1 and the est. of se into k2,
MTB > # noting that the est. of se is the sample standard deviation of
MTB > # the replications, multipled by (n-1)/sqrt(n).
MTB > let k1 = 7*(mean(c3) - mean(c1)**2)
MTB > let k2 = stdev(c3)*7/sqrt(8)
MTB > name k1 'est bias' k2 'est se'
MTB > print k1 k2

est bias 0.180264
est se 3.33670

The desired estimates are 0.180 (for the bias) and 3.34 (for the se).

Problem 8

MTB > # Below I will compute the bias estimate and the se estimate using
MTB > # just one command each.
MTB > set c1
DATA> 1.6 3.3 1.6 1.7 0.6 2.4 1.8 4.7 0.2
DATA> end
MTB > let k1 = 8*(mean(8/(sum(c1) - c1)) - 1/mean(c1))
MTB > let k2 = stdev(8/(sum(c1) - c1))*8/3
MTB > name k1 'est bias' k2 'est se'
MTB > print k1 k2

est bias 0.0281930
est se 0.124306

The desired estimates are 0.0282 (for the bias) and 0.124 (for the se).

Problem 11

(a) The value of the test statistic is about 8.61, and the p-value is about 0.0012.

(b) The p-value is about 0.0015.

(c) The values of the 4 choose 2 = 6 t statistics, along with the associated integer df values, are given below. The 3rd test statistic yielded the smallest p-value (from a two-sample two-sided Welch test), and that p-value is multiplied by 6 to obtain the overall p-value of about 0.12.

t = 1.221 (df = 6)
t = 2.174 (df = 4)
t = 3.746 (df = 4)
t = 1.372 (df = 6)
t = 3.305 (df = 5)
t = 2.333 (df = 6)

Problem 12

(a) The value of the test statistic is about 5.39, the 2nd df is 8, and the p-value is about 0.025.

(b) The value of the test statistic is about 9.23, and the p-value is about 0.026.

(c) The values of the 4 choose 2 = 6 t statistics, along with the associated integer df values, are given above (in solution to Problem 11). None of the t values exceed the 0.05 critical value corresponding to the associated df, and so the conclusion is that p-value > 0.05.

Problem 13

(a) The value of the test statistic (absolute value) is about 5.73, and the p-value is about 0.0010.

(b) The value of the test statistic (absolute value) is about 5.73, and the p-value is about 0.0010.

(c) The value of the test statistic (absolute value) is about 7.03, the 2nd df is 9, and the p-value is about 0.00061.

(d) The value of the test statistic is 34, the mean of its null sampling distribution is 80, the variance of its null sampling distribution is 226 2/3, the appropriate z-score is about -3.02, and the approximate p-value is about 0.0025.

(e) The value of the test statistic is 130, the mean of its null sampling distribution is 157.5, the variance of its null sampling distribution is 87.5, the appropriate z-score is about -2.94, and the approximate p-value is about 0.0033.

Problem 14

(a)

	sigma²_Ahat	sigma²_Ehat
usual unbiased estimators	615.3	16.2
simple improvements; e.g., H-L est'r	615.3	13.9
maximum likelihood estimators	511.9	16.2
Klotz-Milton-Zacks estimators	438.0	13.9
Stein estimators	438.0	13.9

(b)

	nuhat	nu^*	l c b	u c b
usual unbiased est'rs	4.9	5	277.9	2685.8
Stein estimators	4.9	5	277.9	2685.8

(c)

l c b	u c b
189.7	1040.9

Problem 15

(a) With only one observation per cell, there is no way to estimate the variances needed to adjust for possible heteroscedasticity, and so we need to assume homoscedasticity and estimate the assumed common variance with the MSE. An Abelson-Tukey test for a two-way layout is called for.

t: 4.9617
p-value (upper-tailed test): 0.00010.

(b) Page's test is called for.

average rank for Control group: 1
average rank for Low dose group: 2.375
average rank for High dose group: 2.625
L: 13.625
E₀(L): 12
Var₀(L): 0.25
z: 3.25
approximate p-value (upper-tailed test): 0.00058. (Note: StatXact gives an exact p-value of about 0.00035.)

(c)

F: 13.89
p-value: 0.00047 (which is nearly 5 times larger than the p-value from the Abelson-Tukey test against the monotone alternative).

(d)

average rank for Control group: 1
average rank for Low dose group: 2.375
average rank for High dose group: 2.625
Q: 12.25
approximate p-value (chisquare approximation): 0.002
(rounded to 1 significant digit exact p-value (from table): 0.001. (Note: StatXact gives an exact p-value of about 0.00086.)

Problem 16

(a) The value of the point estimate is about 0.17034. The df formula yields a vlue of about 3.47, and so upon rounding, we get 3 df. The confidence interval is (0.065, 1.452). (b) The sample mean of the pseudo-values is about 0.17014. The estimated standard error of the sample mean of the pseudo-values is about 0.08845. The confidence interval is (0.025, 0.316).

Problem 17

(a) The value of the test statistic is about 3.06, the dfs are 12 and 9, and the p-value is about 0.10.

(b) The value of the test statistic is about 2.10, the df is 13, and the p-value is about 0.056.

(c) The value of the test statistic is about 3.06, the dfs are 15 and 11, and the p-value is about 0.068.

(d) The value of the test statistic is about 2.48, the df is 20, and the p-value is about 0.022.

(Note: As in the example on p. G5 of the class notes, the ordinary F test produced the largest p-value. (Recall, in most cases for which the F test procudes the smallest p-value, it's not to be trusted.))

Problem 18

I've got a hard copy version of the solution to give you that shows the calculus details, so I'll just post the answer here so that you can make use of it when doing Problem 19.

The exponential reference rule is

h^* = 2.29 xbar / n^1/3,

where xbar is the sample mean.

Problem 19

	h	UCV(h)
normal reference	9.2	-0.0429
exponential reference	5.2	-0.0428
Freedman-Diaconis	4.6	-0.0426
oversmoothed	15.4	-0.0394
Sturges	12.1	-0.0412

Problem 21

(a) We have 0.5 = P( X < 0 ) = P( e^X < e⁰ ) = P( Y < 1 ), and so the desired median is 1.

(b) 1.007.

(c) 1.007.

(Note: It happens to be the case for this distribution that the skewness is nearly equal to the 25% trimmed mean, but it's not always the case that these two values will be close to one another.)

Problem 22

(a) 0.917.

(b) 0.109.

Problem 23

0.933.

Problem 24

(a) The value of the test statistic is about 3.87, and the p-value is about 0.038.

(b) The p-value is about 0.036.

(c) The value of the test statistic is about 8.76, the 2nd df is 10, and the p-value is about 0.0063.

(d) The value of the test statistic is about 11.27, and the p-value is about 0.0036.

(e) The values of the 3 choose 1 = 3 t statistics, along with the associated integer df values, are given below. Only the t value from the 1st and 2nd samples exceeds the 0.05 critical value (but not the 0.01 critical value) corresponding to the associated df, and so the conclusion is that
0.01 < p-value < 0.05.

t = 3.866 (df = 8)
t = 2.157 (df = 8)
t = -0.561 (df = 13)