MTB > # I'll use Minitab to compute the value of the K-S statistic for the
MTB > # data given on the top of p. 126 of G&C and the distribution proposed
MTB > # in Problem 4.19 on p. 151 of G&C. Then I'll use tables to make a
MTB > # statement about the p-value. Finally, I'll give instructions for
MTB > # using StatXact to obtain an exact p-value.
MTB > set c1
DATA> 9800 10200 9300 8700 15200 6900 8600 9600 12200 15500 11600 7200
DATA> end
MTB > # In order to get a feel for the relationship between the data and the
MTB > # proposed dist'n, we can examine a Q-Q plot. I'll plot the ordered
MTB > # pairs
MTB > # ( x_(i), F_0^{-1}( i/(n+1) ) ).
MTB > set c2
DATA> 1:12
MTB > let c2 = c2/13
MTB > invcdf c2 c3;
SUBC> norm 10000 2000.
MTB > sort c1 c1
MTB > name c1 'obs data' c3 'hyp e.v.'
MTB > # (Note: The hyp e.v. values are approximations of the order statistics from
MTB > # 12 normal random variablies having mean 10,000 and standard deviation 2000.)
MTB > plot c3 c1
hyp e.v. - *
-
-
20000+ *
-
- *
- *
-
10500+ *
- *
- *
-
- *
9000+ *
-
- *
- *
-
7500+
- *
-
-
------+---------+---------+---------+---------+---------+----obs data
7500 9000 10500 12000 13500 15000
MTB > # The dist'n underlying the data may be slightly skewed, but it doesn't appear
MTB > # to be highly incompatible with a normal dist'n. The sample size is too small
MTB > # to reach a strong conclusion.
MTB > desc c1
N MEAN MEDIAN TRMEAN STDEV SEMEAN
obs data 12 10400 9700 10240 2773 801
MIN MAX Q1 Q3
obs data 6900 15500 8625 12050
MTB > let k90 = 1
MTB > exec 'skku'
Executing from file: skku.MTB
skewness 0.830365
kurtosis -0.0487697
MTB > # To compute the value of the K-S statistic, I'll first put the values of
MTB > # F_0( x_(i) ) into c4, the values of i/n into c5, and the values of (i-1)/n
MTB > # into c6. Then I'll put the values of i/n - F_0( x_(i) ) into c7 and the
MTB > # values of F_0( x_(i) ) - (i-1)/n into c8. The largest of the values in
MTB > # c7 and c8 will be the value of the test statistic.
MTB > cdf c1 c4;
SUBC> norm 10000 2000.
MTB > set c5
DATA> 1:12
DATA> end
MTB > let c5 = c5/12
MTB > set c6
DATA> 0:11
DATA> end
MTB > let c6 = c6/12
MTB > let c7 = c5 - c4
MTB > let c8 = c4 - c6
MTB > desc c7 c8
N MEAN MEDIAN TRMEAN STDEV SEMEAN
C7 12 0.0358 0.0381 0.0382 0.0655 0.0189
C8 12 0.0475 0.0452 0.0451 0.0655 0.0189
MIN MAX Q1 Q3
C7 -0.0787 0.1268 -0.0225 0.0842
C8 -0.0435 0.1620 -0.0009 0.1058
Looking at the values under MAX, it can be seen that the value of the test statistic
is 0.1620. Using Table F of G&C it can be seen that the p-value exceeds 0.2. Using
Birnbaum's table, it can be determined that the p-value is between 0.83986 and 0.99995
(but one might guess that it's somewhat close to 0.84 since the table gives us that
P_0( D_12 >= 2/12 ) = 0.83986, and the test statistic value of 0.1620 isn't much different
from 2/12 = 0.1667).
To use StatXact, we put the data in the CaseData editor and then select
Nonparametrics > One-Sample Goodness-of-Fit > Kolmogorov ...
Then click the variable into the Response box using the arrow, select
Normal from the Type menu under Distribution, and enter the values for
the Mean and Std-dev. Finally click to select Exact (under Compute),
and click OK.
The value of the test statistic is 0.162, and the exact p-value is 0.8623
(and the asymptotic p-value is 0.9111).
StatXact can also be used to do Lilliefors's test and the Shapiro-Wilk test.
With the data in the CaseData editor, select
Nonparametrics > One-Sample Goodness-of-Fit > Lilliefors ...
Then click the variable into the response box. (Since StatXact does this
test using the Monte Carlo method, before running this test, the number of
Monte Carlo trials should be set to 1000000 and the random number seed should
be set to the fixed value of 23456.) Finally, click OK to run Lillifors's
test.
The Monte Carlo estimate of the exact p-value is about 0.23. (Note: StatXact
reports 0.2293, but note that the confidence interval for the exact p-value is
(0.2282, 0.2304), which indicates that there is some uncertainty associated
with the estimated p-value. So it would be silly to report the p-value using
four significant digits. I think it's better to just report any sort of
approximate or estimated p-value using only two significant digits.)
For the Shapiro-Wilk test use
Nonparametrics > One-Sample Goodness-of-Fit > Shapiro-Wilk ...
Click the variable into the box and then click OK ... it's that simple.
StatXact always does this test using an approximate p-value formula.
So we can round the reported p-value of 0.199 to 0.20. (Note: The
p-value is in the same ballpark as the one from Lilliefors's test.)