income example


 MTB > # I'll use Minitab to compute the value of the K-S statistic for the
 MTB > # data given on the top of p. 126 of G&C and the distribution proposed
 MTB > # in Problem 4.19 on p. 151 of G&C.  Then I'll use tables to make a
 MTB > # statement about the p-value.  Finally, I'll give instructions for
 MTB > # using StatXact to obtain an exact p-value.

 MTB > set c1
 DATA> 9800 10200 9300 8700 15200 6900 8600 9600 12200 15500 11600 7200
 DATA> end
 
 MTB > # In order to get a feel for the relationship between the data and the
 MTB > # proposed dist'n, we can examine a Q-Q plot.  I'll plot the ordered
 MTB > # pairs
 MTB > #             ( x_(i), F_0^{-1}( i/(n+1) ) ).
 MTB > set c2
 DATA> 1:12
 MTB > let c2 = c2/13
 MTB > invcdf c2 c3;
 SUBC> norm 10000 2000.
 MTB > sort c1 c1
 MTB > name c1 'obs data' c3 'hyp e.v.'
 MTB > # (Note: The hyp e.v. values are approximations of the order statistics from
 MTB > # 12 normal random variablies having mean 10,000 and standard deviation 2000.)

 MTB > plot c3 c1
 
 hyp e.v. -                                                            *
          -
          -
     20000+                                                          *
          -
          -                                      *
          -                                  *
          -
     10500+                         *
          -                      *
          -                     *
          -
          -                   *
      9000+               *
          -
          -              *
          -     *
          -
      7500+
          -   *
          -
          -
            ------+---------+---------+---------+---------+---------+----obs data
               7500      9000     10500     12000     13500     15000
 
 MTB > # The dist'n underlying the data may be slightly skewed, but it doesn't appear
 MTB > # to be highly incompatible with a normal dist'n.  The sample size is too small
 MTB > # to reach a strong conclusion.
 
 MTB > desc c1
 
                 N     MEAN   MEDIAN   TRMEAN    STDEV   SEMEAN
 obs data       12    10400     9700    10240     2773      801
 
               MIN      MAX       Q1       Q3
 obs data     6900    15500     8625    12050
 
 MTB > let k90 = 1
 MTB > exec 'skku'
 Executing from file: skku.MTB
 
 skewness 0.830365
 kurtosis -0.0487697

 MTB > # To compute the value of the K-S statistic, I'll first put the values of
 MTB > # F_0( x_(i) ) into c4, the values of i/n into c5, and the values of (i-1)/n
 MTB > # into c6.  Then I'll put the values of i/n - F_0( x_(i) ) into c7 and the
 MTB > # values of F_0( x_(i) ) - (i-1)/n into c8.  The largest of the values in
 MTB > # c7 and c8 will be the value of the test statistic.
 MTB > cdf c1 c4;
 SUBC> norm 10000 2000.
 MTB > set c5
 DATA> 1:12
 DATA> end
 MTB > let c5 = c5/12
 MTB > set c6
 DATA> 0:11
 DATA> end
 MTB > let c6 = c6/12
 MTB > let c7 = c5 - c4
 MTB > let c8 = c4 - c6
 MTB > desc c7 c8
 
                 N     MEAN   MEDIAN   TRMEAN    STDEV   SEMEAN
 C7             12   0.0358   0.0381   0.0382   0.0655   0.0189
 C8             12   0.0475   0.0452   0.0451   0.0655   0.0189
 
               MIN      MAX       Q1       Q3
 C7        -0.0787   0.1268  -0.0225   0.0842
 C8        -0.0435   0.1620  -0.0009   0.1058
 
Looking at the values under MAX, it can be seen that the value of the test statistic
is 0.1620.  Using Table F of G&C it can be seen that the p-value exceeds 0.2.  Using
Birnbaum's table, it can be determined that the p-value is between 0.83986 and 0.99995
(but one might guess that it's somewhat close to 0.84 since the table gives us that
P_0( D_12 >= 2/12 ) = 0.83986, and the test statistic value of 0.1620 isn't much different
from 2/12 = 0.1667).


To use StatXact, we put the data in the CaseData editor and then select
   Nonparametrics > One-Sample Goodness-of-Fit > Kolmogorov ...

Then click the variable into the Response box using the arrow, select
Normal from the Type menu under Distribution, and enter the values for
the Mean and Std-dev.  Finally click to select Exact (under Compute),
and click OK.

The value of the test statistic is 0.162, and the exact p-value is 0.8623
(and the asymptotic p-value is 0.9111).


StatXact can also be used to do Lilliefors's test and the Shapiro-Wilk test.
With the data in the CaseData editor, select
   Nonparametrics > One-Sample Goodness-of-Fit > Lilliefors ...
  
Then click the variable into the response box.  (Since StatXact does this
test using the Monte Carlo method, before running this test, the number of 
Monte Carlo trials should be set to 1000000 and the random number seed should
be set to the fixed value of 23456.)  Finally, click OK to run Lillifors's
test.

The Monte Carlo estimate of the exact p-value is about 0.23.  (Note: StatXact
reports 0.2293, but note that the confidence interval for the exact p-value is
(0.2282, 0.2304), which indicates that there is some uncertainty associated 
with the estimated p-value.  So it would be silly to report the p-value using 
four significant digits.  I think it's better to just report any sort of 
approximate or estimated p-value using only two significant digits.)

For the Shapiro-Wilk test use
   Nonparametrics > One-Sample Goodness-of-Fit > Shapiro-Wilk ...

Click the variable into the box and then click OK ... it's that simple.
StatXact always does this test using an approximate p-value formula.
So we can round the reported p-value of 0.199 to 0.20.  (Note: The
p-value is in the same ballpark as the one from Lilliefors's test.)