failure time example


 MTB > # I'll use Minitab to compute the value of the K-S statistic for 
 MTB > # the data and proposed dist'n given in Problem 4.18 on p. 151 of
 MTB > # G&C.  Then I'll use tables to make a statement about the p-value.
 MTB > # Finally, I'll give instructions for using StatXact to obtain an
 MTB > # exact p-value.

 MTB > set c1
 DATA> 1.6 10.3 3.5 13.5 18.4 7.7 24.3 10.7 8.4 4.9 7.9 12.0 16.2 6.8 14.7
 DATA> end
 MTB >  
 MTB > # I'll store the data in a file so that I can then read it into StatXact.
 MTB > write 'failtime' c1
 Writing data to file: failtime.DAT

 MTB > # In order to get a feel for the relationship between the data and the
 MTB > # proposed dist'n, we can examine a Q-Q plot.  I'll plot the ordered
 MTB > # pairs
 MTB > #             ( x_(i), F_0^{-1}( i/(n+1) ) ).
 MTB > # For the exponential dist'n under consideration, we have
 MTB > #     F_0^{-1}( i/(n+1) ) = 10*log( (n+1)/(n+1-i) ).

 MTB > set c2
 DATA> 1:15
 DATA> end
 MTB > let c3 = 10*loge( 16/(16-c2) )
 MTB > name c1 'obs data' c3 'hyp e.v.'
 MTB > # (Notes: (1) The hyp e.v. values are the approximated expected values for
 MTB > # the order statistics from a sample of size 15 from the proposed (null hyp.)
 MTB > # dist'n.  (2) I must order the data values from smallest to largest in order
 MTB > # to match them with the hyp e.v. values.  (3) I could have used the invcdf 
 MTB > # command (along with the expo 10 subcommand) to obtain the desired inverse 
 MTB > # cdf values.)
 MTB > sort c1 c1

 MTB > # Here is the pertinent Q-Q plot.
 MTB > width 61
 MTB > height 23
 MTB > plot c3 c1
 
          -
          -
      28.0+                                                            *
          -
  hyp e.v.-
          -
          -
      21.0+                                             *
          -
          -
          -                                        *
          -
      14.0+                                    *
          -
          -                                 *
          -                             *
          -                          *
       7.0+                         *
          -                    *
          -                  **
          -                *
          -        *  *
       0.0+   *
            --------+---------+---------+---------+---------+---------+--obs data
                  4.0       8.0      12.0      16.0      20.0      24.0
 
 MTB > # There seems to be a decent agreement between the plotted points and the
 MTB > # comparison line (the line having slope 1 and intercept 0).  But the small
 MTB > # values are collectively larger than they should be (if the underlying
 MTB > # dist'n is an exponential dist'n having mean 10) and the large values are
 MTB > # collectively smaller than they should be.  So while the mean may be close
 MTB > # to 10, it appears that the standard deviation may be less than 10, and the
 MTB > # true underlying dist'n may be relatively less stretched out than an exponential
 MTB > # dist'n is.  We can check on this by looking at the values of some summary
 MTB > # statistics.

 MTB > desc c1
 
                 N     MEAN   MEDIAN   TRMEAN    STDEV   SEMEAN
 obs data       15    10.73    10.30    10.38     6.01     1.55
 
               MIN      MAX       Q1       Q3
 obs data     1.60    24.30     6.80    14.70
 
 MTB > let k90 = 1
 MTB > exec 'skku'
 Executing from file: skku.MTB
 
 skewness 0.661732
 kurtosis 0.427545

 MTB > # (Note: The sample skewness is appreciably less than 2, which is
 MTB > # the skewness of an exponential dist'n.)

 MTB > # Now I want to compute the value of the one-sample K-S test statistic.  I'll
 MTB > # put the values of F_0( x_(i) ) into c4, the values of i/n into c5, and the
 MTB > # values of (i-1)/n into c6.  Then I'll put the values of i/n - F_0( x_(i) )
 MTB > # into c7 and the values of F_0( x_(i) ) - (i-1)/n into c8.  The largest of
 MTB > # the values in c7 and c8 will be the value of the test statistic (see p. 113
 MTB > # of G&C).

 MTB > cdf c1 c4;
 SUBC> expo 10.
 MTB > let c5 = c2/15
 MTB > set c6
 DATA> 0:14
 DATA> end
 MTB > let c6 = c6/15
 MTB > let c7 = c5 - c4
 MTB > let c8 = c4 - c6
 MTB > print c7 c8
 
  ROW         C7         C8
 
    1  -0.081190   0.147856
    2  -0.161979   0.228645
    3  -0.187374   0.254040
    4  -0.226716   0.293383
    5  -0.203654   0.270320
    6  -0.146155   0.212822
    7  -0.101623   0.168289
    8  -0.109660   0.176326
    9  -0.056991   0.123658
   10  -0.032139   0.098806
   11  -0.007426   0.074093
   12   0.029926   0.036741
   13   0.064565   0.002101
   14   0.092151  -0.025484
   15   0.088037  -0.021370
 
 MTB > desc c7 c8
 
                 N     MEAN   MEDIAN   TRMEAN    STDEV   SEMEAN
 C7             15  -0.0693  -0.0812  -0.0697   0.1062   0.0274
 C8             15   0.1360   0.1479   0.1363   0.1062   0.0274
 
               MIN      MAX       Q1       Q3
 C7        -0.2267   0.0922  -0.1620   0.0299
 C8        -0.0255   0.2934   0.0367   0.2286
 
 MTB > # Looking at the MAX values, it can be seen that the value of the
 MTB > # test statistic is 0.2934.  From Table F on p. 565 of G&C it can
 MTB > # determined that we have
 MTB > #                0.1 < p-value < 0.2.
 MTB > # (Note: These values match the answers given for Problem 4.18 on
 MTB > # p. 612 of G&C.)  Since 15*d_15 is about 4.40, using Birnbaum's
 MTB > # table it can be concluded that
 MTB > #            0.05483 < p-value < 0.19725
 MTB > # (since from the table we have P_0( D_15 < 4/15 ) = 0.80275 and
 MTB > # P_0( D_15 < 5/15 ) = 0.94517, and so it follows that P_0( D_15 >=
 MTB > # 0.2934 ) is some value between P_0( D_15 >= 5/15 ) = 0.05483 and
 MTB > # P_0( D_15 >= 4/15 )= 0.19725).   Upon combining the results, it
 MTB > # can be concluded that
 MTB > #                0.1 < p-value < 0.19725,
 MTB > # which may seem a bit vague, but on the other hand it may be be
 MTB > # entirely sufficient for concluding that there is not strong
 MTB > # evidence against the hypothesized distribution.

 MTB > save 'failtime'
 Saving worksheet in file: failtime.MTW
 

--------------------------------------------------------------------------------


             StatXact Instructions for One-Sample K-S Test


Put the data into a column of the CaseData editor.  Then (from the menus) select
    Nonparametrics > One-Sample Goodness-of-Fit > Kolmogorov ...

When the box for the Kolmogorov-Smirnov test comes up, put the proper variable
into the Response box by clicking the arrow.  Then under Distribution, select
Exponential from the Type menu, and type 10 into the mean box.  Under Compute,
click to select Exact, and then finally click OK.

In the output, the first column under Statistic is for the two-sided test 
(against the general alternative).  It can be seen that the value of the
test statistic is 0.2934 (in agreement with what I got using Minitab), and
the exact p-value is 0.1224 (and the asymptotic p-value is 0.1511).


----------------------------------------------------------------------------------


                    Doing Lilliefors's Test


StatXact won't do Lilliefors's test about an exponential dist'n (although it will do
Lilliefors's test of normality), but StatXact can be helpful.  It can be used to get
the value of the test statistic.

Put the data into a column of the CaseData editor.  Then (from the menus) select
    Basic_Statistics > Descriptive Statistics ...

When the Descriptive Statistics box comes up, put the variable into the Selected
Variables box.  Since the sample mean is among the default selections, there may
be no need to select it (but click to do so if its not preselected).  Then click OK.

Note that the sample mean is 10.73 so that you can enter it when you do the K-S test.

Now select
      Nonparametrics > One-Sample Goodness-of-Fit > Kolmogorov ...

When the box for the Kolmogorov-Smirnov test comes up, put the proper variable
into the Response box by clicking the arrow.  Then under Distribution, select
Exponential from the Type menu, and enter the sample mean into the mean box.  
Then click OK.

The p-value in the output isn't correct for Lillifors's test.  But we can note
that the test statistic value is 0.2694 and compare it to the critical values given
in Table T of G&C (p. 598).  Annoyingly, n=15 isn't covered by the table.  If we use
n=14, we get that the p-value is between 0.05 and 0.10.  If we use n=16, we get that 
it is between 0.01 and 0.05.  To play it safe, you can state
              0.01 < p-value < 0.1,
but this is pretty vague!  (My guess is that the p-value is around 0.05.  This is
partially confirmed by another table I have for this version of Lilliefors's test
(which isn't as accurate, but I'll give you a copy anyway since it does contain
some sample sizes not covered by the table in G&C) which gives the 0.05 critical
value for the n = 15 case as being 0.269, which is the value of the test statistic 
when rounded to the nearest thousandth.)

Note:  I find it rather odd that when one compares the p-value from the K-S test done 
above to the p-value from Lilliefors's test, it is found that we have stronger evidence 
that the data didn't come from *any* exponential distribution than we have that it
didn't come from an exponential distribution having a mean of 10.