horse racing example

MTB > # I'll first use Minitab to do approximate chi-square tests, and then
MTB > # I'll give instructions for doing an exact version of Pearson's test
MTB > # using StatXact.

MTB > # I'll put the observed counts into c1. (Since all of the expected
MTB > # counts equal 18, there is no need to store them in a column.) I'll
MTB > # put the value of Pearson's statistic into k1, the asym. GLR test
MTB > # statistic into k2, and the corresponding p-values into k3 and k4.

MTB > set c1
DATA> 29 19 18 25 17 10 15 11
DATA> end

MTB > let k1 = sum( (c1 - 18)*(c1 - 18)/18 )
MTB > let k2 = 2*sum( c1*loge( c1/18 ) )
MTB > cdf k1 k3;
SUBC> chisquare 7.
MTB > let k3 = 1 - k3
MTB > cdf k2 k4;
SUBC> chisquare 7.
MTB > let k4 = 1 - k4
MTB > name k1 'q' k2 'GLR stat' k3 'Q p-val' k4 'GLRp-val'
MTB > print k1-k4

q 16.3333
GLR stat 16.1381
Q p-val 0.0222052
GLRp-val 0.0238480

MTB > # Since the tests are only approximate, I'll report the p-values using
MTB > # just two significant digits. For Pearson's test we have 0.022, and
MTB > # for the GLR test we have 0.024.

_______________________________________________________________________________
-------------------------------------------------------------------------------

To do an exact version of Pearson's chi-square test using StatXact, I need to
enter the data in a 1 by 8 table, and also give a "score" for each column of
the table corresponding to either the null hypothesis probability or the
expected count under the null hypothesis.

I can create an appropriate 1 by 8 table using

File > New > Table Data

and upon clicking OK, in the Table Settings box that appears I indicate that I want
1 Table,
with 1 Row
and 8 Columns,
and under Scores I click to check the Column box, and then I click OK.

Next I enter the counts (see the data set) 29 through 11 across Row 1. Then I move
down and enter 0.125 into each of the column Score boxes. Once the 8 counts and the
8 scores are all entered, I select the desired test using

Nonparametrics > One-Sample Goodness-of-Fit > Chi-Square

and in the Chi-Square Test box that appears, I click on Probability under Column Scores
and I click on Exact under Compute, and then I click OK.

After about a second or two (or less time if you have a fast computer) the output should appear.
The test statistic value of 16.33 is in agreement with what I got using Minitab. The asymptotic
p-value, which oddly is indicated to be a "2-Sided P-Value," is given to be 0.02224, which is
a bit different from the value of about 0.02221 obtained with Minitab. Using R to obtain an
aymptotic p-value, I get 0.02224, in agreement with StatXact, and so we might conclude that
Minitab's value is a tad inaccurate. The exact p-value is given to be 0.02231.

Upon rounding to two significant digits, the asymptotic and exact p-values are in agreement,
and if we use three significant digits, they differ only slightly (0.0222 vs. 0.0223). So
in this case for which the expected counts are all relatively large, the approximate p-value
is very good.

------------------------------------------------------------------------------------------------

We can also use StatXact to do the K-S test based on a discrete uniform distribution. (Usually
the K-S test is thought of as being used with continuous distributions.) To do the K-S test,
we need to put the data in the case data editor (instead of the table data editor used for the
chi-square test). To do this use
File > New...
and then select Case Data and click okay. Then enter the values 1, 2, 3, ..., 7, and 8 in the
1st column, and the counts 29, 19, 18, ..., 15, and 11 in the 2nd column. Then use
Nonparametrics > One-Sample Goodness-of-Fit > Kolmogorov...
Use the arrows to click Var 1 into the Response box, and Var 2 into the Frequency box. Under
Distribution, select Uniform Discrete for Type, and enter 1 and 8 as the Minimum and Maximum values.
It doesn't seem to be able to do an exact computation of the p-value, so under Compute select
Exact Using Monte Carlo, and then click OK.

In perhaps 30 seconds to a minute you should get some output (even though while you're waiting you
might get the impression that something is wrong). Assuming you had previously set things up so
that 1,000,000 Monte Carlo trials with a seed of 23456 were used, in the 1st column of the output
you should get a Monte Carlo estimate of the exact p-value as being 0.004608. Because an interval
estimate of the exact p-value is (0.004434, 0.004782), it might be best to state that the exact
p-value is estimated to be less than 0.005.

The p-value from the K-S test is smaller than the one from Pearson's chi-square test. This is
because the K-S test focuses on the fact that it would be highly unusual to have more than 63%
of the winners come from positions 1 through 4 if all 8 positions were equally favorable.
(Pearson's test gives a somewhat small p-value because the observed proportions of winners from
the 8 positions are collectively different enough from 1/8 each to be somewhat unlikely if all
8 positions are equally favorable. The K-S test statistic further acknowledges that it'd be
rather unlikely to have all 4 of the outer positions (5 through 8) have less than an average
number of winners (and have so many of the winners coming from the inner 4 positions) if all
8 positions are equally favorable.)

(Note: The asymptotic p-value from the K-S test is not reliable when the hypothesized dist'n
is discrete.)