StatXact Hints
Below are comments about
In some cases if you try to obtain an exact p-value with
StatXact, you will get a message indicating that the sample
size(s) is/are too large for an exact computation. In such cases,
rather than rely on an asymptotic p-value (the value which results from
using a large sample approximation (like a normal or chi-square
approximation)), I think it's good to use the Monte Carlo option
of StatXact.
Before using StatXact's Monte Carlo option, I think it's a
good idea to change the "settings" to things other than the default
values.
(Plus, to make it easier for me to grade the
homework, I want everyone to use the same settings that I do, so that we
all get the same answer --- if the default settings are used, it would
be possible to students to obtain different answers when using the
Monte Carlo option.)
To change the settings,
- pull down
the Options menu on the main bar,
- then
choose
Monte Carlo.
- Change the Monte Carlo Sample Size from
10,000
to
100,000.
- click on
Fixed
for Random Number Seed (which deactivtes the use of the
Clock),
and go with the default Fixed value of
23456.
- Next, change Frequency of Intermediate Display to
10,000.
- Finally,
uncheck Use Importance Sampling where available,
- check to Save Monte Carlo Parameters Permanently,
- and
click OK.
(Note: In general, it may be a good idea to use
importance sampling, but when I explain in class how the Monte
Carlo option works, I'll be referring to the simple version which
does not use importance sampling, and so let's just use the simple
version for the computation of p-value estimates for the homework.)
I presented this test in class. A description of a very similar
test can be found on the
bottom half of p. 88 of Miller's Beyond ANOVA: Basics of Applied
Statistics.
Miller's version gives equal weight to each sample in the test statistic
through the use of the average rank for each sample, whereas the
version that we can easily do using StatXact gives equal weight
to each observation through the use of rank sums for each sample. When
the sample sizes are equal, both versions of the test are equivalent,
but still the null mean and variance from StatXact will differ
from what is given in Miller.
To do this test, enter your data like you would for the J-T test (i.e.,
on the CaseData spreadsheet, in one column put the group indicators, and put the data values into
another column). The group inicators should go from 1 to k in
an order corresponding to the monotone alternative.
Then, convert the CaseData to TableData by having the CaseData
spreadsheet to the foreground, clicking on
the CaseData menu, selecting Convert to TableData, and
then putting the group indicator variable as the rows, the observed
responses as the columns, but also indicating that you want both
Row and Column
Scores, selecting the columns scores to be of the Wilcoxon
(Mid-Rank) variety. Finally, click OK.
Then go to the K Independent Samples portion of StatXact's
Statistics menu
and select
Linear-by-linear Association. Click in the proper variables for
the Population and Response,
then click on Exact, and finally
OK.
If you use the data in Table 6.6 on p. 205 of H&W, you should get a
p-value of about 0.0198 (which is smaller than the value of about 0.0210
that results from the J-T test). If instead of using 1, 2, and 3 for the group
indicators, you use 1, 2, and 4, then the p-value is about 0.0173. The
"weights" of 1, 2, and 4 for the groups rewards you with a smaller
p-value because the last group is the one that's the most different. Using
"weights" of 1, 4, and 5 results in a larger p-value (about 0.0322). By
playing around with the weights you can get a variety of p-values, but
the most commonly used version of the test would just use weights of 1,
2, ..., k. The easiest way to change the weights is to change
the row scores on the table data layout.
If one runs a
Linear-by-linear Association test using the "raw" data
(before converting to the table with the (mid)rank scores), then it's
like a permutation version of the test.
If you use the data in Table 6.6 on p. 205 of H&W, you should get a
p-value of about 0.0169.
- Go to CaseData, and enter all of the observation values into one
column/variable, and in an adjacent column/variable, enter a 1 beside
each observation of the control sample, and
enter a 2 beside
each observation belonging to any of the treatment samples (whether it
be sample 2, 3, ..., or k).
- Then use these two columns/variables to do a W-M-W test.
Although StatXact doesn't include the S-D-C-F procedure on its
menus, it can be used to do most of the grubby computations for you.
You can use StatXact to do k choose 2 W-M-W tests, and the
Standardized values reported near the top of the W-M-W output are
the z-scores that correspond to the expression in the brackets of
(6.62) on p. 241 of H&W. If you multiply the
Standardized values by the square root of 2, then you'll have the
W*ij values needed for the S-D-C-F
procedure (and you can complete the procedure by making use of the
tables in H&W).
The test is on the K Related Samples portion of StatXact's
Statistics menu. To perform the test, put the observations for
each of the k treatments in a separate variable/column of the
CaseData spreadsheet. (The observations must be put into the
columns in the proper order, with the rows of the spreadsheet
corresponding to the blocks.) Then select Page's test from the
Statistics menu, and click each of the k variables into
the Populations/Treatments box, in the proper order. (They should
be listed down the box in the order corresponding to the monotone
alternative.) Then click on Exact, and finally
OK. (If the number of blocks is too large, then StatXact
may not be able to do an exact test, in which case you should choose the
Monte Carlo option, following the guidelines given above.)
As usual, StatXact does not let you specify the direction for the
one-sided test that it does. It will either report a p-value which
corresponds to the treatment effect monotonically increasing with the
order in which they are listed, or
a p-value which
corresponds to the treatment effect monotonically decreasing with the
order in which they are listed; whichever one is smaller.
So you have to take the responsibility of checking to make sure that the
p-value given for the one-sided test is the one that you want. (As
usual, the p-value for the other one-sided test can be obtained
from the StatXact output with just a little bit of work.)
Go to CaseData, and enter the x values into one
column/variable, and enter
the y values
in an adjacent column/variable, making sure that each bivariate pair
occupies the same row.
Then go to the Ordinal Response portion of StatXact's
Statistics menu (the INFERENCE FOR MEASURES OF ASSOCIATION PART)
and select
Kendall's Tau & Somer's D. Next, click in the two variables,
click on Exact, and finally click
OK.
The point estimate given is fine, and the exact p-value is fine, but I
have a HUGE problem with the asymptotic p-value. (I spotted this
apparent mistake 3 years ago when using version 4 of
StatXact, but unlike when I reported some mistakes in version
3, I never got around to reporting the mistakes (there were
others as well) that I found in version 4. I had hoped that when
version 5 came out, others had discovered the mistakes, but
apparently if they did, they didn't report them either. (One would hope
that the Cytel people would have tested the software better.)
So that I can more easily focus the attention of the Cytel people on the
mistakes, I'm going to put them
here, on a separate web page.