Some Comments about Chapter 9 of Hollander & Wolfe
In the first part of Ch. 9, we'll find connections with the
material of Ch. 8 (in particular, Kendall's tau, and Mann's test for
trend). For some simple things, we can make use of
StatXact, but for other things (e.g., the multiple regression
covered in the second part of Ch. 9) Minitab may be the most
convenient software to use. (Since the
Minitab developers have connections with the Penn State
researchers who have contributed a lot to rank-based regression,
Minitab includes some rank-based regression, while most other
statistical software does not.)
Like H&W (in Sec. 9.7), I'll also give a
nondetailed brief
description of other types of nonparametric regression. Tree-based
regression (which can be done using CART or S-Plus) is a
form of nonparametric regression, but there isn't enough time to cover
it in STAT 657. (I covered the use CART for nonparametric
classification and regression in
this past summer's STAT 789 course.) Prior to Sec. 9.7, the
material in Ch. 9 is based on the assumption of a particular form for a
linear regression model --- the nonparametric aspect is due to the fact
that one doesn't have to assume a parametric model for the error term
distribution. Tree-based regression, and methods referred to in Sec.
9.7, don't assume that the form of the regression model is known --- they
can quite useful in helping one to uncover the nature of the
relationships between various variables.
Section 9.1
This section deals with a simple situation. We can make use of Mann's
test for trend (see Sec. 8.1, although H&W don't put much emphasis on
Mann's test) to do a hypothesis test about the slope in a simple
regression model.
I'll offer some specific comments
about the text below.
- p. 416, Data
- Note that, for convenience, H&W assume that
x1 <
x2 <
x3 <
xn-1 <
xn.
If the data are presented to you with the
xi
in some other order, one can simply relabel the points. But it is being
assumed that none of the
xi
values are the same. (The situation of ties among the
xi
isn't treated in this section (although a reference to an article is
given on p. 418).)
- p. 416, Assumption A1
- I refer to the
ei
as the error terms. They produce the variation of the
yi
about the regression line. The
ei
values can be due to measurment error and/or natural variation in what
is being observed (i.e., for a given value of x, there can be more
than one possible value of
what is being observed, even if there is no measurement error).
In some settings there can be practically no measurement error, and the
variation is just due to population differences, but it's still common
to refer to the
ei
as the error terms.
- p. 416, Assumption A2
- Note that the
ei
are taken to be iid random variables (H&W are using lower case here to
refer to the random variables as well as their observed values) having a
distribution with median 0 (as opposed to mean 0, as is often assumed in
other descriptions of regression).
- p. 416, Procedure
- The
Di
of (9.3) are the residuals, viewed as random variables.
- p. 417, Large-Sample Approximation
- When the null hypothesis is true,
Di
= ei,
and so
Dj
- Di
= ej
- ei.
Whether or not the error term distribution is symmetric, the
distribution of
ej
- ei
is symmetric about 0 (since it's the difference of two iid random
variables), and so
Dj
- Di
is equally likely to assume a positive value or a negative value.
So (making note of (9.5) on p. 416))
E0(Dj
- Di) = 0,
and so
(making note of (9.4) on p. 416))
E0(C) = 0.
- p. 418, Example 9.1
- The description of the double ratio may be hard to understand at
first. First of all, it should be realized that, even with no seeding,
the rainfall in the target area can differ from the rainfall in the
control area. That is, the T/Q ratio can differ from 1
with no seeding done over the target area. So, to study the effect of
cloud seeding, one could compare the value of the
T/Q ratio for the seeded periods (periods for when the
clouds over the target area are seeded (the clouds over the
control area are never seeded)) to the value of the
T/Q ratio for the unseeded periods. (Note that over the
course of the year, sometimes seeding is done, and sometimes it's not.
The amounts of rainfall are kept track of in both areas for both seeded
periods and unseeded periods, and one double ratio value is computed to
reflect the effect of seeding for each year.) Finally, in the
expression
[T/Q (seeded)]/[T/Q (unseeded)]
on p. 418, it may appear that the (seeded) and (unseeded) designations
go with the Q, contradictory to the book's indication that the
control area is never seeded, while the target area is the one that is
sometimes seeded and sometimes unseeded. Really, the
(seeded) and (unseeded) designations go with the whole
T/Q ratio.
- p. 419, Example 9.1
- Consider the work shown for this example, and note the similarity
with the Sec. 8.1 procedures. (This similarity is noted in
Comment 2 on p. 420.)
C from Sec. 9.1 is the same as K
from Sec. 8.1 if we
replace the
(xi, yi) pairs from Sec. (8.1) with
the (xi, di) pairs of Sec. (9.1),
and note that
xj - xi is positive whenever
j > i.
- p. 419, Example 9.1
- It is noted that the approximate p-value of 0.071 differs a bit
from the exact p-value of about 0.117. But if a continuity correction
(in this case of 1, changing the -6 to a -5) is used, the approximate
p-value sn't so bad --- it's 0.110 --- even though n is only 5.
- p. 419, Comment 1
- The
distribution of
ej
- ei
is symmetric about 0 (since it's the difference of two iid random
variables), and so the median of
ej
- ei
equals 0. Since
Yj
- Yi
is equal to
ej
- ei
plus a constant, the median of the distribution of
Yj
- Yi
is just that constant.
- p. 420, Comment 2
- Note that when the null hypothesis value of the slope is equal to
0, the test about the slope reduces to Kendall's test (or Mann's test
for trend if the xi are considered to be nonrandom).
This being the case, one could get a small p-value due to a strong
monotone relationship that isn't a linear relationship.
So, as is typically the case with regression, one has to examine the
data and confirm that a linear model makes sense. If one examines the
squirrel monkey data from Table 9.3 on p. 421, while it seems clear that
there is a strong relationship between the two variables, it's not clear
that it's a linear relationship. (Note: I plan to give you a
scatter plot of the data, so you may not want to bother to produce such a
plot.) Where Problem 4 on pp. 420-421
instructs one to "test for the presence of a linear relationship between
these two measurements" it should be kept in mind that the test of Sec.
9.1 can only be used for this purpose if (9.1) on p. 416 is assumed to
hold. I think that it may be better to treat the data as being
observations from a bivariate distribution, and use tests from Ch. 8 to
reject the null hypothesis of independence and suggest that there is a
positive association between the two variables. Then if appropirate
graphics (e.g., a residual plot) suggest that a linear relationship may
be appropriate, one may decide to assume that the model given by (9.1)
holds.
- p. 420, Comment 3
- H&W indicates that Mann's test for trend can be
viewed as a special case of the test of Sec. 9.1 about the
slope, provided that the xi represent the time order
and the null value of the slope is 0, but I think the connection with
Mann's test need not be considered to be limited in this way, if one
notes that one can obviously use something other than time order
with Mann's test (since as long as one orders according to the x
values, it doesn't matter whether x represents time, distance, or
something else), and one notes that the
Di
can play the role of the
Yi
in Mann's test, and that the
Di
are iid if the true slope is equal to the null hypothesis value whether
the null value is 0 or something else.
Section 9.2
This is a very short section.
I'll offer some specific comments
about the text below.
- p. 421, Procedure
- Note that N is equal to n choose 2.
- pp. 421-422, Procedure
- It's disappointing that H&W doesn't point out a connection between
the slope estimate of Sec. 9.2 and the test about the slope presented
in Sec. 9.1. Basically, the point estimate of the slope is a value
which is as compatible with the data as possible, when the test of Sec.
9.1 is used to judge compatibility. Specifically, if we use the point
estimate as the null hypthesis value of the slope, and do a two-sided
test, the p-value is 1. (Because of this, the point estimate of Sec. 9.2 has
similarities to the Hodges-Lehmann estimates of previous chapters.)
It can be determined that this is true by examination of the equation 3
lines from the bottom on p. 419; noting that the number of positive
dj
- di
will be equal to the number of negative
dj
- di,
resulting in a value of 0 for the test statistic.
(I'll go over this in class.)
- p. 422, Comment 4
- It is noted that Dietz claims that there are nice apsects of
the the Theil estimator. Rand Wilcox also reports good things about the
estimator, which is covered in his 2001 book Fundamentals of Modern
Statistical Methods: Substantially Improving Power and Accuracy.
- p. 422, Comment 5
- While it is well known that the least squares estimator is
sensitive to outliers, it is less well known that some robust
alternatives can also produce a bad estimate given that an extreme
outlier occurs at a position where it has great influence. The Theil
estimator, being the median of the pairwise slope estimates, is very
resistant to outliers (it has a high breakdown point). One
really wild value, no matter where it occurs, cannot ruin the Theil
estimate.
- p. 422, Comment 6
- It takes a bit of work to show that the least squares estimator is
a weighted average of the Sijs.
Section 9.3
This is another short section --- perhaps a bit too short, since it is
unfortunate that the book doesn't provide much insight into why the
confidence interval presented has the correct coverage probability.
It should be noted that sometimes the
confidence interval produced by the method of this section is shorter
than the one based on least squares, and in other instances, the
opposite is true.
I'll offer some specific comments
about the text below.
- p. 424, Procedure
- Note that in order for M and Q to be integers, the
coverage probability has to be chosen so that the k critical
value is an even integer.
- p. 424, Large-Sample Approximation
- (error in book) Since a
confidence interval procedure that produces intervals which are wider
than they need to be is conservative, and a
confidence interval procedure that produces intervals which are narrower
than they need to be is anticonservative, I think that one needs to use
the
smallest
integer that is
greater
than or equal to the right-hand side of (9.25) instead of "the
largest
integer that is
less
than or equal to the right-hand side of (9.25)." (The same mistake is
made on the 2nd line of p. 426.)
- p. 425, Comment 9
- (error in book) H&W states that
the lower and upper confidence bounds are given by (9.27) and (9.29),
but these expressions are actually one-sided confidence intervals. The
confidence bounds are just the endpoints of the one-sided confidence
intervals.
Section 9.4
Yet another short section.
I'll offer some specific comments
about the text below.
- p. 426, Procedure
- Note that the estimator is based on a really simple idea: once the
slope estimate is determined, each of the (x, y) pairs can
be used to produce an estimate of the intercept using (9.32), and
"overall" estimate is just the median of n estimates obtained
from the n points.
- p. 427, Procedure
- On the line after (9.36), I don't like that the median of the
distribution of Y given a particular value for x is
referred to as "the typical value" since a distribution median need not
be a value which is close to values which are highly likely to occur.
Section 9.5
I may not spend much time on this section in class. Since I don't plan
to assign any HW exercises based on this section, I won't take the time
to carefully go through the procedure step by step. If you want to use
the test at some point in the future, I think studying Example 9.5
on pp. 430-433
should help you perform the procedure correctly.
I'll offer some specific comments
about the text below.
- p. 429, Procedure
- The pooled estimator given by (9.40) is a weighted sum of the
k slope estimators associated with the k samples.
To see that this is the case, note that in the slope estimator for a
sample (an estimator having the form of the one given in Comment
5 on p. 422), the sample mean of the Yi can be omitted,
since it can be pulled out in front of a sum, with the sum having
the value 0.
- p. 430, Procedure
- Note that (9.42) gives residuals, and that the residuals are
ranked from 1 to ni in each sample (see the top half
of p. 433 for an example).
Section 9.6
My guess is that the most convenient way for most of you to do the
HW exercise related to this section will be to use Minitab.
(Note: The student version of Minitab that I have installed
doesn't do the rank regression with the rreg command, but it's on
the mainframe version that we all have access to. If you've never
gotten your account established on the mason/osf1 system,
you might want to do that rather soon. I'll make the HW exercise
related to this section such that you don't have to print out anything
--- you can just copy the answers from the screen. That way, you don't
have to do a lot on the computer that you may not be real familiar
with. Some information about getting onto the mainframe can be found on
this web page that I have on my STAT 554 web site.)
Although it isn't clear that the main method covered by this section
is better than robust regression methods like those based on
M-estimators, I think that everyone ought to have a way to do linear
regression in addition to ordinary least squares (OLS), and so if you don't
know
how to do any other alternative method, then the material in this
section may be quite useful to you in the future. (If nothing else, it
can serve as a way to check the reasonableness of an OLS analysis.)
Still, it should be noted that rank
regression is seldom used.
Although H&W emphasizes hypothesis testing, it should also be pointed
out that the estimation procedure used to obtain the coefficients in the
fitted model is superior to OLS estimation is a lot of cases (e.g.,
many cases for which the error term distribution has heavy tails).
Since H&W is somewhat skimpy on the details in places, I'll point out
that Ch. 6 of Alternative Methods of Regression by Birkes and
Dodge (Wiley, 1993) provides some additional details (but still doesn't
explain everything fully --- which may be just fine for most of us ...
since life is too short to worry about all of the grubby details of
every statistical procedure that we want to use).
I'll offer some specific comments
about the text below.
- p. 439, Assumptions
- Note that this section deals with the usual multiple linear
regression model. Rather than the usual assumption of a normal
error term distribution, which allows one to form confidence intervals
and predictions intervals, and do tests of hypotheses using OLS
regression fits, here that assumption is relaxed, and it is only assumed
that the error term distribution is symmetric, with a mean/median of 0.
(H&W stipulates that the median is 0, but since symmetry is assumed,
unless the error term distribution is such that the mean is not defined,
then the mean is also 0.)
- p. 439, Hypothesis
- (error in book)
On the two lines below (9.53), H&W has
"do not play significant roles"
whereas it really should be
do not play any role.
- p. 440, Procedure
- An inspection of (9.54) reveals the main difference in rank
regression and OLS regression. If the difference in the brackets was
the same as the difference in the parentheses, then minimizing (9.54)
would lead to the OLS estimates. By incorporating the ranks, the large
differences are given less influence than they are in OLS regression,
and so rank regression is less sensitive to gross outliers (and
generally performs better for heavy-tailed error term distributions).
- p. 440, Procedure
- In class, I'll explain the similarities of the testing procedure
described here with the corresponding normal theory testing procedure.
(Both use a test statistic which has a difference in a quality of fit
measure (between the full and reduced models) in the numerator, and a
measure of scale in the numerator (with the measure of scale being the
estimated error term distribution variance in the normal theory F
statistic, and being some other measure of spread related to the error
term distribution in the rank regression statistic).)
- pp. 441-446, Example 9.6
- You can do the HW exercise related to this section using
Minitab by closely
following the steps used in this example.
- p. 443, Example 9.6
- (error in book)
The 3rd row of M1 should be
0 0 1 0
instead of
0 0 0 0.
- p. 447, Comment 23
- H&W doesn't explain why the parameter tau (not the
same as Kendall's tau) comes into play when testing hypotheses
about about the coefficients. The explanation isn't a simple one, and
so it would take a lot of time in class to develop it. Being that the
end of the semester is so near, we'll have to skip the explanation.
(Really, a different type of nonparametric statistics
course would be needed to
cover things like this, and the development asymptotic results that
H&W present with little explanation. This semester's version on STAT
657 has focused on covering a large number of nonparametric methods for
a wide variety of situation, and having students use the methods to
complete the HW exercies.)
- p. 448, Problem 33
- (error in book)
On the 2nd line from the bottom of the page,
independent
should be
dependent.
Section 9.7
An important thing to make note of is that on p. 453 H&W indicates that
the phrase nonparametric regression typically refers to regression
done using method such as those introduced in this section, and not the
type of regression covered in the first six sections of Ch. 9. (To
avoid confusion, I tend to use the phrase rank regression
to refer to the method of Sec. 9.6.)
During class, I plan to present a bit more information about
kernel
regression smoothers,
local
regression smoothers,
and spline
regression smoothers than is included in H&W, and hope to also give
brief descriptions of
CART and MARS, but there isn't time to present too much
material on any of these methods. (Some of them are rather complex, and
I would need a least a whole lecture period to adequately explain them.)
I'll offer some specific comments
about the text below.
- p. 454 (relates to 2nd paragraph)
- Although H&W focuses on regression based on a single predictor, the
methods of this section can be extended to more than one predictor
variable. A whole semester course would be needed to do justice to
nonparametric multiple regression, and such a course would require a
solid foundation in traditional multiple regression, and perhaps some prior
knowledge of some common computational statistics techniques. But the
difficult course would take students to the frontiers of modern
statistics. Methods which are considered by some to be methods of
data mining, knowledge discovery, and machine learning, are little
different (or no different, in many cases) than established statistical
methods used for this type of
nonparametric regression, nonparametric classification, and clustering.
A good book that covers a lot of such methods, and is written by three
leading Stanford statisticians, is The Elements of Statistical
Learning: Data Mining, Inference, and Prediction by Hastie,
Tibshirani, and Friedman.
- p. 454, Assumptions
- In the 3rd paragraph of the subsection, H&W refers to the
mu term in (9.64) as the median, but for some methods, the
resulting estimated function may be closer to the mean than the median.
In the
1st paragraph of the subsection, H&W indicates that the
ei are iid random variables, but in a lot of situations the
methods are applied when this is not thought to be the case, since often
the variance (and even the general shape) of the error term distribution is
thought to be nonconstant (i.e., depend on x).
- p. 456, General Discussion
- Note that Minitab and S-Plus include some of the more
modern regression methods referred to in this section. I guess the fact
that SAS is not referred to is an indication that it doesn't
include any of the methods referred to in this section. (If this is
indeed the case, then, as far as I know, SAS doesn't do rank
regression, robust regression, or nonparametric regression. If anyone
knows differently, please let me know.)
- p. 456, General Discussion
- H&W indicates that the variance-bias trade-off issue comes up when
choosing a particular (so really, choosing a general method, and then
also deciding how to "tune" it) nonparametric regression method. But
the variance versus bias issue is also present with typical applications
of OLS regression. When one decides to omit nonsignificant variables in
the variable selection process, often the hope is that variance can be
decreased without adding too much bias. That is, if a variable has
minor (but perhaps some) influence on the value of the response,
then it can often be dropped from the model with little consequence.
But it should be kept in mind that unless the model statement is exactly
correct (not just which variables should be included, but also how they
should be included (e.g., maybe quadratic, cubic, and interaction terms
be added, or the variable needs to be transformed), there will typically
be bias associated with the estimates.
(Another reason for omitting variables from a model is to develop a
simpler model. Taking the view that most models are approximate anyway,
there is some extra beauty in a simple model, as opposed to a more
complex one, particularly when the simple model is almost as accurate as
the more complex one.)
Section 9.8
Note that 3 lines from the bottom on p. 456, one can see that in the
case of an evenly spaced design, the
AREs of the Theil estimator with respect to the least squares estimator
are the same as those for the
AREs of the signed-rank test with respect to the t test.
So for large enough sample sizes, the Theil estimator is better than 95%
efficient if the error term distribution is normal, and for heavy-tailed
error term distributions, the Theil estimator can be superior.