Chapter 3:
32 -- 1st line -- should read, "The equation for the family of normal
curves" to be consistent with Wilcox later [pp 118, 2 places] for
example] decisions on whether the family is of equations or of curves.
32 -- 12th line from bottom (and p 42, 7th line fr bottom, and p55, 17 fr
bott, and other places) -- the practice of using "then" to start
the second clause of a sentence that has begun with "if" is (from an
English grammar class pt of view) I believe incorrect syntax. Nor (from
the examples I am familiar with) can it be written off as an attempt at
consistency with "prevalent" programming language structure since
while
SAS uses the "if.... then" structure, neither C++ (just "if..
else if...
else..." etc., no "then" involved), nor S+ (if and if else)
does.
Also NB that Wilcox does not always use "then" after "if":
cf, p 65,
7th and 8th lines fr top).
[DR G cites New Am Heritage Dict to contrary]
36 -- Eqn at top this pg should 2 - 8 and 4 - 8, not 2 - 7 and 4 - 7 and
that some of the other numbers are wrong because of this 7 vs 8 mix up;
[OK]
37 -- 5th and 6th lines down -- and 37) box-plot method of defining outliers -- Seems any across-the board rule re outlier characterization is inappropriate. Would think what constitutes an "outlier", because the term seems to carry the implication that any such observation is a candidate for down-weighting (e.g., via GLS, in the sense that outliers in a series of Y observations for a given X value will are down weighted when the mean-Y they are associated with is down-weighted) or outright exclusion from consideration (e.g., via trimming means), should depend on the nature of the data being looked at.
There are many situations, that is, where what
might by any formula such as those ref'd above (2 std devs, or "outer
quartile + 1.5 * IQR") might be called an "outlier" was
nevertheless of central import and far from being up for trimming or
de-emphasis, might be at the heart of a certain calculation or judgment.
Flood
plane boundaries, it is my understanding are based on "100 yr high level
" estimates -- and affect real estate values significantly. Admiral Bull
Halsey, despite his theretofore brilliant record had his career cut short (and
was almost court martialed) because in 1945 of having let the (I think it was )
Third Fleet run afoul (with significant losses of men and ships) of two major
typhoons (one off Leyte, the other off Okinawa) in the
space of about 6 months -- i.e., there, the fact that being in the path of two
such storms in 6 months might have been an "outlier-ish" event was in
no way viewed as by the Navy grounds for exoneration from blame.
And Long Term Cap Mgmt from one account I have read owed its demise most
directly (although with other contributing factors) to the fact that the main
interest rate spread model put together by its chief model builder ruled out
the possibility of any event which could be more than 6 std devs away from the
mean.
38 -- first 2 lines -- not clear why having "extreme" values in
far less than 25% of observations would not a) alter position of 25th and 75th
quartiles and thus b) alter the definition of "outlier" since this
def depends on the value of the IQR, and hence c) cause "masking".
38-44 -- Although it seems clear CLT deals only
with size of "n", Wilcox repeatedly implies the number of samples
(not clear whether "in addition to" or "instead of" their
common individual size, "n") is what brings the result curve close to
normal, Cf, p 51: "... if we could repeat an experiment billions of times,
we would get fairly good agreement between the plot of weighted means and the
normal curve."
45 -- 9th line fr bott -- "... convergence to normality is quicker when using medians." – Thus, "quicker" as used here actually means "with fewer observations".
Chapter 4:
53 -- 8th and 7th from bott -- Stmt re the mean being the
"optimal" estimator "for any probability curve we might
consider" implies that normalcy is a more prevalent condition than
nonnormalcy, an assumption unproven/dubious/unlikely-
to-be-true.
56 -- 12th fr top -- Fig 4.2 line more like y = x -10 than y = x + 10.
58 -- 13 fr bott -- "frequentist approach" undefined/unexplained.
Per Everitt’s Stat Dict (cite available upon request) , 132-33,
"Frequentist inference -- an approach to statistics based on a frequency view of probability in which it is assumed that it is possible to consider an infinite sequence of independent repetitions of the same statistical experiment.
Significance tests, hypothesis tests and likelihood are the main tools associated with this form of inference….”
“Significance test -- a statistical procedure that when applied to a set of observations results in a p-value relative to some hypothesis. Examples include Student's t-test, z-test, and Wilcoxon's signed rank test." 305 [i.e., tests of significance, i.e., "TOS"s; comparison of a computed test stat [or, p-val] e to a table test stat like t or z or w.]
“Hypothesis testing -- A general term for the procedure of assessing whether sample data is consistent or otherwise with statements made about the population." 159 [i.e., confidence interval testing]
Chapter 5:
72 -- 2d para, line 10 -- shld be larger (area), not higher
74 -- 9 and 8 fr bott -- "Norm curves are bell-shaped, but there are
infinitely many bell-shaped that are not normal." – Would be good to
indicate here (rather than later as he does) what distinguishes.
Chapter 6:
95 -- 13 and 12 fr bott -- "... if we could repeat a study billions of
times... "; still not clear why this concept of extensive repetition keeps
being introduced since per CLT it is the size of the sample not the number of
samples that induces normalcy.
94ff -- generally using "percentile bootstrap" and percentile t
bootstrap" for the two methods is a poor choice of labels, not least since
both, not just one, involve the T distribution, not to mention the fact that
they are used inconsistently [Ex: bott 100:"... the percentile t bootstrap
[versus] assuming normality {instead of "[versus] the percentile
bootstrap}".; and top 103: "... discrepancy between the bootstrap
[should be percentile t bootstrap] what we get assuming normality [should be percentile
bootstrap]"; opt 115: "the percentile t bootstrap beats our reliance
on the central limit theorem... " [not clear if CLT is being used as coda
for Student's T or for "percentile [w/o t] bootstrap"; 115 9th fr
top: refers (apparently) to percentile t bootstrap as "modified
bootstrap".
The three methods discussed in 6 to assess the accuracy of hypotheses about
means -- which itself is simply a means to the end of assessing the sameness or
not of populations -- are
a) the Student approach to calculation of the t statistic;
b)
the Bootstrap approach to calculation of the t statistic, with normalcy of the source
population
assumed; and
c)
the Bootstrap approach to calculation of the t statistic, with normalcy of the source
population
not assumed.
Reasonable "abbreviative" labels for the three approaches could thus
be
a) Student's t;
b) Bootstrap norm-assump t; and
c)
Bootstrap
no-norm-assump t.
98 -- middle of page -- In noting the suggestion that increasing the number
of observations increases the likelihood that bootstrapping will yield an
accurate result, one should also keep in mind that the larger the sample size
the more likely CLT is, as repeated resamplings are done from the one original
sample, to suggest that the underlying population is normal even when it isn't.
102 -- Discussion of why lower end value should be used to define upper end
cut-off not clear.
104 -- "... when using the percentile t bootstrap, the discrepancy
between the actual and nominal Type I error probabilities goes to zero at the
rate 1/sqrt(n) -- it goes to zero more quickly more quickly than when using the
Student's T." -- It would have been helpful to have had the rate for
Student's T specified as well.
107 -- 15-18 from top -- "... Type I errors are never made when there
is heteroskedacity...because... a zero slope is virtually impossible; " Would
have been better to have explained that basis of this belief -- whose motivation is otherwise not clear -- i.e., why the
impossibility of coincidence of heteroskedacity and zero slope --
is view that there is unlikely to be a treatment that affects the
variance but does not affect the slope as well..
110-13 -- In all his extended discussion of Pearson and why zero correlation
does not mean independence, Wilcox never once mentions the simple, summary rule
that "r" and rho are measures of linear association (aka, dependence)
only.
112 -- 14 from top -- dependence does not have to mean variance of Y changes
with value of X, as in quadratic shape for example, var Y can stay tight (even
same for all X), but there still can be strict X.vs.Y. dependence.
113-4 -- Fig 6.6 suggestive not merely of "perhaps a" non-linear
rel but quite possibly and specifically of a quadratic one, s. th. that Wilcox
doesn't mention.
115*** -- 10th and 15th from top -- "Section 6.5" and "Section 6.6" do not exist.
Chapter 7:
117 -- 1 and 2 -- No indication of relationship between mixing and contamination; i.e., that a mixed normal is one but not the only kind of contaminated.
117 last 2 lines, 118 top 2 lines -- This sentence a good example of lack of
clarity that can result from failure to use parallel structure.
118 -- line 8 -- After phrase "... 90 percent chance that..." the
words "it will appear that" should be inserted. Ditto, line 9 after
phrase "10 percent chance that".
118 -- 5th and 4th fr bott -- "little diff tween normal and mixed
normal" is awkward; better is "between true normal and mixed
normal".
121 -- 1st line -- after words "the means" should be inserted the
words "if in fact they are different"; cf p 69 where Wilcox does in
fact define "power" this way (as he should).
121 -- 4 fr bott -- 10% change is hardly "arbitrarily small".
121 -- last 2 at bott -- the ref to power being inversely related to population
variance is in Chapter 5 (at p 71), not Chapter 4.
122 -- He uses the general phrases "departures from normality" (3
fr top) and "normality assumption is violated" (3d line of caption of
Fig 7.3) when what he really is referring to is heavy-tailedness, not
light-tailedness or any other deviation from normality direction such as skew
of any sort.
123 -- 8 fr bott -- refers to normal as light tailed; per Dr. Gentle comment
2 wks ago that normal should be the standard of light and heavy, Wilcox' ref
raises issue of what distribution it is that HE regards as the standard.
123 -- last sentence -- syntax completely wrong.
126 -- lines 8-11 fr top -- Fig 7.7 in no way "indicates" that
"the prob of an obs being within one std dev of the mean is .999...";
similarly, the caption of Fig 7.7, "as indicated here" is quite
wrong.
127 -- 7 fr top -- "effect size" undefined.
[Everitt says "effect" "generally [refers to]... the change in a response variable produced by a change in one or more explanatory or factor variables".
[Vogt’s Stat Dict (cite avail upon request), at 94, says "effect size (ES)" is
"a) any of several measures of association or of the strength of a relation, such as Pearson's r or eta. ES often is thought of as a measure of practical significance.
"b) A statistic, often abbreviated D or delta, indicating the difference in outcome for [i] the average subject who received a treatment from [ii] the average subject who did not (or who received a different level of the treatment). This statistic is often used in meta-analysis. It is calculated by taking the difference between the control and experimental groups and dividing that by the standard deviation of the control group's scores -- or by the standard deviation of the scores of both groups combined.
"c) In statistical power analysis, ES is the degree to which the null hypothesis is false."]
Thus what VOGT calls effect size, version (b), is
what Wilcox calls the standardized difference, a measure of effect size.
127 -- 8 fr top -- Because effect size undefined, significance/relevance of "standardized difference" unclear. Like "common measure of the bletz is the cravis."
129 -- 2d and 3d fr top -- words "of a probability curve" repetitive and should be deleted.
130 -- 13 fr top -- He is now calling
a "regression outlier" what up to this page he has simply called an
outlier.
130 – Saying the slope is not 0 just because the
non-0 value was generated by an "outlier" is a) an arbitrary
denigration of the value of outliers, which in fact deserve to be so treated
only if they can be shown to arise fr measure errors or certifiably unique
circumstances; and b) is
inconsistent with Wilcox' own remarks about outliers on p 113.
134 -- 2d "key point" -- He is using "probability curve" to mean at various places in the text both
a) density curve/density function/distribution curve/distribution function, and
b) population/distribution.
I.e., he is using the one term “prob curve” to mean
two diff things. See on same issue, comment re p 139 below.
Chapter 8:
139 -- "... arbitrarily small departures from normality can have devastating consequences on [sic] the population mean, particularly the population variance.... nonnormality can result in very low power..."
1) consequences for or consequences re, never consequences on;
2) why is an effect on the variance a particular sort of effect on the mean?? Syntax akin to "I like apples, particularly carrots"??
3) Nonnormality can result in any kind (more, less, unchanged) of power; only heavy tailed non norm can result in lowered power. Hence this statement misleading.
139 -- as at 134 (and 140, 143, 146, etc.) , he is using "probability curve" apparently as a synonym for density curve, distribution curve, density function, distribution function, population, and distribution, which is OK except that he has used the terms pop and dist heretofore; no explic of reason for change in nomenclature. And at 141, ln 7, he reverts to "distributions", and at 141, ln 21, to "population". The fact that the terms "prob curve" and "distrib" and "pop" are being used to refer to the same thing should be noted.
139 -- 1st full para confusedly laid out; what the phrase "first criterion" at ln 17 refers to is not clear.
144 -- ln 10 -- Comment that "outliers" can cause the sample mean to be "inaccurate" is imprecise (i.e., "inaccurate" in what respect, and judged against what standard ?) and to the extent interpretable, conclusory.
146 -- lns 6 and 7 -- "... unlikely to provide accurate information about the population mean..." but (unless the product of a measurement error) likely to provide accurate information about the mechanism generating the values that in turn generate the mean itself; so whether the outliers should be in or out depends on what one is interested in, typicality or mechanism, or something else.
146 -- 15 and 16 -- "... a normal... curve OR a curve with relatively light tails...": contradicts his statement on p 123 (8 fr bott) that a normal curve IS a curve with light tails.
151 -- Fig 8.4 --
a) Box upper left: is inappropriate illustration of 1/1 slope;
b) Box lower right: hardly any point in offering biweight as example with zero explanation here (or on pp 152 or 211 where it is mentioned again) of what it is/how it is calculated; (Vogt simply says "BW" it is a system that down-weights outliers; Everitt doesn't mention it at all).
152 -- "... we need to limit the influence of extreme values...": as usual no discussion of why, of need to distinguish between virtually-certainly measurement-error or other one-time-aberrational "extremes" vs. reasonably-possible-to-repeat extremes, purpose of analysis, whether interested in typicality or mechanism, etc.
156 -- He is not adding "an" outlier; he is adding 8 outliers, i.e., converting 40% of the entire sample to outlier status;
156-7 -- Re whole discussion here re 20% trim vs. M-estimator, given that in order to have 20% outperform M,
a) seems must have situation where 15-20% of sample are "outliers" but
b) with that high a percent of the sample in a single "far-from-the- category, it is oxymoronic to call the values outliers.
Thus, it would appear that where there is a "true" outlier situation -- say up to 5%, 7%, perhaps a maximum of 10%, of the values at an "extreme" -- , M is clearly the better choice because it will have a smaller standard error in those circumstances. Where there is a larger percentage of values "far" from the mean, classification (controlling for some other variable at work) would be called for. The question thus comes down to, is there any circumstance when the 20% would be preferred over M in the small-percent-far-from-mean case or classification in the large-percent-far-from-mean case?
157 -- idea that it might be OK to ignore 20% + of the data is bizarre unless it was known absolutely to be measurement error or otherwise one-time aberrational.
158 -- " an arbit small change in curve can cause arbit large change in mean val...": is this correct??