Comments on Statistical Concepts and Methods by
Bhattacharyya and Johnson
Below are some comments about the various chapters of the book.
(You can use these links to jump down to comments about
Ch. 1,
Ch. 3,
Ch. 4,
Ch. 5,
Ch. 6,
Ch. 7,
and Ch. 8.
Later in the
semester I'll try to supply comments about other chapters.)
The number of comments do not reflect the importance of the various parts
of the book --- rather I have just added comments that I think may be
helpful to you as you read through the book, and I'll let some parts of
the book stand on their own.
This book matches a lot of what I cover in my course fairly well, but is
somewhat elementary and lacks sufficient information about dealing with
nonnormality and heteroscedasticity. The books by Miller and Wilcox, as
well as my lecture presentations, will supply you with information about
newer and/or more advanced methods that address these concerns. But reading through
Bhattacharyya and Johnson should help you master the basics, and set you
up to better understand the more advanced material.
My advice is to consult the
reading guide to see what parts of
the book correspond to which lectures, and then start at the beginning
and try to read the first part of the book in a more or less linear
manner to insure continuity,
but perhaps skipping material on probability with which you are
sufficiently well acquainted.
This will require a lot of reading during the first few weeks of the
semester, but hopefully the reading will go fast and you'll find most
everything easy to follow. (I won't get into the material of the Miller
book until the last half of the 4th lecture, so you can leave it alone
at the beginning of the semester, and focus on this book. But you may
want to examine the reading guide to determine how you want to pace
yourself through the Wilcox book. It's somewhat like this one in that
you should read a lot of it during the first half of the semester.)
Since Bhattacharyya and Johnson puts all of its information about
nonparametric methods in a single chapter towards the end of the book,
to better prepare for the 5th lecture you may want to deviate from a
linear reading of the book and read some of the material in Ch. 15.
Also, in the second half of the semester, I cover topics in a different
order than the book does, and so you might find yourself skipping about
a bit.
- In my lectures, I mention some of the specific things covered in
Ch. 1, but for other parts of the chapter, I assume you'll tie things
together and get the "big picture" as we go along.
Reading Ch. 1 ought to help you get a better understanding of what
applied statistics is about. (In my first lecture we start to
get into some of the details of a simple situation and I'm a little
skimpy with regard to motivating the relevance of data analysis to
experimenters from various fields.)
- In some fields, a typical M.S. thesis may be concerned with an
investigation as to whether a certain hypothesis appears to be true.
For example, in biology, a student may want to determine if the presence
of a certain type of artifical aquatic plant attracts a particular
species of fish. Experimental data could be collected to determine if
there is evidence that the artifical plants attract the fish. Because
the number of fish observed at a particular location could flucuate due
to many reasons, careful application of statistical methods should be
applied to the observations resulting from a properly designed
experiment. A poorly designed and executed experiment and/or incorrect
use of statistical methods could lead to erroroneous conclusions (or
there could be so much uncertainty reflected by the data that no firm
conclusion could be reached). (Comment: Too often people from other
fields consult with statisticans after all of the data has been
collected, only to then learn that the experimental design that was used
was poorly suited for the hypotheses to be addressed.)
- (p. 5) Descriptive statistics and inferential
statistics are mentioned. Modelling could also be mentioned
--- it's somewhat like descriptive statistics in that the goal is to
provide a mathematical model for a certain phenomenon, but to build the
model, inferential methods are used.
- (Sec. 1.6) The population variability, combined with the fact that
you may just have a smallish sample of observations to work with, leads to
the challenging problem of making accurate inferences about the larger
population based on the smaller smaple.
- (p. 7) Note the distinction between a sampling unit and an
observation. The sampling unit may be a tree, but the
observation may be a wood density measurement for the tree.
- (p. 8) Note that the population is comprised of (potential)
observations, not of sampling units.
- (p. 8) Note that a sample is a collection of observations. So if I
have a set of 50 wood density measurements, I have a single sample of
size 50 (50 observations). We say that the sample size is 50.
In some fields, people seem to use the term sample like
statisticians tend to use the term observation. Some might say
"I have 10 samples" when they have 10 sampling units (say specimens of
dirt) from which observations will be made. In statistics, 10 samples
would typically imply 10 sets of observations made from 10 populations.
- (p. 8) In some cases, it's better not to think of a sample as being
a subset of a finite population. Rather we think of observations as
arising from distributions, and we want to make an inference about some
aspect of the distribution underlying the observations. (For example,
the observations may be measurements of some aspect of air quality made
at a particular location at various times. Since time is measured on a
continuous scale, there is not a finite population of measurements that
could be made, but rather an infinite number, and so it's hard to think
of the measurements as being a subset of a finite population.)
- (p. 9) The boxed information is important, as is the sentence that
immediately precedes Sec. 1.7
- This chapter deals with elementary probability. Of particular
importance are independent events (p. 90), and the warning given on p.
92. In STAT 554 we won't deal with Bayes' theorem (pp. 93-95).
- This chapter deals with elementary probability. Things that will
be particularly useful in STAT 554 are:
- (p. 120) Properties of Expectation,
- (p. 123) Properties of Variance,
- (p. 133) the items in the three boxes.
- We'll also deal with material on pp. 129-132, but for the most part
this will come towards the end of the semester.
- (pp. 141-150) Hopefully, you're already rather familiar with the
Bernoulli and binomial distributions.
- (pp. 144-145) In Example 5.2, it isn't so important that the
patients aren't physically identical. If we regard each patient in the
study as being randomly selected, then we can sort of think that there is
a constant probability of getting an S (where the sort of is due
to the fact that it may be better to model the situation with a
hypergeometric dist'n (see pp. 152-154 of the book) since if we sample
without replacement from the population of all people who have the
disease we won't have independent trials). If we sample with
replacement from all people having the disease then we will have
iid Bernoulli trials (even if not all people are identical, since on
each trial there is a constant probability (just the proportion of
curable patients) of getting an outcome of cure (S)).
- (pp. 150-151) You can skip the subsection on Other tables.
- (pp. 152-154) We'll briefly deal with the hypergeometric
distribution during the 4th lecture (and on HW #2).
- (p. 153) The pmf given in the box for the hypergeometric
distribution is nonzero for all values x satisfying
max{ 0, n - (N - D) } <= x <= min{ n, D }. (This gives the
support of X even if n > D or n > N - D.)
- (pp. 154-159) You can skip Sections 5.7 and 5.8. (Although the
geometric and the Poisson distributions are important in general, we
won't do anything with them in this class.)
- (p. 166, 1st 2 lines) You'll see in the pages to come that H
will take the role of the research hypothesis (alternate hypothesis),
H1,
and H´ will take the role of the null hypothesis,
H0.
- (p. 166) In the Typical conclusion I don't like the phrase
"highly unlikely that the statistical hypothesis is true." To me, the
word likely suggests probability, but I don't like to
think of the hypothesis being true or not true according to a
probability distribution. A
hypothesis either is or is not true --- we just don't know for sure which
one. If the data is rather incompatible with a hypothesis, then it may
suggest that the hypothesis may not be true. (Note: My objection is
with the book's use of the word likely. It's a small matter perhaps,
and some may think I'm being too picky. (I know I use likely in this
manner at times too --- but I don't think I should.))
- (p. 167, near bottom of page, & p. 168) Note the explanation for the word
null. More explanation is given in the 1st full paragraph on p.
168.
- (pp. 167-168) The sentence that begins on p. 167 and continues onto
p. 168 doesn't make a lot of sense to me --- it seems as though they
have some words wrong.
- (p. 169, 4th line) I don't particularly like the phrase "test of
the null hypothesis," although it's not that uncommon. I prefer to say
that I'm testing to determine if the data provides statistically
significant evidence for the alternative hypothesis (except instead of
saying alternative hypothesis, I'd put into words whatever the
particular alternative is for the case at hand (e.g., p > 0.6)).
- (p. 170) Note: I use alpha for the maximum (really the supremum of
the values) probability of a type I error, and don't usually concern
myself with alpha(p). Also, the book uses beta for the
probability of a type II error, while I use it for the power function
(and the book uses gamma for the power function). It's
unfortunate that terminology and notation isn't consistent among
statisticians, and it's particularly unfortunate that the book's usage
isn't the same as mine. I try to go with what's most proper, or most
common if one choice doesn't seem more or less proper than another.
With beta, a lot of undergrauate-level books use it for the
probability of type II error (like this book does), but it's frequently
used for power at the graduate level.
- (p. 171) Although one could use tables to obtain the values in
TABLE 6.1, I encourage you to see if you can get them using software.
- (p. 170) Instead of "power of the test at the alternative,"
some use "power against the alternative." Some students think
the choice against (I say "agaist the alternative" a lot) is odd, and
think it'd be better to say "power for the alternative." I
believe the choice of against seems sensible if one thinks of a plot of
the power function (like Fig. 6.2 on p. 173) in which the power is
plotted using the vertical axis against the parameter value on the
horizontal axis.
- (p. 173) This page is important (but keep in mind that the book
uses beta for the probability of a type II error instead of for
power).
- (p. 174) Really, size would be a better choice than
"level of significance." (As I have in my class notes on p. 31,
there is a distinction between level and size, but often people say
level when size would be a better choice (and sometimes I mess up and
say level when I'd prefer to say size). One reason the distinction is
often overlooked is because if a test statistic has a continuous
distribution (as opposed to a discrete one), one can make the size of
the test match any chosen level (w/o using a randomized test).
- (p. 174) The paragraph right before Sec. 6.5 is important --- one
should choose the rejection region (and thus the size and power
characteristics) taking into account the consequences of type I and type
II errors. As for the last sentence of the paragraph, keep in mind that
the 5 errors pertain to 100 testing situations in which the null
hypothesis is true. The expected number of type I errors could be
less than 5 since the value 0.05 is the maximum prob. of type I error,
and the actual prob. of type I error could be less than 0.05. Also, note that
even if the prob. of a type I error is exactly 0.05 if the null hyp. is
true, it doesn't mean that the expected number of type I errors in 100
tests is 5 unless the null hypothesis is true in every case. If
the alternate hyp. is sometimes true, then the expected
number of type I errors in 100 tests will be less than 5 (since it's
impossible to make a type I error if the alternate hyp. is true).
- (p. 174) The last 3 sentences are important.
- (p. 175) p-value is more commonly used than significance
probability. (Also, the book's notation of P with an
asterisk isn't common (although some do use just P).)
- (p. 177) Note that getting a power value for a two-tailed test is a
bit more work than getting a power value for a one-tailed test (but it's
not too bad --- one just has to add two probabilities).
- (p. 178-179) Steps (a) through (e) give a nice summary of the
general scheme in hypothesis testing. In step (e), note that even in
the case where the null hyp. is rejected and it can be said that there
is strong evidence to support the alternate hypothesis, one should not say
that the null has been proven false --- there is no definite
proof since one cannot absolutely rule out the possibility that a type I
error has been made.
- (p. 180, top half of page) While in some cases theory does lead to
an optimal test, in other cases there is no best test and a good test
must be selected in some way. (In such cases, the generalized
likelihood ratio approach often yields a reasonable test, but sometimes
one cannot go this route due to lack of sufficiently detailed knowledge
of the distribution underlying the data. That is, if we don't have a
firm parametric model, the likelihood ratio approach cannot be applied,
in which case one might use a nonparametric method or else rely on the
robustness of a test derived for a model that may not be true for the
situation at hand.)
- (pp. 180-181) Example 6.2 nicely illustrates hypothesis testing.
- This is another chapter which primarily covers basic probability
material. Of particular importance are:
- (p. 193) the results in the box,
- (p. 195) noting that the book's notation differs from what I use in
my class notes, since to me N(100, 25) indicates a normal
distribution with a variance of 25 (standard deviation of 5), but the
book takes this to mean that the standard deviation is 25 (and the
variance is 625),
- (p. 200) the results in the box,
- (p. 202) the results in the box.
- Here are some more comments.
- (p. 208) I don't think a random sample necessarily has to consist
of independent observations. What is known as a simple random sample is
when a subset of distinct items are selected from a finite poulation,
and in that case we don't have independence.
- (p. 209) The boxed statements are important.
- (pp. 210-220) Sec. 7.5 is important.
- (pp. 220-222) Sec. 7.6 can be skipped --- in class I'll describe a
simple way to check for approximate normality.
- (pp. 223-226) Sec. 7.7 doesn't deal with probability, but rather it
describes a technique that is sometimes used in applied statistics.
I don't put a lot of emphasis on this transformation technique in STAT
554, although it can be useful in some situations.
- (pp. 233-237) It's important to keep in mind that just because I
don't make a comment about a particular page, it doesn't mean that the
page isn't important. My comments are meant to add something extra when
I think some clarification may be in order.
- (p. 237) In my class notes I don't emphasize the term standard
error. But since it's commonly used, I should emphasize it a bit
more. So please make a note of its simple definition at the bottom of
the page.
- (pp. 238-239) To give that an approximate 95.4% error bound for the
sample mean is +/- 2 estimated S.E. is overstating the precision unless
the sample size is rather large. Unless the random variables are
normally distributed, the sample mean won't be exactly normally
distributed, but only approximately so. Also, precision is lost when
the true standard deviation is replaced by an estimate of the standard
deviation.
- (p. 240) Point (b) is good. People aren't consistent in their use
of the a +/- b notation, and so unless more information is
provided, one doesn't know how to interpret something like 53.4 +/-
4.6. Also, it isn't clear what is meant by the phrase "margin of
error." One way to get around the confusion is to report a confidence
interval (and state that you're reporting a confidence interval, being
sure to give the confidence level).
- (p. 246) The 1st 3 sentences are extremely important. They give
the dos and don'ts of how one should state the results when a confidence
interval is determined. Also, I prefer to always write the result as an
interval, e.g. (41.1, 43.3), instead of writing something like
41.1 < mu < 44.3 (because we don't know for sure that mu
is trapped between the two confidence bounds). (Points (a), (b), and
(c) on p. 247 provide a good summary.)
- (p. 248) Note that the book isn't indicating what is meant by
large (with regard to the value of n). How large n
has to be for the approximation to be good depends on (i) how good do we
want it to be, & (ii) the type and degree of nonnormality of the
distribution underlying the data. In some cases n = 50 (or even
less) may result in a great approximate confidence interval, and in
other cases n = 500 may not be large enough.
- (pp. 248-249) I think it would be a good idea to use more
significant digits than the book does in the example. If one wants 2
significant digits to be reported in the final answer, then keep 4 or
more
digits in the calculations, and round to 2 digits only at the final
step. Similarly, if one wants to report 3 digits, then keep 5 or more digits
until the final step. Also, I do think one should avoid reporting a
lot of significant digits in the final answer (that is, rounding at the
final step is good). Not only are figures with a lot of significant
digits somewhat hard to digest with a quick glance, but they reflect
more accuracy than is warranted, since in most cases we are making
assumptions and approximations (and so it's misleading to report a lot
of digits).
- (p. 250) Just because the book designates n < 30 as being
small, one shouldn't assume that n >= 30 is large enough for the
large sample approximate confidence interval given on p. 248 to be
highly accurate in all cases. In some cases (particularly if the
distribution underlying the data is highly skewed), n = 300 may not be
large enough (although such cases are somewhat rare). But when n <
30, we shouldn't think that the distribution underlying the data has
to be exactly normal, since if we did that when could we ever
make use of the interval (since it would be very rare indeed to have
data from exactly a normal dist'n).
- (p. 262) Again, it can be dangerous to take the n >= 30
rule of thumb too seriously --- how large the sample size needs to be
depends very much on the nature of the nonnormality.
- (p. 266) Note carefully the last sentence of the 1st paragraph in
Sec. 8.7 (and the last sentence on p. 269) --- the test and confidence
interval derived under the assumption of normality can
perform poorly (and be misleading) if the assumption of normality is too
badly violated.
- (pp. 266-269) I cover material related to this section in Unit 5 of
my course notes (which will be covered only briefly during lecture close
to the end of the semester --- but I will suggest that you read my short
Unit 5 and work a bonus homework problem which pertains to
variances (which I'll give you towards the end of the semester)).
- (p. 270) Point (b) is very important. But they are wrong to
suggest that serious errors are not to be worried about if n is
at least 15.
- (p. 271) Point (d) is important.