Note: I advise that you first read this web page from top to
bottom without clicking on any of the links. After you've gone
through this page once in a linear fashion, you may find the various
links useful to jump to the pertinent parts of this page for more
information, and to go to other web pages for additional information.
Is STAT 554 the right course for you?
This question, and other questions (e.g., about
prerequisites,
texts,
and how to get a head start)
are addressed below.
Some students may initially find it difficult to decide if STAT 554 is the
proper course for them to take during the coming/current semester.
STAT 554 is a bit unusual in that it is a graduate level course in applied
statistics that
doesn't assume that you've necessarily had a statistics
class before, while at the same time presenting students with material that
goes far beyond the level that is typically covered in undergraduate
statistics classes. Because of this, some students may be
wondering if the course will be too hard or
wondering if the course will be too easy.
Or some students may be
wondering if they should take the course now or wait until
later, or
wondering if they need to take the course in order to pass
the STAT 554 qualifying exam (this pertains to students in the IT Ph.D.
program).
STAT 554 is a graduate level course in applied statistics that:
- doesn't assume that you've necessarily had a statistics
class before;
- does assume that you're comfortable with elementary
probability theory.
Formally, the prerequisite is
STAT 344
or MATH 351. (You can
visit
my STAT 344 web site to learn more about the course.)
Among the most
important things you need to know from elementary probability
in order to follow along well in STAT 554 are:
- random variables and distributions (certainly binomial,
hypergeometric, and normal distributions);
- expectations and variances (for single random variables, and for sums
of random variables);
- independence and correlation;
- the law of large numbers, and the central limit theorem.
Note: In the past, some students who lacked a
formal probability background did very well in this course. For example,
my guess is that somebody with a biology background and who had a two
semester introductory statistics sequence as an undergraduate or had a
good one semester statistics course and has used statistical
methods in lab work can earn an A in the course if they are a good
student who is willing to work extremely hard
(for one thing, by self-learning some the of preprequisite probability
material during the first couple of weeks). Such a person might find
some of
the homework problems rather
difficult, but since the bulk of one's grade depends on how well they
learn the applied material that will be presented during the course of
the semester, earning an A should not be impossible. (Actually, many
students working in biology or enviromental science have earned an A in
this course.
I would guess that anyone who will need to do statistical
analysis in their thesis or dissertation should choose this course over
a less ambitious statistics course, because an easier course may not
adequately prepare one to do their own analysis. In fact, many students
may need to use more advanced methods that we don't have time to cover in
STAT 554.
However, it may be the case that this course provides such students with
a sufficient background for them to learn additional methods on their
own, take a more advanced course, or effectively work with a more
experienced statistician acting as a consultant.)
If you don't have the prerequisite, it comes down to you
making a choice based on what you know about your abilities and how much time
you'll devote to the course. Some without the prerequisite have earned an A, but others have failed.
If someone took one of the
prerequisite courses
and did well, then he/she should feel comfortable in STAT 554, knowing that a
lot of the other students have a similar background. (If one got an A in STAT
344 then that person should be quite ready for STAT 554. If one got a A- or
lower in STAT 344 then it may be wise to spend several weeks reviewing the
most important parts of elementary probability that
pertain to STAT 554.) But students should keep in mind that even if they
have met the prerequisite, that STAT 554 is primarily a course for students
intending to work as statisticians and is not a survey course for nonmajors
--- and so the expectation is that students will put a lot of time into the
course (15 to 20 hours a week would not be an unreasonable amount
of time to spend on
the course).
If you haven't had a probability course before, then please carefully read my
comments
above. Many students not meeting the formal
prerequisite
have earned a grade of A, while many others have dropped out or suffered a
bad grade because they were completely overwhelmed by the material.
400-, 500-, and 600-level one semester statistics courses typically are
one of four different types.
- Similar to a 100- or 200-level statistics class (but with a
higher course number). Such courses are pretty much (but not completely, in all
cases) a waste of time and
do not adequately prepare you to do good statistical analyses
(more info about this here).
- Consisting of a combination of the material covered in both STAT 554 and
STAT 652
(GMU's course which focuses on the theory and methods of
statistical inference for parametric statistical models
--- you can visit my
STAT 652 web site).
While such a
course may provide one with a wonderful overview of statistics, there is
insufficient time in one semester for students in such a course to gain
a working knowledge of applied statistics. (There wouldn't be time to
cover any methods beyond the most common ones, and even for those there
wouldn't be time for the "fine points" to be presented.)
- Similar to what is described immediately above except that it is
also attempted to present elementary probility theory in the first half
of the semester.
- Consisting of a combination of the material covered in STAT 554,
STAT 655
(GMU's design of experiments/analysis of variance course), and
STAT 656
(GMU's regression analysis course). While it is true that STAT
554 doesn't cover some of the advanced methods that some researchers
need to use, it is also true that
a course in applied statistics which attempts to cover too
much rarely covers everything well. Since it may be the case that I'm
already guilty of trying to cover too much in STAT 554, it seems
foolhardy to try to cram twice as much material into the course. With STAT
554 I've chosen to attempt to provide students with an understanding of
what statistics is about, to thoroughly train them to do a proper
analysis in several common settings, and to provide them with an
overview of how to handle more complex situations (stressing that it is
important to have an understanding of the performance and limitations of
each procedure employed and to know how to check to determine whether or
not the assumptions of the procedures are adequately satisfied). Courses
which attempt to cover too much material tend to be too "cook bookish"
and students who take such a course may obtain a dangerously weak
understanding of how to accurately perform the methods and interpret the results.
Most one semester statistics courses focus on methods that were
developed assuming that the data arose from normal distributions. While
some such methods are robust enough to work satisfactorily in some
nonnormal situations, most satistics books contain scant accurate
information about the
robustness of common techniques. Books also do little with regard to
comparing the performance of classical methods with nonparametric and
robust statistical procedures. If nonparametric methods are covered at
all, they tend to be in a chapter by themselves towards the end of the
book and little information is presented pertaining to guidelines for
choosing the best possible procedure in a given situation.
In this course we will investigate a variety of data analysis
situations, and for each one we will typically cover several different
statistical procedures that may be appropriate. I'll discuss strengths
and weaknesses of each procedure, give guidelines for their effective
use, and teach you how to obtain clues from the data that will allow you
to make a good choice. I'll give warnings about when to avoid common approximations and
discuss exact inference procedures to use instead.
The emphasis on alternative (nonparametric and robust) inference
procedures is important since many normal theory methods
can be quite inaccurate
in small sample size situations. Even in large sample size situations
(for which normal theory procedures may be reasonably robust) it is
still the case that an alternative inference procedure may be a better
choice. Some examples are given below.
- In many cases where others may rely on the
robustness of a t test to perform a test about the mean of a
distribution, or they may obtain a very misleading result by improperly
applying a nonparametric procedure, I may choose to perform Johnson's
modified t test. It's not a widely known procedure, but it can often
appreciably outperform more commonly used methods.
(I don't know of a single book that describes Johnson's modified t test, but I teach you about it in this course because I am convinced that
it is a procedure worth knowing about.)
- In situations where some may use a standard one-way ANOVA F
test, I'll often use Tukey's studentized range test, the Kruskal-Wallis
test, the Steel-Dwass test, a rank analog to the Tukey-Kramer test, or
perhaps a procedure based on Welch's statistic and Boole's
inequality. Although the first two of these alternative procedures
aren't that uncommon, the others aren't widely used. But each one of them
could possibly represent the best choice of a procedure for a given
situation, and so it is good to be familiar with all of these methods.
Unless you have already mastered the material in
several good graduate-level courses in applied statistics, my guess is that you can get a lot out of this course. Even though some M.S.
students may enter the program having had several 400- or 500-level courses
in applied statistics, unless you have had a good course based on Miller's book,
I would recommend that you not skip this course. If early on in the semester you find some of the material repeats what you've studied previously,
challenge yourself to develop a deeper understanding of the material, and read
ahead in order to spend more time with the more advanced material. If you buy
all of the recommended books, there should be plently for you to read and learn.
Although the idea of learning about a variety of techniques for each
type of problem and learning how to use the data to diagnose the
situation and select a good procedure hopefully seems sensible to you
(particularly if you know that
some of the normal theory procedures perform horribly in some nonnormal
situations), you may be surprised to learn that very few books contain
adequate information that allows you to easily perform statistical
analyses in this manner.
My lecture notes are based fairly heavily on
Miller's book,
Beyond ANOVA: Basics of Applied Statistics, Reissue edition,
which does
present a variety of methods for each data analysis situation
covered and gives guidelines for their use. A strengh of Miller's book
is that it gives good references that address issues that we won't have
time to cover. However, Miller's book assumes a much stronger background
than what is needed to meet the prerequisites for this course, and so
you can really consider the lecture notes that I've produced to
be the main text for the course. (I'll encourage you to at least start
looking over Miller's book once we get to the 4th week of the
semester, but don't worry if you find it difficult to follow. When the
course is over, you will hopefully think of the book as a valuable
addition to your bookshelf. Also, since I may not always present my
lectures in the same order as the class notes are written, you may find
it much easier to follow in class if you read ahead in the class notes
before each lecture.)
Since Miller's book assumes that the
reader has some knowledge of statistics, it starts at a point too
advanced for what would be appropriate in this course. You may find one
or both of the following books useful.
In addition to using these books to supplement the book by Miller and my
course notes throughout the semester,
doing some reading in these more elementary books may be a good way to better prepare yourself for the beginning
of the course.
- The book
Statistical Concepts and Methods,
by Bhattacharyya and Johnson assumes a
calculus background. Because many students have told me that it helped
them quite a bit, I've decided to require it in addition to the book by
Miller.
- If you want a review of the pertinent elementary
probability material useful for this course, read
- Ch. 3 - Ch. 5 (pp. 60-164
(especially pertinent are pp. 141-153));
- Ch. 7 (pp. 187-232
(especially pertinent are pp. 193--219)).
- Some material
corresponding to the first half of the course can be found in
Ch. 6 & Ch. 8 (pp. 165-181, pp. 220-222, pp. 233-265, & pp.
297-307).
- You can find pertinent material for the second half
of the course using the index and the table of contents.
- The book
Biostatistical Analysis, 4th ed.
by Zar assumes only an algebra background.
- It
has only scant coverage of probability: Ch. 5 (pp. 47-64) and some
material in Ch. 6.
- This book has a broader coverage of applied techniques than
the one by
Bhattacharyya and Johnson.
- Material which corresponds to the first half of the
course can be found on p. 5, pp. 15-26, pp. 31-37, pp. 67-111, and
pp. 163-171.
- You can find pertinent material for the second half
of the course using the index and the table of contents.
Another book which complements the material that I cover in the course is
Fundamentals of Modern Statistical Methods: Substantially Improving Power
and Accuracy by Wilcox.
It is newer than the other books and covers some material that the older books
don't cover. Furthermore, it is written at a more elementary level than the
Miller book (Wilcox's book seems to be aimed at educating social scientists
who may have a rather limited knowledge of statistics about some of the
latest and greatest techniques), and so one can get a fair amount of knowledge
from it without expending a huge effort.
(Note: In some places the Wilcox book is sloppy (perhaps when he tried to
simplify his presentation he gave up too much accuracy), and so readers
should check out my
comments about the Wilcox book.)
Click here for information about what is
required and what is optional, and more advice about which books to buy. Basically
I recommend that you buy (and read) as many of the books as you can afford (except some may wish to pass on the more elementary book by Zar).
Click here for information about software and
the Minitab books in the bookstore.
Read this prior to going to the GMU
bookstore to buy any books and/or software for STAT 554.
If you are going to eventually take both STAT 544 (Applied Probability)
and this course, STAT 554, then if you don't take them the same semester
(which full-time students may do), it makes more sense to take 544 prior to
554. As long as you know enough probability, it's
okay to take 554 before 544 --- but I think the better you know probability,
the easier 554 will be (and the more you'll get out of the course).
No, but my guess is that it'll be much easier to pass the
qualifying exam once you've completed the course.
I believe I'm accurate in claiming that
STAT 510,
which hasn't been offered recently,
falls in between
STAT 250
and STAT 554 in terms of
difficulty and sophistication, but that STAT 510 is closer to STAT 250 than it
is to STAT 554.
STAT 554 covers a very wide variety of methods that can be used
to analyze experimental data (and so it's good for students in biology and
enviromental science), while STAT 510 caters more to students in public
policy and the social sciences, giving them a few simple methods to use, and
introducing them to the SPSS software.
STAT 510 has only a minimal prerequisite (MATH 108 or permission of
instructor), and it goes at a much slower pace.
My advice is that if you need to know how to do serious data analysis, then
take STAT 554 if you think you can handle it. But do be aware that
STAT 554 can consume a lot of hours (many more than STAT 510 requires), and
this is even more the case if you enter STAT 554 without the formal prerequisite. In the end you have to be the one to make an honest assessment of your
ability and your willingness to work hard (but it should be noted that some
programs do not allow STAT 510 to be taken in place of STAT 554).
Each fall my department also also offers another course dealing with applied statistics,
STAT 535,
which was specially designed for students in the sciences.
If you are a graduate student in biology,
enviromental science, chemistry, or some other field of science, you may
be better off taking this course, especially if you lack the formal
prerequisite for STAT 554 (although I will point out that many students
who have lacked the prerequisite have done fine in STAT 554).
STAT 535 does not go into as much detail as STAT 554 does, and
will make much less use of probability theory. Also, it will have more
time allocated to some useful advanced topics (presented at a reasonable
level) that are not covered in STAT 554.