Comments about A First Course in Probability,
10th Ed. by
S. Ross
You can use the links below to jump down to the part of this
web page that you want to read.
-  (p. 1) In Sec. 1.1 it is stated that the probability 3/6 = 1/2 seems reasonable.
An implicit assumption is that the ordering of the 2 good and 2 bad units will be random,
with all 6 possible arrangements being equally likely.  (It could be that the first two units were 
obtained from a batch which included a lot of defectives, and the last two were from a batch
not containing a lot of defectives, and this would make it more likely the two bad ones are lined 
up consecutively ... making the probability of 1/2 incorrect.)
-  (pages 2, 3, and 15)  With regard to the "basic principle of counting" on p. 2 Ross refers to 
two different experiments whereas on p. 15 he refers to an experiment consisting of two phases.  The key for
the mn result to be correct is that for each of the m possible outcomes of the first experiment/phase,
there has to be n possible outcomes for the second experiment/phase (or vice versa).  Unlike what is suggested 
by the enumeration shown in the proof on p. 2 (the m by n arrangement of ordered pairs), the n
possible outcomes for the second experiment/phase don't have to be the same for all of the m possible
outcomes for the first experiment/phase.  For example, in Example 2a on p. 2 there are 30 children 
who could possibly be chosen as child of the year (an honor I somehow missed out on as a youth), but given each of the 
10 possile choices of mother of the year there are just three choices for the child of the year (and those choices are different
for each of the 10 mothers).  Also, in Example 2e, 26 different letters are possible for the 2nd letter, but given a 
particular choice for the first letter only 25 choices are possible for the second letter (and the 25 possibilities for the 
second letter differ depending on what is chosen for the first letter).  So, once again, for
the mn result to be correct, for each of the m possible outcomes of the first experiment/phase,
there has to be the same number, n, possible outcomes for the second experiment/phase.  The n possibilities for the 
second experiment/phase can be the same for all possible choices for the outcome of the first experiment, as in
Example 2d, or the
n possibilities for the second experiment/phase can be dependent on the outcome of the first experiment, as in
Example 2e.
-  (p. 4)  
Example 3c
is nice in that one has to use to use permutations at two different "levels" ... once to account for the orderings of the books 
of a given subject, and also to account for the different possible orderings of the subjects.  One also applies the
"basic principle"
at two different levels ... once to account for the different orderings of books within the various subjects given a
particular ordering of the subjects (the 4! 3! 2! 1! at the start of the Solution) and again to account for the 
4! possible orderings of the 4 subjects.  (Note: I recommend going through the book's examples carefully before
trying the homework problems since some of the same "tricks" that are useful for the homework are the ones used in the
examples.)
-  (p. 6) Note that the first complete sentence on p. 6 reminds us that with combinations only the membership of the
group matters and "the order of selection is not considered relevant."
-  (p. 6)  
Example 4b
is nice in that one has to use both the combinations result as well as the 
"basic principle."  Also, one has to make an adjustment to reduce to the count of 35 to 30 in order to address the 
second question posed.  The 2 choose 2 times 5 choose 1 result (equal to 5) can be obtained by noting that if the
two feuding men are part of a three man group then there are exactly 5 ways to choose the third man to fill out the 
group of three men.
-  (pp. 6-7)  
Example 4c
is another  nice one.  Note that the solution is easy provided that one "sets it up" in a convenient way.  (Again, I
really recommend going over the book's examples carefully.  The more you study and take delight in the sometimes clever methods, the 
easier it ought to be for you to apply similar tactics when you attack homework and exam problems.)
-  (p. 7, 4th line of Solution of  
Example 4c) The word "between" is wrong, since two of the possible positions are at the ends.  (So Figure 1.1 shows things
correctly, and Ross's wording is just sloppy.)
-  (p. 10) The box on p. 10 refers to "distinct groups."  You can think of this in terms of the groups being labeled,
as for example in 
Example 5b
for which the expression in the box on p. 10 yields the correct answer.  In contrast is 
Example 5c, in which the groups aren't labeled and one needs to divide the result from
the expression in the box on p. 10 by 2 to obtain the correct answer. 
-  (pp. 11-12)  While it's good to read all of the examples at least once, for the most part I think you're better off
trying to get a very good understanding of the simpler examples and not worry so much about the more complicated examples like
Example 5d.
-  (p. 15)  In the second to the last sentence of the Summary on p. 15 you might want to pencil in "labeled" right below the word
"distinct" to remind you that being able to think of the groups as being labeled is a key point.  (Recall the difference between
Example 5b and 
Example 5c.)
-  (2nd line from bottom on p. 23 & 1st line on p. 24) The EF notation given for the intersection of E and F is not
as commonly used as the alternative notation (which I will usually use) given on the top line
of p. 24.
-  (p. 24, line 10) Sometimes mutually exclusive events are called disjoint events.  (They are definitely disjoint sets, but usually
the term mututally exclusive is used with the term events and disjoint is used with the term sets.) 
-  (p. 24, line 17) I give an alternative notation for the complement of an event on p. 2-1 of my notes.
-  (p. 24, 19th and 20th lines from bottom) Another way to state that two events/sets are equal is that they have exactly the same members.
-  (p. 24) I won't spend a lot of time on the commutative, associative, and distributive laws in class, but keep them in mind 
since they are sometimes useful in solving problems (and doing proofs).
-  (p. 27, lines 7-9) Ross poses the question "How do we know that n(E)/n will converge to some limiting value
that will be the same for each possible sequence of repitions of the experiment?"  Note that in the last paragraph on p. 28 (that continues on p. 29), Ross indicates that
this will indeed be the case with probability 1, and that this can be shown to follow from the three simple axioms given on p. 27.
(On lines 21-23 of p. 27, Ross suggests that it's more reasonable to start with the three simple axioms and then prove the limiting
frequency result than to just assume something so complicated is true.)
-  (p. 28, paragraph after Example 3b) Note that P is a function (from the sets of all measurable events to the 
interval [0, 1] ... you plug an event into the function P, and the value of the function is the event's probability).  (Note: The term measurable is 
used in the Technical Remark on p. 29, but it cannot be easily explained
without assuming a background in mathematics well beyond the prerequisites for this course.  Some of my department's 900-level courses
get into this sort of material.  (It's nice that Ross points out (8th line of p. 29) that "all events of any practical interest are
measurable."))
-  (p. 31, Example 4a) Although I think the setting for this example is silly (dealing with probabilities of liking books),
note how the probability of liking neither book is obtained using the information given (and the probability facts covered so far).
-  (p. 32) Expression (4.1) is sometimes called Boole's inequality.
-  (pp. 33-34, 1st paragraph of Sec. 2.5) Hopefully it's clear that if a sample space has equally likely outcomes then the number of outcomes
in the sample space has to be finite (because there's no way to have a countably infinite number of equal probablities that sum to 1).
(Note: I have to put countably infinite to be correct here. If you don't know the difference between countably infinite
and uncountably infinite you can look it up if you desire, or wait until later this semester when I think I'll have to address this issue.)
-  (p. 34, Example 5b) This example is nice in that there are two valid viewpoints for the sample space, but that one gets
the correct probability either way as long as the number of members of the focused on event is counted in a way consistent with the
sample space.  (The example I have on p. 2-12 of my class notes is similar in this regard.)
-  (pp. 39-40, Example 5k) Don't feel bad if you find this example a bit hard to follow since the situation is complex enough 
that it's sometimes hard to keep things straight.  Even after you've seen the solution it may be easy to overlook some aspect of the situation
if you go back and try to solve the problem on your own later.  While you should master dealing with easier problems where one combines
combinatorics with the simple probability rules of Ch. 2, I don't think you should worry too much if you struggle with some of the more
complex examples of this chapter.
-  (pp. 41-42, Example 5m) Instead of the screwy situation of a bunch of men groping around in a room to randomly select a hat,
we can spice this example up a bit by changing it to the analysis of a swinging party.  The 1997 Ang Lee movie The Ice Storm and the
rather risque 2008 network TV show Swingtown have scenes with parties where the men put their car keys into a bowl and then the 
women randomly grab some keys and go home with whoever they belong to.  The results of this example give us the probability that no one goes 
home with their usual partner.  (An interesting thing is that although the limit as N tends to infinity is about 0.368, the answer for
finite N is close to this for nearly all values of N.  For example, for 
N = 4 we have 0.375,    
N = 5 we have (about) 0.367, and   
N = 6 we have (about) 0.368 (and it rounds to 0.368 for all larger values of N).)   
-  (p. 42, Example 5n) The 19! comes from the fact that we can focus on how people are seated relative to one specific person.
That is, we can focus on one person and consider the 19! ways the other 19 people can be seated going clockwise from the focused on person.
-  (pp. 43-44, Example 5o) This example is marked with an asterick, meaning that the author considers it to be more challenging and
optional.  It's okay if you don't want to stress over it now.  (I cover this result when I teach STAT 657 (Nonparametric Statistics) since
it's pertinent to a test call the runs test (which is also hopefully covered in STAT 554, although it shouldn't be derived there).)
-  (pp. 44-48, Sec. 2.6) This optional section (marked with an asterick) covers material not typically covered in a course of this level.
It's okay for you to skip it for now.  I'll skip covering it in class, but we may come back to it if there is adequate time near the end
of the semester.  (Other topics are much more important to cover.)
-  (pp. 48-49, Sec. 2.7) I've got some good comments about this section on p. 2-11 of my class notes.  I'm not a big fan of subjective
probability in a lot of situations, although the concept is perhaps useful in some situations.
-  (pp. 59-60, Example 2b) In the paragraph before the example, Ross indicates that working with a reduced sample space is often 
easier, and that this fact is illustrated in the next example.  But in the solution of the example, he begins by emphasizing use of the
definition of conditional probability rather than a reduced sample space.  He finally gets to the reduced sample space viewpoint at the
very end of his solution.
-  (p. 61, Example 2e)  This is a nice example.  In his solution to part (a), which is the first paragraph of the solution, 
he obtains the desired probability in two ways; (i) using the multiplication rule given by (2.2) on p. 60, and (ii) using results from
Chapters 1 and 2.
-  (pp. 62-63, Example 2f) This one is a bit messy.  It uses results from a messy example in Ch. 2.
-  (p. 63, Example 2g) The problem addressed in this example is the same as the one in Problem 3.14 on p. 104, which I solve on 
pages 3-4 and 3-5 of my notes.  While the way I do it (the way suggested in Problem 3.14) is rather straightforward, and the way a lot
of experienced people may first think to do it, the way Ross does it in this example takes a lot less work.  ***  For the event
E1, I'd just put the ace of spades is in one of the piles and omit the work "any" (since it doesn't clarify
the situation).  To see where the probabilities Ross uses come from, hopefully it's clear that
P(E1) = 1, since when the cards are divided into the four piles the ace of spades must be put into one of them.
The value of P(E2|E1) is 39/51, because we can think that given that the ace of spades is taking
up a space in one of the piles, there are 51 spaces available for the ace of hearts that are assumed to be equally likely, and 3 ×
13 = 39 of them are in the piles not containing the ace of spades.  The arguments for the other conditional probabilities used to obtain
the final answer are similar.
-  (pp. 64-72, Sec. 3.3)  Even though the section title is Bayes's Formula, note that (3.1) on p. 65 is not Bayes's formula.  It's a special 
case of what is sometimes called the law of total probability, of which the more general result is given by (3.4) on p. 72. 
Ross finally states Bayes's formula
(called Bayes's theorem in some books) on the bottom of p. 72, although he uses it before that (e.g., in Example 3c).
-  (p. 65, Example 3a (Part 1)) I think it would be better to indicate that during a fixed one year period (about) 40% of the 
people the insurance company classifies as being accident prone have an accident.  The way Ross has it worded, there are two classes of 
people, and everyone in the group deemed to be accident prone has a constant probability of 0.4 of having an accident in a one year
period, which just seems very unrealistic to me.  But even if we don't object to that part of the situation's description, "new
policyholder" needs to be changed to be randomly selected new policyholder in order to justify's Ross's use of the probabilities.
-  (p. 67, Example 3d) This is a common application of Bayes's formula (even though the result isn't formally
presented until p. 72).
-  (pp. 68-69, Example 3f) I really don't like this example.  For one thing, if G is the event the suspect is guilty, then
to me the only values that make sense for its probability are 0 and 1, since either the suspect is guilty or he isn't guilty.  But 
if we want to use probability as a measure of belief, then I guess a value of 0.6 is okay.  But I still have problems with this example.
For one thing, it isn't clear that 
P(C|GC) should be 0.2.  That would make sense if the suspect for this particular case was 
randomly selected from the general population, but I hope that even the worst police don't act that way.  Also, the 
analysis given has the belief in the suspect's guilt jumping from 0.6 to about 0.882.  Suppose there is only one other possible suspect 
for this crime.  Given that "probability as a measure of belief" obeys the rules of probability, at the stage of the investigation
when the inspector comes up with (from where it's not clear) 0.6 as the "probability" the one suspect is guilty, the other suspect 
should be assigned a 0.4 "probability" of guilt.  Now if the other suspect also has the same characteristic that the first suspect
does then an analysis similar to the one given in the solution of the example would result in the updated assessment for the
second suspect as being about 0.769.  On the one hand, if the new evidence increases the belief in the guilt in the first
suspect it should also do so for the second suspect, but something is terribly wrong with all of this since the two
updated assessments of guilt, 0.882 (for the first suspect) and 0.769 (for the second suspect), sum to a value exceeding 1.
(Since both of the suspects possess the same characteristic, the new evidence really doesn't provide any meaningful information.)
All in all, to me this example makes about as much sense as a soup sandwich.
-  (pp. 70-71) Ross introduces the concept of odds, but I don't think I'll cover this material in class.
Giving the odds, as defined in the book, is equivalent to giving probabilities, but for some reason many lay people prefer to use the language of odds
(whereas I think dealing with probabilities is clearer).  If you're at a horse race and someone says the odds are 15 to 1, it's 
important to realize that this means the odds are 15 to 1 against the horse.  Equivalently one could say the odds are
1 to 15 for the horse.  If none of the money is being skimmed away from the bettors, one could take these odds as meaning
the probability of winning is assessed to be 1/16, but if you take the odds at the tracks and try to convert them to probabilities,
you'll find that they don't sum to 1.  As defined on p. 72, the odds are not inconsistent with probabilities, but in a real-life
gambling situation for which "the house" is guaranteed a "piece of the pie" odds reflect the payoffs to the betters and do not
accurately reflect probabilities.  All in all, if you want to discuss probabilities in such a gambing situation, I think it's better
to use probabilities when you want to talk about probabilities, and steer clear of odds.
-  (p. 71) Right under the box giving the definition of odds, Ross refers to the probability of a hypothesis being true.
Most well-trained statisticians (and scientists too, I presume) regard hypotheses as things which are either true or not ...
not things which are true with certain probabilities.  (Typically, statisticians use inference methods to measure the degree of compatibility 
of the available data with hypotheses, but don't talk about the probabilities of hypotheses being true.)
-  (p. 76, part (a) of Example 3n) This is a common application of equation (3.4) on p. 72 (which is sometimes called 
the law of total probability).
-  (p. 78, Example 4b) I think Ross has things backwards.  Rather than assume the probability of
(H, T) is 1/4 and indicate that establishes the independence of the outcomes of the two flips, I think it's
much better to assume independence and use that to establish that the four possibilities for the two flips are equally likely.
-  (Note: I had several comments pertaining to Ch. 4 of the 8th Edition of the book, but the errors I found were corrected and my other concerns addressed 
for the 9th Edition and the 10th Edition.)
-  (p. 193) You may find Ross's use of notation in the solution to Example 2b a bit confusing.
In the first integral on the page he has f(y), where f is the pdf of X.  This is okay since the 
argument of a density function is a "dummy variable" (but you may find it confusing).
Just a bit later he uses 
fY(x) for the pdf of Y.  This is technically okay too, but I think it's stupid to do this.  
There is no good reason not to use y as the argument of the density function here.
-  (p. 197, and other pages) I find it interesting that Ross titles Sec. 5.3 The Uniform Random Variable (italics are mine),
and titles the subsections of Sec. 5.6 The Gamma Distribution, The Weibull Distribution, etc., yet titles
Sec. 5.4 Normal Random Variables and Sec. 5.5 Exponential RandomVariables.  Generally, I don't like to write or say
(although I sometimes slip up) the uniform distribution, because there isn't just one uniform distribution ... there's an 
infinitely large family of them.   It's okay to say the uniform (0, 1) distribution or the standard normal distribution, because
in such cases we're referring to a specific distribution.
-  (p. 198, Fig. 5.3) (a) could be improved by having a thick line along horizontal axis from -infinity direction to α
and from β to the right; and (b) could be improved by having a thick line from -infinity direction to α ... it should
be emphsized pdfs and cdfs are defined for all real numbers (and not just where the pdf is positive).
-  (p. 212, 1st line at the top of the page) The F(infinity) notation is a bit casual ... it means the limit of
F(x) as x approaches infinity.
-  (p. 212, sentence (in parentheses) right before Example 5b) I find this comment to be rather odd ... Sec. 4.7 doesn't seem
to address exponential dist'ns (even though there is a connection between exponential dist'ns and Poisson processes).
-  (p. 225) In the statement of Theorem 7.1, right after "if y = g(x) for some x" it should continue
in the suport of X
 --- the nonzero part applies to those values and the 0 part applies to all other values.  
(Note: To put it another way, the nonzero part applies to all values of y in the support of Y.
If g is an increasing function and (a, b) is the support of X, then the support of Y is (g(a), g(b)).
If g is a decreasing function and (a, b) is the support of X, then the support of Y is (g(b), g(a)).)
-  (p. 266, 1 line above Example 3f) X + X should be X + Y
-  (p. 339, start of subsection 7.5.2) 
The use of notation here is very good ... E[X|Y] is used to denote a function of the random variable Y.
(Can contrast with p. 337. There E[X|Y = y] denotes a constant, while on p. 339
E[X|Y] is a random variable.)
-  (p. 354, start of subsection 7.5.4) 
The use of notation here is not good ... the first sentence indicates that a specific value y of Y is 
being considered, and so the proper notation for the conditional variance should be
Var[X|Y = y] and not
Var[X|Y].
(If the want to consider the conditional variance to be a random variable which is a function of the random variable Y, then
Var[X|Y] is the proper notation.)