comments on Ghahramani

Comments about Fundamentals of Probability with Stochastic Processes, 4th Ed. by S. Ghahramani

You can use the links below to jump down to the part of this web page that you want to read.

Comments about Ch. 1
Comments about Ch. 2
Comments about Ch. 3
Comments about Ch. 4
Comments about Ch. 5
Comments about Ch. 6
Comments about Ch. 7
Comments about Ch. 8
Comments about Ch. 9
Comments about Ch. 10
Comments about Ch. 11

Chapter 1

(p. 1) I find it rather odd that he uses a "fair coin" as an example at the end of the 2nd paragraph. To me, a fair coin is something that it ideal (a hypothetical model of a perfect coin) and so it's by definition that the probability of heads is 1/2. So no need to refer to the performance of an ideal fair coin in a series of experiments, and based on those results then "postulate" that the probability of heads is 1/2 (since, to me it's by definition that the probability of heads with a fair coin is 1/2).
(p. 1) I think it should be cards instead of "dice" in the 6th to the last line on the page.
(p. 2) In point 1 near the bottom of the page he indicates that there is "no way to analyze the error" but I don't think that's correct. Statistical methods could be used to assess the error.
(p. 2) I don't like some of the examples given by the bullets in point 3. The 50,000th decimal figue of π is either 7 or it's not, and referring to the probability that it's 7 isn't sensible. Also, either Mozart was poisoned by Salieri or he wasn't. Just becase we don't know for sure which or these two cases occurred, to me it isn't sensible to assign a probability to the event that he was poisoned. All in all, I don't care much for using probability as a measure of personal belief. I can see where at times it may be useful for expressing an educated guess as to what may or may not occur in the future (e.g., Las Vegas oddsmakers could seek expert opinions about the probability that a certain team will win the next Super Bowl), but I definitely don't like to use probability for whether or not someone was poisoned hundreds of years ago.
(p. 3) In the first sentence of Sec. 1.2, in the description of a sample space he has that all of the possible outcomes of the experiment are predictable in advance. I think most probability books just describe that the sample space is the set of all possible outcomes and don't stipulate that they are all predictable in advance (i.e., a random experiment has a sample space whether or not we know exactly what all of the possible outcomes are).
(p. 4) In Example 1.6 it's indicated that the "round-off error" is random, but that seems screwy to me. For example, if someone only used a credit card to buy items off of the dollar menu at a fast food place, and only bought one item at a time, the round-off error would always be the same. Even in more realistic credit card usages, I don't like to think of the round-off error as being "random" unless purchses are actually made in a truly random way.
(p. 15) It's fine to read Remark 1.2 to be aware of the information given there, but don't worry about this (or Example 1.22 on pp. 32-33).
(p. 15) In Example 1.11 it's indicated that the choice is made "at random." In this example, at random is taken to mean that all possibilites are equally likely. While at random and randomly often means equally likely, it doesn't have to mean this. One can have a random selection without all possibilities being equally likely.
(pp. 16-17) In Example 1.13, to arrive at the conclusion that all 9 points in the sample space are equally likely, one needs to assume independence (which won't be covered until Ch. 3).
(p. 21) In Example 1.19 it's indicated that all appointments are assumed to be random, and digging further one can determine that it's assumed that the next appointment is equally likely to be any of the doctor's patients. This seems silly to me. (I really don't like such silly artificial examples used in a lot of probability books.)
(p. 22) Remark 1.4 introduces the term odds. I generally prefer to put things in terms of probability instead of odds. For example, to say a probability is 1/10 seems clearer than saying that the odds are 1 to 9. (If I hear that the odds are 1 to 9 I mentally convert to a probability of 1/10.) But I guess the concept of odds are useful in wagers. For example, if the odds against me are 2 to 1 I know I need to be able to win 2 units for every unit I risk in order to have a fair wager.
(pp. 22-23, Exercises 3 and 12) I think a lot of the exercises in the book are silly in that they use probability inappropriately. I won't comment on all of them, but will comment on some. In Exercise 2 on p. 22 we are given that "33% of the people" have a certain type of blood. Given the rest of the problem, I suspect that by "the people" he means people who are eligible to become the next president. The intended method of solution is to assume that the next president will be randomly chosen from "the people" with each person being equally-likely to be selected ... which of course makes about as much sense as a soup sandwich. Are we really to believe that a homeless person or the owner of a strip club is just as likely to be president as the top prospects in the two major political parties? Exercise 12 isn't as bad, but the wording could be better. The phrase "The probability that a driver is male" should be replaced by The probability that a randomly selected driver is male (like he has in the last sentence of the problem).
(pp. 27-29) I plan to skip Sec. 1.5 in class.
(pp. 29-30) I also skip most of Sec. 1.6 in class (but I'll incorporate some of what's in Sec. 1.6 when I discuss Sec. 1.7).
(pp. 32-33) I plan to skip Example 1.22 in class. (The way it is worded, I think that it's rather hard to understand. And even if it were worded better, it could still be quite difficult for many to comprehend.)

Chapter 2

(p. 45, sentence immediately prior to Theorem 2.1) The rather simple result given in Theorem 2.1 is indeed the basis for the chapter. Theorem 2.1 leads to Theorem 2.2, which leads to the various permutations results in Sec. 2.3, and the results of Sec. 2.3 can then be used to establish the main combinations result of Sec. 2.4, which leads to other results in Sec. 2.4. (So all of the key combinatorial results in Ch. 2 stem from Theorem 2.1.)
(p. 45, Theorem 2.1) This can be generalized to be more helpful with two-stage random experiments. If in a two-stage experiment, there are n₁ possible outcomes for the first stage, and whatever the outcome of the first stage is there are n₂ possible outcomes for the second stage, altogether there are n₁n₂ possible ways the two-stage experiment can occur. This is true even if the set of possibilities for the second stage depend on which particular outcome occurred during the first stage. The key point is that n₁n₂ is the right number of total possibilities as long as it's the case that whatever happens during the first stage there are n₂ possibilites for the second stage. (There doesn't have to be a fixed set of possibilities for the second stage. The possible outcomes for the second stage can depend on what occurs during the first stage. For example, consider the random experiment of sequentially selecting two balls, without replacement, from an urn containing four balls numbered 1, 2, 3, and 4. Altogether, there are 4 possible outcomes for the 1st selection, and 4 possible outcomes for the 2nd selection (since each of the 4 balls can possibly be the 2nd one selected), but the number of possible outcomes for the two-stage random experiment is 4*3 = 12 instead of 4*4 = 16, because after the 1st stage occurs, there are only 3 possible outcomes for the 2nd stage.) Theorem 2.2 on p. 46 can be generalized similarly to deal with k-stage experiments. (Such generalizations allow us to apply the results to a wider variety of situations and makes them more useful.)
(p. 46, Remark 2.1) This book, like most introductory probability books, makes assumptions about independence to justify that the outcomes in certain sample spaces are equally likely, even though the notion of independence isn't addressed until Ch. 3. For example, back in Example 1.13 on pp. 16-17, in addition to having it be equally likely that each man gets off at any of the three floors, it is also assumed that the men act independently of one another to arrive at 9 equally-likely outcomes. (It's possible to have each man be equally-likely to get off at floors 2, 3, and 4, but not have them act independently. For example, suppose that one of the men randomly draws a ball from an urn having three balls numbered 2, 3, and 4. Then if both men get off at the floor corresponding to the number selected, each man is still equally likely to get off at any of the three floors, but in this case the men are not acting independently, and instead of there being 9 equally-likely possible outcomes, there are only 3.)
(p. 49, Remark 2.2) Persi Diaconis was one of my professors when I was in graduate school at Stanford. (I actually helped him him a small part of the paper on coincidences that is referred to.) Here is a link to an article about him that describes a card shuffling result (that it takes 7 shuffles to randomize a deck of cards), and how he trained himself to flip a coin in such a way as to obtain the same outcome 10 times in a row. And here is a link to a related video (about flipping coins).
(p. 57, Examples 2.14 & 2.15) These are nice (and similar to ones I discuss in class).
(p. 58, Example 2.18) This one is also nice. Note how Theorem 2.2 (p. 46) is used to obtain the denominator, and Theorem 2.4 (p. 58) is used to obtain the numerator.
(p. 64, Examples 2.20 & 2.21) Example 2.20 isn't a good example! It makes little sense to me. He assumes that all possible subsets of 32 are equally likely to be the ones who gave negative responses and that isn't realistic. The answers that those three specific instructors gave, and also the responses of the other instructors, depend on their specific situations and using probability in a simple way isn't good here. In constrat, the wording of Example 2.18 (i.e., that a random sample of five of them is selected) makes it okay.
(p. 65, Example 2.22) This is a good example. (So good I did the first part in class.)
(p. 65, Example 2.23) This is a good example (Note the use of the "complement rule" from Ch. 1 (Theorem 1.4 on p. 18).) The Warning that starts near the bottom of the page is paricularly good to think about.
(p. 66, Example 2.24) This is a good example. (So good I did the first part in class.)
(pp. 66-67, Example 2.25) There is a mistake on the 7th line of p. 55. If the apples encountered at positions i and i+1 are the first two apples encountered in the ordering, then the 2nd person wouldn't get any oranges. Similarly, if those two apples were the 3rd and 4th apples encountered, then the 4th person wouldn't get any oranges. To generalize, if those two apples were the jth and (j+1)th apples encountered, then the (j+1)th person wouldn't get any apples. (With the book's wording, if the 1st two apples encountered were next to one another in the 5th and 6th positions, the 6th person wouldn't get any oranges. But in such a case the first person would get four oranges, and the second person wouldn't get any (because there are no oranges between the 1st and 2nd apples). The number of oranges that the 6th person receives depends on the number of oranges between the 5th and 6th apples (and not the fact that there are apples in the 5th and 6th positions).)
(pp. 66-67, Examples 2.25 & 2.26) I'm not going to emphasize the results given in these examples very much (they definitely won't be covered on any of your exams). They're interesting combinatorics results, but they aren't as generally useful for solving probability problems as the other results from Ch. 2 that I'll emphasize. (In some situations, the results could provide a value for the numerator of a probability ratio, but seldom would they be useful for obtaining denominators.)
(p. 68, Example 2.28) This is a somewhat difficult (but classic) problem that's being addressed. It's more complicated than I'd expect you to do as part of an in-class exam. (Note: Instead of the envelopes, the same sort of problem is sometimes put in terms of men throwing their hats into a room and then grabbing a hat randomly and determining the chance that at least one of them grabs their own hat. Instead of these silly situations with envelopes and men randomly groping around in a dark room to grab a hat, a more realistic (though somewhat naughty) application pertains to swinging parties. As depicted in the 1997 Ang Lee film The Ice Storm and the 2008 television series Swingtown, in the heyday of wife-swapping parties in the 1970s, men would put their car keys in a bowl and their wives would randomly grab a set of car keys and go home with the car's owner. In this setting one could determine the probability that a wife would go home with her own husband (which sort of defeats the point of attending the party).)
(pp. 71-72, Example 2.34) This example gives the derivation of a result I'll work out when I cover the first part of Ch. 3 As an example of using it, the number of ways of dividing 52 playing cards among four bridge players, giving each of the players 13 cards, is 52!/(13! 13! 13! 13!).
(p. 80, Section 2.5) I'll briefly mention this section in class, but it's not real important for us this semester.

Chapter 3

Some parts of this chapter seem excessively "wordy" to me, and in places (e.g., Example 3.34 on pp. 127-128) the author makes things much more complicated than necessary. I think reading through my Ch. 3 class notes may be a better way to introduce yourself to the basics of conditional probabilities and independence. I'll also point out that a couple of the longer examples (e.g., Examples 3.9 & 3.30) have a somewhat paradoxical quality about them, making them not straightforward examples which clearly illustrate the important concepts pertaining to conditional probability and independence. Also, some examples and exercises are silly/flawed (e.g., Example 3.14 on p. 102), and others (e.g., Example 3.15 on pp. 102-103) are a bit too tedious and advanced for a 300-level course in probability.
(p. 90, Theorem 3.1) This theorem gives us that conditional probabilities also satisfy the axioms of probability.
(p. 93) The 8 points in the middle of the page gives us conditional probability obeys the same results as ordinary probability.
(p. 98, Exercise 1) This exercise is silly! (Unfortunately, many of the exercises in the book are silly.) Assuming that the given probability model is good for whether or not Robert will lie, the probability he commits perjury is 0 is Susan is innocent and 0.25 if she is guilty. Since we don't know if Susan is innocent or guilty, there is no way to answer the question posed. To me, the fact that it's given that the judge is 65% sure that she's guilty is rather meaningless --- she is either guilty or not and the judge's personal assessment of 65% is just a guess. (Note that it's stated that Robert knows whether she's innocent or guilty.) To get the answer in the back of the book one is suppose to assume that Susan is guilty with probability 0.65, but again, in my mind she either is or isn't guilty, and I can't answer the question since I don't know the truth of the matter. (I have a big problem with some of the subjective probability problems found in books --- they are typically just a bunch of happy horses**t.)
(p. 100, Exercise 12) To me the easy way to solve this is to say the answer is 1/14 by symmetry --- we don't really need to use conditional probability. (The ace of spades and each of the 13 hearts are all equally likely to be the first of those 14 cards to be drawn. If the ace of spade is drawn first, among those 14, then it's drawn before any hearts are. Otherwise, a heart will be drawn before the ace of spades.) Exercies 5 and 6 on p. 108 can also be answered using symmetry. If one or two cards are lost at random, there's no reason why the probability of drawing a spade is more or less likely than the probability of drawing a card of any other suit. So the probability is 1/4 in each case. To feel really comfortable with this problem, I'd state it "If a card is lost at random and then a card is randomly drawn ..." If both things are to happen (in the future) then the probability of drawing a spade is definitely 1/4. If the card is already lost and then a card is drawn, then of course the probability of getting a spade depends on which card was lost. But if we have no knowledge of which card is lost and we assume it was random, it's okay to still assess the probability that it will be a spade at 1/4. If at some point I learn which card was lost, I then revise my assessment and state that the probability of drawing a spade is 12/51 is a spade was lost, and it's 13/51 if a card of another suit was lost.
(pp. 102-103, Example 3.15) You can read this example if you want, but I won't cover it and I won't expect you to be able to do problems like this on an exam. I cover something very similar to this when I teach STAT 544. (It's not terribly difficult, but given all that we are suppose to cover this semester, I just don't think it's good to spend the time going through this example.) You can also skip Remark 3.2 on pp. 103-104.
(p. 114, first sentence of last paragraph of Example 3.20) I think that instead of "probability of guilt" it should be probability of innocence. (Note: The last two paragraphs of this example are hard to follow, especially if you don't figure out that the author has at least a couple of mistakes in them.)
(p. 114, 2nd line from end of Example 3.20) I think it should be P(G|D) instead of P(I|D).
(p. 115, Solution of Example 3.23) The author uses horrible notation here. Except in rare circumstances, single uppercase Roman letters (e.g., A, B, C), possibly with a subscript, should be used for events. BR can be misinterpretted to mean the intersection of events B and R (and if we do the same thing for BB, we'd have that BB = B (since intersecting a set with itself just results in the original set)). I'll also point out that, conventionally, letters from the first portion of the alphabet (A, B, C, D, E, F, G, and H) are commonly used for events, while letters from the last portion of the alphabet (T, U, V, W, X, Y, and Z) are commonly used for random variables (that we begin dealing with in the next chapter). (And exception is that I is often used for a certain type of random variable (called an indicator random variable).) Finally, I'll note that the author's use of R for an event is a particularly poor choice because R is commonly used to denote the set of all real numbers.
(pp. 127-128, Example 3.34) A symmetry argument can be applied to obtain the answer of 1/4 easily (and it puzzles me greatly that the author doesn't indicate that such a simple symmetry argument is possible (although I guess he weakly hints at it on the 7th and 8th lines of p. 128)). Among the 16 cards which are face cards and aces, by symmetry each of them is equally likely to be the first of those 16 to appear. Since 4 of the 16 are aces, the probability is 1/4.
(pp. 128-129, Example 3.37) You can skip this example if you're short on time. I cover problems like this in the 500-level probability course that I sometimes teach. (It's not a terribly hard problem, but it's not typical of the kind of problems you'll be expected to solve on exams. The author added a lot of slightly difficult examples to the 3rd edition of the book, making it not that much different from the STAT 544 text book.)
(p. 131, Exercise 9) For the probability of 0.725 to be sensible, one would have to state that the person was randomly selected. But even then the problem isn't good, since John and Jim would be two specific people and not randomly selected ones.
(p. 131, Exercise 11) "Effron" should be Efron.

Chapter 4

(p. 146, Definition of random variable) A lot of introductory probability books simply state that a random variable is a real-valued function of a sample space, and they don't address the additional technicality that our text does. I think it's okay to just skip over the technicality for now. For practical situations one is commonly concerned with, a real-valued function of the sample space will satisfy the additional technicality.
(p. 149, 3rd line of Sec. 4.2) Should be P(X >= 8) instead of P(X > 8).
(p. 149, 2nd line of point 2 near bottom of page) It should be diverges to infinity instead of converges to infinity.
(pp. 152-153, Example 4.8) The cdf is for a random variable of mixed type ... neither discrete nor continuous. While the example illustrate the results given on pp. 149-150 well, I'll point out that random variables of mixed type are rather uncommon and we won't deal with them much.
(p. 154, Remark 4.1) It's okay to just skip this. Dealing with such advanced technicalities doesn't seem to be the most productive way for most beginning students of probability to learn the basics of the subject.
(p. 165, Definition of expected value) I'll explain in class that authors of probability books (and others) are not consistent about whether expected values must be finite. (Some allow infinity to be a valid expected value.) This book is inconsistent with itself. The last sentence right before Example 4.18 on p. 168 indicates that the expected value won't exist in the next two examples, but in Example 4.18 (on p. 168) he has E(X) equal to infinity ... indicating that infinity is a legitimate value for E(X). (Example 4.19 also arrives at an infinite expectation.)
(p. 165, right below the definition) I don't think "mathematical expectation" is a commonly used term, whereas expected value, mean, and expectation are commonly used. (Another term that is sometimes used for expected value is first moment.)
(p. 168, last sentence of Example 4.18) While the given chance is indeed less than 1 in a billion, I wouldn't say that it's much less. (Better to say it's slightly less than 1 in a billion.)
(pp. 169-172) Examples 4.20, 4.21, and 4.22 are messy. It's the type of stuff sometimes covered in STAT 544, but I'll skip this type of stuff for STAT 346.
(p. 183) You don't have to be concerned with Definition 4.6 and Theorem 4.6. They deal with a concept that isn't commonly used.

Chapter 5

(p. 195) The book refers to James Bernoulli. Actually, there were several members of an extended family of Bernoullis who made important contributions to probability. Some of them were known by more than one name, and so it gets a bit confusing.
(p. 195, expression (5.1)) Lots of people routinely use q for 1 - p. While I'll do so on occassion, usually I don't because I think it's better to emphasize that there is just a single parameter instead of two.
(p. 197, Example 5.2) I tend to hate silly examples like this: is it realistic that think that people order randomly, with all choices being equally preferred?
(pp. 198 & 200, Examples 5.4 and 5.8) These examples have a statistics flavor to them: how to use observed values to make an inference about something which is uncertain (assessing hypotheses and estimating parameter values which are unknown).
(p. 201, Fig. 5.1) All 3 plots are inaccurate. Most notably: (1) the values indicated for p(0) and p(n) are way too big for the cases of n = 10 and n = 20; (2) the probabilities don't increase and decrease linearly (or nearly linearly) as shown --- the plots should appear more mound-shaped (rounded at the highest values, and curving more gently towards 0 at the ends of the ranges of possible values).
(p. 202) As I cover in class, there are easier ways to obtain the mean and variance (that use results from Ch. 8 and Ch. 11).
(p. 210, Definition 5.3) Although the choice of symbol is arbitrary, and any reasonable thing can be used, I generally like to use θ instead of λ, because in the Poisson process setting one often replaces the parameter by the product λ t (a rate parameter λ multiplied by a length of time t), and I've noticed that students tend to get confused about why it's sometimes just λ and other times it's λ t.
(p. 209) On the bottom portion of the page (below Definition) some guidelines are given pertaining to what should be true about n and p for the Poisson approximation to the binomial distribution to work okay. Somewhere it should also be stressed that one shouldn't use the approximation if n is small enough to make an exact binomial distribution computation possible! The approximation is for large n, but p also needs to be sufficiently small. (Note: I have no idea why the author has "appreciable λ" on the 12th line from the bottom on p. 209, since the specific approximations being referred to in that sentence work best when λ is small.)
(p. 211, Example 5.12) Overall, a nice little example, but note that there is no specific mention of a Poisson distribution. So based on the setting (the focus is a count --- the number of wrong-number phone calls received --- and there is no fixed number of trials) one is suppose to determine that a Poisson random variable is appropriate.
(pp. 215-216, Examples 5.17 & 5.19) Often problems of this sort will just state that something occurs randomly at a specified rate, and if no other information is given, one should interpret it to mean that one is dealing with a Poisson process (but without the word randomly, one should interpret it to be that something occurs steadily, with a fixed time between occurences). The book's use of the phrase "Poisson rate" isn't something that's very common.
(p. 224, Remark near top of page) The "shortcut" indicated by the remark corresponds to using an equivalent event having a probability that's very easy to obtain.
(p. 227) While letting a hypergeometric random variable represent the number of defectives is an acceptable setting for a hypergeometric distribution, many other settings are possible. In general, you just need to be considering the drawing of a random sample from a collection of two types of objects.
(p. 228, Definition of hypergeometric pmf) One can generalize this --- it's not necessary to require that n be less than or equal to the smaller of D and N - D. See the class notes for the general case. (The pmf formula is still the same, except that the values of x for which the pmf is nonzero differs.)
(p. 233, Exercise 5) This one is worded stupidly --- what is needed is that the order in which they board is random.
(p. 234, Exercise 12) Instead of "games" it should be hands. (Also, it should state whether the focus is on a particular player or any of the four players, and whether "three" means exactly three or at least three (because if a player has 4 aces then he/she definitely has 3 aces).)

Chapter 6

(p. 243, 2nd line of 1st paragraph) Although it's okay to have that the cdf F is a function mapped to R, it's also correct (and perhaps more precise) to say that it is mapped to [0,1] (since probabilities cannot be outside of [0,1]).
(pp. 245-246, part (c) of Example 6.1) This is worded stupidly. To match the answer given in the book, the wording could be changed to "will next see a person smoking in 2 to 5 minutes". Also, for the last part, it would be better to request the probability that it will be at least 7 minutes before he next sees another smoker.
(pp. 249-250, Exercise 2) This is a ridiculous situation in which to use a pdf. (Why would the length of a TV show be a random variable having such a pdf? It's not sensible.)
(p. 253, 1st sentence of Sec. 6.2) As is, the sentence isn't sensible. One way to fix it would be to change "density function h(X)" to the density function of h(X). (Notice that in the following sentence the density function of h(X) is referred to.)
(p. 253, 1st paragraph of Sec. 6.2) The method of distribution functions is what I call the cdf method.
(p. 259, Definition of expected value) "mathematical expectation" is not nearly as frequently used as the other terms are.
(p. 260, Remark 6.2) The fact that the book defines a finite expected value here, and did not put such a requirement in Definition 6.2, and (on the 4th line of Remark 6.2) has an "Otherwise," leads me to believe that the author is allowing for the possibility that an expected value may be infinite. Yet he stops short of expanding the discussion to be in explicit agreement with what I have in the course notes ... that an expectation can be finite, infinite, or it can fail to exist.
(p. 261, Remark 6.3) Since (I suppose) the author has allowed for the possibility that expected values don't have to be finite, I think that the real meaning of the remark is that some of the results stated in the book for expected values won't necessarily hold unless the expectations are finite. (I think that some such results will be found in Chapters 8, 9, 10, and 11.) All in all, it's extremely disappointing that the author doesn't clarify things better (although I suppose the wishy-washy stuff here is better than the incorrect position taken in some places of Ch. 4 that an expected value must be finite). Again, I'll point out that authoritative doctoral-level books on probability are in agreement with what I present in the class notes, that an expected value may be finite, infinite, or fail to exist.

Chapter 7

(p. 281, Example 7.2) Although we might somewhat realistically model the arrival time of the bus with a random variable, I don't think a uniform distribution is appropriate.
(pp. 282-283, Example 7.3) Although I find geometric probability interesting (I did my Ph.D. dissertation in the area of geometric probability), you don't have to concern yourself with the details of this example.
(p. 286, 2nd line) I don't know why he has that np should be "appreciable." One wants n large and p small to make the approximation good.
(pp. 289-290, Example 7.5) Note that the approximation is pretty good (the error is less than one percent).
(p. 295, last sentence of indented paragraph) I think it would be better to change "is always represented by a normal curve" to can often be approximated by a normal curve, or can sometimes be closely approximated by a normal curve.

Chapter 8

(p. 332, Theorem 8.1) Note that here it indicates that provided that the sum is absolutely convergent, the expected value is given by the stated formula. But it doesn't indicate what is true otherwise (and so the lack of precision is the writing continues).
(p. 332, Corollary) Results like this are a reason why the author (in Remark 6.3 on p. 261) states that it should be assumed that expected values are finite unless otherwise indicated. Because if both expectations are finite, the result given in the corollary is definitely true. But the result is not necessarily true if both of the expected values are infinite (one in a positive sense, and the other in a negative sense). However, if one of the two expected values is finite, and the other infinite, then the result still holds (provied it is accepted that expected values can indeed be infinite).
(p. 339, Definition 8.5) I think it'd be better to have the 2nd sentence start with Unless otherwise indicated, a point is said to be randomly selected ... , since it's certainly possible to have randomness in the selection of a point from a region without using a constant density over that region.
(p. 348, 1st 3 lines of Sec. 8.2) These 3 lines should be labeled as a Definition.
(p. 354, Example 8.15) Although I like geometric probability, if you're short on time, feel free to skip over this example.
(pp. 364-365, Example 8.21) An easy way to arrive at the obvious conclusion that without any additional information it makes no difference in expected profit whether or not one switches is to realize that due to the random initial selection, the expected profit from the non-switching strategy is just (1/2)m + (1/2)2m = (3/2)m, where m is the smaller amount of money, and this is also the expected profit which results from the switching strategy. Note that the value of the conditional probability given 6 lines from the bottom on p. 326 is something that would be unknown unless additional information was given. If it is known that the larger amount is equally likely to be 1, 2, 4, or 8 dollars, then it becomes a different problem, and that additional information should be used. It is clear that if the amount initially observed is $0.50, then clearly that's the smaller amount and one should switch. Similarly, if the initial amount observed is $8, then clearly that's the larger amount and switching would result is less money. If 1$, 2$, or $4 is initially observed, then one wouldn't know for sure whether it's the larger or smaller amount, but probability could be used to determine a good strategy.
(Sec. 8.4) Although I'll go over the convolution result given in Theorem 8.9 (when I cover Sec. 8.2 of the text), I'm completely skipping the main portion of this section, which involves obtaining the joint pdf of two functions of two random variables. I've just never found the method that is emphasized to be very useful. While often we might want the pdf for a single function of two random variables, much less often I find myself wanting the joint pdf of two different functions of two random variables. While to obtain the pdf of a single function of two random variables, we could make up a second function and and obtain the joint pdf of two different functions and then integrate the joint pdf to obtain the marginal pdf of the function of interest, this may involve a lot of difficult work. So if I just want the pdf of a single function of two random variables, I usually just employ the cdf method that I emphasize in class.
(pp. 378-379, Proof of Convolution Theorem) The convolution result can also be rather easily derived using the cdf method. It is not necessary to use Theorem 8.8 to derive the convolution result.

Chapter 9

(pp. 398-400, Example 9.3) This is just another way to work problems of the type we encountered in Ch. 3. I don't really think it's a particularly better way ... it's just another way. (I'm skipping this when I cover Ch. 9.)
(p. 406, Definition 9.5) This describes just one type of a random sample. Another popular variety is a simple random sample, which is a subset of size n randomly selected from a finite population of size N, with each possible subset of size n being equally likely to be chosen.
(p. 412, Theorem 9.5) In class I'll focus on the distributions of the 1st and nth order statistics (the sample minimum and sample maximum).
(p. 418, 13 lines from top) "polynomial distribution" is not a commonly-used alternative for "multinomial distribution."

Chapter 10

(p. 429, Theorem 10.1; & p. 430, Corollary) Results like these are a reason why the author (in Remark 6.3 on p. 261) states that it should be assumed that expected values are finite unless otherwise indicated. Because if all expectations are finite, the result given in the corollary is definitely true. But otherwise, the result is not necessarily true. But still, better (more advanced) probability books allow for infinite expectations, and they acknowledge that while results like theses hold if certain conditions are satisfied, they don't hold for all random variables.
(p. 435, Cauchy-Schwarz inequality) Although you don't need this result for this course, you might want to remember that it's in this book since you might encounter it inother courses.
(p. 436, last line of Exercise 5) I think it'd be better to change "should" to will.
(p. 441, Example 10.9) In the example, the "perfected related random variables" have covariance 0 and are uncorrelated. In general, it's possible for "perfected related" random variables to have any covariance (and any correlation from -1 to 1). (Note: By perfectly related, it is meant that for each possible value that X can assume, only one value of Y is possible.)
(Sec. 10.4 and Sec. 10.5) Although there is some nice material in these sections, I'm skipping them because we don't have time to cover everything in the book and I think other sections are more important. (If you take STAT 544 as a second course in probability, this material should be covered.)

Chapter 11

(p. 483, 1st paragraph) It's not important for this course, but I want to point out that kurtosis is not really just a measure of flatness/peakedness. It is also influenced by the "tail weight" of the distribution (how quickly the density in the tails of the distribution approach 0).
(sentence that starts at bottom of p. 483 and continues on p. 484) It'd be better to state that no two different distributions have the same mgf. Notice that the 4th and 5th lines above Example 11.7 on p. 489 indicate that two random variables can have the same mgf (if they are identically distributed).
(p. 488, (c) of Example 11.6) I'm sure it's more accurate to state that ln X is approximately a normal random variable.
(p. 502, Theorem 11.9 (Chebyshev's Inequality)) If you look in a variety of books, you can find that there are several different versions of inequalities called Chebyshev's inequality. This is just one of them (but one of the more commonly used ones).
(p. 513) To fully understand the material presented on this page, one needs advanced mathematics typically not encountered by engineering students during their undergraduate education.
(pp. 516-517, Example 11.25) This has no practical importance.
(pp. 521-522, Theorem 11.13 & the proof of Theorem 11.13) This material is perhaps too complex for you to worry about as the final exam nears.
(pp. 524-525, Example 11.31) Since this is not at all important for the final exam, you may want to skip this example.