This is the Solution Manual for ‘Probability!’ and thus includes both analytical and empirical solutions to (nearly) every problem in the textbook, as well as empirical solutions to the problems reproduced from Stat 110. This document will probably be most effective if you use your computer’s search functionality to locate specific problems/sections (note that chapters aren’t numbered here, so you must search them by name).




R




0.1

There are 1.60934 kilometers in every mile. Write a function in R that converts kilometers to miles.



Empirical Solution:

#define the function; take kilometers as input
converter <- function(kilometers){
  
  #convert to miles
  miles = 1.60934*kilometers
  
  #return the number of miles
  return(miles)
}




0.2

Show graphically that \(\frac{n!}{n!(n - k)!}\) where \(n! = n\cdot (n - 1)\cdot(n - 2)... \cdot 1\) (this is the binomial coefficient, which we discuss at length in Chapter @ref(counting)) is maximized at \(k = n/2\) when \(n\) is even and \(k = \frac{n + 1}{2}, \frac{n - 1}{2}\) when \(n\) is odd.


Empirical Solution:

#define even and odd values for n
#should work for any even/odd values!
n.even = 10
n.odd = 11

#plot the even values
plot(0:n.even, choose(n.even, 0:n.even), xlim = c(0, n.odd),
     ylim = c(0, 500), col = "firebrick3", pch = 16,
     xlab = "k", ylab = "n choose k",
     main = "n choose k for n = 10, 11")

#add a line at the maximum
abline(v = n.even/2, lty = 3)

#allow us to put a new graph down on this plot
par(new = TRUE)

#plot the odd values
plot(0:n.odd, choose(n.odd, 0:n.odd), 
     xlim = c(0, n.odd),
     ylim = c(0, 500), col = "dodgerblue4", pch = 16,
     xlab = "", ylab = "")

#put lines at the maximum
abline(v = (n.odd + 1)/2, lty = 3)
abline(v = (n.odd - 1)/2, lty = 3)

#create a legend
legend("topleft", legend = c("n = 10", "n = 11"),
       lty=c(1,1), lwd=c(2.5,2.5),
       col=c("firebrick3", "dodgerblue4"))




0.3

Demren is wandering among the letters A to E. He starts at C and, every step, moves up or down a letter (i.e., from C he can go up to B or down to D) with equal probabilities (a 50/50 chance). Once he hits one of the endpoints A or E, he stops. Let \(X\) be the number of steps he takes. Using a simulation in R, estimate the mea and median of \(X\).


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#keep track of how many steps he takes
X = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #initialize his location; code A as 1, B as 2, etc.
  location = 3
  
  #go until we hit 1 or 5
  while(location != 1 && location != 5){
    
    #flip a coin to see if we go up or down
    flip = runif(1)
    
    #the case where we go up 
    if(flip <= 1/2){
      location = location + 1
    }
    
    #the case where we go down
    if(flip > 1/2){
      location = location - 1
    }
    
    #increment
    X[i] = X[i] + 1
  }
}

#find the mean and median
mean(X); median(X)
## [1] 4.062
## [1] 4




0.4

Imagine rolling a fair, six-sided die, and then flipping a fair, two-sided coin the number of times specified with the die (i.e., if we roll a 3, flip the coin 3 times). Let \(X\) be the number of heads you get in this experiment. Use a simulation in R to estimate the mean, median and mode of \(X\).



#replicate
set.seed(110)
sims = 1000

#keep track of X
X = rep(0, sims)


#run the loop
for(i in 1:sims){
  
  #generate a roll
  roll = sample(1:6, 1)
  
  #flip the coin the specified number of times
  for(j in 1:roll){
    
    #flip the coin
    #recall that 'runif(1)' draws a random value between 0 and 1, so
    #   count 'heads' as getting a value below 1/2
    flip = runif(1)
    
    #see if we got heads; increment if we did
    if(flip <= 1/2){
      X[i] = X[i] + 1
    }
  }
}

#find the mean and median
mean(X); median(X)
## [1] 1.708
## [1] 2





Counting




1.1

Consider a standard, well-shuffled deck of cards. What is the probability that the 4 Aces are all adjacent? Here, define ‘adjacent’ as taking four consecutive positions; i.e., the 1st, 2nd, 3rd and 4th cards in the deck. The definition is not circular: the first and last cards in the deck are not considered adjacent.



Analytical Solution:

We can employ the naive definition of probability. If we fix the four Aces in adjacent spots, there are 48! ways to order the other 48 cards. There are also 4! ways to order the Aces within their adjacent spots, since the aces are all different. Finally, there are 49 possibilities for ‘adjacent spots’: the first Ace could be the first card, the second card, up to the 49th card. By invoking the multiplication rule (and knowing that there is a 52! way to normally order a deck, which we put in the denominator), we write:

\[P(A) = \frac{48! \cdot 4! \cdot 49}{52!} \approx .17\cdot10^{-3}\]

Where \(A\) is defined as the event that the four aces are adjacent.


Empirical Solution:

#replicate
set.seed(110)

#use an extra number of simulations; rare events!
sims = 1000*100

#create a deck
deck = matrix(0, nrow = 52, ncol = 2)
deck = data.frame(deck)
colnames(deck) = c("Suit", "Value")

#fill in suits and values
deck$Suit = c(rep("H", 13), rep("D", 13), rep("S", 13), rep("C", 13))
deck$Value = c(rep(c("2","3","4","5","6","7","8","9","10","J","Q","K","A"), 4))

#define the 4 Ace's vector
aces = rep("A", 4)

#indicator if we get a success
success = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #draw a sample deck
  samp = sample(deck$Value)
  
  #see if AAAA is in the deck
  #the function 'grepl' sees if the first argument is contained in the second
  #   we have to collapse our vectors using the 'paste' function so
  #   it is palatable for the 'grepl' function
  if(grepl(paste(aces, collapse = ";"), paste(samp, collapse = ";"))){
    success[i] = 1
  }
}

#these should match (empirical and analytical)
mean(success)
## [1] 0.00014
factorial(48)*factorial(4)*49/factorial(52)
## [1] 0.0001809955




1.2
  1. There are eight schools in the Ivy League (Harvard’s athletic conference): Harvard, Yale, Princeton, Brown, Columbia, Dartmouth, Cornell and Penn. The academic rankings of these schools is often a topic of interest and controversy. How many different ways are there to rank these schools?

  2. Sometimes, Harvard, Yale and Princeton are referred to as the ‘Big 3’, and are often grouped together as schools because of their extensive similarities. If indeed these three schools were identical - that is, the ordering ‘Harvard Yale Princeton’ is taken as identical to the ordering ‘Yale Princeton Harvard’ - how many ways would there now be to rank the Ivies?



Analytical Solution:

  1. Since factorials allow us to count the number of ways to ‘line things up’, we get \(8!\), or 40,320 ways.

  2. Since the three schools are now the same, \(8!\) overcounts. We have to divide out by 3!, since we are blocking 3 schools together. The answer is \(\frac{8!}{3!}= 6,720\).


Empirical Solution:

### part a. ###

#define a vector of the ivies
Ivy = c("Harvard", "Princeton", "Columbia", "Dartmouth",
        "Cornell", "Brown", "Penn", "Yale")

#should get 40320
dim(permutations(n = 8, r = 8, v = Ivy, set = FALSE))[1]
## [1] 40320
### part b. ###

#now group the Big 3
Ivy.0 = c("Big3", "Big3", "Columbia", "Dartmouth",
        "Cornell", "Brown", "Penn", "Big3")

#generate the combinations
combs = unique(permn(Ivy.0))

#should get 6720
length(combs)
## [1] 6720




1.3
  1. You are tasked with dividing 20 kids into two kickball teams at recess. Games have been very even in the past, so you decide to make things more interesting by giving 1 team 9 players and the other team 11 players. How many ways could you make these teams?

  2. Can you write your answer to part (a) in a different format?

  3. After the first game, the team with 9 people complained because of the inherent disadvantage. For the second game, you decide to again put 10 people on each team. How many ways can you do this?



Analytical Solution:

  1. The number of ways to choose 11 kids from a group of 20 is \({20 \choose 11}\), by the story of the binomial coefficient.

  2. By the symmetry of the binomial coefficient, \({20 \choose 11} = {20 \choose 9}\). There are only two teams and not picking someone to be on the first team is the same as picking them to be on the second team.

  3. \({20 \choose 10}\) subtly overcounts, because we are simply separating players into groups. Consider a trivial example. Say I wanted to split the letters A,B,C,D into two groups of letters. I could start by writing \(\{AB/CD,AC/BD,AD/BC,CD/BA\}\); however, \(AB/CD\) and \(CD/AB\) count the same configuration: there are two groups (‘teams’), one that has A and B and the other that has C and D. Since order doesn’t matter (these team configurations are identical), we are overcounting with the binomial coefficient, and we must divide by 2 to correct it. So, the answer is \(\frac{{20 \choose 10}}{2}\).


Empirical Solution:

#generate the kid vector; label kids 1 to 20
kids = 1:20

## part a. and part b. ##

#these should match; choose(20, 11) = choose(20, 9) = 167960
dim(combinations(20, 9, kids))[1]
## [1] 167960
dim(combinations(20, 11, kids))[1]
## [1] 167960
## part c. ##

#this should be choose(20, 10)/2 = 92378
dim(combinations(20, 10, kids))[1]/2
## [1] 92378




1.4

For this problem, assume a normal, well-shuffled 52 card deck. You are dealt 5 cards (called a ‘hand’) randomly from the deck.

  1. The best hand in Poker is a royal flush: the 10, Jack, Queen, King and Ace of the same suit. What is the probability that you are dealt a royal flush?

  2. What is the probability that you are dealt a 3 of a kind (getting exactly 3 of the same value, like three jacks)?



Analytical Solution:

  1. There are \({52 \choose 5}\) possible hands of 5 cards out of the total of 52 cards, and there are only 4 successful outcomes (a royal flush for each suit: hearts, spades, diamonds, clubs). By the naive definition of probability, the probability of a royal flush is \(\frac{4}{{52 \choose 5}} = \frac{1}{649,739}\).

  2. Consider the multiplication rule applied to this event tree. The first choice is selecting what value we want three of, from 2 to Ace. This has 13 branches. Then, we then have to choose 3 of the 4 values in the deck: i.e., if we chose that we want three 6’s for our three of a kind, we have to choose three of the four 6’s in the deck (spades? diamonds? etc.). There are \({4 \choose 3}\) ways to choose 3 of the 4 sixes. We then need to count the ways to select the last two cards in the deck. The first card can be any of the 49 remaining, except for the value for which we already have 3 of a kind: i.e., if we want three of a kind with 6’s, we can’t have the fourth 6, since that would give us four of a kind (we need exactly 3 for 3 of a kind). So there are 48 options for the fourth card. The fifth card can then be any of the 48 remaining cards, except for the same value as the three of a kind (would make it four of a kind) or the same value as the fourth card (would make it a full house: three of a kind and a pair!). This eliminates 4 cards, so there are 44 options for the fifth card. We multiply all of these branches and divide by the total number of hands:

\[\frac{13{4 \choose 3}(48)(44)}{{52 \choose 5}}\]

However, we are still overcounting. For example, our tree might give, as the fourth and fifth ‘garbage’ cards, the 4 of spades followed by the 9 of hearts in one hand, and the 9 of hearts followed by the 4 of spades in the other hand. In other words, order doesn’t matter, so we have to divide by 2 to correct for this (‘order not mattering’ is already adjusted for in our selection of the three of a kind). We get, by the naive definition of probability, the probability of three of a kind:

\[\frac{13{4 \choose 3}(48)(22)}{{52 \choose 5}} \approx .02\]


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#create a deck
deck = matrix(0, nrow = 52, ncol = 2)
deck = data.frame(deck)
colnames(deck) = c("Suit", "Value")

#fill in suits and values
deck$Suit = c(rep("H", 13), rep("D", 13), rep("S", 13), rep("C", 13))
deck$Value = c(rep(c("2","3","4","5","6","7","8","9","10","J","Q","K","A"), 4))


#indicators for royal flush, three of a kind
royal.flush = rep(0, sims)
three.kind = rep(0, sims)

#run the loop
for(i in 1:sims){
  
 #draw a hand
 hand = deck[sample(1:52, 5, replace = FALSE),]
 
 #first, see if we got a flush
 if(length(unique(hand$Suit)) == 1){
   
   #now see if the flush is a royal flush
   if(length(intersect(hand$Suit, "10")) == 1 &&
      length(intersect(hand$Suit, "J")) == 1 &&
      length(intersect(hand$Suit, "Q")) == 1 &&
      length(intersect(hand$Suit, "K")) == 1 &&
      length(intersect(hand$Suit, "A")) == 1){
        royal.flush[i] = 1
      }
 }


 #count how many we have of each value (for three of a kind, we want 3, 1, 1)
 counts = as.vector(table(hand$Value))
 counts = sort(counts, decreasing = TRUE)
 
 #check if we have 3 1 1 
 if(isTRUE(all.equal(counts, c(3, 1, 1)))){
   three.kind[i] = 1
 }
}

#should be very small; may not even get one, it's so rare!
mean(royal.flush)
## [1] 0
#should get .02
mean(three.kind)
## [1] 0.017




1.5

Tony has 5 meetings to schedule this business week (Monday through Friday). Define a ‘permutation’ here as a schedule of meetings by day (i.e., two meetings on Monday, one meeting on Thursday and two meetings on Friday is one permutation). If meetings are indistinguishable, how many permutations are there if Tony does not want to have all five meetings on a single day?



Analytical Solution. Since meetings are indistinguishable, order doesn’t matter, and we are sampling days with replacement, so by Bose-Einstein (consider the meetings as balls and the days as boxes) there are \({5 + 5 - 1 \choose 5} = {9 \choose 5}\) total permutations. There are 5 permutations with five meetings on a single day (all meetings on Monday, all meetings on Tuesday, etc.), so we subtract these out to get \({9 \choose 5} - 5 = 121\) permutations.


Empirical Solution.

#label days 1 to 5
days = 1:5

#generate all possible permutations
perms = combinations(n = 5, r = 5, v = days, repeats.allowed = TRUE)

#iterate over the permutations and count when we get different
#   days for the meetings. Should get 121
sum(apply(perms, 1, function(x){
  if(length(unique(x)) > 1){
    return(1)
  }
  return(0)
  }))
## [1] 121




1.6

Consider a state that has 5 different counties (a ‘county’ is a collection of towns: for example, the town Burlington, Connecticut is in Hartford County), and each county has 6 different towns. Imagine that you want to visit every town in the state; however, you don’t want to visit each county more than once (you can fly from any town to any other town in the state). Define a permutation as one specific ordering of the 30 visits you make (one to each town). How many permutations are there?



Analytical Solution:

Once we visit a county, we must visit every town within the county before we visit another county. There are 6! possible orderings to visit the towns within each county, and there are 5! orderings to visit counties (i.e., visit County A first, then County B, etc.). By the multiplication rule, there are \(5!(6!)^5\) permutations.


Empirical Solution:

#for computational ease, consider 2 counties, each with 3 towns
#label the towns in the first county 1 to 3, and the second county 4 to 6
towns = 1:6

#generate all configurations, if we weren't restricted by county
perms = permutations(n = 6, r = 6, v = towns)

#keep track of the permutations where we stay within counties until visiting all towns
county.perms = integer(0)

#iterate over the permutations, find the correct ones
for(i in 1:dim(perms)[1]){
  
  #save the permutation if we stay within counties; i.e., if 
  #   the towns 1,2,3 are together (which is the same as having the
  #   sum of the town labels = 6)
  if(sum(perms[i, 1:3]) == 6 || sum(perms[i, 4:6]) == 6){
    
    #keep track of this permutation
    county.perms = rbind(county.perms, perms[i, ])
  }
}

#should get factorial(2)*(factorial(3)^2) = 72
dim(county.perms)[1]
## [1] 72




1.7

Imagine a game of tic-tac-toe where the players randomly select a blank space each turn to make their move. If \(X\) goes first, what is the probability that \(X\) wins on their third move?



Analytical Solution: Since \(X\) goes first, there will only be two \(O\)’s down when \(X\) makes the third move, so we do not have to worry about the probability of \(O\) winning. First, count the number of ways to put 3 \(X\)’s in a row. There are 8 possible ‘winning segments’ of length 3 on the 3x3 board. Order matters (putting an \(X\) down in the top left and then the top right is different from putting \(X\) down in the reverse order), so there are \(3!\) ways to put the \(X\)’s down in the winning segment. Once the \(X\)’s have been put down, there are \({6 \choose 2}\) configurations for the 2 \(O\)’s in the 6 remaining spots, and \(2!\) ways to arrange each configuration (order matters). By the multiplication rule, we get \((8)(3!){6 \choose 2}(2!)\) ways to create a winning configuration.

To find the probability of a win using the naive definition of probability, we now have to calculate the number of ways to put down the first 5 pieces. Imagine unraveling the 3x3 game board into a 1x9 vector. We are essentially now writing a 9-letter word with 3 \(X\)’s, 2 \(O\)’s, and 4 ‘blanks’. However, the \(X\)’s and \(O\)’s are distinct; the first \(X\) is different from the second \(X\), etc. (order matters). Imagine writing a word with letters \(X_1\), \(X_2\), etc. The ‘blanks’, though, are indistinguishable, so there are \(\frac{9!}{4!}\) ways to put down the first 5 pieces (\(9!\) ways to spell the ‘9-letter word’, but the blanks are indistinguishable so we are overcounting by a factor of \(4!\)). We could also consider how the first \(X\) has 9 choices, the first \(O\) has 8 choices, etc., and by the multiplication rule we get \(9 \cdot 8 \cdot ... \cdot 5 = \frac{9!}{4!}\), which agrees. By the naive definition of probability, we get the probability of \(X\) winning:

\[\frac{(8)(3!){6 \choose 2}(2!)}{\frac{9!}{4!}} = .095\]


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#imagine a 3x3 board, labeled 1 (top left) to 9 (bottom right)
#define the winning sequences
winning = matrix(c(1, 2, 3,
                   4, 5, 6,
                   7, 8, 9,
                   1, 4, 7,
                   2, 5, 8, 
                   3, 6, 9,
                   1, 5, 9,
                   3, 5, 7), nrow = 8, ncol = 3, byrow = TRUE)


#indicator if we get a win
win = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #keep track of the board; initialize here
  board = 1:9
  
  #keep track of X and O spots; initialize here
  X = integer(0)
  O = integer(0)
  
  #draw the first 5 spots
  for(j in 1:3){
    
    #sample an X from the board
    X = c(X, sample(board, 1))
    
    #mark that the X spot is taken
    board = board[board != X[j]]
    
    #if it's the 1st or second round, sample O
    if(j < 3){
      #sample an O from the board
      O = c(O, sample(board, 1))
      
      #mark that the O spot is taken
      board = board[board != O[j]]
    }
  }
  
  #sort X, see if we won (check all winning sequences)
  X = sort(X)
  for(j in 1:8){
    
    #see if X is a winning combination
    if(isTRUE(all.equal(X, winning[j, ]))){
      win[i] = 1
    }
  }
}

#should get .095
mean(win)
## [1] 0.093




1.8

Imagine a modified game of tic-tac-toe called ‘tic-tac-max’. The only difference in tic-tac-max is that players put down \(X\)’s and \(O\)’s until the board is completely filled, not just until someone gets 3 in a row. \(X\) still goes first.

  1. Define a permutation as one game sequence; i.e., player 1 puts an \(X\) in the top left spot, then player 2 puts an \(O\) in the top right spot, etc. Permutations are considered distinct if they are completed in different orders. How many possible permutations are there to play this game?

  2. Define a ‘final board’ as the configuration of the board after each player has put all of their ‘pieces’. How many possible ‘final boards’ are there?



Analytical Solution

  1. There are 9 options for the first piece, 8 for the second, etc., so there are \(9!\) possible permutations. Order matters, so we do not need to divide out for any overcounting.

  2. We now need to adjust the \(9!\) for overcounting, because order no longer matters (it doesn’t matter if an \(X\) is put in the top left first or last; the final board will be the same. We can’t tell the order of gameplay if we just look at the final board!). There are \(5!\) ways to order the \(X\)’s (which are indistinguishable from the perspective of the final board) and \(4!\) ways to order the \(O\)’s, so dividing these out yields \(\frac{9!}{5!4!}\). This equals \({9 \choose 5} = {9 \choose 4}\), which is akin to choosing the spots for the \(X\)’s or \(O\)’s.


Empirical Solution:

#find all unique permutations
perms = permn(c(rep("X", 5), rep("O", 4)))

#should get 9! = 362880
length(perms)
## [1] 362880
#now consider only unique permutations; should get 9 choose 5 = 126
length(unique(perms))
## [1] 126




1.9

(With help from Nicholas Larus-Stone and CJ Christian)

There are \(2n\) people. How many ways are there to pair the people up?

For example, if we had four people named \(A\), \(B\), \(C\) and \(D\), one permutation would be the pairs \(\{A, B\}\) and \(\{C, D\}\). A second permutation would be the pairs \(\{A, C\}\) and \(\{B, D\}\) (one permutation is defined by pairing everyone).



Analytical Solution:

Imagine giving each person a letter, as in the prompt. In fact, we’ll continue to work with the four people named in the prompt. Now consider that we line these people up; one possible permutation is \(CBAD\). We can consider this specific permutation as pairing \(\{A, B\}\) and \(\{C, D\}\). That is, we pair the first two together, then the second two, etc. In general, if there are \(2n\) people, there are \((2n)!\) ways to line people up. Now, we must consider overcounting. It does not matter the order within pairs; that is, \(AB\) as the first two letters in the string is the same as \(BA\) as the first two letters in the string (both mean that \(A\) and \(B\) are paired). There are 2 ways to sort each pair, and \(n\) pairs, so by the multiplication rule we have to divide out by \(2^n\). Finally, it doesn’t matter what the order is across pairs. That is, \(ABCD\) is the same as \(CDAB\); in both cases, we pair \(\{A, B\}\) and \(\{C, D\}\). It doesn’t matter which we pair first! There are \(n\) of these pairs, so we have to divide out by the \(n!\) ways to permute them. Putting it all together, we get that the number of permutations is:

\[\frac{(2n)!}{2^n \cdot n!}\]


Alternatively, we can consider picking the pairs one at a time. We have \({n \choose 2}\) choices for the first pair, \({n - 2 \choose 2}\) choices for the second pair (since \(n - 2\) people are left), etc. Continuing in this way, and by employing the multiplication rule, we get:

\[{n \choose 2}{n - 2 \choose 2} ... {n \choose 2}\]

However, while the Binomial coefficient takes care of the order within pairs, we are still overcounting the order across pairs; it doesn’t matter if we pick a pair first or last, so we again have to divide out by \(n!\), which yields:

\[\frac{{n \choose 2} \cdot {n - 2 \choose 2}...{2 \choose 2}}{n!}\]

We can simplify by expanding the Binomial coefficient:

\[= \frac{n!\cdot (n - 2)! \cdot (n - 4)! ... 2!}{(n - 2)! \cdot (n - 4)! ... 2! \cdot (2!)^n \cdot n!}\]

\[\frac{(2n)!}{2^n \cdot n!}\]

Which matches the original solution.




1.10

Imagine the digits \(0,1,2,...,9\). How many ways are there to make a three-digit number and seven-digit number from these digits? For example, \(\{538\}\) and \(\{1246790\}\) as the three- and seven-digit numbers is one permutation; a second permutation is \(\{835\}\) and \(\{1246790\}\).



Analytical Solution:

There are \({10 \choose 3}\) ways to choose the three digits (which then also choose the seven digits, since these are the digits we didn’t choose for the three-digit number) and then 3! ways to order the smaller number and 7! ways to order the larger number. This gives, by the multiplication rule, \({10 \choose 3}\cdot 3! \cdot 7! = 10!\).

Alternatively, we can just imagine ordering the 10 digits and drawing a line between the third and fourth digit; that is, the first three digits become the three-digit number, and the rest become the seven-digit number. There is one way to draw this line for every permutation of the 10 digits, and there are 10! permutations, so we get 10! as above.


Empirical Solution:

#for computational ease, 5 total digits, broken into 3- and 2-digit numbers

#simply allow the first 3 digits to be the larger number,
#   and the last 2 to be the smaller number. Should get 5! = 120.
length(permn(1:5))
## [1] 120




1.11

(With help from Juan Perdomo)

Consider these 5 points:

You can make plots like this in Latex here.


Imagine selecting two points at random and drawing a straight line in between the two points. Do this 5 times, with the constraint that you cannot select the same pair twice. What is the probability that the lines and points form a pentagon (i.e., a five-sided, five-angled, closed shape)?



Analytical Solution:

There are \({5 \choose 2}\) possible pairs, and we choose 5 of these pairs to draw lines between; therefore, there are \({ {5 \choose 2} \choose 5}\) possible ways to choose pairs. Only one of these ways creates a pentagon, so by the naive definition of probability, we get:

\[\frac{1}{{ {5 \choose 2} \choose 5}} = \frac{1}{{10 \choose 5}}\]

We could also consider picking the pairs 1 by 1. We have \({5 \choose 2}\) pairs, so \({5 \choose 2}\) possibilities for the first choice, then \({5 \choose 2} - 1\) possibilities for the first choice (all pairs except the one we just picked), etc. By the multiplication rule, we get:

\[{5 \choose 2}\big({5 \choose 2} - 1\big)... \big({5 \choose 2} - 4\big)\]

Since \({5 \choose 2} = 10\), we write:

\[= 10 \cdot 9 \cdot 8 \cdot 7 \cdot 6 = 10!/5!\]

Finally, we have to consider overcounting: we don’t care the order that we pick these pairs in. There are 5 pairs, so we divide by 5! to get:

\[\frac{10!}{5!5!} = {10 \choose 5}\]

This is the number of ways to select pairs and (again) there is only one way to create a pentagon, so by the naive definition of probability we get:

\[\frac{1}{{10 \choose 5}}\]

As above.




1.12

There are \(n\) people, each with a left and a right foot. Each has 2 shoes (one for each foot), so there are \(2n\) shoes. Define a ‘permutation’ as an allocation of the \(2n\) shoes to the \(n\) people, such that the left shoes are on the left feet and the right shoes are on the right feet. If shoes are distinguishable, how many permutations are there?



Analytical Solution:

Consider separating the \(n\) left feet/shoes and the \(n\) right feet/shoes, and then number the shoes \(1,2,,...,n\). There are \(n!\) ways to organize the right shoes, and \(n!\) ways to organize the left shoes. By the multiplication rule, there are \((n!)^2\) permutations.


Empirical Solution:

#for computational speed, consider a small n
n = 3

#define the shoes; label left shoes as small values and 
#   right shoes as large values so we can distinguish
left = 1:n
right = 100:(100 + n - 1)
shoes = c(left, right)

#generate all possible permutations
perms = permutations(n = length(shoes), r = length(shoes), v = shoes, repeats.allowed = FALSE)


#iterate over the permutations and only count when we have
#   feet separated, left and right
#   should get factorial(n)^2 = 36
sum(apply(perms, 1, function(x){
  
  #see if the first n shoes are left shoes
  if(sum(x[1:n]) < 100){
    
    #see if the last n shoes are right shoes
    if(sum(x[(n + 1):length(x)])){
      return(1)
    }
  }
  return(0)
  }))
## [1] 36




1.13

Nick claims that, when \(n > k\), we have \({n \choose k} {k \choose n} = 1\) because, by the definition of the binomial coefficient, the multiplication in these terms cancel out. Explain why Nick is wrong, using both math and intuition.



Analytical Solution:

Writing out the second term, we have \({k \choose n} = \frac{k}{(k - n)! n!}\), and \((k - n)\) is negative so \((k - n)!\) is undefined. Intuitively, there are no ways to choose \(n\) people out of a group of \(k\) people when \(n > k\).


Empirical Solution:

#define simple parameters
n = 5
k = 3
choose(n, k); choose(k, n)
## [1] 10
## [1] 0




1.14

Consider 10 tosses of a fair coin. Let \(A\) be the event that you get exactly 5 heads, and \(B\) be the event that you get 10 heads.

  1. Compare \(P(A)\) and \(P(B)\).

  2. Now consider the sequences \(HTHTHTHTHT\) and \(HHHHHHHHHH\). Compare the probabilities that these sequences occur.

  3. Discuss your results in parts a. and b.



Analytical Solution:

  1. There are \(2^{10}\) possible sequences, so \(P(A) = \frac{{10 \choose 5}}{2^{10}}\) and \(P(B) = \frac{1}{2^{10}}\), since there are \({10 \choose 5}\) sequences with 5 heads (choose the 5 spots for the heads) and 1 sequence with 10 heads (only 1 way to have 10 heads). So, \(P(A) = {10 \choose 5}P(B)\).

  2. Both have probability \(\frac{1}{2^{10}}\), since they are both 1 specific sequence out of \(2^{10}\) possible sequences. In general, any one specific sequence has this probability.

  3. It’s more likely to get 5 heads, but any specific sequence has the same probability as any other specific sequence. The key is that there are many more sequences with 5 heads than sequences with 10 heads.



#replicate
set.seed(110)
sims = 1000

#for computational speed, use n = 4
n = 4

#generate the flips; define 1 as a heads, 0 as a tails
X = rbinom(sims, n, 1/2)

#find the probability of 2 heads and 4 heads
#should get choose(4, 2)/4^2 = .375 and 1/4^2 = .0625
length(X[X == 2])/sims; length(X[X == 4])/sims
## [1] 0.369
## [1] 0.048
#indicators for HTHT and HHHH
HTHT = rep(0, sims)
HHHH = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #flip the coin
  X = rbinom(n, 1, 1/2)
  
  #see if we got HTHT, or 1010
  if(all(X == c(1,0,1,0))){
    HTHT[i] = 1
  }
  
  #see if we got HHHH, or all 1's
  if(sum(X) == 4){
    HHHH[i] = 1
  }
}

#should get 1/2^4 = .0625 for each
mean(HTHT); mean(HHHH)
## [1] 0.058
## [1] 0.062




1.15

How many possible 5-letter words are there such that two letters within the word don’t repeat? That is, ‘alamo’ is ok, but ‘aalmo’ is not ok, since the letter ‘a’ repeats.



Analytical Solution:

There are 26 choices for the first letter (can select any letter) and 25 choices for each of the next letters (since we can choose any letter except for the preceding letter). By the multiplication rule, we get \(26 \cdot 25^4\).


Empirical Solution:

#create the words; for computational speed, use 3-letter words
words = permutations(n = 26, r = 3, v = letters, repeats.allowed = TRUE)


#iterate over the permutations and count when we don't
#   have repeat letters. Should get 26*25^2 = 16250
sum(apply(words, 1, function(x){
  if(x[1] != x[2] && x[2] != x[3]){
    return(1)
  }
  return(0)
  }))
## [1] 16250




1.16

(Dedicated to Diana Stone, with help from Matt Goldberg)

There is a restaurant in France called ‘7367’. After every meal, the waiter rolls four fair, 10-sided dice (numbered 1 through 10). If the dice show a 3, a 6 and two 7’s (in any order) the meal is free.

  1. If you eat at this restaurant once, what is the probability of winning a free meal?

  2. This dice game has made the restaurant very popular, but the manager of the restaurant wants to be able to adjust the game. He would like to present variations of the game for meals that are more expensive (lower probability of winning) and for meals that are less expensive (higher probability of winning).

Keeping the same structure of the dice game, how can you change the ‘winning number’ (i.e., 7367) to suit the manager’s needs, both for higher and lower probabilities of winning? Support your answer both with calculations and intuition.



Analytical Solution:

  1. There are 10 sides to each of the 4 dice, so by the multiplication rule there are \(10^4\) possible outcomes. To count the ‘successful’ 7367 outcomes, consider choosing the two dice that will show 7, then the die that will show 6 (this forces the remaining die to show 3). There are \({4 \choose 2}\) (choose 2 of the 4 dice to show 7) possible combinations for the first choice, then 2 possible combinations for the second choice (choose 1 of the 2 remaining dice to show 6) and only one combination for the remaining choice (the last die must be a 3), so by the multiplication rule there are \({4 \choose 2}\cdot 2 = 12\) ‘successful’ outcomes. By the naive definition of probability, you have a \(\frac{12}{10^4} = .0012\) probability of winning a free meal.

Alternatively, we could imagine that ‘7367’ is a ‘word’ made up of numbers instead of letters. Since the order of the dice do not matter (it doesn’t matter which dice show 7, etc.) we simply need to count the number of ways that the letters in this ‘word’ can be arranged to count the number of ‘successful’ outcomes. There are four letters and two are identical (the two 7’s), so there are \(\frac{4!}{2!} = 4 \cdot 3 = 12\), which matches our above calculation.

  1. To make a success more likely, we can make the game 7368 (replace one of the 7’s with another unique digit; that is, not 3 or 6). To make a success less likely, we can make the game 7776, or add another repeat number. There are still \(10^4\) possible outcomes for the roll and, employing the ‘word’ approach from the previous part and the naive definition of probability, we have that this more likely case has probability \(\frac{4!}{10^4} = .024\) and the less likely case has probability \(\frac{4!}{3! \cdot 10^4} = .004\).

Note that the more likely case is twice as likely as the original 7367 case (.024 to .012). This is intuitive: imagine comparing the probability of rolling two 7’s against the probability of rolling a 7 and an 8. In the former case, both dice must show 7; in the latter, the first can show 7 and the second 8, or vice versa (recall that the ordering of the dice do not matter). That is, there are two successful outcomes instead of one, so the success is twice as likely.

Further, the less likely case is a third as likely as the original 7367 case (.004 to .012). This is also intuitive: imagine comparing the probability of rolling two 7’s and a 3 to the probability of rolling three 7’s. In the former case, we have three choices: we choose which die shows the 3. In the latter case, all three dice must show 7, so we only have one choice. That is, since there are three successful outcomes instead of one, the success of the 7367 case is three times more likely.

It would also work to make the game 7777, 7733, etc., to make a success less likely compared to 7367.


Empirical Solution:

#replicate
set.seed(110)

#increased number of sims; rare events
sims = 10000

#indicators for success in the three cases
success = rep(0, sims)
more.success = rep(0, sims)
less.success = rep(0, sims)

#create a die
die = 1:10


#run the loop
for(i in 1:sims){
  
  #roll 4 dice
  roll = sort(sample(die, 4, replace = TRUE))
  
  #see if the roll matches the first case
  if(all(roll == c(3,6,7,7))){
    success[i] = 1
  }
  
  #see if the roll matches the 'more likely' case
  if(all(roll == c(3,6,7,8))){
    more.success[i] = 1
  }
  
  #see if the roll matches the 'less likely' case
  if(all(roll == c(6,7,7,7))){
    less.success[i] = 1
  }
}

#should get .012, .024 and .004
mean(success); mean(more.success); mean(less.success)
## [1] 0.0014
## [1] 0.0034
## [1] 4e-04




1.17

Matt, Dan, Alec, Edward and Patrick are settling in for board game night. They pick their spots randomly at a round table with 5 evenly spaced seats. What is the probability that Dan is sitting next to Alec (i.e., he is sitting either directly to the left or right of Alec)?



Analytical Solution:

By symmetry (the table is round and all seats are alike) it does not matter where Alec is sitting; wherever Alec sits, there are 4 remaining seats and 2 that are adjacent to him. Therefore, Dan has a \(\frac{2}{4} = \frac{1}{2}\) probability of sitting next to Alec.

Alternatively, there are 5 choices for Alec’s seat, and then 2 choices for Dan’s seat (either side of Alec), so by the multiplication rule there are \(5 \cdot 2 = 10\) ‘successful’ combinations. In general, there are 5 choices for Alec’s seat and then 4 possible choices for Dan’s seat (any seat but Alec’s seat) so there are \(5 \cdot 4 = 20\) ‘possible’ outcomes. By the naive definition of probability, we have a \(\frac{10}{20} = \frac{1}{2}\) probability of success, which agrees with our calculation above.


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#design the table; put 1's where Alec and Dan sit
table = c(1,1,0,0,0)

#mark if Dan and Alec sit next to each other
success = rep(0, sims)


#run the loop
for(i in 1:sims){
  
  #draw a random seating
  seats = sample(table)
  
  #see if they are sitting next to each other
  if(sum(seats[1:2]) == 2 || sum(seats[2:3]) == 2 || sum(seats[3:4]) == 2
     || sum(seats[4:5]) == 2 || sum(seats[2:4]) == 0){
    
    success[i] = 1
  }
}

#should get 1/2
mean(success)
## [1] 0.506




1.18

Jon Snow and Robb Stark are both in need of knights. There are currently 100 available knights at Winterfell (Jon and Robb’s home); Jon is going to take 10 knights and go North, and Robb is going to take 30 knights and go South.

To avoid potentially displaying favoritism and upsetting the knights of the realm, Jon and Robb will both randomly select their knights. They will flip a coin to determine who goes first; the winner of the coin flip will randomly select his required number of knights (10 or 30) and the loser of the coin flip will then randomly select his required number of knights (10 or 30) from the remaining number of knights (90 or 70).

  1. Jon claims that there are \({100 \choose 10} \cdot {90 \choose 30}\) possible combinations (one single combination marks the knights that Jon took, the knights that Robb took and the knights that stayed back). Robb, though, claims that Jon is off by a factor of 2, because he forgot to account for who picks their knights first (Jon or Robb). Who is correct?

  2. Robb and Jon flip the coin, and Robb wins (the coin lands on the “direwolf” sigil, not the “crow” sigil). However, Robb is not pleased: “Great”, he says, “now I go first, which means I have a higher chance of taking Theon” (Theon is one of the 100 knights and is well known for his ineptitude). Robb and Jon will still both randomly select their knights, as the rules stipulate; does Robb have a higher chance of selecting Theon if he goes first?



Analytical Solution:

  1. Robb is essentially claiming that Order Matters; in this case, Robb is incorrect. While it is true that the same combination can be arrived at in two different ways (either Robb picks first or Jon picks first), this does not change the number of combinations (even if a combination can be arrived at in two ways, it is the same combination).

We could also envision that there are two possible states of the world: one where Robb wins the toss, and one where Jon wins the toss. If Jon wins the toss, there are \({100 \choose 10} \cdot {90 \choose 30}\) combinations by the multiplication rule; if Robb wins the toss, there are \({100 \choose 30} \cdot {70 \choose 10}\) combinations by the multiplication rule. These two values are equal (you can check this in R or by writing the binomial coefficients out). Additionally, the set of combinations if Jon goes first is the same as the set of combinations if Robb goes first (all of the potential combinations can still be arrived at, regardless of who goes first). Since the set of possible combinations is identical in both possible states of the world, it must be the original set from before the coin toss. That is, the order of the picks (result of the coin toss) does not matter.

  1. If Robb goes first, he has probability \(\frac{30}{100} = \frac{3}{10}\) of picking Theon (Robb owns 30 ‘slots’ to pick knights with, and he picks the knights randomly).

Now consider if Robb goes second. We already know the total number of overall combinations from the previous part of the problem; let’s now count the number of combinations where Jon doesn’t pick Theon and Robb does pick Theon (in this part, Jon goes first and Robb goes second). If we restrict Jon from picking Theon, he has \({99 \choose 10}\) choices (any of the 99 men but Theon). Then, we assign Theon to Robb, and Robb then has \({89 \choose 29}\) choices left (select 29 more men from the remaining 89 men to complete the 30 knights). By the multiplication rule, there are \({99 \choose 10} \cdot {89 \choose 29}\) combinations where Robb picks Theon (going second) and \({100 \choose 10} \cdot {90 \choose 30}\) overall combinations. By the naive definition of probability, Robb has probability \(\frac{{99 \choose 10} \cdot {89 \choose 29}}{{100 \choose 10} \cdot {90 \choose 30}} = .3\) of selecting Theon if he goes second.

In both cases (going first or second), Robb has probability .3 of selecting Theon; therefore, he has an equal chance of selecting Theon if he goes first or second.


Empirical Solution:

#part b. only
#replicate
set.seed(110)
sims = 1000

#mark if Robb picks Theon, depending if Robb picks first or second
success.first = rep(0, sims)
success.second = rep(0, sims)


#run the loop
for(i in 1:sims){
  
  #define the knights; mark Theon as 1
  knights = c(1, rep(0, 99))
  
  #the case where Robb goes first
  robb.first = sample(knights, 30, replace = FALSE)
  
  #see if Robb got Theon
  if(sum(robb.first) > 0){
    success.first[i] = 1
  }
  
  
  #the case where Robb goes second; let Jon pick first
  jon = sample(1:100, 10, replace = FALSE)
  
  #remove the knights Jon picked
  knights = knights[-jon]
  
  #sample for Robb among the remaining knights
  robb.second = sample(knights, 30, replace = FALSE)
  
  #see if Robb got Theon
  if(sum(robb.second) > 0){
    success.second[i] = 1
  }
}

#should both be .3
mean(success.first); mean(success.second)
## [1] 0.274
## [1] 0.304




1.19

Imagine a game called ‘Cards Roulette’. The four Aces are removed from the deck and are well shuffled. You take turns with a friend drawing Aces without replacement from the four cards. The first player to draw a red Ace (Hearts or Diamonds) loses.

  1. If you would like to maximize your probability of winning this game, should you go first or second?

  2. Compare this to the Russian Roulette problem from this chapter (click here for a video recap of this problem). Think about how these two problems compare in structure and make an intuitive argument comparing the two solutions.



Analytical Solution:

  1. Let’s find the probability of winning if you go first. The only way to win if you go first is to pick a black Ace and then have the other player pick a red Ace (if the other player picks a black Ace second, there will only be red Aces left for you).

There are \(\frac{4!}{2! 2!} = 6\) ways to arrange the Aces (since, for our purposes, the red Aces are identical and the black Aces are identical). There are 2 ways to arrange the Aces such that the first Ace is black and the second is red (i.e., you win by going first); we know this because we fix the first card to be black and the second to be red, and then there are two ways to arrange the last two cards: black and red or red and black (recall that two aces of the same color are indistinguishable). Therefore, by the naive definition of probability, you have a \(1/3\) probability of going first. Since one of the players must win, this implies that you have a \(2/3\) probability of winning if you go second, which means you would rather go first.

  1. In the original Russian Roulette example, we were (probabilistically) indifferent between going first and second. To prove this, we employed a symmetry argument: essentially, going first means that you ‘own’ the first, third and fifth spot, and, by symmetry, there is a \(3/6 = 1/2\) probability that the bullet ends up in any of these slots.

In this case, we have two ‘bullets’, or ‘red cards’ (the problems are the same in structure). We are not really concerned with where both red cards end up, but where the first red card ends up (that is, as soon as the first red card is drawn, the game is over). So, in this case we own spots 1 and 3 and there are four slots, but the symmetry argument no longer applies: there is no longer an equal probability that the first red card ends up in any of the four slots. In fact, the first red card cannot end up in the fourth slot; in the most extreme scenario, the last two cards will be red, meaning that the first red card is the third card drawn. In general, since we are looking for the first of multiple red cards, it is more likely that this red card is drawn earlier on, which is why going first marks a higher probability of a loss (we will dive further into this topic when we discuss Order Statistics).


Empirical Solution:

#part a.
#replicate
set.seed(110)
sims = 1000

#define the cards; 1 means a red card
cards = c(1,1,0,0)

#keep track of losses
lose = rep(0, sims)


#run the loop
for(i in 1:sims){
  
  #draw a random chamber
  deck = sample(cards)
  
  #iterate through the chambers
  for(j in 1:(length(cards) - 1)){
    
    #stop if we found the bullet
    if(deck[j] == 1){
      
      #mark if we lost
      if(j%%2 == 1){
        lose[i] = 1
      }
      
      #next loop
      break
    }
  }
}

#should get 2/3
mean(lose)
## [1] 0.696




1.20

Using a story proof, show:

\[{n \choose x} 2^x = \frac{\big(2n\big)\big(2n - 2\big)...\big(2n - 2(x - 1)\big)}{x!}\]

Where \(n > x\).

Hint: During police interrogations, it is common to adapt a ‘good cop/bad cop’ strategy. That is, two cops enter the interrogation room, and one cop is the ‘good cop’ (they are nice to the perpetrator to try to get them to open up) while the other cop is the ‘bad cop’ (they are rude and possibly even threatening to scare the perpetrator).

Imagine that you are the police commissioner and you are presented with pairs of cops; the cops come in teams of two. It is your job to select teams, and then assign which cops will be the ‘good cops’ and which cops will be the ‘bad cops’ (each team must have one good cop and one bad cop).



Analytical Solution:

As prompted in the Hint, both sides count the number of ways that the police commissioner can choose teams of cops and then assign a ‘good cop’ and a ‘bad cop’ within each team. Imagine that there are \(n\) pairs of cops and the commissioner needs to choose \(x\) pairs of cops.

First, consider the LHS. The commissioner first selects the pairs that he will assign. There are \(n\) total pairs and he will select \(x\) pairs, so there are \({n \choose x}\) possible choices. After he has selected the pairs, he needs to assign a good cop and a bad cop in each pair. There are 2 options for each of the \(x\) pairs (either cop can be the good cop), so by the multiplication rule there are \(2^x\) possible combinations once the pairs have been decided. Putting it all together via the multiplication rule, we have \({n \choose x} 2^x\), as desired.

Next, consider the RHS. Here, the police commissioner picks a random cop to be a good cop, and then automatically assigns that cop’s partner as the bad cop of that pair; these two cops make the first pair. With the first pair gone, he then randomly selects another good cop, and proceeds in the same way. The first choice has \(2n\) possibilities (he can choose any of the cops), the second choice has \(2n - 2\) possibilities (he can choose any of the cops except for the first pair), etc., until the commissioner has chosen \(x\) pairs. By the multiplication rule, we have \(\big(2n\big)\big(2n - 2\big)...\big(2n - 2(x - 1)\big)\), or the numerator on the RHS. This overcounts by a factor of \(x!\) though, as it does not matter the order that the commissioner picks cops in; he can arrive at the same combination in different orders. Dividing this factor of \(x!\) out gives us the RHS.

Ultimately, the LHS considers when the commissioner picks the pairs first and then the good cops, and the RHS considers simply when the commissioner picks good cops and automatically assigns pairs in that way.

#try for specific values of n and x; should always hold
n = 13
x = 3

#these should be equal; this single case doesn't prove this in general, of course!
y = seq(from = 2*n, to = 2*n - 2*(x - 1), by = -2)
choose(n, x)*2^x; prod(y)/factorial(x)
## [1] 2288
## [1] 2288




1.21

Define ‘skip-counting’ as an alternative way to count integers. Imagine counting from 1 to 10: a legal ‘skip-count’ is an ascending set of integers that starts with 1 and ends with 10 (i.e., 1-2-5-6-10 and 1-3-10 are both legal ‘skip-counts’).

How many skip-counts are there if we skip-count from 1 to some integer \(n > 1\)?



Analytical Solution:

For each integer in between 1 and \(n\), we simply have to decide if we want to include it in the skip-count. For example, if \(n = 5\), one possible skip-count is 1-2-5. In this skip-count, we include 2 and do not include 3 or 4.

There are \(n - 2\) integers between \(n\) and 1, and we have 2 choices for each (include in skip-count or not) so there are \(2^{n - 2}\) possible skip-counts (we do not have to worry about order because there is only one correct order: ascending order).




1.22

Masterminded by Alexander Lee.


The NFL (National Football League) consists of 32 games that play 17 games amongst each other in a season (each team is given one ‘bye’ week that they have off, but for simplicity in this problem we will assume all teams play all 17 weeks). By extension, there are 16 games each week (each of the 32 teams plays one other team).

Imagine entering an NFL betting pool with the following rules: you can purchase a single entry for $1, which allows you to pick a game a week until you are wrong. That is, every week, you select a team out of the 32 that you think will win, and if you are correct (the team wins or ties) you advance to the second week. You win if you ‘survive’ for 17 weeks (pick 17 games, one in each week, correctly).

You are allowed to purchase multiple entries; simply imagine that different entries are different chances to play the game. Each entry is independent (you can pick different games, the same games, etc. with different entries) and when an entry fails (you pick a wrong game for that entry) the entry is eliminated, but other entries may continue. If any one of your entries makes it 17 weeks, you win.

What is the least amount of money you have to pay to buy enough entries that allow you to implement a strategy that guarantees victory (at least one entry survives)?



Analytical Solution:

Let’s think about surviving just one week. If we buy two entries, we are guaranteed that at least one entry will advance: we can just use one entry to pick one team, and the other entry to pick the team playing that team! That is, if the New England Patriots are playing the Atlanta Falcons, we could put the first entry on the Patriots, and the second entry on the Falcons (recall that if teams tie, both entries advance).

So, if we are in Week 17 and need to pick one more game correctly, we need 2 entries to still be ‘alive’. Extending that logic, we need 4 entries to still be alive in Week 16 (put 2 entries on one team and 2 entries on the other team in that matchup, and at least 2 entries will advance). Continuing in this way, we see that we need \(2^{17}\) entries in the first week, so we have to pay a mere \(\$131,072\).




BH Problems



The problems in this section are taken from @BH. The questions are reproduced here, and the analytical solutions are freely available online. Here, we will only consider empirical solutions: answers/approximations to these problems using simulations in R.




BH 1.8
  1. How many ways are there to split a dozen people into 3 teams, where one team has 2 people, and the other two teams have 5 people each?
#consider splitting 7 people into 3 teams (one team of 3, two teams of 2)
#   for computational speed

#label people 1 to 7
people = 1:7

#permute the people
perms = permutations(n = 7, r = 7, v = people)

#define the first two people as the first team, etc.
#sort within teams with the 'apply' function, then transpose
#   with the t function to get back to original structure
first.team = t(apply(perms[, 1:2], 1, function(x) sort(x)))
second.team = t(apply(perms[, 3:4], 1, function(x) sort(x)))
third.team = t(apply(perms[, 5:7], 1, function(x) sort(x)))

#bind the teams back together
teams = cbind(first.team, second.team, third.team)

#count the unique teams, divide by two because order doesn't matter when comparing
#   first and second teams
#should get factorial(7)/(factorial(2)*factorial(2)*factorial(3)*2) = 105
dim(unique(teams))[1]/2
## [1] 105


  1. How many ways are there to split a dozen people into 3 teams, where each team has 4 people?
#consider splitting 6 people into 3 teams, for computational speed
#label people 1 to 7
people = 1:6

#permute the people
perms = permutations(n = 6, r = 6, v = people)

#define the first two people as the first team, etc.
#sort within teams with the 'apply' function, then transpose
#   with the t function to get back to original structure
first.team = t(apply(perms[, 1:2], 1, function(x) sort(x)))
second.team = t(apply(perms[, 3:4], 1, function(x) sort(x)))
third.team = t(apply(perms[, 5:6], 1, function(x) sort(x)))

#bind the teams back together
teams = cbind(first.team, second.team, third.team)

#count the unique teams, divide by 3! because order doesn't matter when comparing
#   teams.
#should get factorial(6)/(factorial(2)*factorial(2)*factorial(2)*factorial(3)) = 15
dim(unique(teams))[1]/factorial(3)
## [1] 15




BH 1.9
  1. How many paths are there from the point \((0,0)\) to the point \((110,111)\) in the plane such that each step either consists of going one unit up or one unit to the right?
#for computational speed, only go to the point (3, 4)
#we need to go right 3 steps and up 4 steps
steps = c(rep("R", 3), rep("U", 4))

#count the unique permutations; should get choose(7, 4) = 35
dim(unique(permutations(n = 7, r = 7, v = steps, set = FALSE, repeats.allowed = FALSE)))[1]
## [1] 35


  1. How many paths are there from \((0,0)\) to \((210,211)\), where each step consists of going one unit up or one unit to the right, and the path has to go through \((110,111)\)?
#for computational speed, go from (0,0) to (3,4) and then to (7, 8)
#we already counted number of ways to go from (0,0) to (3,4)
#   now count number of ways to go from (3,4) to (7,8). Must go up/right 4 times each
steps.0 = c(rep("R", 4), rep("U", 4))

#by the multiplication rule, multiply number of steps to each location
#should get choose(7, 4)*choose(8, 4) = 2450
(dim(unique(permutations(n = 7, r = 7, v = steps, set = FALSE, repeats.allowed = FALSE)))[1])*(dim(unique(permutations(n = 8, r = 8, v = steps.0, set = FALSE, repeats.allowed = FALSE)))[1])
## [1] 2450




BH 1.16

Show that for all positive integers \(n\) and \(k\) with \(n \geq k\), \[{n \choose k} + {n \choose {k-1}} = {{n+1} \choose k},\] doing this in two ways: (a) algebraically and (b) with a story, giving an interpretation for why both sides count the same thing.


#try for specific values of n and k; should always hold
n = 10
k = 5

#these should be equal; this single case doesn't prove this in general, of course!
choose(n, k) + choose(n, k - 1); choose(n + 1, k)
## [1] 462
## [1] 462




BH 1.18
  1. Show using a story proof that \[{k \choose k} + {k+1 \choose k} + {k+2 \choose k} + ... + {n \choose k} = {n+1 \choose k+1},\] where \(n\) and \(k\) are positive integers with \(n \geq k\). This is called the hockey stick identity.
#try for a specific value of n; should always hold
n = 10
k = 1:n

#these should be equal; this single case doesn't prove this in general, of course!
sum(choose(k,k[1])); choose(n + 1, k[1] + 1)
## [1] 55
## [1] 55


  1. Suppose that a large pack of Haribo gummi bears can have anywhere between 30 and 50 gummi bears. There are 5 delicious flavors: pineapple (clear), raspberry (red), orange (orange), strawberry (green, mysteriously), and lemon (yellow). There are 0 non-delicious flavors. How many possibilities are there for the composition of such a pack of gummi bears? You can leave your answer in terms of a couple binomial coefficients, but not a sum of lots of binomial coefficients.
#for computational speed, only allow the pack to have between 4 and 6 gummi bears
#define the flavors; label 1 to 5
flavors = 1:5

#generate the packs, for packs from size 4 to 6
pack4 = permutations(n = 5, r = 4, v = flavors, repeats.allowed = TRUE)
pack5 = permutations(n = 5, r = 5, v = flavors, repeats.allowed = TRUE)
pack6 = permutations(n = 5, r = 6, v = flavors, repeats.allowed = TRUE)

#sort the packs; 'apply' sorts, and then t() transposes back to the original configuration
pack4 = t(apply(pack4, 1, sort))
pack5 = t(apply(pack5, 1, sort))
pack6 = t(apply(pack6, 1, sort))

#add up the permutations
#should get choose(8, 4) + choose(9, 4) + choose(10, 4) = choose(11, 5) - choose(8, 5) = 406
dim(unique(pack4))[1] + dim(unique(pack5))[1] + dim(unique(pack6))[1]  
## [1] 406




BH 1.22

A certain family has 6 children, consisting of 3 boys and 3 girls. Assuming that all birth orders are equally likely, what is the probability that the 3 eldest children are the three girls?

#generate all possible birth orderings
kids = permn(c("G","G","G","C","C","C"))

#pack into a matrix (6 columns) then a data frame
kids = matrix(unlist(kids), ncol = 6, byrow = TRUE)
kids = data.frame(kids)

#if we don't name the columns of the data frame, they are X1, X2, ... by default
#should get .05
length(kids$X1[kids$X1 == "G" & kids$X2 == "G" & kids$X3 == "G"])/length(kids$X1)
## [1] 0.05




BH 1.23

A city with 6 districts has 6 robberies in a particular week. Assume the robberies are located randomly, with all possibilities for which robbery occurred where equally likely. What is the probability that some district had more than 1 robbery?

#replicate 
set.seed(110)
sims = 1000

#matrix that tracks where the crimes were
districts = matrix(0, nrow = sims, ncol = 6)

#mark successes (more than 1 robbery)
success = rep(1, sims)

#run the loop
for(i in 1:sims){
  
  #pick a random district for each crime
  districts[i, ] = sample(1:6, 6, replace = TRUE)
  
  #sort the sampled disticts
  sort = sort(districts[i, ])
             
  #see if we got districts 1 through 6, one for each (that is, no district
  #   had more than 1 robbery)
  #the 'all.equal' function checks if vectors are equal; isTRUE checks if 'all.equal' returned TRUE
  if(isTRUE(all.equal(sort, 1:6))){
    success[i] = 0
  }
}

#analytical solution is .9846
mean(success)
## [1] 0.982




BH 1.26

A college has 10 (non-overlapping) time slots for its courses, and blithely assigns courses to time slots randomly and independently. A student randomly chooses 3 of the courses to enroll in. What is the probability that there is a conflict in the student’s schedule?

#replicate 
set.seed(110)
sims = 1000

#mark successes (overlap in classes)
success = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #pick a random slot for each class
  slots = sample(1:10, 3, replace = TRUE)
  
  #if we don't have 3 unique entries, we have an overlap
  if(length(unique(slots)) != 3){
    success[i] = 1
  }
}

#should get .285
mean(success)
## [1] 0.285




BH 1.27

For each part, decide whether the blank should be filled in with =, < or > and give a clear explanation.

  1. (probability that the total after rolling 4 fair dice is 21) vs. (probability that the total after rolling 4 fair dice is 22)
#replicate 
set.seed(110)
sims = 1000

#indicators for getting 21 and 22 
success.21 = rep(0, sims)
success.22 = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #roll the die 4 times
  roll = sum(sample(1:6, 4, replace = TRUE))
  
  #fill in the indicators if necessary
  if(roll == 21){
    success.21[i] = 1
  }
  if(roll == 22){
    success.22[i] = 1
  }
}

#21 should be twice as likely (try the simulation a few times)
mean(success.21)
## [1] 0.009
mean(success.22)
## [1] 0.01


  1. (probability that a random 2-letter word is a palindrome) vs. (probability that a random 3-letter word is a palindrome)
#replicate
set.seed(110)
sims = 1000

#the 'letters' vector is given in R

#indicator for the palindromes
two.letters = rep(0, sims)
three.letters = rep(0, sims)

for(i in 1:sims){
  
  #generate a two letter word, see if it's a palindrome
  two.word = sample(letters, 2, replace = TRUE)
  
  #see if it's the same as the reversed word (i.e., a palindrome)
  if(isTRUE(all.equal(two.word, rev(two.word)))){
    two.letters[i] = 1
  }
  
  #generate a three letter word, see if it's a palindrome
  three.word = sample(letters, 3, replace = TRUE)
  
  #see if it's the same as the reversed word (i.e., a palindrome)
  if(isTRUE(all.equal(three.word, rev(three.word)))){
    three.letters[i] = 1
  }
}

#both should be 1/26 = .038
mean(two.letters)
## [1] 0.034
mean(three.letters)
## [1] 0.03




BH 1.29

Elk dwell in a certain forest. There are \(N\) elk, of which a simple random sample of size \(n\) are captured and tagged (“simple random sample” means that all \({N \choose n}\) sets of \(n\) elk are equally likely). The captured elk are returned to the population, and then a new sample is drawn, this time with size \(m\). This is an important method that is widely used in ecology, known as capture-recapture. What is the probability that exactly \(k\) of the \(m\) elk in the new sample were previously tagged? (Assume that an elk that was captured before doesn’t become more or less likely to be captured again.)

#replicate
set.seed(110)
sims = 1000

#define simple parameters
N = 100
n = 50
m = 10

#count how many overlap
overlap = rep(NA, sims)

#run the loop
for(i in 1:sims){
  
  #pick the first sample
  first.draw = sample(1:N, n)
  
  #second sample
  second.draw = sample(1:N, m)
  
  #count how many overlap with the 'intersect' function
  overlap[i] = length(intersect(first.draw, second.draw))
}


#the histogram and PMF should be a close fit
#plot the empirical PMF
plot(table(overlap)/sims, main = "PMF",
     xlab = "# of Recaptured Elk (k)", col = "black",
     ylab = "P(k Elk are recaptured)", lwd = 3)

#plot the analytical PMF
k = 1:10
lines(choose(n, k)*choose(N - n, m - k)/choose(N, m),
      lwd = 5, col = "red", type = "p", pch = 16)

legend("topright", legend = c("Empirical PMF", "Analytical PMF"),
       lty=c(1,1), lwd=c(2.5,2.5),
       col=c("black", "red"))




BH 1.31

A jar contains \(r\) red balls and \(g\) green balls, where \(r\) and \(g\) are fixed positive integers. A ball is drawn from the jar randomly (with all possibilities equally likely), and then a second ball is drawn randomly.

  1. Explain intuitively why the probability of the second ball being green is the same as the probability of the first ball being green.
#replicate
set.seed(110)
sims = 1000

#set parameters 
r = 10
g = 10

#create a vector of balls
balls = c(rep("r", r), rep("g", r))

#indicators if the first/second ball is green
first = rep(0, sims)
second = rep(0, sims)

for(i in 1:sims){
  
  #sample 2 balls
  pick = sample(balls, 2)
  
  #see if we got green balls
  if(pick[1] == "g"){
    first[i] = 1
  }
  
  if(pick[2] == "g"){
    second[i] = 1
  }
}

#these should be equal
mean(first)
## [1] 0.479
mean(second)
## [1] 0.539


  1. Define notation for the sample space of the problem, and use this to compute the probabilities from (a) and show that they are the same.

  2. Suppose that there are 16 balls in total, and that the probability that the two balls are the same color is the same as the probability that they are different colors. What are \(r\) and \(g\) (list all possibilities)?




BH 1.32

A random 5-card poker hand is dealt from a standard deck of cards. Find the probability of each of the following possibilities (in terms of binomial coefficients).

  1. A flush (all 5 cards being of the same suit; do not count a royal flush, which is a flush with an ace, king, queen, jack, and 10).

  2. Two pair (e.g., two 3’s, two 7’s, and an ace).

#replicate
set.seed(110)
sims = 1000

#create a deck
deck = matrix(0, nrow = 52, ncol = 2)
deck = data.frame(deck)
colnames(deck) = c("Suit", "Value")

#fill in suits and values
deck$Suit = c(rep("H", 13), rep("D", 13), rep("S", 13), rep("C", 13))
deck$Value = c(rep(c("2","3","4","5","6","7","8","9","10","J","Q","K","A"), 4))


#indicators for flush, two pair
flush = rep(0, sims)
two.pair = rep(0, sims)

#run the loop
for(i in 1:sims){
  
 #draw a hand
 hand = deck[sample(1:52, 5, replace = FALSE),]
 
 #check for a flush
 if(length(unique(hand$Suit)) == 1){
   flush[i] = 1
   
   #don't count a royal flush
   if(length(intersect(hand$Suit, "10")) == 1 &&
      length(intersect(hand$Suit, "J")) == 1 &&
      length(intersect(hand$Suit, "Q")) == 1 &&
      length(intersect(hand$Suit, "K")) == 1 &&
      length(intersect(hand$Suit, "A")) == 1){
        flush[i] = 0
      }
 }


 #count how many we have of each value (for two pair, want 2 2 1)
 counts = as.vector(table(hand$Value))
 counts = sort(counts, decreasing = TRUE)
 
 #check if we have 2 2 1
 if(isTRUE(all.equal(counts, c(2,2,1)))){
   two.pair[i] = 1
 }
}

#should be 4*(choose(13,5) - 1)/choose(52,5) = .001
mean(flush)
## [1] 0.001
#should be choose(13,2)*choose(4,2)^2*44/choose(52,5) = .047
mean(two.pair)
## [1] 0.056




BH 1.40

A norepeatword is a sequence of at least one (and possibly all) of the usual 26 letters a,b,c,…,z, with repetitions not allowed. For example, “course” is a norepeatword, but “statistics” is not. Order matters, e.g., “course” is not the same as “source”.

A norepeatword is chosen randomly, with all norepeatwords equally likely. Show that the probability that it uses all 26 letters is very close to \(1/e\).

#replicate
set.seed(110)
sims = 1000

#construct a sample of random norepeatwords (select sizes randomly)
#see if we get a 26 letter norepeatword
all.letters = rep(0, sims)

#keep track of norepeatwords
norepeatwords = character(0)

#run the loop
for(i in 1:sims){
  
  #go until we get a norepeatword, then break
  while(TRUE){
    
    #sample a random number for the length
    size = sample(1:26, 1)
    
    #sample a candidate word. the 'letters' vector is stored in R
    candidate = sample(letters, size, replace = TRUE)
    
    #if the candidate word is a norepeatword, break
    if(length(unique(candidate)) == size){
      break
    }
  }
  
  #see if we got a 26 letter word
  if(length(candidate) == 26){
    all.letters[i] = 1
  }
  
  #collapse the candidate into one word
  candidate = paste(candidate, collapse = "")
  
  #tack on the candidate word
  norepeatwords = c(norepeatwords, candidate)
}


#unforunately, this is a flawed experiment
#we won't observe any 26 letter words because it's hard to generate these randomly;
#   they all have low probabilities. The only other solution is to construct
#   the full population of norepeatwords by hand, but this is computationally large
max(all.letters)
## [1] 0
#let's force a maximum of 3 letters for a norepeatword, and then
#   find the probability of a three letter norepeatword

#generate all norepeat words of length 1, 2 and 3
norepeat1 = combinations(n = 26, r = 1, v = letters)
norepeat2 = combinations(n = 26, r = 2, v = letters)
norepeat3 = combinations(n = 26, r = 3, v = letters)

#multiply by the number of permutations (i.e., 3 letter word has 3! possible permutations)
#should get factorial(3)*choose(26, 3)/(factorial(3)*choose(26, 3) + 
#   factorial(2)*choose(26, 2) + factorial(1)*choose(26, 1)) = .958
dim(norepeat3)[1]*factorial(3)/(dim(norepeat3)[1]*factorial(3) + dim(norepeat2)[1]*factorial(2) + dim(norepeat1)[1])
## [1] 0.9584665




BH 1.48

A card player is dealt a 13-card hand from a well-shuffled, standard deck of cards. What is the probability that the hand is void in at least one suit (“void in a suit” means having no cards of that suit)?

#replicate
set.seed(110)
sims = 1000

#create a deck
deck = matrix(0, nrow = 52, ncol = 2)
deck = data.frame(deck)
colnames(deck) = c("Suit", "Value")

#fill in suits and values
deck$Suit = c(rep("H", 13), rep("D", 13), rep("S", 13), rep("C", 13))
deck$Value = c(rep(c("2","3","4","5","6","7","8","9","10","J","Q","K","A"), 4))

#indicators for esuccess (void of at least one suit)
success = rep(0, sims)


for(i in 1:sims){
  
  #draw a hand
  hand = deck[sample(1:52, 13, replace = FALSE),]
  
  #check if there are less than 4 unique suits
  if(length(unique(hand$Suit)) < 4){
    success[i] = 1
  }
}

#should be .051
mean(success)
## [1] 0.054




BH 1.52

Alice attends a small college in which each class meets only once a week. She is deciding between 30 non-overlapping classes. There are 6 classes to choose from for each day of the week, Monday through Friday. Trusting in the benevolence of randomness, Alice decides to register for 7 randomly selected classes out of the 30, with all choices equally likely. What is the probability that she will have classes every day, Monday through Friday? (This problem can be done either directly using the naive definition of probability, or using inclusion-exclusion.)

#replicate
set.seed(110)
sims = 1000

#6 Mondays, 6 Tuesdays, etc.
classes = c(rep("M", 6), rep("Tu", 6), rep("W", 6), rep("Th", 6), rep("F", 6))

#indicator for getting a class each day
success = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #sample 7 random classes for the schedule
  schedule = sample(classes, 7, replace = FALSE)
  
  #check if we have one on each day
  if(length(unique(schedule)) == 5){
    success[i] = 1
  }
}

#should be .302
mean(success)
## [1] 0.333




BH 1.59

There are 100 passengers lined up to board an airplane with 100 seats (with each seat assigned to one of the passengers). The first passenger in line crazily decides to sit in a randomly chosen seat (with all seats equally likely). Each subsequent passenger takes his or her assigned seat if available, and otherwise sits in a random available seat. What is the probability that the last passenger in line gets to sit in his or her assigned seat? (This is a common interview problem, and a beautiful example of the power of symmetry.)

#replicate
set.seed(110)
sims = 1000

#indicator if the last passenger gets seat 100
success = rep(0, sims)

#run the loop
for(j in 1:sims){
  
  #assume person i is assigned the i^th seat; i.e., person 37 is assigned seat 37
  seats = 1:100
  
  #keep track of taken seats.  This numeric(0) creats an empty vector that we can tack on to
  taken = numeric(0)
  
  #take out the first seat from the crazy guy
  taken = c(taken, sample(seats, 1))
  seats = seats[-taken]
  
  #put everyone else down
  for(i in 2:100){
    
    #check if the seat is taken; if not, take it!
    if(length(seats[seats == i]) > 0){
      seats = seats[-which(seats == i)]
      taken = c(taken, i)
    }
    
    #if it's taken, take a random seat
    else if(length(seats[seats == i]) == 0){
      pick = sample(seats, 1)
      taken = c(taken, pick)
      seats = seats[-which(seats == pick)]
    }
  }
 
  #see if the person got his seat
  if(taken[100] == 100){
    success[j] = 1
  }
}

#should be 1/2
mean(success)
## [1] 0.504





Conditional Probability




2.1

Juan has \(n = 10\) different pairs of socks (\(2n\) socks total). Every morning when we wakes up, he randomly chooses socks one at a time until he gets a pair (i.e., both socks from the fifth pair). Let \(X\) be the number of socks he chooses before he gets a pair (not including the sock that makes a pair). Find the PMF of \(X\).

Hint: Define a ‘double factorial’ \(n!!\) as a factorial that skips every other number; for even numbers the factorial iterates down to 2, and for odd numbers the factorial iterates down to 1. For example, \(10!! = 10\cdot 8 \cdot 6 \cdot ... \cdot 2\) and \(9!! = 9\cdot 7 \cdot 5 \cdot ... \cdot 1\). This may be useful in counting the number of ways to select socks in a way that doesn’t create a pair.



Analytical Solution: This is slightly similar to the birthday problem, except that for every sock that we select, we remove 2 potential socks from the pool (if we select the left sock from pair 1, we can no longer select the left sock, since we already picked it, and the right sock if we don’t want to make a pair). We will break this into two steps: the probability of selecting \(x\) unique socks, and then the probability of selecting a non-unique sock immediately after, conditioning on the fact that we selected \(x\) unique socks (and thus creating a pair after selecting \(x\) uniques socks).

First, consider the probability of selecting \(x\) unique socks. We can employ the naive definition of probability. The denominator, the number of ways to select \(x\) socks from the 20, can be written \(20\cdot 19 \cdot ... \cdot (20 - x)\) by the multiplication rule (for the first sock, we have 20 choices, then 19 choices, etc., until we have chosen \(x\) socks). In general, we can write this as \(20!/(20 - x)!\).

Next, the numerator in this naive definition is the number of ways to select \(x\) unique socks from the 20 socks. For the first sock, we have 20 choices (all socks are unique). For the second sock, we have 18 choices: 19 socks are left (we took one out), and we can’t take the sock that matches with the first sock we took. Continuing in this way, we get \(20 \cdot 18 \cdot 20 - 2x\), which in general can be written \(20!!/(20 - 2x)!!\), where \(!!\) represents the double factorial as explained in the Hint. Therefore, the probability of selecting all unique socks in the first \(x\) selections is:

\[\frac{20!!/(20 - 2x)!!}{20!/(20 - x)!}\]

Now we need the second part: the probability of selecting a sock that makes a pair on the \(x + 1^{th}\) selection, given that we’ve selected these \(x\) unique socks. Here, there are \(x\) socks left that will make a pair, and \(20 - x\) socks left total. This gives a simple probability of \(x/(20 - x)\). We then multiply the two probabilities (we are conditioning on the first stage when we find the probability of the second stage) to find the probability that they both occur, to get our PMF:

\[P(X = x) = \Big(\frac{20!!/(20 - 2x)!!}{20!/(20 - x)!}\Big) \big(\frac{x}{20 - x}\big)\]


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#define a function that calculates the double factorial
double.factorial <- function(x){
  
  #if we don't have a positive integer, return error
  if(x != round(x, 0) && x < 0){
    
    return()
  }
  
  #return 1 for x = 1, 0
  if(x == 1 || x == 0){
    return(1)
  }
  
  #create the sequence, return the factorial
  k = seq(from = 0, to = x - 2, by = 2)
  return(prod(x - k))
}

#define the parameter
n = 10

#keep track of X
X = rep(NA, sims)

#run the loop
for(i in 1:sims){
  
  #define our 'collection' of socks, labeled 1 to n
  drawer = rep(1:n, 2)

  #keep track of the socks we've picked; initialize here
  socks = integer(0)
  
  #go until we get a match
  while(TRUE){
    
    #pick a new sock 
    new.sock = sample(1:length(drawer), 1)
    
    #add the sock 
    socks = c(socks, drawer[new.sock])
    
    #take it out of the drawer
    drawer = drawer[-new.sock]
    
    #see if we got a pair; if we did, break
    if(length(socks) > length(unique(socks))){
      break
    }
  }
  
  #see how many socks we picked before getting the pair
  X[i] = length(socks) - 1
}


#show that the PMFs line up
#calculate the analytical PMF
k = as.numeric(rownames(table(X)))
PMF = sapply(k, function(x)
      (double.factorial(20)/double.factorial(20 - 2*x))/
      (factorial(20)/factorial(20 - x)))
PMF = PMF*k/(20 - k)


#plots should line up
#empirical
plot(k, table(X)/sims, col = "black", main = "Empirical and Analytical PMF", type = "h",
     xlab = "x", ylab = "P(X = x)", xlim = c(min(k), max(k)), ylim = c(0, 1), lwd = 3)

#analytical
lines(k, PMF, main = "Analytical PMF", ylab = "P(X = x)", xlab = "x", col = "red", pch = 20, ylim = c(0, 1), type = "p", lwd = 3)


legend("topright", legend = c("Empirical PMF", "Analytical PMF"),
       lty=c(1,20), lwd=c(2.5,2.5),
       col=c("black", "red"))




2.2

You flip a fair, two-sided coin 5 times. Let \(X\) be the length of the longest streak in the 5 flips (i.e., if you flip \(TTTTH\), the longest streak is the \(TTTT\), so \(X = 4\)). Given that you flip 3 heads, find the PMF of \(X\).



Analytical Solution: We can see that the 2 Tails will never make up the uniquely longest streak; we know we flip 2 Tails, so the longest possible streak of Tails is 2. However, if the 2 Tails are together, then there are 2 or 3 Heads that are also in a streak. So, we only have to consider the Heads.

We can see that \(X = 1,2,3\). The only way for \(X = 1\) is if we get the sequence \(HTHTH\). There is a total of \(5!/(3!2!)\) possible sequences (using the same result as counting the number of combinations of the word PEPPER; there are 5 letters, but 3 identical H’s and 2 identical T’s), so the probability of this one sequence is \(\frac{1}{5!/(3!2!)} = \frac{3!2!}{5!} = .1\).

Now consider \(X = 3\). We know we must get the pattern \(HHH\) in the 5 flips, and there are 3 ways that this can happen: the first \(H\) is in position 1, position 2, or position 3. So, the probability of this occuring is \(\frac{3}{5!/(3!2!)} = .3\).

Since the PMF must sum to 1 and the support of \(X\) is \((1, 2, 3)\), we know that \(P(X = 2) = 1 - P(X = 1) - P(X = 0) = .6\). So, we get the PMF \(P(X = 1) = .1\), \(P(X = 2) = .6\), \(P(X = 3) = .3\).

#replicate
set.seed(110)
sims = 1000

#define the flips
flips = c("H", "H", "H", "T", "T")

#count the longest streak
streak = rep(NA, sims)

#run the loop
for(i in 1:sims){
  
  #draw a sequence from the flips
  sequence = sample(flips)
  
  #first, see if we got the three H's
  #the function 'grepl' sees if the first argument is contained in the second
  #   we have to collapse our vectors using the 'paste' function so
  #   it is palatable for the 'grepl' function
  #if we did get 3 H's, skip to the next iteration
  if(grepl(c("HHH"), paste(sequence, collapse = ""))){
    streak[i] = 3
    next
  }
  
  #see if we got two H's; if we did, skip to next iteration
  if(grepl(c("HH"), paste(sequence, collapse = ""))){
    streak[i] = 2
    next
  }
  
  #if we made it this far, we must have just a streak of 1
  streak[i] = 1
}


#should get (.1, .6, .3)
table(streak)/sims
## streak
##     1     2     3 
## 0.090 0.593 0.317




2.3

CJ is trick-or-treating on a street with 10 houses. He selects houses at random to visit; however, if he visits any one house a second time, he is turned away. If CJ selects 5 houses randomly (of course, he may select the same one multiple times) what is the probability that he never gets turned away?



Analytical Solution: This is isomorphic to the birthday problem. Think of the houses as the days of the year, and CJ’s random visits as the births of people. By the naive definition of probability, the probability of no match (never getting turned away) is:

\[\frac{10\cdot 9 \cdot 8 \cdot 7 \cdot 6}{10^5} = .3024\]


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#indicator if he doesn't get turned away
success = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #sample the houses he visits
  houses = sample(1:10, 5, replace = TRUE)
  
  #see if he visited unique houses
  if(length(unique(houses)) == 5){
    success[i] = 1
  }
}

#should get .3024
mean(success)
## [1] 0.3
#should also match the result from the birthday problem
1 - pbirthday(n = 5, classes = 10)
## [1] 0.3024




2.4

Your friend has two six-sided dice in his pocket. One is a fair die, and thus has an equal probability of landing on each number. The other is weighted, and has the following probability distribution: \(\frac{1}{6}\) probability of rolling a 1, 2 or 3, \(\frac{1}{8}\) probability of rolling a 4 or 5, and a \(\frac{1}{4}\) probability of rolling a 6.

He takes a die blindly and randomly from his pocket and rolls it four times: the outcomes are 6, 1, 2, 3.

Given these results, what is the probability that he is rolling the fair die?



Analytical Solution: Let \(R\) be the event that our results occurred (we rolled a 6, 1, 2 and 3), \(F\) be the event that your friend pulled the fair die out of his pocket, and \(W\) be the event that your friend pulled the weighted die out of the pocket. We need to find \(P(F|R)\), or the probability that we are using the fair die given the results we saw. Using Baye’s rule, we can re-write this:

\[P(F|R) = \frac{P(R|F)P(F)}{P(R|F)P(F) + P(R|F^c)P(F^c)}\]

We now need to find \(P(R|F), P(F),P(F^c)\), and \(P(R|F^c)\). We know that \(P(F)\) and \(P(F^c)\) are simply .5 (there is a .5 probability of selecting each of the die from the pocket).

First consider \(P(R|F)\), the probability of these results given we are using the fair die. If all rolls of the die are independent and equally likely, than each number has a \(\frac{1}{6}\) chance of coming up. This result, then, has a probability of \((\frac{1}{6})^4\).

To find \(P(R|F^c)\), the probability of these results given we are using the weighted die, we do a similar calculation, but just multiply by the probabilities from the PMF of the weighted die. This comes out to \((\frac{1}{4})(\frac{1}{6})^3\).

Returning to our Baye’s rule formula:

\[P(F|R) = \frac{(\frac{1}{6})^4(.5)}{(\frac{1}{6})^4(.5) + (\frac{1}{4})(\frac{1}{6})^3(.5)} = .4\]

So, given the results of the die roll, there is a .4 probability that we are using the fair die and thus a .6 probability we are using the weighted die. It makes good sense that the weighted die is slightly more probable, because we rolled a 6, the highest weighted value for the weighted die, and did not roll a 4 or 5, the lowest weighted values for the weighted die. Still, there are only four rolls, which shows why the probability is only slightly above one half (the same result with more rolls would just mean stronger evidence for the weighted die).


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#define the fair and weighted die PMFs
fair.PMF = rep(1/6, 6)
weighted.PMF = c(1/6, 1/6, 1/6, 1/8, 1/8, 1/4)

#indicator if we select the fair die
fair = rep(0, sims)

#indicator if our results match the prompt
match = rep(0, sims)


#run the loop
for(i in 1:sims){
  
  #flip to see which die we select
  flip = runif(1)
  
  #select the fair die and roll it
  if(flip <= 1/2){
    rolls = sample(1:6, 4, replace = TRUE, prob = fair.PMF)
    
    #fill in that we selected the fair die
    fair[i] = 1
  }
  
  #select the weighted die and roll it
  if(flip > 1/2){
    rolls = sample(1:6, 4, replace = TRUE, prob = weighted.PMF)
  }
  
  #see if the results match
  if(isTRUE(all.equal(sort(rolls), c(1, 2, 3, 6)))){
    match[i] = 1
  }
}

#see how many times we selected the fair die, when we got the 1,2,3,6
#should get .4
mean(fair[match == 1])
## [1] 0.4




2.5

There is a disease such that \(P(D)\), the probability of contracting the disease, is \(.1\) for any random person. There are two symptoms, \(S_1\) and \(S_2\), that always occur if someone has the disease. Overall (in general), each symptom has .2 probability of occurring in a random person. Aside from the relationship to \(D\), symptoms 1 and 2 are unrelated to each other.

  1. Find the probability that you have the disease given that you experience the first symptom.

  2. Given that you don’t have the disease, what’s the probability that you still experience the first symptom?

  3. Given that you experience Symptom 2, what is the probability that you also experience Symptom 1?

  4. Discuss the dependence between \(S_1\) and \(S_2\).

  5. If we observe both symptoms, what is the probability that we have the disease?



Analytical Solution: (a) We are interested in \(P(D|S_1)\). We are given that \(P(S_1|D) = 1\) (you are guaranteed to have symptom 1 if you have the disease) and that \(P(D) = .1\). We are also given that, unconditionally, \(P(S_1) = .2\). Using the formula for conditional probability:

\[P(D|S_1) = \frac{P(S_1|D)P(D)}{P(S_1)}\]

\[=\frac{.1}{.2} = .5\]

So, if you have the first symptom, there is a .5 probability you have the disease. It makes sense that the disease probability increased from the unconditional value of .1, since this symptom is associated with the disease (even though it does not guarantee it).

  1. We are interested in \(P(S_1|D^c)\). We can use LOTP for \(P(S_1)\) and work backwards. Writing out LOTP, conditioning on \(D\), gives us

\[P(S_1) = P(S_1|D)P(D) + P(S_1|D^c)P(D^c)\]

Plugging in what’s known:

\[.2 = .1 + P(S_1|D^c).9\]

Solving yields:

\[P(S_1|D^c) = \frac{1}{9}\]

So, if you do not have a disease, you have a \(1/9\) probability of getting symptom 1. It makes sense that this probability decreased from .2 in general, since we are conditioning on not having the disease (which would guarantee that we have this symptom).

  1. We are interested in \(P(S_1|S_2)\). This is an example of where ‘wishful thinking’ is useful; we wish we knew if the person has the disease, so we just condition on it! This extra conditioning becomes:

\[P(S_1|S_2) = P(S_1|S_2, D) P(D|S_2) + P(S_1|S_2, D^c) P(D^c|S_2)\]

We can solve for all of these parts on the RHS. For example, we know \(P(S_1|S_2, D)\); conditioning on \(D\) tells us that we must have symptom 1, and knowing that we have \(S_2\) is irrelevant information, so this is 1. We also know \(P(D|S_2) = .5\) since we know \(P(D|S_1) = .5\) and the two symptoms are symmetric in this problem.

Next is \(P(S_1| S_2, D^c)\). From the prompt, we know that the only connection between \(S_1\) and \(S_2\) is \(D\), so conditioning on \(D^c\) tells us all we need to know about \(S_1\). We found this probability earlier as \(\frac{1}{9}\).

Finally, we have \(P(D^c|S_2)\). We know \(P(D|S_2) = .5\) and we know that \(P(D|S_2) + P(D^c|S_2) = 1\) (they must sum to 1, since given \(S_2\), the probability of having the disease or not must still sum to 1) so we know that \(P(D^c|S_2) = .5\). Finally, then, we plug in:

\[P(S_1|S_2) = \frac{1}{2} + \frac{1}{2} (\frac{1}{9}) = \frac{5}{9}\]

So, we get a higher probability than simply \(P(S_1)\), which makes sense, because if we observe \(S_2\) that makes \(D\) more likely, which in turn would guarantee \(S_1\).

  1. We are essentially given in the prompt that \(S_1\) and \(S_2\) are condtitionally independent given \(D\), since if we know we have the disease, we have all of the information we need about the symptoms (the symptoms are otherwise unrelated).

However, the two symptoms are not independent marginally; we just showed that \(P(S_1) \neq P(S_1|S_2)\). This makes sense, since if we observe \(S_2\) but not \(D\), it gives us information that \(D\) is more likely, and thus \(S_1\) is more likely.

  1. We are looking for \(P(D | S_1 \cap S_2)\). Using Bayes’ rule (\(S_1 \cap S_2\) is just a set of its own) we get:

\[P(D | S_1 \cap S_2) = \frac{P(S_1 \cap S_2 | D)P(D)}{P(S_1 \cap S_2)}\]

We know \(P(S_1 \cap S_2 | D) = 1\), since given that we have the disease, we know we have both symptoms. We know that \(P(D) = .1\). Finally, we have to find \(P(S_1 \cap S_2)\). We can expand this:

\[P(S_1 \cap S_2) = P(S_2)P(S_1|S_2)\]

We know \(P(S_2) = .2\), and from a previous part, we know \(P(S_1|S_2) = \frac{5}{9}\). So, we plug in and get:

\[\frac{.1}{\frac{1}{5}(\frac{5}{9})} = .9\]

This makes sense, since it is larger than the marginal probability of \(P(D)\) marginally and the probability of \(D\) conditioning on just one symptom (the more symptoms we observe, the greater the probability we have the disease).


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#set paths for D, S1 and S2 (indicators)
D = rep(0, sims)
S1 = rep(0, sims)
S2 = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #flip to see if we have the disease
  flip = runif(1)
  
  #if we got the disease, mark it
  if(flip <= .1){
    D[i] = 1
  }
  
  #if we have the disease, S1 and S2 are both 1
  if(D[i] == 1){
    S1[i] = 1
    S2[i] = 1
  }
  
  #if we don't have the disease, we need to flip for S1 and S2
  #without D, we must get these symptoms with probability 1/9,
  #   so that overall we see these symptoms 20% of the time, and
  #   .1*1 + .9/9 = .2 (LOTP)
  if(D[i] == 0){
    
    #flip for S1
    flip1 = runif(1)
    
    #see if we got S1
    if(flip1 < 1/9){
      S1[i] = 1
    }
    
    #flip for S2
    flip2 = runif(1)
    
    #see if we got S1
    if(flip2 < 1/9){
      S2[i] = 1
    }
  }
}


#P(D|S1), should get 1/2
mean(D[S1 == 1])
## [1] 0.5841121
#P(S1|D^c), should get 1/9
mean(S1[D == 0])
## [1] 0.1017143
#P(S1|S2), should get 5/9
mean(S1[S2 == 1])
## [1] 0.5877193
#in general, S1 = 1 affects the mean of S2
#  given D = 1 or D = 0, S1 = 1 does not affect the mean of S2
#  this does not rigorously prove conditional independence, but gives good intuition
mean(S2); mean(S2[S1 == 1]);
## [1] 0.228
## [1] 0.6261682
mean(S2[D == 1]); mean(S2[S1 == 1 & D == 1])
## [1] 1
## [1] 1
mean(S2[D == 0]); mean(S2[S1 == 1 & D == 0])
## [1] 0.1177143
## [1] 0.1011236
#P(D|S1, S2), should get .9
mean(D[S1 == 1 & S2 == 1])
## [1] 0.9328358




2.6

You are part of a diving competition. Each dive receives a score from 1 to 10 (integers only, so the possible scores are \(1,2,...,10\)), with 10 being the best. You are allowed to dive three times and take your best score; this is your overall competition score. Unfortunately, the judge is not at all qualified to be at this competition, and just assigns your scores randomly from 1 to 10. However, he won’t assign the same score twice, in case the audience catches on that he knows nothing.

Find the PMF and CDF (which you can leave as a sum) of \(C\), your overall competition score. You can leave the CDF as a summation. How could you find the expectation (no need to calculate here)?



Analytical Solution: We’re looking for the maximum score out of the three dives. The worst case, of course, would be scoring 1, 2 and 3, and thus having an overall score of 3 (remember, the judge can’t give the same score twice). So, the support of \(C\) is \(3,4,...,10\).

We’re looking for \(P(C=c)\), which is just the definition of a PMF. If \(C = c\), then \(c\) is the maximum score that we have. If \(c\) is the maximum score, this means that we scored less than \(c\) (somewhere from 1 to \(c-1\)) twice. There are \({c-1 \choose 2}\) ways to pick these scores less than c, and then, of course, we have one choice for our maximum score (since we know our maximum score is \(c\)). There’s \({10 \choose 3}\) overall ways to arrange the scores, so, by the naive definition of probability:

\[P(C = c) = \frac{{c-1 \choose 2}}{{10 \choose 3}}\]

Where \(c = 3,4,...,10\).

Given the PMF, we can find the CDF by a simple summation. Since this is the discrete case, we can just add up all the discrete cases underneath it to get the total cumulative probability:

\[P(C \leq c) = \sum_{k=3}^c \frac{{k-1 \choose 2}}{{10 \choose 3}}\]

And to find the Expectation, we could just multiply every probability by the maximum score, which is simply taking a weighted average. This is the same as multiplying the PMF by \(c\) and summing over all possible values! We discussed this formula for expectation and will formalize the concept later; it is a very powerful and generalizable tool. We get:

\[E(C) = \sum_{c=3}^{10} c\frac{{c-1 \choose 2}}{{10 \choose 3}}\]

Which, when calculated, gives 8.25.


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#keep track of C
C = rep(NA, sims)

#run the loop
for(i in 1:sims){
  
  #simulate the scores (don't replace, the judge won't pick the same score twice)
  scores = sample(1:10, 3, replace = FALSE)
  
  #mark C, the max score
  C[i] = max(scores)
}

#calculate the analytical PMF
c = as.numeric(rownames(table(C)))
PMF.c = choose(c - 1, 2)/choose(10, 3)


#the PMFs should match
plot(table(C)/sims, ylim = c(0,1/2), main = "PMF of C", lwd = 3,
     xlab = "c", ylab = "P(C = c)")
lines(c, PMF.c, col = "red", lwd = 3, type = "p", pch = 20)
legend("topright", legend = c("Empirical PMF", "Analytical PMF"),
       lty=c(1,20), lwd=c(2.5,2.5),
       col=c("black", "red"))

#calculate the analytical CDF
c = as.numeric(rownames(table(C)))
CDF.c = sapply(c, function(x){
  k = 3:x
  return(sum(choose(k - 1, 2)/choose(10, 3)))
  })


#the CDFs should match
plot(ecdf(C), ylim = c(0,1/2), main = "CDF of C", lwd = 3,
     xlab = "c", ylab = "P(C < c)")
lines(c, CDF.c, col = "red", lwd = 3, type = "p", pch = 20)
legend("bottomright", legend = c("Empirical CDF", "Analytical CDF"),
       lty=c(1,20), lwd=c(2.5,2.5),
       col=c("black", "red"))

#should get 8.25 for the mean
mean(C)
## [1] 8.211




2.7

Many music fans claim that sound quality is far enhanced on vinyl (i.e., record players); however, people are often skeptical that vinyl sounds any different from more modern audio methods (i.e., digital speakers).

Freddie claims that he can reliably discern vinyl audio from digital audio. If he is right, then he will correctly identify the mode of audio, digital or vinyl, with probability .8. If he is wrong, as many would claim, then he has probability .5 of correctly identifying the mode of audio.

Freddie listens to 50 songs and tries to identify the mode of audio. Let \(V\) be the event that he can reliably discern vinyl from audio. Unconditionally, assume that \(P(V) = 1/2\). Let \(X\) be the number of songs he correctly identifies, and then \(P(V|X)\) be the updated probability that he can discern vinyl from digital after observing him identify \(X\) songs correctly. How large does \(X\) have to be for \(P(V|X) >= .9\)?



Analytical Solution: By Bayes’ Rule:

\[P(V|X) = \frac{P(X|V)P(V)}{P(X|V)P(V) + P(X|V^c)P(V^c)}\] \[= \frac{P(X|V)}{P(X|V) + P(X|V^c)}\]

Since \(P(V) = P(V^c) = 1/2\), these terms cancel. Consider \(P(X|V)\). Conditioned on Freddie being able to discern vinyl from audio, \(X \sim Bin(50, .8)\), so \(P(X|V) = {50 \choose x} .8^x .2^{50 - x}\). Similarly, \(P(X|V^c) = {50 \choose x} .5^{50}\), since in the case that he cannot discern the audio modes, each trial is 50/50. We are left with:

\[\frac{.8^x .2^{50 - x}}{.8^x .2^{50 - x} + .5^{50}}\]

Since the \({50 \choose x}\) terms cancel. We can run the following code in R, which finds the smallest value for \(x\) such that \(P(V|X) >= .9\):

x = 0:50; v = .8^x*.2^(50 - x)/(.8^x*.2^(50 - x) + .5^50); min(x[v >= .9])

This gives us \(X = 35\). So, if Freddie gets 35 of the songs or more correct, the probability that he can reliably discern vinyl and digital is greater than .9.


Empirical Solution:

#replicate
set.seed(110)

#increased number of sims (rare events)
sims = 10000

#keep track of V and X
V = rep(NA, sims)
X = rep(NA, sims)

#run the loop
for(i in 1:sims){
  
  #generate V, which is marginally Bern(1/2)
  V[i] = rbinom(1, 1, 1/2)
  
  #generate X depending on V
  #case where V = 1
  if(V[i] == 1){
    
    #Freddie has a 80% chance each time
    X[i] = sum(rbinom(1, 50, .8))
  }
  
  #case where V = 0
  if(V[i] == 0){
    
    #Freddie has a 80% chance each time
    X[i] = sum(rbinom(1, 50, .5))
  }
}

#plot P(V|X) for different values of x
x = 0:50

#calculate the conditional probabilities
cond.prob = sapply(x, function(x) mean(V[X == x]))
plot(x, cond.prob, main = "P(V|X) for different x",
     xlab = "x", ylab = "P(V|X)", col = "black", pch = 16)
abline(h = .9)
abline(v = 35)




2.8

You roll a fair, six-sided die twice. Let \(X\) be the sum of the two rolls. Find \(P(X = 7)\) using a conditioning argument; that is, do not simply count the number of ways to roll a 7 and divide by the number of possible combinations for the two rolls.



Analytical Solution: Imagine conditioning on the first roll. No matter what roll we get, there is still a chance that we get a sum of 7 after rolling the second die (i.e., if we roll a 1 first, then we could roll a 6 second; if we roll a 6 first, we could roll a 1 second, etc.). Specifically, for all of the 6 possible values of the first roll, there is a 1/6 probability that we get the value on the second roll that gives us a total of 7. Therefore, \(P(X = 7) = 1/6\). This argument, as well as being cleaner than a counting argument, also gives us insight as to why 7 is the most likely sum: for any other possible total, there is at least one roll of the first die that eliminates the possibility of that sum (i.e., if we are interested in a sum of 6 and we roll a 6 on the first die, we cannot possibly get a sum of 6).


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#generate the rolls
roll1 = sample(1:6, sims, replace = TRUE)
roll2 = sample(1:6, sims, replace = TRUE)

#add the rolls
X = roll1 + roll2

#should get 1/6
length(X[X == 7])/sims
## [1] 0.173




2.9

Imagine generating a random word by sampling \(3 < n < 26\) letters, with replacement (from the 26 letters in the alphabet). What is the probability that this word has no repeats; i.e., \(n\) unique letters?




Analytical Solution: This is analagous to the birthday problem, if we had 26 days and \(n\) birthdays were being sampled. Using the same construction, we find:

\[\frac{26 \cdot 25 \cdot ... \cdot (26 - n + 1)}{26^n}\]


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#define a simple parameter
n = 5

#indicator if we get a unique word
no.repeat = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #generate the word. the 'letters' vector is stored in R
  word = sample(letters, n, replace = TRUE)
  
  #see if it is a no repeat word
  if(length(unique(word)) == n){
    no.repeat[i] = 1
  }
}

#these should match
prod(26:(26 - n + 1))/26^n; mean(no.repeat)
## [1] 0.6643675
## [1] 0.686
#we could also use the pbirthday command
1 - pbirthday(n, classes = 26)
## [1] 0.6643675




2.10

Ali and Bill are taking a test. For any single question, Ali has equal probabilities of answering correctly or incorrectly, and Bill also has equal probabilities of answering correctly or incorrectly. For any single question, the probability that both Ali and Bill get the question correct is .4. Given that Bill gets a question wrong, what is the probability that Ali gets it right?



Analytical Solution:

Let \(A\) and \(B\) be the events that Ali and Bill get the question right, respectively. We need \(P(A|B^c)\). By LOTP, we know:

\[P(A) = P(A|B)P(B) + P(A|B^c)P(B^c)\]

We are given that \(P(A) = P(B) = P(B^c) = 1/2\), since each person has a 50/50 chance marginally of getting the question correct. We are also given that \(P(A \cap B) = .4\) in the prompt, so we can find \(P(A|B)\) by using the formula for conditional probability:

\[P(A|B) = \frac{P(A \cap B)}{P(B)} = \frac{.4}{.5} = .8\]

Now, we know everything in \(P(A) = P(A|B)P(B) + P(A|B^c)P(B^c)\) except \(P(A|B^c)\), which is what we are solving for. Plugging in what’s known and solving yields:

\[P(A|B^c) = \frac{P(A) - P(A|B)P(B)}{P(B^c)} = \frac{1/2 - 8/10 \cdot 1/2}{1/2} = .2\]


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#we will see if the solution results in the correct 
#   probabilities; that is, P(A, B) = .4 and P(B) = .5

#we can generate Bill
B = rbinom(sims, 1, 1/2)

#keep track of Ali
A = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #flip to see if Ali gets it
  flip = runif(1)
  
  #Bill got it
  if(B[i] == 1){
    
    #Ali gets it with probability .8
    if(flip <= .8){
      A[i] = 1
    }
  }
  
  #Bill didn't get it
  if(B[i] == 0){
    
    #Ali gets it with probability .2
    if(flip <= .2){
      A[i] = 1
    }
  }
}

#should get .5 and .4
mean(A); length(B[B == 1 & A == 1])/sims
## [1] 0.482
## [1] 0.392




2.11

Consider the birthday problem with the usual assumptions. Previously, we’ve considered a ‘match’ as a single day with multiple birthdays; here, imagine a week match, which consists of a week with multiple birthdays. Find the probability that, among \(n \leq 52\) people, there are no week matches and no day matches. For what value of \(n\) does this probability drop below 1/2?

You may have noticed that, by daycount conventions, the ‘52 weeks’ of the year do not go evenly into the 365 days. For this problem, assume that there are 364 days in the year, not 365, just for simplicity, so that the weeks perfectly divide up the year.



Analytical Solution:

Having ‘no week match’ is a superset of having ‘no day match’. Therefore, we just have to find the probability of ‘no week match’, which is simply the birthday problem but with 52 ‘boxes’ that people can fall in. We get:

\[P(no \; week \; match) = \frac{52 \cdot 51 \cdot ... (52 - n + 1)}{52^n}\]

Per the graph, when \(n = 9\), the probability drops below 1/2.


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#define different values of n
n = 1:52
  
#keep track of probabilities
results = rep(NA, length(n))

#iterate over n
for(j in 1:52){
  
  #keep track of the number of day and week matches
  day = rep(0, sims)
  week = rep(0, sims)
  
  #run the loop
  for(i in 1:sims){
    
    #generate bdays
    bdays = sample(1:364, n[j], replace = TRUE)
    
    #see if we got a day match
    if(length(unique(bdays)) < n[j]){
      day[i] = 1
    }
    
    #convert to weeks
    bdays = ceiling(bdays/7)
    
    #see if we got a week match
    if(length(unique(bdays)) < n[j]){
      week[i] = 1
    }
  }
  
  #mark how often we got no day or week matches
  results[j] = length(week[week == 0 & day == 0])/sims
}


#compare empirical and analytical
#calculate probabilities
probs = sapply(n, function(x) factorial(52)/(factorial(52 - x)*52^x))

#plot
plot(n, probs, main = "Empirical vs. Analytical",
     xlab = "n", ylab = "P(no day or week matches)",
     col = "red", type = "p", pch = 16)
lines(n, results, col = "black", lwd = 3)
abline(h = 1/2)


legend("topright", legend = c("Empirical Result", "Analytical Result"),
       lty=c(1,1), lwd=c(2.5,2.5),
       col=c("black", "red"))




2.12

Consider the birthday problem with the usual assumptions. Define a ‘month match’ as a month with more than one birthday. Given that there is at least one ‘month match’, find the probability that there is at least one ‘day match’ (i.e., a day where multiple people are born) among \(n \leq 12\) people. Compare this probability to the ‘unconditional’ probability of at least one day match in the standard birthday problem.

For this problem, assume 360 days in a year, and that each of the 12 months has 30 days, just so we don’t have to worry about the fact that months have irregular amounts of days.



Analytical Solution:

Let \(A\) be the event that there is at least one day match, and \(B\) be the event that there is at least one month match. We are interested in \(P(B|A)\). Using conditional probability, we write:

\[P(A|B) = \frac{P(B|A) P(A)}{P(B)}\]

Consider \(P(B|A)\). If we have a day match (i.e., \(A\) occurs), then we must have a month match (i.e., \(B\) occurs, since two people born on the same day are also born in the same month). We are left with:

\[=\frac{P(A)}{P(B)}\]

Where the numerator is the usual birthday problem (with 360 days because of the simplification we made above) and the denominator is the birthday problem with 12 ‘boxes’ (you can imagine that the months are days).

\[= \frac{1 - \frac{360 \cdot 359 \cdot ... (360 - n + 1)}{360^n}}{1 - \frac{12 \cdot 11 \cdot ... (12 - n + 1)}{12^n}}\]

We can plot these values, and compare it to the unconditional probability of at least one match in the Birthday Problem with \(n\) people and 360 days (plotted in red). Note that the conditional probability (in black) is greater than the unconditional probability for small \(n\), but the two get closer as \(n\) gets towards 12. This is intuitive: if \(n\) is small, say \(n = 2\), it is unlikely that we get a month match. If we condition on observing a month match, then, birthdays are closer than we expected (since month matches for such a small value of \(n\) are rare events), and thus the probability of a day match is slightly higher than in the unconditional case. As \(n\) gets closer to 12, the probability of a month match gets higher (i.e., with 11 people, there’s a very good chance of a month match) so month matches are no longer rare events and thus conditioning on them does not give us much valuable information about birthdays being close together.


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#define different values of n
n = 1:12
  
#keep track of probabilities
results = rep(NA, length(n))

#iterate over n
for(j in 1:12){
  
  #keep track of the number of day and month matches
  day = rep(0, sims)
  month = rep(0, sims)
  
  #run the loop
  for(i in 1:sims){
    
    #generate bdays
    bdays = sample(1:360, n[j], replace = TRUE)
    
    #see if we got a day match
    if(length(unique(bdays)) < n[j]){
      day[i] = 1
    }
    
    #convert to months
    bdays = ceiling(bdays/30)
    
    #see if we got a month match
    if(length(unique(bdays)) < n[j]){
      month[i] = 1
    }
  }
  
  #mark how often we got a day match, conditioned on a month match
  results[j] = mean(day[month == 1])
}


#compare empirical and analytical
#calculate probabilities
probs = sapply(n, function(x) 
  pbirthday(x, classes = 360)/(pbirthday(x, classes = 12)))

#plot
plot(n, probs, main = "Empirical vs. Analytical",
     xlab = "n", ylab = "P(at least one day match | at least one month match)",
     col = "red", type = "p", pch = 16,
     ylim = c(0, .2))
lines(n, results, col = "black", lwd = 3)


legend("topleft", legend = c("Empirical Result", "Analytical Result"),
       lty=c(1,1), lwd=c(2.5,2.5),
       col=c("black", "red"))




2.13

Cameron is wandering around on the alphabet (A, B, etc.). He goes ‘up’ a letter (i.e., from D to C) and ‘down’ a letter (i.e., D to E) with equal probabilities. He cannot go from A to Z, nor Z to A (i.e., the alphabet isn’t circular).

If he starts at M, what is the probability that Cameron spells “HI” before he spells “NO”? Here, we equate ‘spelling’ a word to wandering around on its letters in the correct order; i.e., if Cameron wanders on PQPQR as a part of his path, then he spelled PQPQR (among other words).



Analytical Solution:

Imagine labeling the letters H to O as \(0, 1, ..., 7\). We can then realize that this is a Gambler’s Ruin problem with \(p = 1/2\), \(N = 7\) and \(i = 5\). That is, if Cameron gets to O (which we label here as 7) then he has spelled “NO”. If he gets to H (which we label here as 0) then he will spell “HI” before he spells “NO” (since the alphabet is circular and he must pass H, then I, to get back to N). So, the probability of spelling “HI” first is \(2/7\).


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#indicator if we spell "hi" first
hi = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #initialize the letter string
  #the 'letters' vector is stored in r
  string = letters[13]
  
  #go until we spell "hi" or "no", then break
  while(TRUE){
    
    #usual case
    if(string[length(string)] != "a" && string[length(string)] != "z"){
      
      #flip to see if we move up or down
      flip = runif(1)
      
      #go up
      if(flip <= 1/2){
        string = c(string, letters[which(letters == string[length(string)]) + 1])
      }
      
      #go down
      if(flip > 1/2){
        string = c(string, letters[which(letters == string[length(string)]) - 1])
      }
    }
      
    
    #corner cases
    if(string[length(string)] == "a"){
      #set to b
      string = c(string, "b")
    }
    if(string[length(string)] == "z"){
      #set to y
      string = c(string, "y")
    }
    
    #see if we spelled hi; mark and break if so
    if(all(c(string[length(string) - 1], string[length(string)]) == c("h", "i"))){
      hi[i] = 1
      break
    }
    
    #see if we spelled no; break if so
    if(all(c(string[length(string) - 1], string[length(string)]) == c("n", "o"))){
      break
    }
  }
}

#should get 2/7 = .29
mean(hi)
## [1] 0.312




2.14

(With help from Matt Goldberg, CJ Christian, Nicholas Larus-Stone, Juan Perdomo and Dan Fulop)

Imagine the standard Monty Hall problem, but Monty does not actually know what is behind each door; he picks one of the two remaining doors at random.

You pick Door 1 (for this problem, assume that you always pick Door 1), and Monty opens Door 2 to reveal a goat. Should you switch to Door 3?

See @rosenthal for variants and intuition on this type of problem.



Analytical Solution:

Let \(C\) be the event that the car is behind Door 1, and let \(G\) be the event that Monty opens Door 2 and reveals a Goat. We are interested in \(P(C|G)\). By Bayes’ Rule:

\[P(C|G) = \frac{P(G|C)P(C)}{P(G)}\]

Consider \(P(G|C)\), the probability of Monty opening Door 2 and revealing a goat given that the car is in Door 1. Well, given that the car is in Door 1, there must be a goat in Door 2, and the probability that Monty opens Door 2 and reveals this goat is 1/2 (recall that we are assuming throughout this problem that you select Door 1, and that Monty randomly selects one of the other two doors to open). So, \(P(C|G) = 1/2\). We then know that \(P(C)\) by symmetry (the car has equal probability of being behind any door). Finally, consider \(P(G)\). For \(G\) to occur, we need Monty to select Door 2 and we need Door 2 to have a goat behind it; these events are independent, since Monty does not know what are behind the doors are. The probability of Monty selecting Door 2 is 1/2 (again, we select Door 1, so he selects Door 2 and Door 3 with equal probabilities) and the probability that there is a goat behind Door 2 is 2/3; we multiply these probabilities (since the events are independent) to get 1/3. Putting it all together:

\[= \frac{1/2 \cdot 1/3}{1/3} = 1/2\]

So, there is a 1/2 probability that the car is behind Door 1 given that Monty opened Door 2 and revealed a goat, which means there is also a 1/2 probability that the car is behind Door 3 (the car must be behind Door 1 or Door 3 at this point, since we know that there is a goat behind Door 2). You should be indifferent between switching and staying (they have equal probabilities).


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#indicators for the car being behind door 1 and
#   monty opening door 2 to reveal a goat
door1 = rep(0, sims)
door2 = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  
  #randomize the doors; 1 means car, 0 means goat
  doors = sample(c(1,0,0))
  
  #mark if door 1 has the car
  if(doors[1] == 1){
    door1[i] = 1
  }
  
  #have monty open door 2 or door 3
  monty = sample(c(2,3), 1)
  
  #see if monty picked door 2 and revealed a goat
  if(monty == 2 && doors[monty] == 0){
    door2[i] = 1
  }
}

#should get 1/2
mean(door1[door2 == 1])
## [1] 0.5109034




2.15

Brandon is a cell. He splits into 2 with probability \(1/2\) and dies with probability \(1/2\). His offspring do the same, independently (each splits into 2 or die with equal probabilities). Let \(E\) be the probability that Brandon’s population goes extinct. Find \(P(E)\).

Hint: condition on the first step.



Analytical Solution:

We consider \(P(E)\) and condition on Brandon doubling or dying. Let \(D\) be the event that Brandon doubles, so \(D^c\) is the event that Brandon dies.

\[P(E) = P(E|D)P(D) + P(E|D^c)P(D^c)\]

We know that if Brandon dies (i.e., \(D^c\) occurs) then extinction has occurred, so \(P(E|D^c) = 1\). Plugging this in, as well as \(P(D) = P(D^c) = 1/2\):

\[P(E) = P(E|D)p + q\]

Consider \(P(E|D)\), or the probability of extinction given that Brandon doubled. If Brandon doubles, then we essentially have two new Brandons, and the probability that the entire population goes extinct is the probability that both of their independent branches go extinct. By symmetry, both of their branches have the same probability \(P(E)\) as Brandon of going extinct, and since the branches are independent, we can multiply:

\[P(E) = \frac{P(E)^2}{2} + \frac{1}{2}\]

Rearranging:

\[ \frac{P(E)^2}{2} - P(E) + \frac{1}{2} = 0\]

Solving for \(P(E)\) yields \(P(E) = 1\). So, this population will definitely go extinct. This may seem a little untuitive; we’ve seen the power of ‘doubling’ values, and the expected value of every set of offspring is 1 (we either have 2 offspring or 0 offspring, with equal probabilities).


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#indicator if the population goes extinct
E = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #initialize the population (1 person at the start of the population)
  population = 1
  
  #generate up to 50 generations; hopefully the population dies by then
  for(j in 1:50){
  
    #generate the next offspring
    population = 2*sum(rbinom(population, 1, 1/2))
  }
  
  #see if the population went extinct
  if(population == 0){
    E[i] = 1
  }
}

#should always go extinct; sometimes, it takes a long time,
#   so we observe some populations that survive for a long time here
mean(E)
## [1] 0.968




2.16

The little hand on a standard clock moves clockwise one unit (i.e., from 5 to 6) or counter-clockwise 1 unit (i.e., 1 to 12) with equal probabilities.

Find the probability that, from its starting spot, the little hand makes it a full day forward (24 hours, clockwise) before it makes it a half day backward (12 hours, counterclockwise). It does not matter how long the little hand takes to get to these endpoints; we only care about the location of the little hand relative to its starting spot.



Analytical Solution:

We can imagine ‘breaking’ the circle of the clock and straightening it out so that we lay it out on the number line. Imagine starting at 12; from there, the little hand moves left or right one integer with equal probabilties, until it hits 0 (12 hours back) or 36 (24 hours ahead). This is a Gambler’s ruin problem with \(i = 12\) and \(N = 36\), and we want the probability of winning, which is \(12/36 = 1/3\). It makes sense that this is less than \(1/2\), since we have to go a farther way to win.


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#indicator if we go 24 hours forward first
success = rep(0, sims)


#run the loop
for(i in 1:sims){
  
  #initialize the starting spot
  X = 12
  
  #go until we hit 0 or 36
  while(X > 0 && X < 36){
    
    #flip to see if we go up or down
    flip = runif(1)
    
    #go up
    if(flip <= 1/2){
      X = X + 1
    }
    
    #go down
    if(flip > 1/2){
      X = X - 1
    }
  }
  
  #see if we had success
  if(X == 36){
    success[i] = 1
  }
}

#should get 1/3
mean(success)
## [1] 0.305




2.17

The ‘Prisoners with Three Hats’ riddle is a common interview question. The problem statement is as follows:


There are three prisoners in a room. Each will be independently given a red hat or a green hat to wear on their head (each has a 50/50 chance of a red or a green hat and, again, the colors that they are assigned are independent). The prisoners can see each others hats, but no prisoner can see his own hat. Each prisoner is given a chance to guess the color of his own hat (which he cannot see); they can either guess a color (red or green) or pass. If at least one prisoner correctly guesses the color of his own hat and no prisoners incorrectly guess the color of their own hat, they are free to go (‘passing’ cannot count as either a correct or incorrect guess; it is merely a pass). The prisoners are not allowed to communicate with each other in any way once in the room, and they must cast their guesses simultaneously (i.e., one prisoner cannot adapt his strategy based on another prisoner guessing). The prisoners are allowed a strategy session before where they can discuss the best approach.


Upon first hearing this riddle, it seems like the best chance the prisoners have of escaping is assigning one person to randomly guess the color of their own hat, and the other two to simply pass. This results in a 50/50 chance of success (either the person guessing gets his color, or not). However, there is a superior strategy: if a prisoner sees two hats of the same color (i.e., he sees that the other two prisoners both have red hats) he guesses the other color for his own hat. If he sees that the other two prisoners have different color hats, he passes.

  1. Find the probability of winning with the ‘superior strategy’.

  2. You should have arrived at a probability greater than .5 in part (a). Your friend Nick hears about this strategy and says “well, then, if I see that the other two prisoners both have green hats, then there is a greater than .5 probability that my hat is red”. Is Nick correct? Explain.



Analytical Solution:

  1. This strategy will win when there are 2 hats of one color and 1 hat of the other color (the prisoners with the majority color hat will pass, and the prisoner with the minority color hat will correctly guess). This strategy will fail when all three hats are the same color (each prisoner will incorrectly guess the color of his own hat). There are \(2^3 = 8\) possible outcomes (each prisoner has 2 possibilities for hat color) and only 2 outcomes where all hats are the same color (all green or all red). Therefore, in 6 of the outcomes, the prisoners win, so the probability of success with the superior strategy is \(6/8 = .75\).

  2. Nick is incorrect: recall from the problem statement that the hat colors are independent, meaning that conditioning on the color of the other prisoners hat gives no information about the color of your own hat. Clearly, though, this seems at odds with part (a), where we beat the 50/50 guess.

The key in part (a) is that we do not consider any one specific prisoner; we simply say the prisoner that sees two matching color hats. This is a very subtle distinction, but this ‘prisoner that sees two matching color hats’ could be any of three prisoners, not one specific prisoner. If we specifically consider one prisoner (like Nick), he will always have a 50/50 chance of getting his color right. To prove this, imagine Nick conditioning on the fact that he sees two red hats. Conditioned on this outcome, there are now two equally likely possibilities for the set of all 3 hats: \(RRR\) or \(RRG\) (where \(R\) is red and \(G\) is green). That is, either Nick has red or green on, and these are equally likely, 50/50 outcomes.

The crux of part (a) is that we took advantage of the fact that it is more common to have 2 matching hats and 1 hat of a different color instead of all 3 matching hats, and implemented a strategy that always worked in the first case. Again, this strategy isn’t constrained to any specific prisoner; it simply tells us to pick the prisoner in this situation that has the different colored hat (sees two matching hats).

#replicate
set.seed(110)
sims = 1000

#keep track of win probabilities for the superior strategy
#   and Nick's strategy (guessing other color when
#   he sees two matching hats)
win.strat = rep(0, sims)
win.nick = rep(NA, sims)


#run the loop
for(i in 1:sims){
  
  #generate the hats; 1 for red, 0 for green
  #   let nick have the first hat
  hats = sample(c(1, 0), 3, replace = TRUE)
  
  #superior strategy wins if not all the hats are the same
  if(sum(hats) == 1 || sum(hats) == 2){
    win.strat[i] = 1
  }
  
  #check if the second two hats match, and if Nick is correct
  #Nick sees two green hats
  if(sum(hats[2:3]) == 0){
    
    #Nick guesses red
    if(hats[1] == 1){
      win.nick[i] = 1
    }
    if(hats[1] == 0){
      win.nick[i] = 0
    }
  }
  
  #Nick sees two red hats
  if(sum(hats[2:3]) == 2){
    
    #Nick guesses green
    if(hats[1] == 0){
      win.nick[i] = 1
    }
    if(hats[1] == 1){
      win.nick[i] = 0
    }
  }
}

#should get 1/2
#   take out the NA cases for Nick 
#   (when he saw mismatching hats and didn't guess)
mean(win.nick, na.rm = TRUE)
## [1] 0.4782609
#should get 3/4
mean(win.strat)
## [1] 0.748




BH Problems



The problems in this section are taken from @BH. The questions are reproduced here, and the analytical solutions are freely available online. Here, we will only consider empirical solutions: answers/approximations to these problems using simulations in R.




BH 2.1

A spam filter is designed by looking at commonly occurring phrases in spam. Suppose that 80\(\%\) of email is spam. In 10\(\%\) of the spam emails, the phrase “free money” is used, whereas this phrase is only used in 1\(\%\) of non-spam emails. A new email has just arrived, which does mention “free money”. What is the probability that it is spam?

#replicate
set.seed(110)
sims = 1000

#set up the email matrix
emails = matrix(0, nrow = sims, ncol = 2)
emails = data.frame(emails)

#indicators if the email is spam/says freemoney
colnames(emails) = c("spam", "freemoney")


#run the loop
for(i in 1:sims){
  
  #decide if spam or not, with probability .8
  if(runif(1) < .8){
    emails$spam[i] = 1
  }
  
  #decide if free money or not, with probability dependent on if it's spam
  #first, the case where it is spam
  if(emails$spam[i] == 1){
  
    if(runif(1) < .1){
      emails$freemoney[i] = 1
    }
  }
  
  #case where the email is not spam
  if(emails$spam[i] == 0){
    
    if(runif(1) < .01){
      emails$freemoney[i] = 1
    }
  }
}

#should get .975
mean(emails$spam[emails$freemoney == 1])
## [1] 0.9795918




BH 2.2

A woman is pregnant with twin boys. Twins may be either identical or fraternal (nonidentical). In general, 1/3 of twins born are identical. Obviously, identical twins must be of the same sex; fraternal twins may or may not be. Assume that identical twins are equally likely to be both boys or both girls, while for fraternal twins all possibilities are equally likely. Given the above information, what is the probability that the woman’s twins are identical?

#replicate
set.seed(110)
sims = 1000

#set up the twin matrix
twins = matrix(0, nrow = sims, ncol = 3)
twins = data.frame(twins)

#indicators for twins being identical, and then if the first and second are boys
colnames(twins) = c("identical", "boy1", "boy2")

#run the loop
for(i in 1:sims){
  
  #see if the twins are identical or not, with probability 1/3
  if(runif(1) < 1/3){
    twins$identical[i] = 1
  }

  #generate the genders, depending on if we have identical twins or not
  #first, the case with identical twins
  if(twins$identical[i] == 1){
    
    #pick evenly between genders
    if(runif(1) < .5){
      twins$boy1[i] = 1
      twins$boy2[i] = 1
    }
  }
  
  #the case with fraternal twins
  if(twins$identical[i] == 0){
    
    #pick the first child
    if(runif(1) < .5){
      twins$boy1[i] = 1
    }
    
    #pick the second child
    if(runif(1) < .5){
      twins$boy2[i] = 1
    }
  }
}

#should get 1/2
mean(twins$identical[twins$boy1 == 1 & twins$boy2 == 1])
## [1] 0.5060606




BH 2.22

A bag contains one marble which is either green or blue, with equal probabilities. A green marble is put in the bag (so there are 2 marbles now), and then a random marble is taken out. The marble taken out is green. What is the probability that the remaining marble is also green?

#replicate
set.seed(110)
sims = 1000

#indicator if the first marble is green or not
first.green = rep(0, sims)

#mark if the marble we pick out is green, and if the marble left in the bag is green
pick.marble = rep(0, sims)
bag.marble = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #see if we have a green marble
  if(runif(1) < .5){
    first.green[i] = 1
  }
  
  #put a green marble in the bag
  bag = c(first.green[i], 1)
  
  #pick a marbe out of the bag
  pick = sample(1:2, 1)
  
  #mark the picked marble, and the marble left in the bag
  pick.marble[i] = bag[pick]
  
  #we do '3 - pick' because if we picked marble 2, then the 3rd - 2nd = 1st marble is left in the bag.
  bag.marble[i] = bag[3 - pick]
}

#should get 2/3
mean(bag.marble[pick.marble == 1])
## [1] 0.6739974




BH 2.26

To battle against spam, Bob installs two anti-spam programs. An email arrives, which is either legitimate (event \(L\)) or spam (event \(L^c\)), and which program \(j\) marks as legitimate (event \(M_j\)) or marks as spam (event \(M^c_j\)) for \(j \in \{1,2\}\). Assume that 10% of Bob’s email is legitimate and that the two programs are each “90% accurate” in the sense that \(P(M_j|L) = P(M^c_j|L^c) = 9/10\). Also assume that given whether an email is spam, the two programs’ outputs are conditionally independent.

  1. Find the probability that the email is legitimate, given that the 1st program marks it as legitimate (simplify).
  1. Find the probability that the email is legitimate, given that both programs mark it as legitimate (simplify).
#replicate
set.seed(110)
sims = 1000

#set up a matrix for the emails
emails = matrix(0, nrow = sims, ncol = 3)
emails = data.frame(emails)

#indicators for legitimacy, marked by 1 and marked by 2
colnames(emails) = c("L", "M1", "M2")

#run the loop
for(i in 1:sims){
  
  #see if the email is spam
  if(runif(1) < .1){
    emails$L[i] = 1
  }
  
  #run the programs on the email
  
  #the case of legitimate emails
  if(emails$L[i] == 1){
    
    #run the first program
    if(runif(1) < .9){
      emails$M1[i] = 1
    }
    
    #run the second program
    if(runif(1) < .9){
      emails$M2[i] = 1
    }
  }
  
  #the case of spamspam
  else if(emails$L[i] == 0){
    
    #run the first program
    if(runif(1) < .1){
      emails$M1[i] = 1
    }
    
    #run the second program
    if(runif(1) < .1){
      emails$M2[i] = 1
    }
  }
}

#part a.  Analytical solution is .5
mean(emails$L[emails$M1 == 1])
## [1] 0.52
#part b.  Analytical solution is .9
mean(emails$L[emails$M1 == 1 & emails$M2 == 1])
## [1] 0.8981481




BH 2.30

A family has 3 children, creatively named \(A, B,\) and \(C\).

  1. Discuss intuitively (but clearly) whether the event “\(A\) is older than \(B\)” is independent of the event “\(A\) is older than \(C\)”.
#replicate
set.seed(110)
sims = 1000

#generate random ages for the children
#it makes sense to generate from a uniform here; from 0 to 11 (span of childhood)
A = runif(sims, 0, 11)
B = runif(sims, 0, 11)
C = runif(sims, 0, 11)

#higher chance of A being older than B if A is older than C (dependent)
length(A[A > B])/sims
## [1] 0.469
length(A[A > B & A > C])/length(A[A > C])
## [1] 0.6240157


  1. Find the probability that \(A\) is older than \(B\), given that \(A\) is older than \(C\).
#recycle vectors
#should get 2/3
length(A[A > B & A > C])/length(A[A > C]) 
## [1] 0.6240157




BH 2.31

Is it possible that an event is independent of itself? If so, when is this the case?

#replicate
set.seed(110)
sims = 1000

#generate Bern(0), Bern(1/2) and Bern(1)
A = rbinom(sims, 1, 0)
B = rbinom(sims, 1, 1/2)
C = rbinom(sims, 1, 1)


#A and C are independent of themselves, B is not (the mean changes)
mean(A); mean(A[A == 0])
## [1] 0
## [1] 0
mean(B); mean(B[B == 0])
## [1] 0.487
## [1] 0
mean(C); mean(C[C == 1])
## [1] 1
## [1] 1




BH 2.32

Consider four nonstandard dice (the Efron dice), whose sides are labeled as follows (the 6 sides on each die are equally likely).

A: 4, 4, 4, 4, 0, 0

B: 3, 3, 3, 3, 3, 3

C: 6, 6, 2, 2, 2, 2

D: 5, 5, 5, 1, 1, 1

These four dice are each rolled once. Let A be the result for die A, B be the result for die B, etc.

  1. Find P(A > B), P(B > C), P(C > D), and P(D > A).

  2. Is the event A > B independent of the event B > C? Is the event B > C independent of the event C > D? Explain.

#replicate
set.seed(110)
sims = 1000

#create the dies
A = c(4, 4, 4, 4, 0, 0)
B = c(3, 3, 3, 3, 3, 3)
C = c(6, 6, 2, 2, 2, 2)
D = c(5, 5, 5, 1, 1, 1)

#set up paths for the results
A.rolls = A[sample(1:6, sims, replace = TRUE)]
B.rolls = B[sample(1:6, sims, replace = TRUE)]
C.rolls = C[sample(1:6, sims, replace = TRUE)]
D.rolls = D[sample(1:6, sims, replace = TRUE)]

#part a. should all be 2/3.
length(A.rolls[A.rolls > B.rolls])/sims
## [1] 0.689
length(B.rolls[B.rolls > C.rolls])/sims
## [1] 0.653
length(C.rolls[C.rolls > D.rolls])/sims
## [1] 0.702
length(D.rolls[D.rolls > A.rolls])/sims
## [1] 0.643
#part b. The first pair should be equal, the second pair (third and fourth)
#   values) not equal.
(length(A.rolls[A.rolls > B.rolls])/sims)*(length(B.rolls[B.rolls > C.rolls])/sims)
## [1] 0.449917
length(A.rolls[A.rolls > B.rolls & B.rolls > C.rolls])/sims
## [1] 0.433
(length(C.rolls[C.rolls > D.rolls])/sims)*(length(B.rolls[B.rolls > C.rolls])/sims)
## [1] 0.458406
length(C.rolls[B.rolls > C.rolls & C.rolls > D.rolls])/sims
## [1] 0.355




BH 2.35

You are going to play 2 games of chess with an opponent whom you have never played against before (for the sake of this problem). Your opponent is equally likely to be a beginner, intermediate, or a master. Depending on which, your chances of winning an individual game are 90%, 50%, or 30%, respectively.

  1. What is your probability of winning the first game?

  2. Congratulations: you won the first game! Given this information, what is the probability that you will also win the second game (assume that, given the skill level of your opponent, the outcomes of the games are independent)?

  3. Explain the distinction between assuming that the outcomes of the games are independent and assuming that they are conditionally independent given the opponent’s skill level. Which of these assumptions seems more reasonable, and why?

#replicate
set.seed(110)
sims = 1000

#set up a matrix for the games
games = matrix(0, nrow = sims, ncol = 3)
games = data.frame(games)

#indicators if you win the first two games, and a column to mark skill level
colnames(games) = c("Win1", "Win2", "Skill")

#run the loop
for(i in 1:sims){
  
  #flip for the skill level
  skill = runif(1)
  
  #the case that you draw a beginner
  if(skill <= 1/3){
    
    #play the games, mark the skill level
    games$Win1[i] = rbinom(1, 1, .9)
    games$Win2[i] = rbinom(1, 1, .9)
    games$Skill[i] = "B"
  }
  
  #the case that you draw an intermediate
  else if(skill > 1/3 && skill <= 2/3){
    
    #play the games, mark the skill level
    games$Win1[i] = rbinom(1, 1, .5)
    games$Win2[i] = rbinom(1, 1, .5)
    games$Skill[i] = "I"
  }
  
  #the case that you draw an advanced
  else if(skill > 2/3){
    
    #play the games, mark the skill level
    games$Win1[i] = rbinom(1, 1, .3)
    games$Win2[i] = rbinom(1, 1, .3)
    games$Skill[i] = "M"
  }
}

#part a.  Should be 17/30 = .57
mean(games$Win1)
## [1] 0.584
#part b.  Should be 23/34 = .68
mean(games$Win2[games$Win1 == 1])
## [1] 0.6969178
#part c.  The games are not independent, since if you win game 1 you are more likely to win game 2
mean(games$Win2)
## [1] 0.593
mean(games$Win2[games$Win1 == 1])
## [1] 0.6969178
#if you also condition on the skill level, though, they are independent
#this doesn't rigorously prove independence, just one example!
mean(games$Win2[games$Skill == "B"])
## [1] 0.9329446
mean(games$Win2[games$Win1 == 1 & games$Skill == "B"])
## [1] 0.928125




BH 2.38
  1. Consider the following 7-door version of the Monty Hall problem. There are 7 doors, behind one of which there is a car (which you want), and behind the rest of which there are goats (which you don’t want). Initially, all possibilities are equally likely for where the car is. You choose a door. Monty Hall then opens 3 goat doors, and offers you the option of switching to any of the remaining 3 doors. Assume that Monty Hall knows which door has the car, will always open 3 goat doors and offer the option of switching, and that Monty chooses with equal probabilities from all his choices of which goat doors to open. Should you switch? What is your probability of success if you switch to one of the remaining 3 doors?
#replicate
set.seed(110)
sims = 1000

#indicators for winning if we switch or stay
win.switch = rep(0, sims)
win.stay = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #create the doors, numbered
  doors = c(1:7)
  
  #location of the car
  car = sample(1:7, 1)
  
  #the door you pick
  pick = sample(1:7, 1)
  
  #if you stay, you win, if you have the car!
  if(pick == car){
    win.stay[i] = 1
  }
  
  #monty picks three doors, not yours or the car
  monty.doors = sample(doors[c(-pick, -car)], 3)
  
  #take out the doors that monty showed us
  doors = doors[c(-monty.doors)]
  
  #switch
  new.door = sample(doors[c(-pick)], 1)
  
  #see if you won
  if(new.door == car){
    win.switch[i] = 1
  }
}

#should get 2/7 = .285, 1/7= .143
mean(win.switch)
## [1] 0.246
mean(win.stay)
## [1] 0.155


  1. Generalize the above to a Monty Hall problem where there are \(n \geq 3\) doors, of which Monty opens m goat doors, with \(1 \leq m \leq n - 2\).
#define simple parameters
m = seq(from = 1, to = 48, by = 1)
n = 50

#set the paths
win.path.switch = rep(NA, length(m))
win.path.stay = rep(NA, length(m))

#iterate over n
for(j in 1:length(m)){
  
  #indicators if we win by switching or staying
  win.switch = rep(0, sims)
  win.stay = rep(0, sims)
  
  for(i in 1:sims){
    
    #create the doors, numbered
    doors = c(1:n)
    
    #location of the car
    car = sample(1:n, 1)
    
    #the door you pick
    pick = sample(1:n, 1)
    
    #if you stay, you win, if you have the car!
    if(pick == car){
      win.stay[i] = 1
    }
    
    #monty picks three doors, not yours or the car
    monty.doors = sample(doors[c(-pick, -car)], m[j])
    
    #take out the doors that monty showed us
    doors = doors[c(-monty.doors)]
    
    #switch
    new.door = sample(doors[c(-pick)], 1)
    
    #see if you won
    if(new.door == car){
      win.switch[i] = 1
    }
  }
  
  #mark the results
  win.path.switch[j] = mean(win.switch)
  win.path.stay[j] = mean(win.stay)
}
  

#switch probability should increase with doors
plot(m, win.path.switch, xlab = "Number of Doors Monty Opens",
     main = "P(Win) for 50 doors", ylab = "Win Probability",
     type = "l", col = "black", lwd = 4)
lines(m, win.path.stay, col = "red", lwd = 4)
  

legend("topleft", legend = c("Win Probability if you switch", "Win Probability if you stay"),
       lty=c(1,1), lwd=c(2.5,2.5),
       col=c("black", "red"))




BH 2.39

Consider the Monty Hall problem, except that Monty enjoys opening door 2 more than he enjoys opening door 3, and if he has a choice between opening these two doors, he opens door 2 with probability \(p\), where \(1/2 \leq p \leq 1\).

  1. Find the unconditional probability that the strategy of always switching succeeds (unconditional in the sense that we do not condition on which of doors 2 or 3 Monty opens).

  2. Find the probability that the strategy of always switching succeeds, given that Monty opens door 2.

  3. Find the probability that the strategy of always switching succeeds, given that Monty opens door 3.

#replicate
set.seed(110)
sims = 1000

#set the paths
win.switch = rep(0, sims)
monty.open = rep(0, sims)

#define a simple parameter
p = .75

#run the loop
for(i in 1:sims){
  
  #initialize the doors
  doors = c(1:3)
  
  #pick door 1 and generate which door has the car
  pick = 1
  car = sample(1:3, 1)
  
  #if we picked the car door, monty has 2 choices
  if(pick == car){
    
    #pick monty's door
    monty.door = sample(x = doors[-c(pick)], size = 1)
    
    #special case where Monty can open door 2 or 3
    #this only occurs when we pick 1, and the car is behind 1
    if(pick == 1 && car == 1){
      monty.door = sample(2:3, 1, prob = c(p, 1 - p))
    }
  }
  
  #if we picked the non car door, monty only has one choice
  else if(pick != car){
    monty.door = doors[-c(pick, car)] 
  }
  
  #see if the switching worked
  if(car == doors[-c(monty.door, pick)]){
    win.switch[i] = 1
  }
  
  #keep track of what monty opened
  monty.open[i] = monty.door
}

#part a. Analytical solution is 2/3
mean(win.switch)
## [1] 0.636
#part b. Should be 1/(1 + p) = .57
mean(win.switch[monty.open == 2])
## [1] 0.5362563
#part c. Should be 1/(2 - p) = .8
mean(win.switch[monty.open == 3])
## [1] 0.7813268




BH 2.42

A fair die is rolled repeatedly, and a running total is kept (which is, at each time, the total of all the rolls up until that time). Let \(p_n\) be the probability that the running total is ever exactly \(n\) (assume the die will always be rolled enough times so that the running total will eventually exceed n, but it may or may not ever equal \(n\)).

  1. Write down a recursive equation for \(p_n\) (relating \(p_n\) to earlier terms \(p_k\) in a simple way). Your equation should be true for all positive integers n, so give a definition of \(p_0\) and \(p_k\) for \(k\) < 0 so that the recursive equation is true for small values of \(n\).

  2. Find \(p_7\).

#replicate
set.seed(110)
sims = 1000

#keep track if we hit 7 or not
hit7 = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #keep track of the running total  
  runsum = 0
  
  #run until we get above 7, or equal to it
  while(runsum < 7){
    
    #roll the die, add the amount
    runsum = runsum + sample(1:6, 1)
  }
  
  #see if we got 7
  if(runsum == 7){
    hit7[i] = 1
  }
}

#part b., should be about .254
mean(hit7)
## [1] 0.254


  1. Give an intuitive explanation for the fact that pn \(\rightarrow\) 1/3.5 = 2/7 as \(n \rightarrow \infty\)
#take number of sims down for computational speed
sims = 100

#keep track if we hit each number or not
n = round(seq(from = 7, to = 500, length.out = 10))
hit_n = rep(0, length(n))

#iterate over n
for(j in 1:length(n)){
  
  #keep track of hits
  hit_i = rep(0, sims)
  
  #run the loop
  for(i in 1:sims){
    
    #keep track of the running total  
    runsum = 0
    
    #run until we get above the value, or equal to it
    while(runsum < n[j]){
      
      #roll the die, add the amount
      runsum = runsum + sample(1:6, 1)
    }
    
    #see if we hit n
    if(runsum == n[j]){
      hit_i[i] = 1
    }
  }
  
  #mark the results
  hit_n[j] = mean(hit_i)
}

#asymptotically we should approach 2/7
#convergence is slow here!
plot(n, hit_n, type = "l", xlab = "n", 
     ylab = "p_n", main = "n vs. p_n",
     col = "red", lwd = 3)
abline(h = 2/7)




BH 2.44

Calvin and Hobbes play a match consisting of a series of games, where Calvin has probability p of winning each game (independently). They play with a “win by two”” rule: the first player to win two games more than his opponent wins the match. Find the probability that Calvin wins the match (in terms of p), in two different ways:

  1. by conditioning, using the law of total probability.

  2. by interpreting the problem as a gambler’s ruin problem.

#replicate
set.seed(110)
sims = 1000

#define p
p = .6

#indicator if calvin won
calvin_wins = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #trackers for calvin and hobbe's wins
  calvin = 0
  hobbes = 0
  
  #stop when one of them gets two ahead
  while(abs(calvin - hobbes) < 2){
    
    #see who wins
    draw = runif(1)
    
    #the case that calvin wins
    if(draw < p){
      calvin = calvin + 1
    }
    
    #the case that hobbes wins
    if(draw > p){
      hobbes = hobbes + 1
    }
  }
  
  #mark if calvin won overall
  if(calvin > hobbes){
    calvin_wins[i] = 1
  }
}    

#should be (p^2)/(p^2 + (1 - p)^2) = .69
mean(calvin_wins)
## [1] 0.707




BH 3.6

Benford’s law states that in a very large variety of real-life data sets, the first digit approximately follows a particular distribution with about a 30% chance of a 1, an 18% chance of a 2, and in general \[P(D = j) = \log_{10} \left(\frac{j+1}{j}\right), \textrm{ for } j \in \{1,2,3,\dots,9\},\] where \(D\) is the first digit of a randomly chosen element. Check that this is a valid PMF (using properties of logs, not with a calculator).

j = 1:9
PMF = log((j + 1)/j, base = 10)

#should sum to 1
sum(PMF)
## [1] 1




BH 3.21

Let \(X \sim \textrm{Bin}(n,p)\) and \(Y \sim \textrm{Bin}(m,p)\), independent of \(X\). Show that \(X-Y\) is not Binomial.

#replicate
set.seed(110)
sims = 1000

#define simple parameters
p = 1/2
m = 10
n = 20

#draw the r.v.'s
X = rbinom(sims, m, p)
Y = rbinom(sims, n, p)

#the mean is negative, which is impossible for a Binomial
mean(X - Y)
## [1] -5.127




3.25

Alice flips a fair coin \(n\) times and Bob flips another fair coin \(n+1\) times, resulting in independent \(X \sim \textrm{Bin}(n,\frac{1}{2})\) and \(Y \sim \textrm{Bin}(n+1,\frac{1}{2})\).

  1. Show that \(P(X < Y) = P( n - X < n+1-Y)\).
#replicate
set.seed(110)
sims = 1000

#define simple parameters
n = 10
p = 1/2

#keep track of X and Y
X = rep(NA, sims)
Y = rep(NA, sims)

#run the loop
for(i in 1:sims){
  
  #generate the r.v.'s
  X[i] = rbinom(1, n, p)
  Y[i] = rbinom(1, n + 1, p)
}

#these should match
length(X[X < Y])/sims; length(X[(n - X) < (n + 1 - Y)])/sims
## [1] 0.517
## [1] 0.483
  1. Compute \(P(X<Y)\).

Hint: Use (a) and the fact that \(X\) and \(Y\) are integer-valued.

#recycle vectors
#should get 1/2
length(X[X < Y])/sims
## [1] 0.517




BH 3.35

Players A and B take turns in answering trivia questions, starting with player A answering the first question. Each time A answers a question, she has probability \(p_1\) of getting it right. Each time B plays, he has probability \(p_2\) of getting it right.

  1. If A answers \(m\) questions, what is the PMF of the number of questions she gets right?
#replicate
set.seed(110)
sims = 1000

#define simple parameters
p1 = 3/4
m = 10

#count how many A gets correct
A = rep(0, sims)


#run the loop
for(i in 1:sims){
  
  #run through each question, see if A gets it right
  for(j in 1:m){
    if(runif(1) < p1){
      A[i] = A[i] + 1
    }
  }
}

#calculate the analytical PMF
k = 0:m
PMF = dbinom(k, m, p1)

#PMFs should line up
#plot the empirical PMF
plot(table(A)/length(A), col = "black", 
     main = "PMF", type = "h",
     xlab = "x", ylab = "P(X = x)",
     xlim = c(min(k), max(k)),
     ylim = c(0, 1), lwd = 3)

#plot the analytical PMF
lines(k, PMF, main = "Analytical PMF", 
      ylab = "P(X = x)", xlab = "x", 
      col = "red", pch = 16, 
      ylim = c(0, 1), type = "p", lwd = 3)


legend("topright", legend = c("Empirical PMF", "Analytical PMF"),
       lty=c(1,20), lwd=c(2.5,2.5),
       col=c("black", "red"))

  1. If A answers \(m\) times and B answers \(n\) times, what is the PMF of the total number of questions they get right (you can leave your answer as a sum)? Describe exactly when/whether this is a Binomial distribution.
#define simple parameters.  Let p2 = p1 here
p1 = 3/4
m = 10
n = 15
p2 = p1

#count how many A and B get correct
A = rep(0, sims)
B = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #run through each question, see if A gets it right
  for(j in 1:m){
    if(runif(1) < p1){
      A[i] = A[i] + 1
    }
  }
  
  #run through each question, see if B gets it right
  for(j in 1:n){
    if(runif(1) < p2){
      B[i] = B[i] + 1
    }
  }
}

#add together
C = A + B

#calculate analytical PMF
k = 0:(m + n)
PMF = dbinom(k, m + n, p1)


#PMFs should line up
#plot the empirical PMF
plot(table(C)/length(C), col = "black", 
     main = "PMF", type = "h",
     xlab = "x", ylab = "P(X = x)", 
     xlim = c(min(k), max(k)), 
     ylim = c(0, 1), lwd = 3)

#plot the analytical PMF
lines(k, PMF, main = "Analytical PMF", 
      ylab = "P(X = x)", xlab = "x", 
      col = "red", pch = 16, 
      ylim = c(0, 1), type = "p", lwd = 3)


legend("topright", legend = c("Empirical PMF", "Analytical PMF"),
       lty=c(1,20), lwd=c(2.5,2.5),
       col=c("black", "red"))




BH 3.45

A new treatment for a disease is being tested, to see whether it is better than the standard treatment. The existing treatment is effective on 50% of patients. It is believed initially that there is a 2/3 chance that the new treatment is effective on 60% of patients, and a 1/3 chance that the new treatment is effective on 50% of patients. In a pilot study, the new treatment is given to 20 random patients, and is effective for 15 of them.

  1. Given this information, what is the probability that the new treatment is better than the standard treatment?
#replicate
set.seed(110)
sims = 1000

#indicator if we are in the 60% case or not
high.effect = rep(0, sims)

#count how many patients this is effective for
effective = rep(NA, sims)

#run the loop
for(i in 1:sims){
  
  #see if we are in the 60% case
  if(runif(1) < 2/3){
    
    #mark that we are in this case
    high.effect[i] = 1
    
    #see how many patients had positive results.  This is Bin(20, .6)
    effective[i] = sum(rbinom(1, 20, .6))
  }
  
  #50% case
  else{
    #see how many patients had positive results.  This is Bin(20, .5)
    effective[i] = sum(rbinom(1, 20, .5))
  } 
}

#should get (choose(20, 15)*.6^(15)*.4^5*2/3)/((choose(20, 15)*.6^(15)*.4^5*2/3) + choose(20, 15)*.5^(20)*(1/3)) = .909
mean(high.effect[effective == 15])
## [1] 0.9047619
  1. A second study is done later, giving the new treatment to 20 new random patients. Given the results of the first study, what is the PMF for how many of the new patients the new treatment is effective on? (Letting \(p\) be the answer to (a), your answer can be left in terms of \(p\).)
#we now have a new probability of being in the high effect case:
p = mean(high.effect[effective == 15])

#now we run the same code, but with this different p.


#indicator if we are in the 60% case or not
high.effect = rep(0, sims)

#count how many patients this is effective for
effective = rep(NA, sims)

#run the loop
for(i in 1:sims){
  
  #see if we are in the 60% case
  if(runif(1) < p){
    
    #mark that we are in this case
    high.effect[i] = 1
    
    #see how many patients had positive results.  This is Bin(20, .6)
    effective[i] = sum(rbinom(1, 20, .6))
  }
  
  #50% case
  else{
    #see how many patients had positive results.  This is Bin(20, .5)
    effective[i] = sum(rbinom(1, 20, .5))
  } 
}


#calculate the analytical PMF
k = 0:20
PMF = p*choose(20, k)*.6^k*.4^(20 - k) + (1 - p)*choose(20, k)*.5^20


#PMFs should line up
#plot the empirical PMF
plot(table(effective)/sims, 
     col = "black", main = "PMF", 
     type = "h", xlab = "x", 
     ylab = "P(X = x)", 
     xlim = c(min(k), max(k)), 
     ylim = c(0, 1), lwd = 3)

#analytical
lines(k, PMF, main = "Analytical PMF", 
      ylab = "P(X = x)", xlab = "x", 
      col = "red", pch = 16, 
      ylim = c(0, 1), type = "p", lwd = 3)


legend("topright", legend = c("Empirical PMF", "Analytical PMF"),
       lty=c(1,20), lwd=c(2.5,2.5),
       col=c("black", "red"))





Discrete Random Variables




3.1

With help from Matt Goldberg

You are playing a game of Russian Roulette with one other person. The rules of game are as follows: a bullet is placed randomly in one of six chambers of a gun, and the players take turns pulling the trigger (if the bullet chamber comes up, the bullet fires and the player loses, otherwise the gun does not fire and the game continues). Every time the trigger is pulled, the chambers rotate so that a new chamber comes into the ‘firing position’ (so at maximum, the gun is fired six times).

  1. If you would like to maximize your chances of winning this game, should you go first or second?

  2. Re-solve part (a) in the case of a game of Russian Roulette using a gun with \(n\) chambers, where \(n \geq 2\) is an even number.

  3. Re-solve part (b) in the case where \(n > 2\) is an odd number. Discuss what happens as \(n \rightarrow \infty\).

  4. Return to the conventional set-up of the game as described at the start of the problem. Let \(X\) be the number of blanks fired (trigger pulls before the bullet is fired, not including the bullet firing). Remember that the game ends when the bullet is fired. Explain why \(X\) is NOT Geometric.

  5. Continuing as in (d), find \(E(X)\).



Analytical Solution

  1. We could solve this using a brute force probability calculation, but this will be less useful for part (b). Instead, let’s employ a symmetry argument: imagine that the player who goes first has ‘ownership’ over the first, third and fifth chambers (if the bullet ends up in one of these chambers, the player loses). This player’s probability of losing, then, is the probability that the bullet ends up in one of those chambers. By the naive definition of probability (6 chambers, all equally likely) this is 3/6 = 1/2. So, as a probability-maximizer, you should be indifferent between going first or second (although, as a subjective human, you may not be).

  2. The same argument in part (a.) applies. There are \(n/2\) ‘losing’ spots out of \(n\) possible slots. This yields \((n/2)/2 = 1/2\), as before.

  3. The same argument applies, but now the player that goes first has ‘ownership’ over \((n + 1)/2\) slots out of \(n\) slots, so the probability reduces to \(\frac{(n + 1)}{2n}\). This is greater than \(1/2\), which means we should go second, where we have probability \(1 - \frac{(n + 1)}{2n} = \frac{(n - 1)}{2n} < 1/2\) of losing. As \(n \rightarrow \infty\), \(\frac{n + 1}{n} \rightarrow 1\), so we again become indifferent between going first or second. This is intuitive; if there were millions of chambers, you wouldn’t mind much if you took on one extra relative to the other player!

  4. The probability of firing a blank changes after each shot, since the number of remaining chambers decreases. The \(p\) parameter in a Geometric is constant. Further, the support of a Geometric is non-negative integers, while \(X\) takes on 1,2,…,5 only.

  5. Define \(X = I_1 + ... + I_6\) where \(I_j\) is the indicator that we fire a blank on the \(j^{th}\) round. Immediately, we can see that \(I_6\) is degenerate because it always takes value 0: either the game ends before the final round, or we get to the final round and the bullet must be in the last chamber. Taking expectations of both sides and applying linearity (even though the indicators are dependent, linearity still holds) and the fundamental bridge:

\[E(X) = E(I_1) + ... + E(I_5) = P(I_1 = 1) + ... + P(I_5 = 1)\]

We can’t apply a symmetry argument, because the probabilities are different. \(P(I_1 = 1)\) is clearly \(5/6\). However, for \(P(I_2 = 1)\), we must think conditionally. The game ends if we fire the bullet in the first round, so we need to condition on what happens in the first round. Employing LOTP and conditioning on the outcome of the first round:

\[P(I_2 = 1) = P(I_2 = 1 | I_1 = 1)P(I_1 = 1) + P(I_2 = 1 | I_1 = 0)P(I_1 = 0)\]

\[P(I_2 = 1 | I_1 = 1)P(I_1 = 1) \]

We drop the second term because if the bullet is fired during the first round, the game ends and we cannot possibly fire a blank on the second round. \(P(I_2 = 1 | I_1 = 1)\) is \(4/5\), since, if we fired a blank during the first round, there are 5 chambers left and 4 of them are blank. Therefore, we find \(\frac{5}{6}\cdot\frac{4}{5}.\) Continuing in this way, we find that:

\[P(I_j = 1) = \prod_{i = 0}^{j - 1} \frac{5 - i}{6 - i}\]

For \(j = 1, 2, ..., 5\). Calculating this yields \(E(X) = 2.5\).


Empirical Solution:

#part a.
#replicate
set.seed(110)
sims = 1000

#indicator if the first person wins
win = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #set the chambers
  chambers = sample(c("bullet", rep("blank", 5)))
  
  #iterate through each round
  for(j in c(1,3,5)){
    
    #the first person fires; see if they got the bullet, break if they did
    if(chambers[j] == "bullet"){
      break
    }
    
    #mark it down and break if the second person gets the bullet
    if(chambers[j + 1] == "bullet"){
      win[i] = 1
      break
    }
  }
}

#should get 1/2
mean(win)
## [1] 0.485
#part b.
#try this for a different n (should work for any even n)
n = 10


#indicator if the first person wins
win = rep(0, sims)


#run the loop
for(i in 1:sims){
  
  #set the chambers
  chambers = sample(c("bullet", rep("blank", n - 1)))
  
  #iterate through each round
  for(j in seq(from = 1, to = (n - 1), by = 2)){
    
    #the first person fires; see if they got the bullet, break if they did
    if(chambers[j] == "bullet"){
      break
    }
    
    #mark it down and break if the second person gets the bullet
    if(chambers[j + 1] == "bullet"){
      win[i] = 1
      break
    }
  }
}

#should get 1/2
mean(win)
## [1] 0.51
#part c.
#try this for increasing, odd n (just an example)
n = seq(from = 5, to = 31, by = 2)

#keep track of the win probabilities
probs = rep(NA, length(n))

#iterate over n
for(k in 1:length(n)){
  
  #indicator if the second person wins
  win = rep(0, sims)
  
  
  #run the loop
  for(i in 1:sims){
    
    #set the chambers
    chambers = sample(c("bullet", rep("blank", n[k] - 1)))
    
    #iterate through each round
    for(j in seq(from = 1, to = n[k], by = 2)){
      
      #the first person fires; see if they got the bullet, break if they did
      if(chambers[j] == "bullet"){
        win[i] = 1
        break
      }
      
      #mark it down and break if the second person gets the bullet
      if(chambers[j + 1] == "bullet"){
        break
      }
    }
  }
  
  #mark the mean
  probs[k] = mean(win)
}

#should approach 1/2
plot(n, probs, main = "P(Second Person Wins) for odd n",
     xlab = "n", ylab = "P(Second Person Wins)",
     ylim = c(0, 1), pch = 16)
abline(h = 1/2, col = "red")

#parts d and e
#keep track of X
X = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #set the chambers
  chambers = sample(c("bullet", rep("blank", 5)))
  
  #iterate through each round
  for(j in c(1,3,5)){
    
    #the first person fires; see if they got the bullet, break if they did
    if(chambers[j] == "bullet"){
      break
    }
    
    #increment X if we fired a blank
    X[i] = X[i] + 1
    
    #mark it down and break if the second person gets the bullet
    if(chambers[j + 1] == "bullet"){
      break
    }
    
    #increment X if we fired a blank
    X[i] = X[i] + 1
  }
}

#histograms should be different
hist(X, col = "gray", main = "X", xlab = "x")

hist(rgeom(sims, 1/6), col = "gray", main = "Geom(1/6)", xlab = "")

#find the mean, should get 2.5
mean(X)
## [1] 2.411




3.2

(Help from Matt Goldberg) The Negative Hypergeometric distribution is a discrete distribution that takes three parameters, \(n\) = total number of balls, \(k\) = number of white balls, and \(r\) = number of black balls when the experiment is stopped (i.e., after observing \(r\) black balls, we stop drawing balls). If \(X ~ NHgeom(n, k, r)\), then \(X\) counts the number of white balls sampled from \(n\) balls (without replacement) until we have sampled \(r\) black balls.

  1. Explain how this distribution is similar to and different from a Hypergeometric distribution.

  2. If \(X NHgeom(n, k, r)\), \(E(X) = \frac{rk}{n - k + 1}\). Use this fact to provide a more elegant Solution: to 4.1 (the Russian Roulette problem.)



Analytical Solution

  1. Consider this compared to the Hypergeometric: in both cases, we are sampling balls without replacement out of an urn and counting successes (white balls). In the Hypergeometric case, we draw a pre-determined number of balls. In the Negative Hypergeometric case, we draw balls until we achieve a pre-determined number of failures.

  2. If we define drawing a blank as a white ball (successful trial) and firing the bullet as a failure (black ball) then we see that \(X \sim NHGeom(6, 5, 1)\). In general, this distribution has mean \(\frac{rk}{n - k + 1}\), which in this case comes out to 2.5, as we found above.


Empirical Solution

#should get 2.5
n = 6; r = 1; k = 5
(r*k)/(n - k + 1)
## [1] 2.5




3.3

Let \(X \sim Pois(c\lambda)\), where \(c\) is a positive integer. Let \(Y \sim Pois(\lambda)\) and \(Z \sim cY\). Are \(X\) and \(Z\) identically distributed? That is, do they have the same distribution (i.e., two quarters, when flipped, are different random variables, but both have a \(Bern(1/2)\) distribution if we are counting the number of heads)?



Analytical Solution:

Immediately, we can tell that the supports are different: \(X\) is Poisson and takes on non-negative integers, while \(Z\) can only be non-negative multipls of \(c\) (this means that \(Z\) is not even Poisson).

Let’s go a bit further. First, let’s compare the PMFs. The PMF of \(X\) is simply Poisson with parameter \(c\lambda\):

\[P(X = x) = \frac{e^{-c \lambda }(c \lambda)^x}{x!}\]

We can find the PMF of \(Z\) by thinking about the PMF of \(Y\). The probability of \(Z\) taking on value \(z\) is the same as the probability of \(Y\) taking on the value \(z/c\) (since if \(Y = z/c\), then \(Z = cY = cz/c = z\)). We know the probability of \(Y\) taking on \(z/c\) because we know the PMF of \(Y\):

\[P(Z = z) = \frac{e^{\lambda} \lambda^{z/c}}{(z/c)!}\]

Clearly, the PMFs of \(X\) and \(Z\) are different; we can see this quickly by noting that \(P(X = 0) = e^{-c \lambda}\) and \(P(Z = 0) = e^{\lambda}\). Therefore, \(X\) and \(Z\) are not identically distributed.

Finally, we can go further and show again that \(Z\) is not even Poisson, like \(X\). First, \(Var(Z) = Var(cY) = c^2Var(Y) = c^2\lambda\). However, \(E(Z) = cE(Y) = c\lambda\), and so \(E(Z) \neq Var(Z)\), meaning that \(Z\) is not Poisson (when \(c \neq 1\), which is possible) and therefore does not have the same distribution as \(X\).


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#define simple parameters
lambda = 1
c = 3

#generate the r.v.'s
X = rpois(sims, c*lambda)
Z = c*rpois(sims, lambda)


#calculate the analytical PMFs
x = as.numeric(rownames(table(X)))
z = as.numeric(rownames(table(Z)))
PMF.x = exp(-c*lambda)*(c*lambda)^x/factorial(x)
PMF.z = exp(-lambda)*lambda^(z/c)/(factorial(z/c))

#the PMFs should match within plots, but not across plots
plot(table(X)/sims, ylim = c(0,1/2), 
     main = "X ~ Pois(lambda*d)", lwd = 3,
     xlab = "x", ylab = "P(X = x)")
lines(x, PMF.x, col = "red", lwd = 3, type = "p", pch = 20)
legend("topright", legend = c("Empirical PMF", "Analytical PMF"),
       lty=c(1,20), lwd=c(2.5,2.5),
       col=c("black", "red"))

plot(table(Z)/sims, ylim = c(0, 1/2), 
     main = "Z ~ d*Pois(lambda)",
     xlab = "y", ylab = "P(Y = y)")
lines(z, PMF.z, col = "red", lwd = 3, type = "p", pch = 20)
legend("topright", legend = c("Empirical PMF", "Analytical PMF"),
       lty=c(1,20), lwd=c(2.5,2.5),
       col=c("black", "red"))




3.4

Nick’s favorite word is ‘no’. In fact, he loves the word ‘no’ so much that he employs the following pattern of speech: for every word he speaks, he says ‘no’ with probability \(1/4\) and some word other than ‘no’ with probability \(3/4\), independently across words. You have a conversation with Nick where he says \(n \geq 3\) words. Find the expected number of times that he says “no no no”. If he says “no no no no”, this counts as two “no no no” phrases (the first ‘no’ to the third ‘no’, and then the second ‘no’ to the fourth ‘no’).



Analytical Solution:

Let \(X\) b the number of times Nick says ‘no no no’, and then define \(X = I_{123} + I_{234} + ... + I_{n - 2, n - 1, n}\), where \(I_{i, j, k}\) is the indicator that the \(i^{th}\), \(j^{th}\) and \(k^{th}\) words are all ‘no’. Taking expectation and using linearity and symmetry:

\(E(X) = E(I_{123} + I_{234} + ... + I_{n - 2, n - 1, n})\)

\(E(X) = E(I_{123}) + E(I_{234}) + ... + E(I_{n - 2, n - 1, n})\)

\(E(X) = (n - 2)E(I_{123})\)

Since there are \(n - 2\) indicators (\(n - 2\) slots of length 3). By the fundamental bridge, we need the probability that the first, second and third words are all ‘no’. Since each word independently has probability \(1/4\) of being ‘no’, we multiply to get \(1/4^3\). Therefore, we have \(E(X) = \frac{n - 2}{4^3}\).


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#define a simple parameter; small enough that we can run the loop quickly
n = 50

#count the number of 'no no no' occurences
X = rep(0, sims)

#create the pattern that we will look for
pattern = "nonono"

#run the loop
for(i in 1:sims){
  
  #create a conversation; sample no and some other word
  convo = sample(c("no", ""), n, replace = TRUE, prob = c(1/4, 3/4))

  #iterate through the convo and see if we get a pattern
  for(j in 3:n){
    
    #see if we got the pattern
    if(convo[j - 2] == "no" && convo[j - 1] == "no" && convo[j] == "no"){
      X[i] = X[i] + 1
    }
  }
}

#should get (n - 2)/(4^3) = 3/4
mean(X)
## [1] 0.731




3.5

The first chord in the song Bohemian Rhapsody by Queen is a ‘B flat 6’, which is correctly played with 4 distinct keys on a 88 key piano.

  1. You randomly select 4 distinct keys on a piano and play them together (as a chord) until you play the B flat 6 (in the correct key; again, there are only 4 correct keys). Let \(X\) be the number of chords you play (including the B flat 6), and find \(E(X)\).

Hint: Remember, a Geometric random variable ‘doesn’t count the success’, but a First Success distribution does.

  1. The first lyrics in the song are “Is this the real life”. Imagine that you speak words at random until you have dictated these opening lyrics verbatim. Let \(Y\) be the number of words you speak until you have recited these lyrics. Explain why \(Y\) does not have the same distribution as \(X\) (not just the same distribution with a different parameter, but a different distribution altogether).



Analytical Solution:

  1. \(X\) has a First Success distribution (identical to a Geometric distribution except we count the ‘success’); every trial (each chord tried) is independent, with the same probability of success. The probability of success is the probability of selecting the 4 correct keys. By the naive definition of probability, there is 1 way to select the 4 correct notes, and \({88 \choose 4}\) ways to select any 4 notes. So, we get \(X \sim FS(p)\), where \(p = 1/{88 \choose 4}\). Therefore, by the definiton of the First Success distribution, \(E(X) = {88 \choose 4}\).

  2. \(Y\) is not First Success. Imagine the first two ‘trials’: words 1 to 5 and words 2 to 6 (the first two chunks of 5 words, since the length of the lyric is 5 words). These two trials are not independent, as the First Success story stipulates: for example, if the words 1 to 5 (first trial) are “Look up to the skies”“, then the words 2 to 6 (second trial) cannot possibly recite *“Is this the real life*" (since words 2-5 are wrong).


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#generate the 'keys'; for computational speed,
#   imagine there are a total of 10 keys on the piano
#label the correct keys 1, 2, 3, 4
keys = c(1, 2, 3, 4, rep(0, 6))

#count how many chords we need to play
chords = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  
  #go until we get the B flat 6, then break
  while(TRUE){
    
    #increment the number of chords played
    chords[i] = chords[i] + 1
    
    #play 4 keys
    play = sample(keys, 4)
    
    #see if we got the B flat 6 (i.e., we got the 1, 
    #  2, 3, 4, so our 'keys' add to 10)
    if(sum(play) == 10){
      break
    }
  }
}

#should get choose(10, 4) = 210
mean(chords)
## [1] 199.521




3.6

Datamatch is a Harvard Valentine’s day program where you fill out a questionairre and are matched based on some ‘compatability scores’ with other students who filled out the questionairre.

Say, for the purpose of this problem, that Harvard Datamatch is undergoing some reconstruction. Instead of giving you your top 10 matches as in the past, intimacy is being brought up a notch, and you are only given one match: the top person that you were compatible with. However, sadly, that doesn’t necessarily mean that they were matched with you: they could be given anyone. Say also that, even more unfortunately, the secret love algorithm is just a random generator that assigns you your top ‘match’ completely randomly, and there is no special sauce behind the scenes (you have an equal chance of getting everyone else in Datamatch as your top match).

Finally, you can’t be matched with yourself, and assignments do not have to be unique (one person can be the top match for multiple other people). This year, 100 people decided to fill out Datamatch.

  1. Say that you are one of the 100 people doing Datamatch and are given your true love for your top intimacy match. What is the probability that they also got you as their top match?

  2. A ‘lovebird pair’ occurs when two people get each other as their top intimacy match. Let \(M\) be the number of lovebird pairs. Find \(E(M)\).



Analytical Solution:

  1. Your true love has 99 options for top matches (a total of 99 people that they could get), so the chance that they randomly get you is just \(\frac{1}{99}\).

  2. This is an opportunity to use indicator random variables. We can write \(M = \sum_{i, j} I_{i,j}\) where \(I_{i,j}\) is an indicator variable that equals 1 if the \(i^{th}\) and \(j^{th}\) person match each other and 0 otherwise. Taking the expectation of both sides, then employing linearity and symmetry (there is no signficant difference between the indicator that the first and second person being matched and the fifteenth and twentieth person being matched):

\[E(M) = E(I_{1, 2}) + E(I_{1, 3}) + ... + E(I_{100, 99}) = {100 \choose 2}E(I_{1,2})\]

We get \({100 \choose 2}\) because this is the number of potential pairs in the 100 people. By the fundamental bridge, \(E(I_{1,2})\) is the probability that the first and second person match. Each person has (independently) probability \(1/99\) of matching with the other, so the probability that both match with each other is \(1/99^2\).

\[E(M) = \frac{{100 \choose 2}}{99^2} = .505\]

So we expect .505 ‘lovebird pairs’.


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#imagine that you are person #1 and your true love is person #100
#sample their match, a person from 1 to 99
match = sample(1:99, sims, replace = TRUE)

#should get 1/99
length(match[match == 1])/sims
## [1] 0.011
#now keep track of the number of lovebird pairs
pairs = rep(0, sims)

#label people 1 to 100
people = 1:100

#run the loop
for(i in 1:sims){
  
  #set a path for the matches
  matches = rep(NA, 100)
  
  #iterate over the matches; people can't match with themselves
  for(j in 1:100){
    
    #sample a match, can't sample yourself
    matches[j] = sample(people[-j], 1)
  }
  
  #count lovebird matches
  for(j in 1:100){
    
    for(k in j:100){
      
      #see if we got a match
      if(matches[j] == k){
        if(matches[k] == j){
          pairs[i] = pairs[i] + 1
        }
      }
    }
  }
}

#should get .505
mean(pairs)
## [1] 0.5




3.7

Recall the ‘hospital problem’. There are \(n\) couples (two parents), each of which has exactly 1 child. There is a mix-up at the hospital, and the \(n\) children are distributed randomly among the \(n\) couples. Let \(X\) be the number of couples that get their baby back.

  1. As a refresher, find \(E(X)\).

  2. Approximate \(P(X = 0)\), the probability that no one gets their baby back, as \(n \rightarrow \infty\).



Analytical Solution:

  1. Define \(n\) indicators \(I_j\) for the \(j^{th}\) couple being reunited with their baby. Each indicator has expectation \(\frac{1}{n}\) (each couple has one correct child and \(n\) possible childs that they can get). We have \(n\) indicators so the overall expectation is \(n/n = 1\).

  2. We have many (loosely independent) events, each with a small probability of success. Therefore, we can use the Poisson paradigm. Our \(\lambda\) is just the sum of all of the idnividual probabilities. As we’ve shown with the indicators, each probability is \(1/n\), and we have \(n\) of them, so \(\lambda = 1\).

So, we approximately have \(X \sim Pois(1)\). We can now just plug in 0 to the PMF of a Poisson to get:

\[P(X = 0) = e^{-1}\]


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#define a simple parameter
n = 10

#keep track of how many people get their baby back
matches = rep(NA, sims)

#label parents and children, 1 to n
parents = 1:n
children = 1:n

#run the loop
for(i in 1:sims){
  
  #assign the children randomly to the parents
  assignments = sample(children)
  
  #calculate the ratios of parents to children
  #if the ratio is 1, that means the parent got their child back!
  ratios = parents/assignments
  
  #mark how many matches we got (ratio of 1)
  matches[i] = length(ratios[ratios == 1])
}

#should get 1
mean(matches)
## [1] 0.967
#try this for increasing n
n = round(seq(from = 10, to = 100, length.out = 10))

#keep track of the probability of 0 matches
probs = rep(NA, length(n))

#iterate over n
for(j in 1:length(n)){
  
  #keep track of how many people get their baby back
  matches = rep(NA, sims)
  
  #label parents and children, 1 to n
  parents = 1:n[j]
  children = 1:n[j]
  
  #run the loop
  for(i in 1:sims){
    
    #assign the children randomly to the parents
    assignments = sample(children)
    
    #calculate the ratios of parents to children
    #if the ratio is 1, that means the parent got their child back!
    ratios = parents/assignments
    
    #mark how many matches we got (ratio of 1)
    matches[i] = length(ratios[ratios == 1])
  }
  
  #mark down the probability of getting 0
  probs[j] = length(matches[matches == 0])/sims
}

#should approach exp(-1) eventually
plot(n, probs, main = "P(no matches) for different n",
     xlab = "n", ylab = "P(X = 0)", col = "red",
     pch = 16, ylim = c(0, 1))
abline(h = exp(-1))




3.8

There are 50 states in the USA; assume for this problem that you have been to none of them. You visit the states at random (for each ‘round’ you randomly select one of the 50 to visit, even if you have already visited it) until you have visited every state. On average, how many visits will you make before you visit every state?



Analytical Solution:

Consider our first visit: we will visit a state we haven’t yet been to with probability 1. Consider the second visit: we will visit a new state with probability \(49/50\). The number of visits we make until visiting a new state, therefore, is First Success; we have a sequence of independent trials, each with the same probability of success, and we count the trials (including the success). We continue to iterate in this way (if we have visited \(x\) states, then the number of visits until we visit a new state is distributed \(FS(\frac{50 - x}{50}))\) until there are no new states to visit (we have visited them all). The mean of a \(FS(p)\) is simply \(1/p\), so we get a sum of expectations of \(FS\) r.v.’s:

\[1 + 50/49 + 50/48 + ... + 50/1 = 224.96\]

So on average you will make about 225 visits.


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#label the states 1 to 50
states = 1:50

#keep track of the number of visits
visits = rep(NA, sims)

#run the loop
for(i in 1:sims){
  
  #keep track of the stops on the trip
  stops = integer(0)
  
  #go until we visit each state, then break
  while(TRUE){
    
    #visit a new state
    stops = c(stops, sample(states, 1))
    
    #see if we've visited all 50 states; if we have, break
    if(length(unique(stops)) == length(states)){
      break
    }
  }
  
  #see how long the trip was
  visits[i] = length(stops)
}

#should get 225
mean(visits)
## [1] 226.684



3.9

Let \(I_A\) and \(I_B\) be the indicators for events \(A\) and \(B\), respectively. Let \(p_A = P(A)\) and \(p_B = P(B)\). Find the distribution of \(I_A^{I_B}\).



Analytical Solution:

We know that anything raised to 0 is 1 (even \(0^0 = 1\)), and that \(1^1\) is 1, so the only way for \(I_A^{I_B}\) to be 0 is if \(I_A = 0\) and \(I_B = 1\). That is, \(I_A^{I_B} \sim Bern(1 - p_B(1 - p_A))\), since \(I_A^{I_B}\) is 1 with probability \(1 - p_B(1 - p_A)\).


#replicate
set.seed(110)
sims = 1000

#define simple parameters
pa = 1/2
pb = 1/3

#generate the indicators
Ia = rbinom(sims, 1, pa)
Ib = rbinom(sims, 1, pb)

#should get 1 - pb*(1 - pa) = .833
mean(Ia^Ib)
## [1] 0.813




3.10

(With help from Juan Perdomo)


Recall the ‘pentagon problem’, which we will restate here.


Consider these 5 points:

You can make plots like this in Latex here.


Imagine selecting two points at random and drawing a straight line in between the two points. Do this 5 times, with the constraint that you cannot select the same pair twice. What is the probability that the lines and points form a pentagon (i.e., a five-sided, five-angled, closed shape)?

We saw this problem earlier in a counting context. Now, solve this problem using the Hypergeometric distribution.



Analytical Solution:

There are \({5 \choose 2} = 10\) potential pairs (we could also think about selecting the first point, which has 5 choices, then the second point, which has 4 choices, and dividing by 2 since the order of point selection doesn’t matter to get \(\frac{5 \cdot 4}{2} = 10\) possible pairs) and we are picking pairs one at a time without replacement. There are 5 ‘correct’ pairings (the configuration that produces a pentagon) and thus \(10 - 5\) ‘incorrect’ pairings. So, if we let \(X\) be the number of correct pairings that we select, \(X \sim HGeom(5, 10 - 5, 5)\) by the story of the Hypergeometric (selecting pairs without replacement, some are ‘good’, some are ‘bad’). We need \(P(X = 5)\), which is equivalent to the probability of picking all of the correct pairings and thus drawing the pentagon. Using the PMF of a Hypergeometric, we get:

\[P(X = 5) = \frac{{5 \choose 5}{5 \choose 0}}{{10 \choose 5}}\]

\[= \frac{1}{{10 \choose 5}}\]

As we saw earlier.




3.11

There are \(n\) people with red hats and \(n\) people with blue hats in a room. The people randomly pair off (i.e., they are each randomly paired with another person). Let \(X\) be the number of pairs with matching hat colors (i.e., a pair where both people have red hats).

  1. Find \(E(X)\).
  2. Let \(Y\) be the number of pairs that don’t have matching hat colors (one in the pair has a red hat and one has a blue hat). Find \(E(Y)\).
  3. Which is larger, \(E(X)\) or \(E(Y)\)? Why? How do they compare for large \(n\)?
  4. What is \(E(X) + E(Y)\)?



Analytical Solution:

  1. Let \(X = I_1 + I_2 + ... + I_n\) where \(I_j\) is the indicator that the \(j^{th}\) pair is a ‘color match’. By linearity and symmetry, we take expectations and get \(E(X) = nE(I_1)\). By the Fundamental Bridge, we now just have to find the probability that a random pair is a color match. Imagine picking the first person of the pair. Regardless of the color of his hat, there are \(2n - 1\) choices for the second person in the pair (\(2n\) people total, and 1 has been taken out) and \(n - 1\) choices of ‘same color’ hats (i.e., if we picked a person with a red hat for the first person in the pair, there are \(n - 1\) red hats left). By the naive definition of probability, this is \(\frac{n - 1}{2n - 1}\). Putting it all together:

\[E(X) = n(\frac{n - 1}{2n - 1})\]


  1. We can use a similar approach here. Let \(Y = I_1 + ... + I_n\), where \(I_j\) is the indicator that the \(j^{th}\) pair is not a color match. By linearity and symmetry, we have \(E(Y) = nE(I_n)\). By the Fundamental Bridge, we need to find the probability that a random pair is not a color match. Once we pick the first person for the pair, we have \(2n - 1\) choices for the second person, and \(n\) of these choices have a different color hat (we haven’t taken out any of the opposite color hats). By the naive definition of probability:

\[E(Y) = n(\frac{n}{2n - 1})\]

  1. We can see that \(E(Y)\) is slightly bigger (the numerator is \(n\) vs. \(n - 1\)). This makes sense. Imagine selecting a person with a red hat as the first person in the pair. There is now one less ‘red hat person’ in the population, whereas the number of ‘blue hat people’ has stayed the same; there is therefore a lower probability of selecting a ‘red hat person’ as the second person in the pair and thus creating a ‘color match’. As \(n\) gets large, the difference betwen \(E(X)\) and \(E(Y)\) is small. This makes sense: in the example we just discussed, even though we take a ‘red hat person’ out of the population, the population is so large that taking one person out doesn’t make much of a difference.


  1. \(E(X) + E(Y) = n\), because there are \(n\) pairs and each pair is either color matching or not color matching (no other alternatives). We could have used this to solve part b. in a more elegant way (knowing \(E(X)\), you can simply say \(E(Y) = n - E(X)\)).


We can also use this fact to ‘sanity check’ our answers; they should add to \(n\).

\[n(\frac{n - 1}{2n - 1}) + n(\frac{n}{2n - 1})\] We add fractions since we have common denominators. \[= n\frac{n - 1 + n}{2n - 1}\] \[= n\frac{2n - 1}{2n - 1} = n\]


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#define a simple parameter
n = 10

#create a vector for people; red hats are 1, blue are 0
people = c(rep(1, n), rep(0, n))

#keep track of X and Y
X = rep(0, sims)
Y = rep(0, sims)


#run the loop
for(i in 1:sims){
  
  #generate a random pairing (the first two are paired, etc.)
  pairs = sample(people)
  
  #iterate through the pairs, see if they are matching
  for(j in 1:n){
    
    #only way to have a non-matching pair is if the sum is 1
    if(sum(pairs[(2*j - 1):(2*j)]) == 1){
      Y[i] = Y[i] + 1
    }
    
    #otherwise, we have a matching pair
    if(sum(pairs[(2*j - 1):(2*j)]) != 1){
      X[i] = X[i] + 1
    }
  }
}

#should get n*(n - 1)/(2*n - 1) and n*n/(2*n - 1), or 4.74 and 5.26
mean(X); mean(Y)
## [1] 4.736
## [1] 5.264
#should always get n
table(X + Y)
## 
##   10 
## 1000




3.12

Imagine a table with \(n\) chairs. Each round, we ‘toggle’ each chair (chairs can either be occupied or empty, and ‘toggling’ means to switch to the other state, like from empty to occupied) independently with probability 1/2 (if we don’t toggle a chair, it stays in the same state). Let \(X_t\) be the number of filled seats at round \(t\), and start at \(t = 1\).

  1. Find \(E(X_t)\).

  2. Let \(M\) be the first time that \(X_t = n\); that is, the first time that the table is full. Find \(E(M)\).



Analytical Solution:

  1. We can say that \(X_t \sim Bin(n, 1/2)\), since every round we flip states or keep states with equal probabilities. Therefore, we get \(E(X_t) = n/2\).

  2. On every round, we have probability \(\frac{1}{2^n}\) of filling the table, since we require each chair to be filled and each chair has 1/2 probability of being filled every round. So, we can say that \(T \sim FS(\frac{1}{2^n})\) and thus \(E(M) = 2^n\).


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#define simple parameters
n = 5
t = 3

#keep track of Xt and M
Xt = rep(NA, sims)
M = rep(NA, sims)


#run the loop
for(i in 1:sims){
  
  #initialize X; keep track of full chairs
  X = integer(0)
  
  #first, run for t rounds
  for(j in 1:t){
    
    #define full chairs as 1, empty as 0, sample from a Binomial
    X = c(X, sum(rbinom(n, 1, 1/2)))
  }
  
  #mark Xt
  Xt[i] = X[t]
  
  #re-initialize X
  X = integer(0)
  
  #keep a counter
  count = 0
  
  #go until we get a full table
  while(sum(X) < n){
   
    #define full chairs as 1, empty as 0, sample from a Binomial
    X = rbinom(n, 1, 1/2)
    
    #incrememnt
    count = count + 1
  }
  
  #see how many rounds we took
  M[i] = count
}


#should get n/2 = 2.5 and 2^n = 32
mean(Xt); mean(M)
## [1] 2.526
## [1] 33.69




3.13

Imagine a restaurant with \(n > 1\) tables. Each table has \(k \leq n\) chairs. There are \(n\) people in the restaurant, and they are randomly assigned to the chairs in the restaurant. Let \(X\) be the number of empty tables. Find \(E(X)\).



Analytical Solution:

We can write \(X = I_1 + ... + I_n\), where \(I_j\) is the indicator that the \(j^{th}\) table is empty. Taking expectations:

\[E(X) = E(I_1 + ... + I_n)\] \[= E(I_1) + ... + E(I_n) = nE(I_1) = nP(I_1 = 1)\]

Where the last steps hold by linearity, symmetry and the fundamental bridge. Now consider the probability that the first table is empty. If we force the first table to be empty, that means we can’t assign people to the \(k\) chairs at the \(n\) tables. Instead, we have to pick \(k\) of the other \((n - 1)k\) chairs in the restaurant for these people. We divide this by the total number of ways to choose \(n\) chairs out of all of the \(nk\) chairs:

\[E(X) = n \frac{{(n - 1)k \choose n}}{{nk \choose n}}\]

We can employ a quick sanity check; if \(k = 1\), then we should have \(X = 0\) always, since we have \(n\) tables each with 1 chair for a total of \(n\) chairs for the \(n\) people. The people fill up the chairs and thus there are no empty tables, so \(X = 0\). If we plug \(k = 1\) into the formula above, we get \({n - 1 \choose n}\) in the numerator. Since \(n - 1 < n\), there are 0 ways to choose n people out of a group of size \(n - 1\), so the calculation comes out to 0, as desired.


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#define simple parameters
n = 10
k = 4

#define the chairs; label a chair by its table number
chairs = rep(1:n, k)

#keep track of number of empty tables
X = rep(NA, sims)

#run the loop
for(i in 1:sims){
  
  #sample tables, by sampling chairs
  tables = unique(sample(chairs, n, replace = FALSE))
  
  #see how many empty tables we have
  X[i] = n - length(tables)
}

#should get n*choose((n - 1)*k, n)/choose(n*k, n) = 3
mean(X)
## [1] 3.032




3.14

We know that the Geometric distribution is memoryless, and if \(Y \sim Geom(p)\), that \((Y + 1) \sim FS(p)\).


  1. Find the CDF of \(X\) if \(X \sim FS(p)\).

  2. Recall the technical condition for memorylessness:

\[P(X \geq n + k | X \geq n) = P(X \geq k)\]

Test to see if \(Y\) is memoryless.


  1. Provide some intuition for your result in part b.



Analytical Solution:

  1. We need to find \(P(X \leq x)\), and we know \(P(X \leq x) = 1 - P(X > x)\). Consider the RHS. To observe \(X > x\), we need at least \(x\) failures, since we will then have at least \(X = x + 1\) (we have at least \(x\) failures and then 1 success). So, \(P(X > x) = q^x\), which implies that \(P(X \leq x) = 1 - q^x\).

  2. Consider the LHS of the condition. We can expand it using the formula for conditional probaiblity:

\[P(X \geq n + k | X \geq n) = \frac{P(X \geq n + k \cap X \geq n)}{P(X \geq n)}\]

We know that \(X \geq n\) is a subset of \(X \geq n + k\), so \(P(X \geq n + k \cap X \geq n) = P(X \geq n + k)\).

\[= \frac{P(X \geq n + k)}{P(X \geq n)}\]

And we found that \(P(X \leq x) = 1 - q^x\), so we can plug in to the formula above (recall that \(P(X \geq x) = 1 - P(X < x) = 1 - P(X \leq x - 1)\), and that we already know the CDF of \(X\)). We get:

\[=\frac{q^{n + k - 1}}{q^{n - 1}} = q^k\]

Recall that we want to compare this to \(P(X \geq k) = P(X < k) = q^{k - 1}\). Since \(q^k \neq q^{k - 1}\), the distribution is not memoryless; the technical condition is \(not\) satisfied!

  1. The intuition is that we know that \(X\) will at least be 1. So, if we are sitting at time 0, we know that \(X\) will increase by at least 1. If we are sitting at time 1, we don’t know that we will have \(X\) increase to 2 or not. This violates the memoryless property. We can especially see this in the case when \(k = 1\). Here, we have the RHS of the technical condition give \(P(X \geq 1) = 1\), since the support of \(X\) is \(1,2,...\). However, the LHS is \(P(X \geq n + 1 | X \geq n)\), which is not necessarily 1.


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#define simple parameters
p = 1/2
k = 1
n = 2


#generate the r.v.s
Y = rgeom(sims, p)
X = Y + 1

#these should be different
length(X[X >= n + k])/length(X[X >= n]); length(X[X >= k])/sims
## [1] 0.4893617
## [1] 1




3.15

This problem is dedicated to Renan Carneiro, Nicholas Larus-Stone, CJ Christian, Juan Perdomo, Matt Goldberg and Dan Fulop.


“President” is a popular card game, and is played with a standard 52-card deck. Let’s consider a 4-person game. Each player is given a ‘title’ based on their past performance in the game. The best two players are “president” and “vice president” (best and second best, respectively) and the bottom two players are “scum” and “vice scum” (worst and second worst, respectively). Each player is dealt a random 13-card hand from the well-shuffled deck. The president gets to choose the two best cards in the scum’s hand; he then gives the scum his two worst cards. A similar transaction occurs between the vice president and vice scum, but with just 1 card.

The best card in the game is a 2, and since this is a standard deck, there are four 2’s in the deck (one for each suit).

  1. Find the expected number of 2’s that each player gets before the ‘transactions’ (i.e., the hand they are dealt before they swap cards).

  2. Assume that the president will always try to take as many 2’s as possible from the vice scum (and that he will never give up a 2). Let \(X\) be the number of 2’s that the president will end up with post-transaction (after he has taken two cards from the scum). Find \(E(X)\).

Hint: Consider \(Y\), the number of 2’s the scum gets post transaction.

  1. Explain why your answer to part (b) is not 2.



Analytical Solution:

  1. Let \(X_i\) be the number of 2’s dealt to the \(i^{th}\) player, for \(i = 1,2,3,4\). We know that \(X_1 + X_2 + X_3 + X_4 = 4\), since we are dealing all of the cards (and thus, all of the 2’s) to the four players (they have all four 2’s between them). By symmetry, \(E(X_i) = E(X_j)\) for all \(i, j\), so we know that each player expects to see one 2.

  2. As in the Hint, we will consider \(Y\). The only way for the scum to get 2’s is if the scum is dealt three or four 2’s (since the president will take up to two 2’s). So, we can find \(P(Y = 1)\) and \(P(Y = 2)\) as the probability that the scum is dealt three or four 2’s, respectively.

\[P(Y = 1) = \frac{{4 \choose 3}{48 \choose 10}}{{52 \choose 13}} = p_1\]

\[P(Y = 2) = \frac{{48 \choose 9}}{{52 \choose 13}} = p_2\]

Consider the first term. We choose the three 2’s that we want with the \({4 \choose 3}\) term, then we choose the rest of the hand with the \({48 \choose 10}\) term (there are 48 non-2 cards left in the deck, and we need 10 to complete a hand). We multiply these terms (multiplication rule) and divide by the total number of ways to select a 13-card hand, or \({52 \choose 13}\). We apply similar arguments to the second term (importantly, there is only way to choose the four 2’s, or \({4 \choose 4} = 1\), so this term drops out). We then define these terms as \(p_1\) and \(p_2\) to simplify notation.

By LoTUS, then, we know that:

\[E(Y) = p_1 + 2p_2\]

Is the expected number of 2’s that the scum gets. We also know that \(E(X + Y) = 2\), since the president and scum ‘own’ 26 of the 52 cards, and by symmetry we expect half of the four 2’s to end up in their half of the cards. Solving for \(E(X)\):

\[E(X) = 2 - E(Y)\]

\[ = 2 - p_1 - 2p_2\]

Where \(p_1, p_2\) are defined above. Plugging in:

\[E(Y) \approx 1.953\]

  1. 2 is the expected number of 2’s between the scum and president; however, the president doesn’t have complete control over the scum’s cards: he can only take 2 of them! For example, even if the scum has all four 2’s, the president can only take two of them. If the president could take as many cards as he wants (i.e., he has control over 26 of the 52 cards) his expected number of 2’s would be 2. However, there is a slim chance that the scum gets three or four 2’s, and the president misses out on some 2’s; again, he doesn’t have complete control.


Empirical Solution:

#replicate
set.seed(110)
sims = 1000


#keep track of X
X = rep(NA, sims)

#keep track of all of the 2's between the president and scum
twos = rep(NA, sims)

#run the loop
for(i in 1:sims){
  
  #define the deck; we only need values, not suits
  deck = rep(1:13, 4)
  
  #generate the two hands
  hand1 = sample(1:52, 13)
  pres = deck[hand1]
  
  #remove the cards in the president's hand from the deck
  deck = deck[-hand1]
  
  #select the second hand
  hand2 = sample(1:39, 13)
  scum = deck[hand2]
  
  
  #count how many 2's the president gets; he can get 
  #   at most two 2's from the scum
  X[i] = length(pres[pres == 2]) + min(length(scum[scum == 2]), 2)
  
  #see how many 2's there are total
  twos[i] = length(pres[pres == 2]) + length(scum[scum == 2])
}

#should get 1.95 for the mean
mean(X)
## [1] 1.934
#should get 2 for this mean; it's slightly higher
#   than the 2's that the president gets, since the 
#   president can take at most two 2's from the scum!
mean(twos)
## [1] 1.968




3.16

You are dealt a random 5-card hand from a standard, well-shuffled 52-card deck. Let \(X\) be the number of Aces that you get. Find \(Var(X)\).



Analytical Solution:

We can recognize that \(X \sim Hgeom(4, 48, 5)\), since we have 48 non-aces (unfavorable cards) and 4 aces (favorable cards) and we draw 5 cards total. You can look up the variance of a Hypergeometric (as well as key facts about all of the distributions) on William Chen’s cheatsheet. If we perform the calculation, we see that \(Var(X) = .327\).


#replicate
set.seed(110)
sims = 1000

#keep track of the number of aces
X = rep(NA, sims)

#define the deck; simply 1 for aces and 0 for non-aces
deck = c(rep(1, 4), rep(0, 48))

#run the loop
for(i in 1:sims){
  
  #draw a hand
  hand = sample(deck, 5, replace = FALSE)
  
  #see how many aces we got
  X[i] = sum(hand)
}

#should get .327
var(X)
## [1] 0.3325886




3.17

Let \(I_j\) be the indicator that you roll a \(j\) on one roll of a fair die, and let \(X = I_1 + I_2 + ... + I_6\). Nick claims that \(X \sim Bin(1, 1/6)\), since \(X\) is the sum of \(Bern(1/6)\) r.v.’s. Argue for or against his claim.



Analytical Solution:

The key is that \(I_1, I_2, ..., I_6\) are all highly dependent; no more than one can take on value 1 at a time. In fact, we know that \(X = 1\), or is a constant (degenerate random variable), since the die will always take on exactly one value from 1 to 6. We could say \(X \sim Bin(1, 1)\) (one trial with probability 1 of success) but saying \(X \sim Bin(1, 1/6)\) is incorrect.


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#roll the die many times
rolls = sample(1:6, sims, replace = TRUE)

#set paths for the indicators
I1 = rep(0, sims)
I2 = rep(0, sims)
I3 = rep(0, sims)
I4 = rep(0, sims)
I5 = rep(0, sims)
I6 = rep(0, sims)

#fill in the indicators
I1[rolls == 1] = 1
I2[rolls == 2] = 1
I3[rolls == 3] = 1
I4[rolls == 4] = 1
I5[rolls == 5] = 1
I6[rolls == 6] = 1

#calculate X
X = I1 + I2 + I3 + I4 + I5 + I6

#X should always be 1
table(X)
## X
##    1 
## 1000




3.18

Juan is reading a book with 10 pages. He starts on page 3 and flips forward a page (i.e., page 3 to 4) with probability \(p\) and backwards with probability \(1 - p\). What is the probability that he flips to the end of the book (flips to page 10) before flipping backwards more than once?



Analytical Solution:

The event described in the prompt can happen in two ways: Juan gets 7 ‘forward’ flips in a row, or Juan gets 7 forward flips and one back flip in his first 8 flips (excluding the sequence ‘7 forward flips followed by 1 back flip’, since this gets us to 10) and then 1 forward flip. The first ‘sequence pattern’ (i.e., 7 forward flips in a row) has probability \(p^7\). There are \({8 \choose 7} - 1\) possible sequences that follow the second ‘sequence pattern’ (we subtract out 1 because we want to exclude ‘7 forward flips followed by 1 back flip’) and each of these sequences has probability \(p^8 q\) (since we have 7 forward and 1 back, and then 1 more forward at the end). Putting it all together, we get the probability:

\[p^7 + ({8 \choose 7} - 1)p^8 q\]


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#indicator if we have a success
success = rep(0, sims)

#define a simple parameter
p = .9

#run the loop
for(i in 1:sims){
  
  #initialize the page
  page = 3
  
  #count how many back steps we take
  back = 0
  
  #go until we hit page 10
  while(page < 10){
    
    #flip to see if we go up or down
    flip = runif(1)
    
    #go up
    if(flip <= p){
      page = page + 1
    }
    
    #go down
    if(flip > p){
      page = page - 1
      
      #increment count of back steps
      back = back + 1
      
      #if we have more than one back step, break the loop
      if(back > 1){
        break
      }
    }
  }
  
  #see if we got to 10
  if(page == 10){
    success[i] = 1
  }
}

#these should match
p^7 + (choose(8, 7) - 1)*p^8*(1 - p); mean(success)
## [1] 0.7796239
## [1] 0.778




3.19

A binary number is a number expressed in the base-2 numeral system; it only includes the digits 0 and 1. It is perhaps best understood with an example. The binary number…

\[1 \; 0 \; 1 \; 1 \; 0\]

…can easily be written in our standard, decimal (base-10) system, by raising each digit in the binary number to the appropriate value of 2 (based on its location in the string). That is, we convert to decimal with the calculation \(2^5 + 0 + 2^3 + 2^1 + 0 = 42\) (since there is a ‘1’ in the ‘fifth’ spot - that is, five spots from the left - so this marks that we should add \(2^5\), and there is a 0 in the fourth spot, so we don’t have to add anything here).

  1. Consider a binary number with \(n\) digits. How many possible decimal numbers can we express with this binary number?


For the next two parts, consider a binary number with \(n\) digits, where each digit is randomly assigned 0 or 1 with equal possibilities. Let \(X\) be decimal value of this binary number (i.e., the value when we translate it to base-10).

  1. Find the probability that \(X\) is even.

  2. Find the distribution of \(X\).



Analytical Solution:

  1. Each digit has 2 choices (0 or 1) so by the multiplication rule we can express \(2^n\) different numbers.

  2. The only way that \(X\) can be even is if the right-most binary digit is 0 (if it is 0, then we are adding on \(2^0 = 1\)). This has probability 1/2.

  3. Every distinct binary sequence (i.e., 010 is distinct from 001) maps to a unique decimal number. Since each of the \(2^n\) binary numbers maps to exactly one (unique) decimal number between 1 and \(2^n\), the distribution of \(X\) is uniform over \(1,2,...,2^n\) (i.e., has equal probability \(1/2^n\) of taking on any of these values). You can also think of this as ‘order mattering’ in the binary string: that is, 0101 is different from 1100, even though both strings have two 1’s and two 0’s.


Empirical Solution:

#replicate
set.seed(110)
sims = 1000


#define a simple parameter
n = 4


#part a.
#keep track of the permutations; intialize here
perms = 0

#iterate over n; this chooses how many 1's we have
for(i in 0:n){
  
  #add the number of permutations with i 1's and n - i 0's
  perms = perms + dim(unique(permutations(n, n, v = c(rep(1, i), rep(0, n - i)), set = FALSE)))[1]
}

#should get 2^n = 16
perms
## [1] 16
#part b. and c.
#keep track of decimal numbers 
X = rep(NA, sims)

#run the loop
for(i in 1:sims){
  
  #generate a random binary number using a Binomial
  binary = rbinom(n, 1, 1/2)
  
  #convert binary to a string; this is what the function stroi takes
  binary = paste(as.character(binary), collapse = "")
  
  #calculate the binary value using the function 'strtoi'
  X[i] = strtoi(binary, 2)
}


#should be even half of the time
length(X[X%%2 == 0])/sims
## [1] 0.474
#should be uniform from 1 to 2^n = 16
hist(X, main = "X", col = rgb(0,1,0,1/4), xlab = "", breaks = (-1:2^n))







3.20

This riddle is a common interview problem:


A king has imprisoned 10 subjects. He offers the prisoners a strange way that they can play for their release: each day, he will randomly and independently call one of the 10 prisoners in. That prisoner will be allowed to either guess if all of the prisoners have already been called in, or ‘pass’ and return to the dungeon for another day. If a prisoner correctly guesses that all of the prisoners have been called in already, the prisoners are set free; otherwise, they lose.

The prisoners are kept in different cells and are not allowed to communicate except through one strange channel: in the kings’ chamber sits a standard, two-sided coin, and every day, the prisoner that is randomly selected to be summoned may choose to flip the coin over (heads to tails or tails to heads) or simply leave the coin unflipped. The coin starts with heads showing.

The ‘random sampling’ of the king (picking one of the 10 prisoners each day) is truly random and independent across days. The prisoners are allowed to discuss a strategy session before entering their cells. How can they guarantee their escape?


The answer to the riddle is a follows: the prisoners assign one prisoner to be the ‘White Knight’ who simply keeps track of the number of prisoners that have been called. To do this, the prisoners institute a rule: if a prisoner that is not the White Knight is summoned and he sees that the coin shows heads, he flips the coin over to show tails. From there, no other non-White Knight prisoner will flip the coin (it shows tails, not heads) and when the White Knight enters and sees the coin showing tails, he will add one to his count and flip the coin back over to show heads (the process begins again). Once a non-White Knight prisoner has flipped the coin from heads to tails, he doesn’t flip it again, even if he sees heads (to avoid double-counting). When the White Knight has flipped the coin 9 times (he has counted the other 9 prisoners) he can safely tell the king that all of the prisoners have been called.

If the prisoners use this strategy, how long will it take, on average, for them to be freed? How does this compare to the actual average amount of time that it will take for each prisoner to be called in?



Analytical Solution:

First, we will answer how long it will take on average to be freed using this strategy.The total waiting time of the problem can be broken into different ‘waiting segments’; first, we have to wait for a ‘non-White Knight’ prisoner to be called (any of these prisoners will flip the coin). By the story of the First Success distribution, the amount of days that we have to wait for this to happen is distributed \(FS(9/10)\); on every trial, there is a \(9/10\) chance of success (getting one of the 9 ‘non-White Knights’), and we want to count the success (the day that the non-White Knight arrives). This has expected value \(10/9\) (this makes sense; the probability of failure is low, so we expect a small amount of failures).

After the first non-White Knight flips the coin, we need to wait for the White Knight to arrive to flip the coin back. Similarly, this amount of time has distribution \(FS(1/10)\) (on every draw, there is a \(1/10\) probability that we get the White Knight) so on average we have to wait 10 days. We can add this expected value to the expected value of waiting for the first non-White Knight because of linearity.

We continue to iterate in this way: every time that a coin is flipped, we either have to wait for a non-White Knight prisoner (which appear in increasingly smaller numbers) or the White Knight. We get the expectation sums as:

\[\frac{10}{9} + 10 + \frac{10}{8} + 10 + ... + 10 + 10 \approx 118\] So, on average, the prisoners will have to wait 118 days.


We can use a similar approach to find the average amount of time for each prisoner to be called in. The first prisoner that will be called in is guaranteed to be ‘unique’ (has not been called in before). Following that, we have a \(9/10\) probability on every draw that we get a ‘unique’ prisoner (someone other than the prisoner that was already called). The amount of time that we spend waiting for this arrival is distributed \(FS(9/10)\), so it has expectation \(10/9\). Continuing in this way, we have (by linearity) that the expected number of days is:

\[1 + 10/9 + 10/8 + ... + 10 \approx 29\]

Or much less than the average amount of time for the strategy to pan out; but at least the prisoners guarantee their release!


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#keep track of the number of days until freedom and 
#   days until all prisoners have actually entereed
days.free = rep(0, sims)
days.enter = rep(0, sims)


#run the loop
for(i in 1:sims){

  
  #first part
  #initialize prisoners; first, all prisoners are 'unique'
  prisoners = rep(1, 10)
  
  #iterate
  while(TRUE){
    
    #increment
    days.enter[i] = days.enter[i] + 1
    
    #see if we get a unique prisoner (1)
    #   change that prisoner to non-unique if so
    if(sample(prisoners, 1) == 1){
      
      #take off the first entry
      prisoners = prisoners[-1]
      
      #append a non-unique prisoner to the end
      prisoners = c(prisoners, 0)
    }
    
    #see if we have called all prisoners
    if(sum(prisoners) == 0){
      break
    }
  }
  
  
  
  
  #second part
  #iterate over number of eligible prisoners
  for(j in 0:8){
    
    
    #first, wait for the next eligible prisoner (0)
    #   re-initialize prisoners
    prisoners = c(1, rep(1, j), rep(0, 9 - j))
    
    #draw until we get the next eligible prisoner
    while(TRUE){
      
      #increment
      days.free[i] = days.free[i] + 1
      
      #0 means we get an eligible prisoner
      if(sample(prisoners, 1) == 0){
        break
      }
    }
    
    #next, draw until we get the white knight
    while(TRUE){
      
      #increment
      days.free[i] = days.free[i] + 1
      
      #1 means we get the white knight
      if(sample(c(1, rep(0, 9)), 1) == 1){
        break
      }
    }
  }
}


#should get 29 and 118
mean(days.enter)
## [1] 29.083
mean(days)
## [1] 3




BH Problems



The problems in this section are taken from @BH. The questions are reproduced here, and the analytical solutions are freely available online. Here, we will only consider empirical solutions: answers/approximations to these problems using simulations in R.




BH 3.18
  1. In the World Series of baseball, two teams (call them A and B) play a sequence of games against each other, and the first team to win four games wins the series. Let \(p\) be the probability that A wins an individual game, and assume that the games are independent. What is the probability that team A wins the series?
#replicate
set.seed(110)
sims = 1000

#define a simple parameter for P(A wins)
p = 1/2

#the number of games A wins is Bin(7, p), so we generate a r.v.
X = rbinom(sims, 7, p)

#find the probability that A won more than 3 games (i.e., won the series)
#should get p^4 + choose(4,3)*p^4*(1 - p) + choose(5, 3)*p^4*(1 - p)^2 + choose(6, 3)*p^4*(1 - p)^3 = 1/2
length(X[X > 3])/sims
## [1] 0.487
#we could also do a simulation were we stop after one team has won 4 games

#indicator for A winning 4 games
A.wins = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #initialize number of wins for A and B
  A = 0
  B = 0
  
  #go until someone wins
  while(A < 4 && B < 4){
    
    #see if A or B wins
    if(runif(1) <= p){
      A = A + 1
    }
    else{
      B = B + 1
    }
  }
  
  #see if A won, mark it
  if(A == 4){
    A.wins[i] = 1
  }
}

#should get 1/2, like above
mean(A.wins)
## [1] 0.446
  1. Give a clear intuitive explanation of whether the answer to (a) depends on whether the teams always play 7 games (and whoever wins the majority wins the series), or the teams stop playing more games as soon as one team has won 4 games (as is actually the case in practice: once the match is decided, the two teams do not keep playing more games).
#In part a., we simulated both of these cases and got the same result.




BH 3.28

There are \(n\) eggs, each of which hatches a chick with probability \(p\) (independently). Each of these chicks survives with probability \(r\), independently. What is the distribution of the number of chicks that hatch? What is the distribution of the number of chicks that survive? (Give the PMFs; also give the names of the distributions and their parameters, if applicable.)

#replicate
set.seed(110)
sims = 1000

#define simple parameters
n = 10
p = 1/2
r = 1/2

#count how many hatch, and how many survive
hatch = rep(0, sims)
survive = rep(0, sims)


#run the loop
for(i in 1:sims){
  
  #iterate for each of the n eggs
  for(j in 1:n){
    
    #see if the egg hatches
    if(runif(1) < p){
      
      #increment the number of hatches
      hatch[i] = hatch[i] + 1
      
      #see if the chick survives
      if(runif(1) < r){
        
        #incrememnt the number of survivors
        survive[i] = survive[i] + 1  
      }
    }
  }
}


#show that the distributions are the same

hist(hatch, main = "Empirical # of Hatches", xlab = "", col = "black", ylim = c(0, sims))

hist(rbinom(sims, n, p),  main = "Bin(n, p)", xlab = "", col = "firebrick3", ylim = c(0, sims))

hist(survive, main = "Empirical # of Survivors", xlab = "", col = "firebrick3", ylim = c(0, sims))

hist(rbinom(sims, n, p*r),  main = "Bin(n, rp)", xlab = "", col = "black", ylim = c(0, sims))




BH 3.29

A sequence of \(n\) independent experiments is performed. Each experiment is a success with probability \(p\) and a failure with probability \(q=1-p\). Show that conditional on the number of successes, all valid possibilities for the list of outcomes of the experiment are equally likely.

#replicate
set.seed(110)
sims = 1000

#define simple parameters; use n = 2 so we only have two potential orderings
n = 2
p = 1/2

#keep track of the successes
successes = rep(NA, sims)

#indicators the 0,1 and 1,0 outcomes (failure first, success first)
failure.first = rep(0, sims)
success.first = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #draw the outcome
  outcome = rbinom(2, 1, p)
  
  #see if we got a pattern
  if(isTRUE(all.equal(outcome, c(0, 1)))){
    failure.first[i] = 1
  }
  if(isTRUE(all.equal(outcome, c(1, 0)))){
    success.first[i] = 1
  }
  
  #mark the number of successes
  successes[i] = sum(outcome)
}

#conditional on 1 success, should get the same probability for each outcome
mean(success.first[successes == 1])
## [1] 0.4486166
mean(failure.first[successes == 1])
## [1] 0.5513834




BH 3.37

A message is sent over a noisy channel. The message is a sequence \(x_1,x_2, \dots, x_n\) of \(n\) bits (\(x_i \in \{0,1\}\)). Since the channel is noisy, there is a chance that any bit might be corrupted, resulting in an error (a 0 becomes a 1 or vice versa). Assume that the error events are independent. Let \(p\) be the probability that an individual bit has an error (\(0<p<1/2\)). Let \(y_1, y_2, \dots, y_n\) be the received message (so \(y_i = x_i\) if there is no error in that bit, but \(y_i = 1-x_i\) if there is an error there).

To help detect errors, the \(n\)th bit is reserved for a parity check: \(x_n\) is defined to be 0 if \(x_1 + x_2 +\dots + x_{n-1}\) is even, and 1 if \(x_1 + x_2+ \dots + x_{n-1}\) is odd. When the message is received, the recipient checks whether \(y_n\) has the same parity as \(y_1 + y_2+ \dots + y_{n-1}\). If the parity is wrong, the recipient knows that at least one error occurred; otherwise, the recipient assumes that there were no errors.

  1. For \(n=5,p=0.1\), what is the probability that the received message has errors which go undetected?
#replicate
set.seed(110)
sims = 1000

#define parameters
n = 5
p = .1

#indicator if we get an undetected error
error.und = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #create the first four bits in the message
  message = sample(c(1,0), n - 1, replace = TRUE)
  
  #add the last bit (0 if even, 1 if odd)
  if(sum(message)%%2 == 0){
    message = c(message, 0)
  }
  else if(sum(message)%%2 == 1){
    message = c(message, 1)
  }
  
  #generate the message recieved; first, the vector of error probabilities
  error.probs = runif(n)
  message.rec = integer(0)
  
  #count how many errors we had
  errors = 0
  
  #fill in what you receive
  for(j in 1:n){
    
    #see if there was an error
    #no error
    if(error.probs[j] > p){
      message.rec = c(message.rec, message[j])
    }
    
    #error
    if(error.probs[j] < p){
      message.rec = c(message.rec, 1 - message[j])
      
      #increment that we recieved an error
      errors = errors + 1
    }  
  }
  
  #see if there was an error
  if(errors > 0){
    
    #see if the parity of the last message matches the first four
    if(message.rec[n]%%2 == sum(message.rec[1:4])%%2){
      
      #we have an undetected error
      error.und[i] = 1
    }
  }
}

#should get .073
mean(error.und)
## [1] 0.074
  1. For general \(n\) and \(p\), write down an expression (as a sum) for the probability that the received message has errors which go undetected.

  2. Give a simplified expression, not involving a sum of a large number of terms, for the probability that the received message has errors which go undetected.




BH 3.42

Let \(X\) be a random day of the week, coded so that Monday is 1, Tuesday is 2, etc. (so \(X\) takes values 1,2,…,7, with equal probabilities). Let \(Y\) be the next day after \(X\) (again represented as an integer between 1 and 7). Do \(X\) and \(Y\) have the same distribution? What is \(P(X < Y)\)?

#replicate
set.seed(110)
sims = 1000

#draw X and Y
X = sample(1:7, sims, replace = TRUE)
Y = X + 1

#adjust the Y; shift 8 to 1
Y[Y == 8] = 1

#see if X and Y have the same distribution

hist(X, col = "black", main = "X", xlab = "x")

hist(Y, col = "firebrick3", main = "Y", xlab = "y")

#Find P(X < Y), should be 6/7
length(X[X < Y])/sims
## [1] 0.87




BH 4.13

Are there discrete random variables \(X\) and \(Y\) such that \(E(X) > 100E(Y)\) but \(Y\) is greater than \(X\) with probability at least 0.99?

#replicate
set.seed(110)

#increased number of sims; rare events
sims = 5000

#Let Y be Bern(1), so it's always 1, and X be 10000*Bern(.001), so once in a while it is huge
#generate data for x and y
x = sample(c(0,10000), sims, replace = T, prob = c(1 - .001, .001))
y = rep(1, sims)


#E(X) > E(Y)
mean(x)
## [1] 8
mean(y)
## [1] 1
#y is usually greater than x
length(y[y > x])/sims
## [1] 0.9992




BH 4.17

A couple decides to keep having children until they have at least one boy and at least one girl, and then stop. Assume they never have twins, that the “trials” are independent with probability 1/2 of a boy, and that they are fertile enough to keep producing children indefinitely. What is the expected number of children?

#replicate
set.seed(110)
sims = 1000

#keep track of kids
kids = rep(NA, sims)

#run the loop
for(i in 1:sims){
  
  #indicator if we have a girl and boy
  girl = 0
  boy = 0
  
  #keep having kids until we have one of both
  while(girl == 0 || boy == 0){
    
    #have a kid
    kid = runif(1)
    
    #had a boy
    if(kid < 1/2){
      boy = boy + 1
    }
    
    #had a girl
    else if(kid > 1/2){
      girl = girl + 1
    }
  }
  
  #mark how many kids were had
  kids[i] = boy + girl
}

#should be 3
mean(kids)
## [1] 2.996
#looks like (X - 1) where X is Geom(1/2)
#graphics
hist(kids, col = "gray", xlab = "Number of Kids", main = "Empirical Sample")

hist(rgeom(sims, 1/2) - 1, col = "firebrick3", xlab = "Number of Kids", main = "Geom(1/2) - 1")




BH 4.18

A coin is tossed repeatedly until it lands Heads for the first time. Let \(X\) be the number of tosses that are required (including the toss that landed Heads), and let \(p\) be the probability of Heads, so that \(X \sim FS(p)\). Find the CDF of \(X\), and for \(p=1/2\) sketch its graph.

Hint: Recall that a First Success (\(FS\)) distribution is a Geometric where we count the success.

#replicate
set.seed(110)
sims = 1000

#count how many flips it took
flips = rep(NA, sims)

#run the loop
for(i in 1:sims){
  
  #go until we have a heads
  heads = 0
  tails = 0
  
  #loop until we get a heads, then break
  while(heads == 0){
    
    #flip the coin
    flip = runif(1)
    
    #got a H
    if(flip > 1/2){
      heads = heads + 1
    }
    
    #got a T
    else if(flip < 1/2){
      tails = tails + 1
    }
  }
  
  #count how many flips we get
  flips[i] = tails + heads
}

#mean should be 2
mean(flips)
## [1] 1.982
#ecdf plots a function in R
plot(ecdf(flips), main = "CDF of Flips", ylab = "P(X <= x)", xlab = "x")




BH 4.20

Let \(X \sim \textrm{Bin}(n,\frac{1}{2})\) and \(Y \sim \textrm{Bin}(n+1,\frac{1}{2})\), independently. (This problem has been revised from that in the first printing of the book, to avoid overlap with Exercise 3.25.)

  1. Let \(V = \min(X,Y)\) be the smaller of \(X\) and \(Y\), and let \(W = \max(X,Y)\) be the larger of \(X\) and \(Y\). So if \(X\) crystallizes to \(x\) and \(Y\) crystallizes to \(y\), then \(V\) crystallizes to \(\min(x,y)\) and \(W\) crystallizes to \(\max(x,y)\). Find \(E(V) + E(W)\).

  2. Show that \(E|X-Y| = E(W)-E(V),\) with notation as in (a).

  3. Compute \(Var(n-X)\) in two different ways.

#replicate
set.seed(110)
sims = 1000

#define a simple parameter
n = 10

#generate the r.v.'s
x = rbinom(sims, n, 1/2)
y = rbinom(sims, n + 1, 1/2)

#keep track of the mins and maxes
v = rep(NA, sims)
w = rep(NA, sims)

#run the loop
for(i in 1:sims){
  v[i] = min(x[i], y[i])
  w[i] = max(x[i], y[i])
}

#part a., should be n + 1/2 = 10.5
mean(v) + mean(w)
## [1] 10.412
#part b., these must be equal
mean(abs(x - y))
## [1] 1.94
mean(w) - mean(v)
## [1] 1.94
#part c., should be 10/4 = 2.5
var(n - x)
## [1] 2.403082




BH 4.22

Alice and Bob have just met, and wonder whether they have a mutual friend. Each has 50 friends, out of 1000 other people who live in their town. They think that it’s unlikely that they have a friend in common, saying “each of us is only friends with 5% of the people here, so it would be very unlikely that our two 5%’s overlap.”

Assume that Alice’s 50 friends are a random sample of the 1000 people (equally likely to be any 50 of the 1000), and similarly for Bob. Also assume that knowing who Alice’s friends are gives no information about who Bob’s friends are.

  1. Compute the expected number of mutual friends Alice and Bob have.

  2. Let \(X\) be the number of mutual friends they have. Find the PMF of \(X\).

  3. Is the distribution of \(X\) one of the important distributions we have looked at? If so, which?

#replicate
set.seed(110)
sims = 1000

#count number of mutual friends
mutual = rep(NA, sims)

for(i in 1:sims){
  
  #sample both of their friends
  A = sample(1:1000, 50, replace = F)
  B = sample(1:1000, 50, replace = F)
  
  #count the overlap
  mutual[i] = length(intersect(A, B))
}

#part a., should be 2.5
mean(mutual)
## [1] 2.538
#histograms should match
#empirical
hist(mutual, col = "gray", main = "Empirical PMF of Mutual Friends",
     xlab = "Mutual Friends", freq = F)

#analytical PMF
hist(rhyper(sims, 50, 50, 1:10), col = "firebrick3", main = "PMF of a HGeom",
     freq = F, xlab = "")




BH 4.24

Calvin and Hobbes play a match consisting of a series of games, where Calvin has probability \(p\) of winning each game (independently). They play with a “win by two” rule: the first player to win two games more than his opponent wins the match. Find the expected number of games played.

#replicate
set.seed(110)
sims = 1000

#try different values for p
p = seq(from = 0, to = 1, by = .1)

#keep track of th means for each p
means = rep(NA, length(p))

#iterate over p
for(j in 1:length(p)){
  
  #keep track of games played
  games = rep(NA, sims)
  
  #run the loop
  for(i in 1:sims){
    
    #both start with 0 wins, and 0 games overall
    calvin = 0
    hobbes = 0
    count = 0
    
    #play until one is up by more than 2
    while(abs(calvin - hobbes) < 2){
      
      #flip to see who wins
      flip = runif(1)
      
      #calvin won
      if(flip <= p[j]){
        calvin = calvin + 1
      }
      
      #hobbes won
      else if(flip > p[j]){
        hobbes = hobbes + 1
      }
      
      #increment
      count = count + 1
  
    }
    
    #count how many games were played
    games[i] = count
  }
  
  #take the mean
  means[j] = mean(games)
}

#at p = .5, this should be 2/(2*(.5^2)) = 4
means[p == .5]
## [1] 4.152
plot(p, means, main = "Expected # of Games for different p", 
     ylab = "Average # of Games",
     xlab = "P(Calvin wins a random game)", col = "red", pch = 16)




BH 4.26

Let \(X\) and \(Y\) be \(Pois(\lambda)\) r.v.s, and \(T = X + Y\). Suppose that \(X\) and \(Y\) are not independent, and in fact \(X=Y\). Prove or disprove the claim that \(T \sim {Pois}(2\lambda)\) in this scenario.

#replicate
set.seed(110)
sims = 1000

#set parameter
lambda = 1

#generate the r.v.'s (use Z instead of T)
X = rpois(sims, lambda)
Y = rpois(sims, lambda)
Z = X + Y

#should not look like a Pois(2);
#notice how T can't take on odd values
hist(Z, col = "gray", main = "T", ylab = "")

hist(rpois(sims, 2*lambda), col = "firebrick3", main = "Pois(2)",
     ylab = "")




BH 4.29

A discrete distribution has the memoryless property if for \(X\) a random variable with that distribution, \(P(X \geq j + k | X \geq j) = P(X \geq k)\) for all nonnegative integers \(j,k\).

  1. If \(X\) has a memoryless distribution with CDF \(F\) and PMF \(p_i = P(X=i)\), find an expression for \(P(X \geq j+k)\) in terms of \(F(j), F(k), p_j,p_k.\)

  2. Name a discrete distribution which has the memoryless property. Justify your answer with a clear interpretation in words or with a computation.

#replicate
set.seed(110)
sims = 1000

#we will try part b; show that the geometric is memoryless,
#   and the binomial is not
#define simple parameters
p = .5
n = 10

#generate the r.v.'s
geoms = rgeom(sims, p)
binoms = rbinom(sims, n, p)


#these should all be close, by memorylessness
mean(geoms[geoms > 0])
## [1] 1.93617
mean(geoms[geoms > 1]) - 1
## [1] 1.913043
mean(geoms[geoms > 2]) - 2
## [1] 1.810345
#these may not be close
mean(binoms[binoms > 0])
## [1] 4.975976
mean(binoms[binoms > 1]) - 1
## [1] 4.028398
mean(binoms[binoms > 2]) - 2
## [1] 3.186766




BH 4.30

Randomly, \(k\) distinguishable balls are placed into \(n\) distinguishable boxes, with all possibilities equally likely. Find the expected number of empty boxes.

#replicate
set.seed(110)
sims = 1000

#define simple parameters
k = 5
n = 10

#count how many empty boxes we have
empties = rep(NA, sims)

#run the loop
for(i in 1:sims){
  
  #sample which boxes to put the objects into
  boxes = sample(1:n, k, replace = T)
  
  #count the number of empty boxes
  empties[i] = n - length(unique(boxes))
  
}

#should be n*(1 - 1/n)^k = 5.9
mean(empties)
## [1] 5.898




BH 4.31

A group of \(50\) people are comparing their birthdays (as usual, assume their birthdays are independent, are not February 29, etc.). Find the expected number of pairs of people with the same birthday, and the expected number of days in the year on which at least two of these people were born.

#replicate
set.seed(110)
sims = 1000


#set the number of people
n = 50

#store number of pairs each time, days with bdays
pairs = rep(NA, sims)
days.bdays = rep(NA, sims)

#run the loop
for(i in 1:sims){

  
  #sample the birthdays
  days = sample(1:365, n, replace = T)
  
  #get the overlaps
  overlaps = sort(as.vector(table(days)), decreasing = T)
  
  #drop when we just have 1 per day
  overlaps = overlaps[overlaps != 1]
  
  #add up all of the pairs we have
  pairs[i] = sum(choose(overlaps, 2))
  
  #count how many days have bdays
  days.bdays[i] = length(overlaps)
  
}

#should be choose(50,2)/365 = 3.35
mean(pairs)
## [1] 3.31
#should be 365*(1 - (364/365)^50 - (50/365)*(364/365)^49) = 3.07
mean(days.bdays)
## [1] 3.038




BH 4.32

A group of \(n \geq 4\) people are comparing their birthdays (as usual, assume their birthdays are independent, are not February 29, etc.). Let \(I_{ij}\) be the indicator r.v. of \(i\) and \(j\) having the same birthday (for \(i<j\)). Is \(I_{12}\) independent of \(I_{34}\)? Is \(I_{12}\) independent of \(I_{13}\)? Are the \(I_{ij}\) independent?

#replicate
set.seed(110)
sims = 1000

#for simpllicity/computational ease, assume small n and only 10 days in a year
n = 4
days = 10

#keep track of I_{1,2}, I_{2,3}, I_{3, 4}
#   where I_{i, j} is the indicator that i and j share a birthday
I12 = rep(0, sims)
I23 = rep(0, sims)
I13 = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #draw the birthdays for the n people
  bdays = sample(days, n, replace = TRUE)
  
  #see if we got the indicators
  if(bdays[1] == bdays[2]){
    I12[i] = 1
  }
  if(bdays[2] == bdays[3]){
    I23[i] = 1
  }
  if(bdays[1] == bdays[3]){
    I13[i] = 1
  }
}

#non-overlapping indicators are pairwise independent; the means are the 
#   same; this doesn't rigourously prove independence, but gives
#   an intuition
mean(I12); mean(I12[I23 == 1])
## [1] 0.091
## [1] 0.05882353
#indicators are not independent because of this 'forced' case (1 must match 2)
mean(I12); mean(I12[I23 == 1 & I13 == 1])
## [1] 0.091
## [1] 1




BH 4.33

A total of \(20\) bags of Haribo gummi bears are randomly distributed to 20 students. Each bag is obtained by a random student, and the outcomes of who gets which bag are independent. Find the average number of bags of gummi bears that the first three students get in total, and find the average number of students who get at least one bag.

#replicate
set.seed(110)
sims = 1000

#count how many first three get, how many get at least one
first.three = rep(NA, sims)
at.least.one = rep(NA, sims)


#run the loop
for(i in 1:sims){
  
  
  #assign the bags to the student; sample a student and give them a bag
  bags = sample(1:20, 20, replace = T)
  
  #count how many the first 3 get
  first.three[i] = length(bags[bags == 1 | bags == 2 | bags == 3])
  
  #count how many get at least one
  at.least.one[i] = length(unique(bags))
}

#should get 3
mean(first.three)
## [1] 3.013
#should be 20*(1 - (19/20)^20) = 12.83
mean(at.least.one)
## [1] 12.813




BH 4.40

There are 100 shoelaces in a box. At each stage, you pick two random ends and tie them together. Either this results in a longer shoelace (if the two ends came from different pieces), or it results in a loop (if the two ends came from the same piece). What are the expected number of steps until everything is in loops, and the expected number of loops after everything is in loops? (This is a famous interview problem; leave the latter answer as a sum.)

#replicate
set.seed(110)

#decreased number of sims; complex simulation
sims = 100

#keep track of steps and loops
steps = rep(0, sims)
loops = rep(0, sims)

#run the loop
for(i in 1:sims){

  #keep track of ends 
  #each end has three characteristics: what lace it is, what side it's on (left or right), and a unique ID
  ends = matrix(0, nrow = 200, ncol = 3)
  
  #number the ends
  ends[1:100, 1] = 1:100
  ends[101:200, 1] = 1:100
  
  #determine which side
  ends[1:100, 2] = rep("R", 100)
  ends[101:200, 2] = rep("L", 100)
  
  #give each end a unique ID
  ends[1:200, 3] = 1:200
  
  #form a data frame
  ends = data.frame(ends)
  colnames(ends) = c("Lace", "Side", "ID")
  

  #go until we have no laces left (only loops)
  while(length(ends$ID) > 0){
    
    #pick up two random ends, can select from all of the ends
    pick1 = sample(ends$ID, 1)
    
    #mark which end we have (convert from iD to lace number)
    end1 = ends$Lace[ends$ID == pick1]
    
    #sample the other end; anything but the one we just picked!
    pick2 = sample(ends$ID[ends$ID != pick1], 1)
    
    #corner case, 'sample' doesn't like when we have vector of length 1
    if(length(ends$ID[ends$ID != pick1]) == 1){
        right = ends$ID[ends$ID != pick1]
    }
    
    
    #mark which end we have (convert from ID to lace number)
    end2 = ends$Lace[ends$ID == pick2]
  
    
    #if equal, we got a loop! Take it out of the lace pile.
    if(end1 == end2){
      
      #add one to our loop count
      loops[i] = loops[i] + 1
      
    }
    
    #take out one of the laces from the pile
    #  this is arbitrary; if end1 == end2, we are taking out a loop
    #  if end1 != end2, this simulates tying lace 1 to lace 2 (and thus getting rid of lace 1)
    ends = ends[ends$Lace != end1, ]
  
    #increment number of steps
    steps[i] = steps[i] + 1
  }
}

#should get 
#n = 1:2
#sum(1/(2*n - 1)) = 3.28
mean(loops)
## [1] 3.3
#should get 100
mean(steps)
## [1] 100




BH 4.44

Let \(X\) be Hypergeometric with parameters \(w,b,n\).

  1. Find \(E {X \choose 2}\) by thinking, without any complicated calculations.

  2. Use (a) to find the variance of \(X\). You should get \[Var(X) = \frac{N-n}{N-1} npq,\] where \(N=w+b, p =w/N, q = 1-p\).

#replicate
set.seed(110)

sims = 1000

#define simple parameters
w = 10
b = 10
n = 5

#generate the r.v.
x = rhyper(sims, w, b, n)

#should get choose(n, 2)*(w/(w + b))*((w - 1)/(w + b - 1)) = 2.36
mean(choose(x, 2))
## [1] 2.247
#should get n*(w/(w + b))*(1 - (w/(w + b)))*(w + b - n)/(w + b - 1) = .986
var(x)
## [1] 0.9619369




BH 4.47

A hash table is being used to store the phone numbers of \(k\) people, storing each person’s phone number in a uniformly random location, represented by an integer between \(1\) and \(n\) (see Exercise 25 from Chapter 1 for a description of hash tables). Find the expected number of locations with no phone numbers stored, the expected number with exactly one phone number, and the expected number with more than one phone number (should these quantities add up to \(n\)?)

#replicate
set.seed(110)
sims = 1000

#define k and n, use simple examples
k = 5
n = 10

#count how many locations have no numbers, exactly one number, and more than one
none = rep(NA, sims)
one = rep(NA, sims)
multiple = rep(NA, sims)

#run the loop
for(i in 1:sims){
  
  #put the people's numbers down (k numbers into n spots)
  slots = sample(1:n, k, replace = T)
  
  #count how many have none
  none[i] = n - length(unique(slots))
  
  #count how many have just 1
  one[i] = length(as.vector(table(slots))[as.vector(table(slots)) == 1])
  
  #count how many have more than 1
  multiple[i] = length(as.vector(table(slots))[as.vector(table(slots)) > 1])
}

#should get n*(1 - 1/n)^k = 5.90
mean(none)
## [1] 5.898
#should get k*(1 - 1/n)^(k - 1) = 3.28
mean(one)
## [1] 3.291
#should get n - n*(1 - 1/n)^k - k*(1 - 1/n)^(k - 1) = .815
mean(multiple)
## [1] 0.811




BH 4.50

Consider the following algorithm, known as bubble sort, for sorting a list of \(n\) distinct numbers into increasing order. Initially they are in a random order, with all orders equally likely. The algorithm compares the numbers in positions 1 and 2, and swaps them if needed, then it compares the new numbers in positions 2 and 3, and swaps them if needed, etc., until it has gone through the whole list. Call this one ``sweep" through the list. After the first sweep, the largest number is at the end, so the second sweep (if needed) only needs to work with the first \(n-1\) positions. Similarly, the third sweep (if needed) only needs to work with the first \(n-2\) positions, etc. Sweeps are performed until \(n-1\) sweeps have been completed or there is a swapless sweep.

For example, if the initial list is 53241 (omitting commas), then the following 4 sweeps are performed to sort the list, with a total of 10 comparisons: \[ 53241 \to 35241 \to 32541 \to 32451 \to 32415.\] \[ 32415 \to 23415 \to 23415 \to 23145.\] \[ 23145 \to 23145 \to 21345.\] \[ 21345 \to 12345.\]

  1. An inversion is a pair of numbers that are out of order (e.g., 12345 has no inversions, while 53241 has 8 inversions). Find the expected number of inversions in the original list.
#replicate
set.seed(110)
sims = 1000

#create a function to count inversions
invert <- function(x){
  
  #see how long a vector is
  m = length(x)
  
  #keep track of inversions; initialize here
  inv = 0
  
  #iterate over each letter
  for(i in 1:(m - 1)){
    
    #iterate from each letter to the end of the string
    for(j in (i + 1):m){
      
      #see if there is an inversion
      if(x[i] > x[j]){
        inv = inv + 1
      }
    }
  }
  
  #return the number of inversions
  return(inv)
}

#define a simple parameter
n = 5

#keep track of inversions
inversions = rep(NA, sims)

#run the loop
for(i in 1:sims){
  #find the inversions for a random list
  inversions[i] = invert(sample(1:n))
}

#should get (1/2)*choose(n, 2) = 5
mean(inversions)
## [1] 4.997
  1. Show that the expected number of comparisons is between \(\frac{1}{2} {n \choose 2}\) and \({n \choose 2}.\)
#create a function that bubble sorts and counts the number of comparison
bubble <- function(x){
  
  #find the length of the string
  m = length(x)
  
  #keep track of the comparisons
  comps = 0

  #do the sweeps
  for(i in 1:(m - 1)){
    
    #count the number of swaps
    swaps = 0
    
    #do the swaps
    for(j in 1:(n - i)){
      
      #increment comparisons
      comps = comps + 1
      
      #swap the two values if necessary
      if(x[j] > x[j + 1]){
        
        #swap the values
        storage = x[j]
        x[j] = x[j + 1]
        x[j + 1] = storage
        
        #increment
        swaps = swaps + 1
      }
    }
    
    #if we had a swapless sweep, we are done
    if(swaps == 0){
        break
    }
  }
  
  #return the number of comparisons
  return(comps)
}

#define a simple parameter
n = 5

#keep track of comparisons
comparisons = rep(NA, sims)

#run the loop
for(i in 1:sims){
  #find the comparisons for a random list
  comparisons[i] = bubble(sample(1:n))
}

#should get in between (1/2)*choose(n, 2) = 5 and choose(n, 2) = 10
mean(comparisons)
## [1] 9.237




BH 4.52

An urn contains red, green, and blue balls. Balls are chosen randomly with replacement (each time, the color is noted and then the ball is put back). Let \(r,g,b\) be the probabilities of drawing a red, green, blue ball, respectively (\(r+g+b=1\)).

  1. Find the expected number of balls chosen before obtaining the first red ball, not including the red ball itself.

  2. Find the expected number of different colors of balls obtained before getting the first red ball.

  3. Find the probability that at least 2 of \(n\) balls drawn are red, given that at least 1 is red.

#replicate
set.seed(110)
sims = 1000

#define parameters, use simple case
r = 1/3
b = 1/3
g = 1/3
n = 10

#number of balls before red card
pre.red = rep(0, sims)

#number of different colors before red card
pre.red.colors = rep(0, sims)

#counts number of red balls in n draws
red.balls = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #reset from last loop
  green = 0
  blue = 0
  
  #flip until we get a red ball
  ball = sample(c("R", "B", "G"), 1, prob = c(r, b, g))
  
  #go until we get a red ball
  while(ball != "R"){
    
    #we picked a non-red; count it!
    pre.red[i] = pre.red[i] + 1
    
    #count if we picked a green ball or blue ball
    if(ball == "G"){
      green = 1
    }
    if(ball == "B"){
      blue = 1
    }
    
    #draw another ball
    ball = sample(c("R", "B", "G"), 1, prob = c(r, b, g))
  }
  
  #count how many other colors we got
  pre.red.colors[i] = green + blue
  
  
  #now do a separate drawing to see if we get at least 2 reds
  balls = sample(c("R", "B", "G"), 10, replace = T, prob = c(r, b, g))
  red.balls[i] = length(balls[balls == "R"])
}

#part a.  Should get (1 - r)/r = 2
mean(pre.red)
## [1] 2.05
#part b.  Should get g/(g + r) + b/(b + r) = 1
mean(pre.red.colors)
## [1] 1.03
#part c. (1 - (1 - r)^n - n*r*(1 - r)^(n - 1))/(1 - (1 - r)^n) = .911
length(red.balls[red.balls > 1])/length(red.balls[red.balls > 0])
## [1] 0.8960245




BH 4.53

Job candidates \(C_1,C_2,...\) are interviewed one by one, and the interviewer compares them and keeps an updated list of rankings (if \(n\) candidates have been interviewed so far, this is a list of the \(n\) candidates, from best to worst). Assume that there is no limit on the number of candidates available, that for any \(n\) the candidates \(C_1,C_2,...,C_n\) are equally likely to arrive in any order, and that there are no ties in the rankings given by the interview.

Let \(X\) be the index of the first candidate to come along who ranks as better than the very first candidate \(C_1\) (so \(C_X\) is better than \(C_1\), but the candidates after \(1\) but prior to \(X\) (if any) are worse than \(C_1\). For example, if \(C_2\) and \(C_3\) are worse than \(C_1\) but \(C_4\) is better than \(C_1\), then \(X = 4\). All \(4!\) orderings of the first 4 candidates are equally likely, so it could have happened that the first candidate was the best out of the first 4 candidates, in which case \(X > 4\).

What is \(E(X)\) (which is a measure of how long, on average, the interviewer needs to wait to find someone better than the very first candidate)?

Hint: Find \(P(X>n)\) by interpreting what \(X>n\) says about how \(C_1\) compares with other candidates, and then apply the result of Theorem 4.4.8.

#replicate
set.seed(110)
sims = 1000

#count the number of candidates until we find one better than the first
cands = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #gauge the 'skill' of the first candidate by drawing from Unif(0, 1)
  #this holds with the assumption of the problem; by symmetry, any
  #   ordering of multiple candidates is equally likely
  c1 = runif(1)
  
  #draw 'skill' for new candidates
  new.cand = 0
  
  #go until we get a better candidate, than stop
  while(new.cand < c1){
    
    new.cand = runif(1)
    cands[i] = 1 + cands[i]
    
  }
}

#analytically, the solution is infinity
#in the real world, we obviously don't see infinity, which is why the mean is finite
#If you investigate the 'cands' vector, though, you will see that there are many small values 
#  and a couple of extremely large ones
mean(cands)
## [1] 7.868




BH 4.62

Law school courses often have assigned seating to facilitate the Socratic method. Suppose that there are 100 first-year law students, and each takes the same two courses: Torts and Contracts. Both are held in the same lecture hall (which has 100 seats), and the seating is uniformly random and independent for the two courses.

  1. Find the probability that no one has the same seat for both courses (exactly; you should leave your answer as a sum).

  2. Find a simple but accurate approximation to the probability that no one has the same seat for both courses.

  3. Find a simple but accurate approximation to the probability that at least two students have the same seat for both courses.

#replicate
set.seed(110)
sims = 1000

#part a.
#indicator for no one having the same seat
same.seat = rep(0, sims)

for(i in 1:sims){
  
  #see if any person got the same seat twice
  torts = sample(1:100)
  contract = sample(1:100)
  
  #if person given same seat, ratio would be 1
  ratio = torts/contract
  
  if(length(ratio[ratio == 1]) == 0){
    same.seat[i] = 1
  }
}

#should get .37
mean(same.seat)
## [1] 0.373
#part b.
#let X ~ Pois(lambda), where lambda = n*p, n = 100, p = 1/100, so X ~ Pois(1)
X = rpois(sims, 1)

#should get .37
length(X[X == 0])/sims
## [1] 0.36
#part c.
#same set-up as b.
X = rpois(sims, 1)

#should get .26
length(X[X > 1])/sims
## [1] 0.265




BH 4.63

A group of \(n\) people play ``Secret Santa" as follows: each puts his or her name on a slip of paper in a hat, picks a name randomly from the hat (without replacement), and then buys a gift for that person. Unfortunately, they overlook the possibility of drawing one’s own name, so some may have to buy gifts for themselves (on the bright side, some may like self-selected gifts better). Assume \(n \geq 2\).

  1. Find the expected value of the number \(X\) of people who pick their own names.

  2. Find the expected number of pairs of people, \(A\) and \(B\), such that \(A\) picks \(B\)’s name and \(B\) picks \(A\)’s name (where \(A \neq B\) and order doesn’t matter).

  3. Let \(X\) be the number of people who pick their own names. What is the approximate distribution of \(X\) if \(n\) is large (specify the parameter value or values)? What does \(P(X = 0)\) converge to as \(n \to \infty\)?

#replicate
set.seed(110)
sims = 1000

#count how many people get themselves
matches = rep(0, sims)

#vector of the people (each one assigned a number)
people = 1:100

#run the loop
for(i in 1:sims){
  
  #assign a person to each gift-giver
  gifts = sample(1:100)
  
  #if person given same seat, ratio would be 1
  ratio = gifts/people
  
  #count the number of matches
  matches[i] = length(ratio[ratio == 1])
}

#should get 1
mean(matches)
## [1] 0.992
#count how many pairs there are
pairs = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #number people 1 to 100
  people = 1:100
  
  #assign people to gift-givers
  gifts = sample(1:100)
  
  #iterate through people, check if we have a pair
  for(j in 1:100){
    
    #see if the person we are giving the gift to is also giving us a gift (and they are a different person)
    if(gifts[j] != j && gifts[gifts[j]] == j){
      pairs[i] = pairs[i] + 1
    }  
  }
  

  #we overcount (count twice for each pair) so divide out
  pairs[i] = pairs[i]/2
}
mean(pairs)
## [1] 0.466
#X is approximately Pois(lambda = 1), since we saw the expected number was 1
#find P(X = 0) for large n
X = rpois(sims, 1)

#should be exp(-1) = .367
length(X[X == 0])/sims
## [1] 0.36




BH 4.65

Ten million people enter a certain lottery. For each person, the chance of winning is one in ten million, independently.

  1. Find a simple, good approximation for the PMF of the number of people who win the lottery.

  2. Congratulations! You won the lottery. However, there may be other winners. Assume now that the number of winners other than you is \(W \sim Pois(1)\), and that if there is more than one winner, then the prize is awarded to one randomly chosen winner. Given this information, find the probability that you win the prize (simplify).

#replicate
set.seed(110)
sims = 1000

#part b.
#generate the r.v.
W = rpois(sims, lambda = 1)

#indicators of you winning the prize
win = rep(0, sims)

#run the loop over W
for(i in 1:sims){
  
  #draw a winner (we are number 1)
  draw = sample(1 + W[i], 1)
  if(draw == 1){
    win[i] = 1
  }
}

#should get 1 - exp(-1) = .632
mean(win)
## [1] 0.647




BH 4.75

A group of 360 people is going to be split into 120 teams of 3 (where the order of teams and the order within a team don’t matter).

  1. How many ways are there to do this?
#for simplicity and speed, consider splitting 6 people into 3 teams of 3

#label people 1 to 9
people = 1:6

#permute the people
perms = permutations(n = 6, r = 6, v = people)

#define the first three people as the first team, etc.
#sort within teams with the 'apply' function, then transpose
#   with the t function to get back to original structure
first.team = t(apply(perms[, 1:3], 1, function(x) sort(x)))
second.team = t(apply(perms[, 4:6], 1, function(x) sort(x)))

#bind the teams back together
teams = cbind(first.team, second.team)

#count the unique teams, divide by two because order doesn't matter when comparing
#   first and second teams
#should get factorial(6)/(factorial(3)^2*factorial(2)) = 10
dim(unique(teams))[1]/2
## [1] 10
  1. The group consists of 180 married couples. A random split into teams of 3 is chosen, with all possible splits equally likely. Find the expected number of teams containing married couples.
#re-label people; the first couple are both labeled 1, etc.
#   this computation is faster, so we can use more peopl (45 couples, 90 people total)
people = rep(1:45, 2)

#keep track of teams with a married couple
married = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #create the teams
  teams = matrix(sample(people), ncol = 3, nrow = length(people)/3)
  
  #iterate through the teams
  for(j in 1:nrow(teams)){
    
    #mark if we have a married couple (2 unique values)
    if(length(unique(teams[j, ])) < 3){
      married[i] = married[i] + 1
    }
  }
}

#should get 30*(45*88/choose(90, 3)) = 1.011
mean(married)
## [1] 1.1




BH 4.76

The gambler de M’er’e asked Pascal whether it is more likely to get at least one six in 4 rolls of a die, or to get at least one double-six in 24 rolls of a pair of dice. Continuing this pattern, suppose that a group of \(n\) fair dice is rolled \(4 \cdot 6^{n-1}\) times.

  1. Find the expected number of times that “all sixes” is achieved (i.e., how often among the \(4 \cdot 6^{n-1}\) rolls it happens that all \(n\) dice land \(6\) simultaneously).
#replicate
set.seed(110)
sims = 1000

#count how many 'all sixes' we get
all.sixes = rep(0, sims)

#set a simple parameter
n = 3

#run the loop
for(i in 1:sims){
  
  #roll the specified number of times
  for(j in 1:(4*6^(n - 1))){
    
    #roll the dice
    rolls = sample(1:6, n, replace = TRUE)
    rolls
    #see if we got all sixes
    if(length(rolls[rolls == 6]) == n){
      all.sixes[i] = all.sixes[i] + 1
    } 
  }
}

#should get 2/3
mean(all.sixes)
## [1] 0.649
  1. Give a simple but accurate approximation of the probability of having at least one occurrence of ``all sixes“, for \(n\) large (in terms of \(e\) but not \(n\)).
1 - dpois(0, lambda = 2/3)
## [1] 0.4865829
  1. de M’er’e finds it tedious to re-roll so many dice. So after one normal roll of the \(n\) dice, in going from one roll to the next, with probability 6/7 he leaves the dice in the same configuration and with probability \(1/7\) he re-rolls. For example, if \(n=3\) and the \(7\)th roll is \((3,1,4)\), then \(6/7\) of the time the \(8\)th roll remains \((3,1,4)\) and \(1/7\) of the time the \(8\)th roll is a new random outcome. Does the expected number of times that “all sixes” is achieved stay the same, increase, or decrease (compared with (a))? Give a short but clear explanation.
#count how many matches we get
matches = rep(0, sims)

#set a simple parameter
n = 3

#do the first roll
roll = sample(1:6, 3, replace = T)

#run the loop
for(i in 1:sims){
  
  #roll the specified number of times
  for(j in 1:(4*6^(n - 1))){
    
    #see if we're going to roll again
    draw = runif(1)
    
    #we will roll again 1/7 of the time
    if(draw < 1/7){
      rolls = sample(1:6, n, replace = T)
    }
    
    #see if we got all sixes
    if(length(rolls[rolls == 6]) == n){
      matches[i] = matches[i] + 1
    } 
  }
}

#answer should stay the same at 2/3
mean(matches)
## [1] 0.691




BH 4.77

Five people have just won a $100 prize, and are deciding how to divide the $100 up between them. Assume that whole dollars are used, not cents. Also, for example, giving $50 to the first person and $10 to the second is different from vice versa.

  1. How many ways are there to divide up the $100, such that each gets at least $10?

  2. Assume that the $100 is randomly divided up, with all of the possible allocations counted in (a) equally likely. Find the expected amount of money that the first person receives.

#solve part b.

#replicate
set.seed(110)
sims = 1000

#keep track of winnings
winnings = rep(NA, sims)

#run the loop
for(i in 1:sims){
  
  #randomly draw weights for the 5 people
  weights = runif(5)

  #normalize the weights and assign the money based on the weights
  weights = weights/sum(weights)
  
  #see what the first person got (their percentile of 50, each person already has 10)
  winnings[i] = 10 + weights[1]*50
}

#should get 20
mean(winnings)
## [1] 20.27485
  1. Let \(A_{j}\) be the event that the \(j\)th person receives more than the first person (for \(2 \leq j \leq 5\)), when the $100 is randomly allocated as in (b). Are \(A_2\) and \(A_3\) independent?
#replicate
set.seed(110)
sims = 1000

#keep track of winnings for first three people
winnings1 = rep(NA, sims)
winnings2 = rep(NA, sims)
winnings3 = rep(NA, sims)

#run the loop
for(i in 1:sims){

  #randomly draw weights for the 5 people
  weights = runif(5)

  #normalize the weights and assign the money based on the weights
  weights = weights/sum(weights)
  
  #see what each person got (their percentile of 50, each person already has 10)
  winnings1[i] = 10 + weights[1]*50
  winnings2[i] = 10 + weights[2]*50
  winnings3[i] = 10 + weights[3]*50
}

#the latter should be larger
length(winnings1[winnings1 < winnings2])/sims
## [1] 0.475
length(winnings1[winnings1 < winnings2 & winnings1 < winnings3])/length(winnings1[winnings1 < winnings3])
## [1] 0.6308316




BH 4.78

Joe’s iPod has 500 different songs, consisting of 50 albums of 10 songs each. He listens to 11 random songs on his iPod, with all songs equally likely and chosen independently (so repetitions may occur).

  1. What is the PMF of how many of the 11 songs are from his favorite album?
#replicate
set.seed(110)
sims = 1000

#mark the songs, 1 if they are from his favorite album, 0 otherwise
songs = c(rep(1, 10), rep(0, 49*10))

#count how many songs are from his favorite album
songs.fav = rep(NA, sims)

#run the loop
for(i in 1:sims){
  
  #draw the songs, see how many are from his favorite
  songs.fav[i] = sum(sample(songs, 11, replace = F))
}


#plot both PMFs, empirical and analytical
plot(table(songs.fav)/sims, type = "l", 
     col = "black", lwd = 4, xlab = "k = # of Favorite Songs",
     ylab = "P(X = k)", main = "PMF")

lines(0:5, dbinom(0:5, size = 11, prob = 1/50), type = "p", col = "red", pch = 20)

legend("topright", legend = c("Empirical PMF", "Analytical PMF"),
       lty=c(1,20), lwd=c(2.5,2.5),
       col=c("black", "red"))

  1. What is the probability that there are 2 (or more) songs from the same album among the 11 songs he listens to?
#replicate
set.seed(110)
sims = 1000

#mark the songs, numbered by album
songs = c(rep(1:50, 10))

#indicator if we get 2 or more songs from the same album
matches = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #draw the songs, see how many are from his favorite
  playlist = sample(songs, 11, replace = F)

  #see if we got a match
  if(length(unique(playlist)) < 11){
    matches[i] = 1
  }
}

#should get 1 - factorial(49)/(factorial(39)*50^10) = .695
mean(matches)
## [1] 0.656
  1. A pair of songs is a match if they are from the same album. If, say, the 1st, 3rd, and 7th songs are all from the same album, this counts as 3 matches. Among the 11 songs he listens to, how many matches are there on average?
#replicate
set.seed(110)
sims = 1000

#mark the songs, numbered by album
songs = c(rep(1:50, 10))

#count how many matches we get
matches = rep(NA, sims)

#run the loop
for(i in 1:sims){
  
  #draw the songs, see how many are from his favorite
  playlist = sample(songs, 11, replace = FALSE)

  #see how many overlaps we get
  counts = rev(sort(as.vector(table(playlist))))
    
  #sum up how many pairs we have
  matches[i] = sum(choose(counts, 2))
}

#should get 1.1
mean(matches)
## [1] 0.986




BH 4.79

In each day that the Mass Cash lottery is run in Massachusetts, 5 of the integers from 1 to 35 are chosen (randomly and without replacement).

  1. When playing this lottery, find the probability of guessing exactly 3 numbers right, given that you guess at least 1 of the numbers right.
#replicate
set.seed(110)
sims = 1000

#count how many you guess right
guesses = rep(NA, sims)

#run the loop
for(i in 1:sims){
  
  #pick winning numbers
  winners = sample(1:35, 5, replace = F)
  
  #make a guess
  guess = sample(1:35, 5, replace = F)
  
  #see how many you got
  guesses[i] = length(intersect(winners, guess))
  
}

#should get:
#(choose(5, 3)*choose(30, 2)/choose(35, 5))/(1 - choose(5, 0)*choose(30, 5)/choose(35, 5)) = .024
length(guesses[guesses == 3])/length(guesses[guesses > 0])
## [1] 0.02702703
  1. Find an exact expression for the expected number of days needed so that all of the \({35 \choose 5}\) possible lottery outcomes will have occurred.

  2. Approximate the probability that after 50 days of the lottery, every number from 1 to 35 has been picked at least once.

#replicate
set.seed(110)
sims = 1000


#indicator for all 35 being picked
all = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #keep track of what is picked
  picked = integer(0)
  
  #do the lottery for 50 days
  for(j in 1:50){
    
    #pick a new 35, only keep the new draws
    picked = c(picked, sample(1:35, 5, replace = F))
    picked = unique(picked)
  }
  
  #see if we got all 35
  if(length(picked) == 35){
    all[i] = 1
  }
}

#should get .98
mean(all)
## [1] 0.983






Continuous Random Variables




4.1

Fantasy Sports (especially Fantasy Football) are extremely popular in the United States. Essentially, ‘owners’ (regular citizens) compete by picking the football players that they think will perform the best in a specific set of real-life games. Each football player is assigned a numerical score based on their real-life performance (higher score is better), and the ‘owner’ that scores the most points in aggregate (the sum of all of the football players they picked, or their ‘lineup’) wins.

Imagine that you need to fill two more spots on your ‘lineup’ (pick two more players) and you have two options: you could select Tom Brady (widely considered the greatest player of all time) and Jimmy Graham, or Eli Manning and Rob Gronkowski. By considering historical data, you can reasonably approximate the ‘fantasy points’ that each player will indepedently score: \(T \sim N(27, 4)\), \(J \sim N(3, 1)\), \(E \sim N(11, 2)\), \(R \sim N(18, 5)\), where \(T\) stands for Tom Brady’s score, etc. (the independence condition is often unrealistic, especially here because Brady and Gronkowski are on the same team, but we will assume independence here). Which option (Brady and Graham, or Manning and Gronkowski) is more likely to score more points? Find the probability of this option scoring more than the other option; you can leave your answer in terms of \(\Phi\).



Analytical Solution:

We are interested in the probability \(P(T + J > E + R)\). We can re-write this as \(P(T + J - E - R > 0)\), and remember that the linear combination of Normal r.v.’s is still Normal. To find the mean of the new Normal, we simply take expectation: \(E(T + J - E - R) = 27 + 3 - 11 - 18 = 1\). The variance is a similar calculation, made easier because the scores are independent: \(Var(T + J - E - R) = 4 + 1 + 2 + 5 = 12\). So, we are interested in \(P(T + J - E - R > 0) = P(X > 0)\) where \(X \sim N(1, 12)\). By normalizing (subtracing the mean and dividing by the standard deviation, we get \(P(\frac{X - 1}{\sqrt{12}} > \frac{-1}{\sqrt{12}})\). Since \(\frac{X - 1}{\sqrt{12}} \sim N(0, 1)\), we can write \(P(\frac{X - 1}{\sqrt{12}} > \frac{-1}{\sqrt{12}}) = 1 - \Phi(\frac{-1}{\sqrt{12}})\). If we calculate this in R with the code \(1 - pnorm(-1/sqrt(12))\), we get about .61. So, we expect the Tom Brady option to outscore the other option with probability .61.


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#generate the r.v.'s
Brady = rnorm(sims, mean = 27, sd = 2)
Graham = rnorm(sims, mean = 3, sd = 1)
Manning = rnorm(sims, mean = 11, sd = sqrt(2))
Gronk = rnorm(sims, mean = 18, sd = sqrt(5))

#aggregate into X
X = Brady + Graham - Manning - Gronk

#find the proability of X > 0, should get .61 
length(X[X > 0])/sims
## [1] 0.559




4.2
  1. Is it possible to have two i.i.d. random variables \(X\) and \(Y\) such that \(P(X > Y) \neq 1/2\)?



For the following parts, let \(X, Y\) be i.i.d. \(N(0,1)\).

  1. \(E(\Phi(X)) \; \_\_\_ \; \Phi(E(X))\)

  2. \(P(X/Y < 0) \; \_\_\_ \; P(\frac{|X - Y|}{\sqrt{2}} < 1)\)



Analytical Solution:

  1. This is very possible in the discrete case; consider if \(X, Y\) are i.i.d. \(Bern(1/2)\). In this case, \(P(X > Y) = P(X = 1 \cap Y = 0) = P(X = 1)P(Y = 0) = 1/4\). The idea here is that there is some probability of a tie, which we know is not the case in a continuous distribution.

  2. These are equal. The left side, by Universality, is just the expectation of a standard uniform, or 1/2. The right side is 0 (the expectation of a standard normal) plugged into the CDF of a normal.

  3. The LHS is 1/2, you can imagine multiplying both sides by Y (sometimes Y is negative and you have to flip the inequality, but you always have \(P(X < 0) = P(X > 0) = 1/2\)). On the RHS, recognize that X − Y ∼ N(0, 2), and then when we scale by \(\sqrt{2}\) we get to a standard normal. Asking for the probability of the absolute value being less than 1 is the same as asking the probability that the standard normal is between -1 and 1, which is \(\approx\) .68 by the 68-95-99.7 rule.


Empirical Solution:

#part a.
#replicate
set.seed(110)
sims = 1000

#generate the r.v.'s
X = rbinom(sims, 1, 1/2)
Y = rbinom(sims, 1, 1/2)

#should get 1/4
length(X[X > Y])/sims
## [1] 0.243
#part b.
#generate the r.v.
X = rnorm(sims)

#these should match; both are 1/2
pnorm(mean(X)); mean(pnorm(X))
## [1] 0.4909795
## [1] 0.4912469
#part c.
#generate the r.v.'s
X = rnorm(sims)
Y = rnorm(sims)

#LHS
Z = X/Y

#RHS
W = abs(X - Y)/sqrt(2)

#first is 1/2, second is about .68
length(Z[Z < 0])/sims
## [1] 0.465
length(W[W < 1])/sims
## [1] 0.682




4.3

Imagine that you won in the first round of your diving competition, and now, in the next round, you perform only one dive. The scoring of your dive is as follows: you will be judged by three judges on a scale of 0 to 10 (10 being the best) and the maximum rating given by the three judges will be your score. Unlike the previous problem, ratings are given on a continuous scale; instead of just integers, a judge may give 3.14, for example.

Unfortunately, the judges in this specific competition do not have an eye for the sport at all, and each one independently assigns you a random score between 0 and 10.

  1. What is the probability that the first judge gives you the highest score of the group?

  2. Let \(H\) be your competition score. Find \(E(H)\).



Analytical Solution:

  1. By symmetry, each judge has a \(1/3\) chance of being the maximum.

  2. In this case, we will find the PDF of \(H\) and then employ LoTUS to find \(E(H)\). To find the PDF, we will first find the CDF of \(H\). The CDF is given by \(P(H \leq h)\). If we consider any one value \(h\), we know that for this to be the high score all three judges must score equal or less than that value; so, we have to find the probability that all three Uniform distributions (the individual judges scores) are less than \(h\). Since the distributions are independent, we can multiply the probabilities. Because probability is proportional to length in a Uniform distribution, and this interval is 10 units long, the probability of being less than \(h\) is just \(\frac{h}{10}\) (sanity check: if \(h=8\), you should have an 80\(\%\) chance of scoring less than that, since 80\(\%\) of the distribution is lower than 8. That’s exactly what \(\frac{8}{10}\) gives). Multiplying all three, we get \(P(H \leq h) = \frac{h^3}{1000}\).

We can easily derive this CDF in terms of \(h\) to find the PDF: we get \(f(h) = \frac{3h^2}{1000}\). Then, we can employ LoTUS: integrate this PDF times \(h\) over the support to find our expectation.

\[\int_{0}^{10} (h)\frac{3h^2}{1000}dh = \int_{0}^{10}\frac{3h^3}{1000}dh = \Big|_{0}^{10} \frac{3h^4}{4000} = 7.5\]

So, on average, the largest score that the judges give out will be 7.5.


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#generate the r.v.'s
U1 = runif(sims, 0, 10)
U2 = runif(sims, 0, 10)
U3 = runif(sims, 0, 10)

#keep track of the max
M = rep(NA, sims)

#run the loop to find the max
for(i in 1:sims){
  M[i] = max(U1[i], U2[i], U3[i])
}

#should get 7.5
mean(M)
## [1] 7.433508




4.4
  1. Let \(X\sim N(0,1)\). What is \(E(X^5)\)?

  2. Let \(Y \sim Pois(\lambda)\). Find \(E(\frac{c}{Y + 1})\) where \(c\) is a constant.

  3. Let \(Q\) be a Uniform Distribution such that \(Q \sim Unif(0,4)\). Find \(Var(\sqrt{Q})\).

Hint: it may be easier to work in terms of the standard uniform, and the convert.



Analytical Solution:

  1. This is 0. If we use LoTUS and set up the integral, we get the PDF of a Standard Normal multiplied by \(z^5\). This is an odd function (remember, the definition of an odd function is \(f(-x) = -f(x)\)), and an odd function integrated from negative infinity to infinity is 0. In general, \(E(X^c)\), where \(c\) is odd, is 0.

  2. We will employ LoTUS. Multiplying \(\frac{c}{Y + 1}\) and the PMF of \(Y\), then summing:

\[E(\frac{c}{Y + 1}) = \sum_{k=0}^{\infty}(\frac{c}{k + 1})(\frac{\lambda^k}{k!e^\lambda})\]

Sift out the \(c\) and \(\frac{1}{e^\lambda}\), since they are constants not affected by the sum.

\[ \frac{c}{e^\lambda} \sum_{k=0}^{\infty}(\frac{1}{k + 1})(\frac{\lambda^k}{k!})\]

In the denominator of the sum, we can combine \(k+1\) and \(k!\):

\[\frac{c}{e^\lambda} \sum_{k=0}^{\infty}\frac{\lambda^k}{(k+1)!}\]

We begin to see potential for a Taylor Series. We want to get the exponent of the \(\lambda\) in the sum to match the denominator; that is, we want \(\lambda^{k+1}\) in the numerator, so we multiply by \(\frac{\lambda}{\lambda}\).

\[\frac{c}{\lambda e^\lambda} \sum_{k=0}^{\infty}\frac{\lambda^{k+1}}{(k+1)!}\]

This looks like an expansion we are familiar with. We are off by 1 from the regular Taylor Expansion for \(e^x\).

\[E(\frac{c}{Y+1}) = \frac{c}{\lambda e^\lambda}(e^\lambda - 1)\]

  1. As mentioned in the Hint, it is a lot simpler to to work with the Standard Uniform, \(U \sim Unif(0,1)\), instead of \(Q\). Therefore, we can do all of our calculations with \(U\), and then convert to \(Q\). We know that \(Q = 4U\) so, if we just find the Variance of \(\sqrt{U}\), we can easily convert to \(Var(\sqrt{Q})\). To find \(Var(\sqrt{U})\), we employ the equation for variance:

\[Var(\sqrt{U}) = E((\sqrt{U})^2) - (E(\sqrt{U}))^2\]

\[Var(\sqrt{U}) = E(U) - (E(\sqrt{U}))^2\]

We already know that \(E(U)\) is .5. To find \(E(\sqrt{U})\), we employ LoTUS. The PDF of \(U\) is 1, so:

\[E(\sqrt{U}) = \int_{0}^{1} f(u)\sqrt{u} \; du = \int_{0}^{1}\sqrt{u} \; du\]

\[= \Big|_{0}^{1} \frac{2}{3}u^\frac{3}{2} = \frac{2}{3}\]

Returning to the equation for variance:

\[Var(\sqrt{U}) = E(U) - (E(\sqrt{U}))^2 = .5 - \frac{2}{3}^2 = \frac{1}{18}\]

So, considering the variance of \(\sqrt{U}\), we can transform back to \(Q\). Since \(\sqrt{Q} = \sqrt{4U} = 2\sqrt{U}\), we find:

\[Var(\sqrt{Q}) = Var(2\sqrt{U}) = 4Var(\sqrt{U}) = \frac{4}{18}\]

So \(Var(\sqrt{Q}) = \frac{2}{9}\).


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#define simple parameters
lambda = 1
c = 2

#generate the r.v.'s
X = rnorm(sims)
Y = rpois(sims, lambda)
Q = runif(sims, 0, 4)

#should get 0
#since we see large values, this might be slightly different from 0
# however, it's certainly centered around 0
mean(X^5)
## [1] -1.454001
#these should match
(c/(lambda*exp(lambda)))*(exp(lambda) - 1); mean(c/(Y + 1))
## [1] 1.264241
## [1] 1.285205
#should get 2/9
var(sqrt(Q))
## [1] 0.2301918




4.5

Let \(H = 1 - U\), where \(U\) is the Standard Uniform, and \(G\) follow a Gamma Distribution (which we will cover later) with parameters \(\alpha\) and \(\lambda\), PDF \(f(g) = \frac{\lambda^\alpha}{\Gamma(\alpha)}g^{\alpha-1}e^{-g\lambda}\) and CDF \(F(g)\). Find \(E(H - F(G))\).



Analytical Solution:

By symmetry, \(1 - U\) also has a Standard Uniform distribution (every point on \(1-U\) maps to a point on \(U\)). By the Universality of the Uniform, a Random Variable plugged into it’s own CDF is Standard Uniform (which we can apply even if we don’t know anything about the Gamma). So, we’re asking for the expectation of one Standard Uniform minus another Standard Uniform. This is 0. Specifically, it’s \(.5 - .5 = 0\).


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#define simple parameters
alpha = 3
lambda = 1

#generate the r.v.'s
H = 1 - runif(sims)
G = rgamma(sims, alpha, lambda)

#should get 0
mean(H - pgamma(G, alpha, lambda))
## [1] 0.01665719




4.6

Consider a ‘random cube’ generated with side length \(S \sim Unif(0,10)\). Let \(V\) be the volume of the cube. Find \(E(V), \; Var(V)\) as well as the CDF and PDF of \(V\).



Analytical Solution:

Based on the definition in the prompt, \(V = S^3\). First, consider finding \(E(V)\) via LoTUS; We want to find \(E(V) = E(S^3)\), which just a function of \(S\). The PDF of a Uniform is just a constant, and in this case it’s \(\frac{1}{10}\). So, our LoTUS integral becomes:

\[E(V) = E(S^3) = \frac{1}{10}\int_{0}^{10} s^3 ds = \frac{s^4}{40} \big|_{0}^{10} = \frac{10,000}{40} = 250\]

Now we can find the Variance by finding the second moment of \(V\), or \(E(V^2) = E(S^6)\), with another LoTUS integral

\[E(V^2) = E(S^6) = \frac{1}{10}\int_{0}^{10} s^6 ds = \frac{s^7}{70} \big|_{0}^{10} = \frac{10,000,000}{70}\]

Using the equation for variance:

\[Var(V) = E(V^2) - E(V)^2 = \frac{10,000,000}{70} - 250^2 \approx 80,000\]

Now consider the CDF and PDF of \(V\). The definition of the CDF for \(V\) is just \(P(V \leq V)\). We don’t know much about \(V\), but we know a lot about \(S\), and we know that \(V = S^3\). We plug in \(S\) to our equation:

\[P(V \leq v) = P(S^3 \leq v) = P(S \leq \sqrt{3}{v})\]

Recall that \(S\) is just a \(Unif(0, 10)\). Since the probability in a Uniform is proportional to length, we can very easily find probabilities of values along it’s interval.

\[P(S \leq \sqrt{3}{v}) = \frac{\sqrt{3}{v}}{10}\]

This is true when \(\sqrt{3}{v}\) is between 0 and 10. If \(\sqrt{3}{v}\) is less than 0, this CDF is 0, and if \(\sqrt{3}{v}\) is greater than 10, this CDF is 1.

Now that we have the CDF, we just have to derive it to obtain the PDF.

\[f(v) = F'(v) =\frac{v^{-\frac{2}{3}}}{30}\]


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#generate the r.v.'s
S = runif(sims, 0, 10)
V = S^3

#should get 250
mean(V)
## [1] 234.2388
#should get 80,000
var(V)
## [1] 74437.04
#plot the PDFs; should be the same
#calculate the analytical PDF
v = seq(from = .01, to = 1000, length.out = 100)
PDF = v^(-2/3)/30

#plots should line up
#empirical
plot(density(V), col = "black", 
     main = "PDF of v", type = "h",
     xlab = "v", ylab = "f(v)",lwd = 3)

#analytical
lines(v, PDF, col = "red", pch = 20, type = "p", lwd = 3)


legend("topright", legend = c("Empirical PDF", "Analytical PDF"),
       lty=c(1,1), lwd=c(2.5,2.5),
       col=c("black", "red"))

#plot the CDFs; should be the same
#calculate the analytical CDF
v = seq(from = .01, to = 1000, length.out = 100)
CDF = v^(1/3)/10

#plots should line up
#empirical
plot(ecdf(V), col = "black", 
     main = "CDF of v", 
     xlab = "v", ylab = "f(v)",lwd = 3)

#analytical
lines(v, CDF, col = "red", type = "l", lwd = 3)


legend("topleft", legend = c("Empirical PDF", "Analytical PDF"),
       lty=c(1,1), lwd=c(2.5,2.5),
       col=c("black", "red"))




4.7

Let \(X \sim N(0, 1)\). For what values of \(b\) does \(E(e^{x^b/2})\) diverge?



Analytical Solution:

By LoTUS:

\[E(e^{x^b/2}) = \int_{-\infty}^{\infty} e^{x^b/2} e^{-x^2/2} \]

\[= \int_{-\infty}^{\infty} e^{x^b/2-x^2/2}\]

If \(b = 2\), we are left with the integral of 1 from \(-\infty\) to \(\infty\), which diverges to \(\infty\). If \(b > 2\), then the integrand will grow exponentially w.r.t. \(x\). Therefore, \(E(e^{x^b/2})\) diverges for \(b \geq 2\).


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#generate the r.v.
X = rnorm(sims)

#we live in a finite world, so we won't get infinity for
#   this mean. However, the max is quite large, because
#   the integral diverges analytically 
mean(exp(X^2/2)); max(exp(X^2/2))
## [1] 2.420524
## [1] 114.066




4.8

Let \(X\) be a discrete random variable such that the support of \(X\) is \(1,2,...,n\), where \(n > 2\), and \(P(X = x) = \frac{2x}{n(n + 1)}\).

  1. Verify that \(P(X = x)\) is a valid PMF.

  2. Find \(E(X)\) and \(Var(X)\).

Hint: Feel free to use these facts: \(\sum_{k = 1}^n k = \frac{n(n + 1)}{2}\), \(\sum_{k = 1}^n k^2 = \frac{n(n + 1)(2n + 1)}{6}\), and \(\sum_{k = 1}^n k^3 = \Big(\frac{n(n + 1)}{2}\Big)^2\).



Analytical Solution:

  1. From the hint, we have \(\sum_{k = 1}^n k = \frac{n(n + 1)}{2}\), so

\[\sum_{k = 1}^n \frac{2k}{n(n + 1)} = \frac{2}{n(n + 1)} \sum_{k = 1}^n k\]

\[= \frac{2}{n(n + 1)}\cdot \frac{n(n + 1)}{2} = 1\]

  1. From the hint, and using LoTUS:

\[E(X) = \sum_{x = 1}^n x^2 \frac{2}{n(n + 1)} = \frac{2}{n(n + 1)} \sum_{k = 1}^x x^2\]

\[= \frac{2}{n(n + 1)} \cdot \frac{n(n + 1)(2n + 1)}{6} = \frac{2n + 1}{3}\]

This gives us the mean. We can also use LoTUS to find the second moment, which allows us to find Variance:

\[E(X^2) = \sum_{k = 1}^n x^3 \frac{2}{n(n + 1)} = \frac{2}{n(n + 1)} \sum_{k = 1}^n x^3\]

\[= \frac{2}{n(n + 1)} \cdot \Big(\frac{n(n + 1)}{2}\Big)^2\]

\[= \frac{n(n + 1)}{2}\]

Putting it together for the Variance:

\[Var(X) = E(X^2) - (E(X))^2 = \frac{n(n + 1)}{2} - \Big(\frac{2n + 1}{3}\Big)^2\]


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#define a simple parameter
n = 10

#define the PMF 
x = 1:n
PMF = 2*x/(n*(n + 1))

#generate the r.v.
X = sample(x, sims, replace = TRUE, prob = PMF)

#both rows should match
mean(X); (2*n + 1)/3
## [1] 7.163
## [1] 7
var(X); (n*(n + 1))/2 - ((2*n + 1)/3)^2
## [1] 5.714145
## [1] 6




4.9

Imagine a job that pays 1 dollar after the first day, then 2 dollars after the second day, then 4 dollars after the third day, etc., such that the payments continue to double. However, at the end of each day, your boss flips a fair coin, and if it lands tails, you are fired (before you are paid). Let \(X\) be the total, lifetime earnins that you get from this job.

  1. Find \(E(X)\), as well as the PMF of \(X\).

  2. Based on \(E(X)\), is this a job that you want? Does \(E(X)\) tell the whole story?



Analytical Solution:

  1. On the first day we have probability \(\frac{1}{2^2}\) of making 1 dollar, since we need to get a heads on the first day (get paid) and a tails on the second day (get fired), and flips are independent. Then, we have probability \(\frac{1}{2^3}\) probability of making 2 dollars, etc., for the same reasons. So, we have \(P(X = x) = \frac{1}{2^{x + 1}}\) of making \(x\) dollars, for \(x = 1,2,4,8,...\). So, by LoTUS, we get:

\[E(X) = \frac{1}{2^2} + \frac{2}{2^3} + \frac{4}{2^4} + ... \] \[ = \frac{1}{4} + \frac{1}{4} + \frac{1}{4} + ... = \infty\]

  1. Even though the expectation is infinite, this is not a practical result, as we do not live in a finite world (here, we require the sequence to go to infinity). In reality, we can only work a finite number of days, so this sequence will stop eventually.


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#keep track of winnings
X = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #go until we get a tails
  while(TRUE){
    
    #flip to see if we win
    flip = runif(1)
    
    #win
    if(flip <= 1/2){
      
      #if it's the first time, add 1
      if(X[i] == 0){
        X[i] = 1
      }
      
      #if it's not the first time, double
      if(X[i] > 0){
        X[i] = 2*X[i]
      }
    }
    
    #lose
    if(flip > 1/2){
      break
    }
  }
}

#we live in a finite world, so we won't see an infinite mean;
#   however, note that the max is very large!
mean(X); max(X)
## [1] 6.65
## [1] 1024




4.10

Let \(X \sim Expo(\lambda)\) and \(Y = X + c\) for some constant \(c\). Does \(Y\) have an Exponential distribution? Use intuition about the Exponential distribution to answer this question.



Analytical Solution:

Consider the memoryless property, and imagine if \(c = 5\). In this case, if we have ‘waited’ for 3 minutes (recall that the story of an Exponential random variable is waiting for a bus), we know that we have at least 2 more minutes to wait. If we wait for another minute (so we’ve waited for a total of 4 minutes) then we know that we have at least 1 more minute to wait. This violates the memoryless property, which says that no matter how long we’ve been waiting, we have the same distribution for a wait time going forward. Therefore, \(Y\) is not Exponential.


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#define simple parameters
lambda = 3
c = 5

#generate the r.v.'s
X = rexp(sims, lambda)
Y = X + c

#see if Y is Exponential; we can approximate lambda
#   as 1 over the empirical mean

#set 2x2 grid for graphics
par(mfrow = c(1,2))

hist(Y, main = "Y", col = rgb(0, 1, 0, 1/4),
     xlab = "y")
hist(rexp(sims, 1/mean(Y)), main = "Expo random variable", col = rgb(0, 0, 1, 1/4),
     xlab = "")

#re-set graphics
par(mfrow = c(1,1))




4.11

Let \(X,Y\) be i.i.d. \(N(0, 1)\). Find \(E\big((X + Y)^2\big)\) using the fact that the linear combination of Normal random variables is a Normal random variable.



Analytical Solution:

Let \(M = X + Y\). We know that \(M\) is Normal, and specifically \(M \sim N(0, 2)\) (since \(E(X + Y) = E(X) + E(Y) = 0\) by linearity and \(Var(X + Y) = Var(X) + Var(Y) = 2\) by the independence of \(X\) and \(Y\)). We are interested in \(E(M^2)\), and we know:

\[Var(M) = E(M^2) + E(M)^2\]

Plugging in what’s known:

\[E(M^2) = 2\]


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#generate the r.v.'s
X = rnorm(sims)
Y = rnorm(sims)

#should get 2
mean((X + Y)^2)
## [1] 2.040195




4.12
  1. You plan to try to log on to your email at some random (Uniform) time between 4:00 and 5:00. Independently, the interet will crash sometime between 4:00 and 5:00 and will be unavailable from the time that it crashes to 5:00. What is the probability that you are able to log on to your email (i.e., the computer is not crashed when you log on)?

  2. In this part, two ‘break points’ are randomly (Uniformly) and independently selected between 4:00 and 5:00. The internet will not be available between these two points, but will be available for the rest of the hour. What is the probability that the computer is not crashed when you log on?



Analytical Solution:

  1. Let \(U_1\) be your log in time and \(U_2\) be the time the computer crashes. The probability that you can log on is \(P(U_1 < U_2)\), or the probability that you log on before the computer crashes. By symmetry, this is 1/2.

  2. Now let \(U_2\) be your log in time and \(U_1\), \(U_3\) be the two break points. The probability that you can log on is \(1 - P(U_1 < U_2 < U_3) - P(U_3 < U_2 < U_1)\), or 1 minus the probability that we arrive during the ‘crash’ time. Since \(U_1, U_2, U_3\) are i.i.d., each ordering (i.e., \(U_1 < U_3 < U_2\) is a specific ordering) is equally likely. There are 3! orderings, so each ordering has probability \(1/3!\) by the naive definition of probability, and we are left with

\[1 - P(U_1 < U_2 < U_3) - P(U_3 < U_2 < U_1) = 1 - 1/6 - 1/6 = 2/3\]

So, there is a 2/3 probability that we will be able to log on.


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#generate the r.v.'s
U1 = runif(sims)
U2 = runif(sims)
U3 = runif(sims)

#part a. we log on if we arrive before the crash time
#   should get 1/2
length(U2[U2 < U1])/sims
## [1] 0.469
#part b. we log on if we don't arrive during the crash time
#   should get 2/3
(sims - length(U2[U1 < U2 & U2 < U3]) - length(U2[U3 < U2 & U2 < U1]))/sims
## [1] 0.702




4.13

Imagine a single elimination tournament with \(2^n\) teams, where \(n \geq 1\). ‘Single elimination’ means that if you lose a game, you are eliminated. You can envision the tournament set-up in simple cases. When \(n = 1\), we have two teams that play each other for the championship. When \(n = 2\), we have four teams; they are split into pairs, and the winner of each pair meets in the championship. The tournament set-up continues to expand in this way as we add teams (see the March Madness tournament for an example where \(n = 6\)).

Imagine for this problem that every team is equally skilled (in real life, this is clearly not a reasonable assumption) such that any random team has equal probabilities of winning or losing against any other random team. Assume games are independent (also not a reasonable assumption in real life). Let \(X\) be the number of games won by the first team.

  1. Find \(E(X)\) using LoTUS.

  2. Find \(E(X)\) using a symmetry argument.



  1. Consider the PMF of \(X\). We know that there will be \(n\) rounds to this tournament, since each round eliminates half of the teams, and there are \(2^n\) teams so \(\frac{2^n}{2^n} = 1\) (i.e., halving \(2^n\), \(n\) times, gets to 1 team). So, the first team can win \(0,1,2,...,n\) games (all the way from one win to champions), and thus the support of \(X\) is \(0,1,2,...,n\). Consider \(P(X = 0)\). This is simply \(1/2\), since any team as an equal chance of winning or losing against any other team. Now consider \(P(X = 1)\). For this team to win exactly 1 game, they must win the first game and lose the second game, which has probability \(\frac{1}{2} \cdot \frac{1}{2}\) (by the fact that games are even and independent). Expanding this, we have:

\[P(X = x) = \frac{1}{2^{x + 1}}\]

However, this only holds for \(x = 0,1,...,n-1\). We can say \(P(X = n)\) is simply \(1/2^n\) (we don’t need to do \(1/2^{n + 1}\), since the team won’t play another game if they win the championship). Now, we can use LoTUS. We can start the sum at 1 because plugging in \(x = 0\) returns 0, and we don’t forget that we have to consider the cases when \(x \leq n - 1\) and when \(x = n\).

\[E(X) = \frac{n}{2^n} + \sum_{x = 1}^n \frac{x}{2^{x + 1}}\]

\[= \frac{1}{2^2} + \frac{2}{2^3} + \frac{3}{2^4} + ... + \frac{n - 1}{2^n} + \frac{n}{2^n}\]

  1. By symmetry, since the teams are completely equal, each team should have the same expected value. We know that, in total, there are \(2^n - 1\) games, since each game eliminates 1 team and we must be left with 1 of the \(2^n\) teams as our champion in the end, and thus \(2^n - 1\) potential wins. Therefore, we have:

\[E(X_1) + E(X_2) + ... + E(X_{2^n}) = 2^n - 1\]

Where \(X_i\) is the number of games won by the \(i^{th}\) team. As mentioned above, \(E(X_i) = E(X_j)\) for all \(i,j\) by symmetry, so we have:

\[E(X) = \frac{2^n - 1}{2^n}\]

This looks different from the value we found in the prveious part, but they are in fact the same (try some simple examples to prove it to yourself).


Empirical Solution:

#replicate
set.seed(110)
sims = 1000


#define a simple parameter
n = 8

#keep track of the number of games X wins
X = rep(0, sims)


#run the loop
for(i in 1:sims){
  
  #play up to n games
  for(j in 1:n){
    
    #flip to see if the team wins
    flip = runif(1)
    
    #win 
    if(flip <= 1/2){
      X[i] = X[i] + 1
    }
    
    #loss, break the loop (leave the tournament)
    if(flip > 1/2){
      break
    }
  }
}

#all of these should match
x = 0:(n - 1)
sum(x/(2^(x + 1))) + n/2^n; (2^n - 1)/2^n; mean(X)
## [1] 0.9960938
## [1] 0.9960938
## [1] 0.981




BH Problems



The problems in this section are taken from @BH. The questions are reproduced here, and the analytical solutions are freely available online. Here, we will only consider empirical solutions: answers/approximations to these problems using simulations in R.




BH 4.56

For \(X \sim Pois(\lambda)\), find \(E(X!)\) (the average factorial of \(X\)), if it is finite.

#replicate
set.seed(110)
sims = 1000

#define a simple value of lambda, draw the r.v.
#use a value less than 1, since the sum diverges for > 1
lambda = 1/2

X = rpois(sims, lambda)

#should get exp(-lambda)/(1 - lambda) = 1.21
mean(factorial(X))
## [1] 1.104




BH 4.59

Let \(X \sim {Geom}(p)\) and let \(t\) be a constant. Find \(E(e^{tX})\), as a function of \(t\) (this is known as the moment generating function; we will see in Chapter 6 how this function is useful).

#replicate
set.seed(110)
sims = 1000

#define simple parameters, draw the r.v.
#use p = 9/10 so the series doesn't diverge
p = 9/10
t = 1
X = rgeom(sims, p)

#should get (p/(1 - (1-p)*exp(t)))
#we won't get infinity because we live in a finite world 
mean(exp(t*X))
## [1] 1.165406




BH 4.60

The number of fish in a certain lake is a Pois\((\lambda\)) random variable. Worried that there might be no fish at all, a statistician adds one fish to the lake. Let \(Y\) be the resulting number of fish (so \(Y\) is 1 plus a Pois\((\lambda)\) random variable).

  1. Find \(E(Y^2)\).

  2. Find \(E(1/Y)\).

#replicate
set.seed(110)
sims = 1000

#define a simple parameter
lambda = 1

#generate the r.v.
Y = 1 + rpois(sims, lambda)
 
#should get 3*lambda + lambda^2 + 1
#should get 1/lambda*(1 - exp(-lambda))
mean(Y^2)
## [1] 4.618
mean(1/Y)
## [1] 0.6493




BH 4.61

Let \(X\) be a Pois(\(\lambda\)) random variable, where \(\lambda\) is fixed but unknown. Let \(\theta = e^{-3\lambda}\), and suppose that we are interested in estimating \(\theta\) based on the data. Since \(X\) is what we observe, our estimator is a function of \(X\), call it \(g(X)\). The bias of the estimator \(g(X)\) is defined to be \(E(g(X)) - \theta\), i.e., how far off the estimate is on average; the estimator is unbiased if its bias is 0.

  1. For estimating \(\lambda\), the r.v. \(X\) itself is an unbiased estimator. Compute the bias of the estimator \(T=e^{-3X}\). Is it unbiased for estimating \(\theta\)?

  2. Show that \(g(X) = (-2)^X\) is an unbiased estimator for \(\theta\). (In fact, it turns out to be the only unbiased estimator for \(\theta\).)

  3. Explain intuitively why \(g(X)\) is a silly choice for estimating \(\theta\), despite (b), and show how to improve it by finding an estimator \(h(X)\) for \(\theta\) that is always at least as good as \(g(X)\) and sometimes strictly better than \(g(X)\). That is, \[|h(X) - \theta| \leq |g(X) - \theta|,\] with the inequality sometimes strict.

#replicate
set.seed(110)
sims = 1000

#define a simple value of lambda, which also defines theta
lambda = 1
theta = exp(-3*lambda)

#generate X
X = rpois(sims, lambda)

#calculate the bias for a.
#should get exp(-3*lambda)*(exp((2 + exp(-3))*lambda) - 1) = .336
mean(exp(-3*X) - theta)
## [1] 0.362398
#calculate bias for b.  Should get 0
mean((-2)^X - theta)
## [1] 0.04421293
#find h_x for part c.
g_x = (-2)^X

#define h_x as equal to g_x when g_x > 0 and 0 otherwise
h_x = rep(NA, length(g_x))

for(i in 1:length(g_x)){
  h_x[i] = max(g_x[i], 0)
}

#h(x) should be better
#look at absolute difference from the value (how far off estimator is on average)
#if we just looked at the mean, we know g_x is unbiased.
mean(abs(g_x - theta))
## [1] 2.437437
mean(abs(h_x - theta))
## [1] 1.261437




BH 5.11

Let \(U\) be a Uniform r.v. on the interval \((-1,1)\) (be careful about minus signs).

  1. Compute \(E(U), Var(U),\) and \(E(U^4)\).
#replicate
set.seed(110)
sims = 1000

#generate the r.v.
U = runif(1000, -1, 1)

#find each value.  Should get 0, 1/3 and 1/5
mean(U)
## [1] -0.03493541
var(U)
## [1] 0.3345493
mean(U^4)
## [1] 0.1996812
  1. Find the CDF and PDF of \(U^2\). Is the distribution of \(U^2\) Uniform on \((0,1)\)?
#recycle the vector from a.
U2 = U^2

#clearly not uniform
hist(U2, main = "U Squared", col = "firebrick3")

#plot the CDFs; should be the same
#calculate the analytical CDF
k = seq(from = 0, to = 1, length.out = 100)
CDF = sqrt(k)


#plots should line up
#empirical
plot(ecdf(U2), col = "black", 
     main = "CDF", 
     xlab = "u^2", ylab = "F(U^2)",lwd = 3)

#analytical
lines(k, CDF, col = "red", pch = 20, type = "p", lwd = 1)


legend("topleft", legend = c("Empirical PDF", "Analytical PDF"),
       lty=c(1,1), lwd=c(2.5,2.5),
       col=c("black", "red"))




BH 5.12

A stick is broken into two pieces, at a uniformly random breakpoint. Find the CDF and average of the length of the longer piece.

#replicate
set.seed(110)
sims = 1000

#keep track of longer piece
long = rep(NA, sims)

#run the loop
for(i in 1:sims){
  
  break.point = runif(1)
  long[i] = max(1 - break.point, break.point)
}

#should get 3/4
mean(long)
## [1] 0.7514732
#find analytical CDF
x = seq(from = 1/2, to = 1, by = 1/100)
CDF = 2*x - 1

#show that the CDFs match
plot(ecdf(long), main = "Analytical and Empirical CDF", col = "red", lwd = 3, ylab = "F(x)", xlab = "x")
lines(x, CDF, col = "black", lwd = 4)

legend("topleft", legend = c("Empirical CDF", "Analytical CDF"),
       lty=c(1,1), lwd=c(2.5,2.5),
       col=c("black", "red"))




BH 5.16

Let \(U \sim Unif(0,1)\), and \[X = \log\left(\frac{U}{1-U}\right).\] Then \(X\) has the Logistic distribution, as defined in Example 5.1.6.

  1. Write down (but do not compute) an integral giving \(E(X^2)\).
#replicate
set.seed(110)
sims = 1000

#generate the r.v.
U = runif(sims)

#define X
X = log(U/(1 - U))

#define the function to find the analytical result
PDF.a <- function(u){
  return(log(u/(1 - u))^2)
}

#these should match
mean(X^2); integrate(PDF.a, 0, 1)
## [1] 3.072777
## 3.289868 with absolute error < 0.00033
  1. Find \(E(X)\) without using calculus.
#recycling vectors; should get 0
mean(X)
## [1] -0.126199




BH 5.19

Let \(F\) be a CDF which is continuous and strictly increasing. Let \(\mu\) be the mean of the distribution. The quantile function, \(F^{-1}\), has many applications in statistics and econometrics. Show that the area under the curve of the quantile function from 0 to 1 is \(\mu\).

#replicate
set.seed(110)
sims = 1000

#define a simple parameter
lambda = 10

#define the inverse CDF
inverse = function(U){
  return((-1/lambda)*log(1 - U))
}

#should get 1/10, which is the mean of an Expo(10)
integrate(inverse, lower = 0, upper = 1)
## 0.1 with absolute error < 3.7e-16




BH 5.32

Let \(Z \sim N(0,1)\) and let \(S\) be a random sign independent of \(Z\), i.e., \(S\) is \(1\) with probability \(1/2\) and \(-1\) with probability \(1/2\). Show that \(SZ \sim N(0,1)\).

#replicate
set.seed(110)
sims = 1000

#generate the r.v.'s
Z = rnorm(sims)

#generate S
S = rep(NA, sims)

#iterate through S
for(i in 1:sims){
  
  #see if we flip
  if(runif(1) < 1/2){
    S[i] = -1
  }
}

#show that the distributions are the same
hist(Z, freq = FALSE, main = "Z",
     xlab = "", col = "gray")

hist(S*Z, freq = FALSE, main = "SZ",
     xlab = "", col = "gray")




BH 5.33

Let \(Z \sim N(0,1).\) Find \(E \left(\Phi(Z) \right)\) without using LOTUS, where \(\Phi\) is the CDF of \(Z\).

#replicate
set.seed(110)
sims = 1000

#generate the r.v.
Z = rnorm(sims)

#should get 1/2
mean(pnorm(Z))
## [1] 0.479558




BH 5.34

Let \(Z \sim N(0,1)\) and \(X=Z^2\). Then the distribution of \(X\) is called Chi-Square with 1 degree of freedom. This distribution appears in many statistical methods.

  1. Find a good numerical approximation to \(P(1 \leq X \leq 4)\) using facts about the Normal distribution, without querying a calculator/computer/table about values of the Normal CDF.
#replicate
set.seed(110)
sims = 1000

#generate the r.v.'s
Z = rnorm(sims)
X = Z^2

#find the probability.  Should get 2*(pnorm(2) - pnorm(1)) = .27
length(X[X > 1 & X < 4])/sims
## [1] 0.276
  1. Let \(\Phi\) and \(\varphi\) be the CDF and PDF of \(Z\), respectively. Show that for any \(t>0\), \(I(Z>t) \leq (Z/t)I(Z>t)\). Using this and LOTUS, show that \(\Phi(t) \geq 1 - \varphi(t)/t.\)
#try for different t values
t = seq(from = 0, to = 3, by = 1/10)

#the CDF should always be greater
plot(t, pnorm(t), ylim = c(0, 1), 
     main = "F(x) and 1 - f(t)/t", xlab = "t", 
     ylab = "", type = "l", lwd = 3, col = "red")
lines(t, 1 - dnorm(t)/t, col = "black", lwd = 3)

legend("bottomright", legend = c("1 - f(t)/t", "F(x)"),
       lty=c(1,1), lwd=c(2.5,2.5),
       col=c("black", "red"))




BH 5.36

Let \(Z \sim N(0,1)\). A measuring device is used to observe \(Z\), but the device can only handle positive values, and gives a reading of \(0\) if \(Z \leq 0\); this is an example of censored data. So assume that \(X=Z I_{Z>0}\) is observed rather than \(Z\), where \(I_{Z>0}\) is the indicator of \(Z>0\). Find \(E(X)\) and \(Var(X)\).

#replicate
set.seed(110)
sims = 1000

#draw the r.v.'s
Z = rnorm(sims)
X = rep(0, sims)

#run the loop for X
for(i in 1:sims){
  
  #check if Z > 0
  if(Z[i] > 0){
    X[i] = Z[i]
  }
}

#should get 1/sqrt(2*pi) = .4, 1/2 - 1/(2*pi)  = .34
mean(X)
## [1] 0.3481901
var(X)
## [1] 0.2829415




BH 5.38

A post office has 2 clerks. Alice enters the post office while 2 other customers, Bob and Claire, are being served by the 2 clerks. She is next in line. Assume that the time a clerk spends serving a customer has the Exponential(\(\lambda\)) distribution.

  1. What is the probability that Alice is the last of the 3 customers to be done being served?

Hint: No integrals are needed.

#replicate
set.seed(110)
sims = 1000

#define a simple parameter
lambda = 1

#generate the r.v.
A = rexp(sims, 1)
B = rexp(sims, 1)
C = rexp(sims, 1)

#indicator if alice is last
last = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #Alice's time is A + min(B,C)
  #check if this is greater than max(B,C)
  if(A[i] + min(B[i], C[i]) > max(B[i], C[i])){
    last[i] = 1
  }
}


#should get 1/2
mean(last)
## [1] 0.531
  1. What is the expected total time that Alice needs to spend at the post office?
#recycle vectors

#count total time
time = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #her time is just A + min(B, C)
  time[i] = A[i] + min(B[i], C[i])
}

#should get 3/(2*lambda) = 1.5
mean(time)
## [1] 1.508915




BH 5.41

Fred wants to sell his car, after moving back to Blissville (where he is happy with the bus system). He decides to sell it to the first person to offer at least $15,000 for it. Assume that the offers are independent Exponential random variables with mean $10,000.

  1. Find the expected number of offers Fred will have.

  2. Find the expected amount of money that Fred will get for the car.

#replicate
set.seed(110)
sims = 1000

#keep track of number of offers
count = rep(0, sims)

#keep track of payment
cash = rep(NA, sims)

#define lambda
lambda = 1/10000

#run the loop
for(i in 1:sims){
  
  #go until we get the right offer; initialize
  offer = 0
  while(offer < 15000){
    
    #generate a new offer
    offer = rexp(1, lambda)
    
    #increment (keep track of offers)
    count[i] = count[i] + 1
  }
  
  #see what was offered
  cash[i] = offer
}

#should get exp(1.5) = 4.48
mean(count)
## [1] 4.541
#should get 25000
mean(cash)
## [1] 25341.46




BH 5.44

Joe is waiting in continuous time for a book called The Winds of Winter to be released. Suppose that the waiting time \(T\) until news of the book’s release is posted, measured in years relative to some starting point, has an Exponential distribution with \(\lambda = 1/5\).

Joe is not so obsessive as to check multiple times a day; instead, he checks the website once at the end of each day. Therefore, he observes the day on which the news was posted, rather than the exact time \(T\). Let \(X\) be this measurement, where \(X=0\) means that the news was posted within the first day (after the starting point), \(X=1\) means it was posted on the second day, etc. (assume that there are 365 days in a year). Find the PMF of \(X\). Is this a named distribution that we have studied?

#replicate
set.seed(110)
sims = 1000

#define a parameter
lambda = 1/5

#generate the r.v.
Time = rexp(sims, lambda)

#X is the day at which T occurs; we take the floor, after converting to days (multiply by 365)
X = floor(365*Time)


#show that the distributions are similar
hist(X, col = "gray", main = "X", xlab = "x",
     breaks = seq(from = 0, to = 16000, by = 1000))

hist(rgeom(sims, 1 - exp(-1/1825)), main = "Geom(1 - exp(-1/1825))",
     xlab = "", col = "gray", breaks = seq(from = 0, to = 16000, by = 1000))




BH 5.50

Find \(E(X^3)\) for \(X \sim Expo(\lambda)\), using LOTUS and the fact that \(E(X) = 1/\lambda\) and \(Var(X) = 1/\lambda^2\), and integration by parts at most once. In the next chapter, we’ll learn how to find \(E(X^n)\) for all \(n\).

#replicate
set.seed(110)
sims = 1000

#define a simple parameter
lambda = 5

#generate the r.v.
X = rexp(sims, lambda)

#find the mean, should get 6/lambda^3 = .048
mean(X^3)
## [1] 0.05326947




BH 5.51

The Gumbel distribution is the distribution of \(-log(X)\) with \(X \sim Expo(1)\).

  1. Find the CDF of the Gumbel distribution.
#replicate
set.seed(110)
sims = 1000

#generate the r.v.'s
X = rexp(sims, 1)
G = -log(X)

#calculate analytical CDF
x = seq(from = -3, to = 10, by = 1/10)
CDF = exp(-exp(-x))

#show that the CDFs match
plot(ecdf(G), lwd = 3, col = "black", 
     main = "CDF", ylab = "F(x)")
lines(x, CDF, type = "p", col = "red", pch = 20)

legend("topleft", legend = c("Empirical PDF", "Analytical PDF"),
       lty=c(1,1), lwd=c(2.5,2.5),
       col=c("black", "red"))

  1. Let \(X_1,X_2,\dots\) be i.i.d. Expo(1) and let \(M_n = \max(X_1,\dots,X_n)\). Show that \(M_n - log(n)\) converges in distribution to the Gumbel distribution, i.e., as \(n \to \infty\) the CDF of \(M_n - \log n\) converges to the Gumbel CDF.
#recycle vectors

#define a simple parameters
n = 100

#generate M
M = rep(NA, sims)

#run the loop
for(i in 1:sims){
  
  #find the max
  M[i] = max(rexp(n, 1))
}


#show that the CDFs match
plot(ecdf(G), lwd = 3, col = "black", main = "CDF of G", ylab = "F(g)", xlab = "g")

plot(ecdf(M), col = "red", lwd = 3, main = "CDF of M", ylab = "F(m)", xlab = "m")




BH 5.55

Consider an experiment where we observe the value of a random variable \(X\), and estimate the value of an unknown constant \(\theta\) using some random variable \(T=g(X)\) that is a function of \(X\). The r.v. \(T\) is called an estimator. Think of \(X\) as the data observed in the experiment, and \(\theta\) as an unknown parameter related to the distribution of \(X\).

For example, consider the experiment of flipping a coin \(n\) times, where the coin has an unknown probability \(\theta\) of Heads. After the experiment is performed, we have observed the value of \(X \sim \textrm{Bin}(n,\theta)\). The most natural estimator for \(\theta\) is then \(X/n\).

The bias of an estimator \(T\) for \(\theta\) is defined as \(b(T ) = E(T) - \theta\). The mean squared error is the average squared error when using \(T(X)\) to estimate \(\theta\): \[\textrm{MSE}(T) = E(T - \theta)^2.\] Show that \[\textrm{MSE}(T) = \textrm{Var}(T) + \left(b(T)\right)^2.\] This implies that for fixed MSE, lower bias can only be attained at the cost of higher variance and vice versa; this is a form of the bias-variance tradeoff, a phenomenon which arises throughout statistics.

#replicate
set.seed(110)
sims = 1000

#define parameters
n = 100

#define a true theta
theta = 1/2

#generate the r.v.
X = rbinom(sims, n , theta)

#find the MSE
MSE = mean((X - theta)^2)

#should get the same
var(X) + mean(X - T)^2
## [1] 2403.772
MSE
## [1] 2452.778




BH 5.57
  1. Let \(X_1,X_2, \dots\) be independent \(N(0,4)\) r.v.s., and let \(J\) be the smallest value of \(j\) such that \(X_j > 4\) (i.e., the index of the first \(X_j\) exceeding 4). In terms of \(\Phi\), find \(E(J)\).
#replicate
set.seed(110)
sims = 1000

#keep track of J 
J = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #go until we have a value bigger than 4; initialize
  value = 0
  while(value < 4){
    value = rnorm(1, mean = 0, sd = 2)
    J[i] = J[i] + 1
  }
}

#should get 1/(1 - pnorm(2)) = 43.9
mean(J)
## [1] 44.273
  1. Let \(f\) and \(g\) be PDFs with \(f(x) > 0\) and \(g(x) > 0\) for all \(x\). Let \(X\) be a random variable with PDF \(f\). Find the expected value of the ratio \[R=\frac{g(X)}{f(X)}.\] Such ratios come up very often in statistics, when working with a quantity known as a likelihood ratio and when using a computational technique known as importance sampling.
#generate X
X = rnorm(sims, mean = 0, sd  = 1)

#find R; use f from a N(0, 1), g from N(1, 2)
R = dnorm(X, mean = 1, sd = sqrt(2))/dnorm(X, mean = 0, sd = 1)

#should get 1
mean(R)
## [1] 0.9413336
  1. Define \[F(x) = e^{-e^{-x}}.\]This is a CDF and is a continuous, strictly increasing function. Let \(X\) have CDF \(F\), and define \(W = F(X)\). What are the mean and variance of \(W\)?
#generate according to Universality
#this is the inverse of the PDF, with U plugged in
X = -log(-log(runif(sims)))

#generate W
W = exp(-exp(-X))

#should get 1/2 and 1/12, like a standard uniform
mean(W)
## [1] 0.5014141
var(W)
## [1] 0.08718282
#should have a uniform distribution
hist(W, main = "W", xlab = "", col = "gray")




BH 5.59

As in Example 5.7.3, athletes compete one at a time at the high jump. Let \(X_j\) be how high the \(j\)th jumper jumped, with \(X_1,X_2,\dots\) i.i.d. with a continuous distribution. We say that the \(j\)th jumper is “best in recent memory” if he or she jumps higher than the previous 2 jumpers (for \(j \geq 3\); the first 2 jumpers don’t qualify).

  1. Find the expected number of best in recent memory jumpers among the 3rd through \(n\)th jumpers.
#replicate
set.seed(110)
sims = 1000

#keep track of best jumpers in recent memory
best = rep(0, sims)

#define basic parameters
n = 100

#run the loop
for(i in 1:sims){
  
  #create a path for the jumpers
  X = rep(NA, n)
  
  #draw the first two jumpers.  Use Expo (continuous, positive support)
  X[1:2] = rexp(2, 1)
  
  #generate the rest of the jumps
  for(j in 3:n){
    
    #generate another jumper
    X[j] = rexp(1, 1)
    
    #see if we have the best in recent memory
    if(X[j] > max(X[j - 1], X[j - 2])){
      best[i] = best[i] + 1
    }
  }
}

#should get (n - 2)/3 = 32.67
mean(best)
## [1] 32.485
  1. Let \(A_j\) be the event that the \(j\)th jumper is the best in recent memory. Find \(P(A_3 \cap A_4), P(A_3),\) and \(P(A_4)\). Are \(A_3\) and \(A_4\) independent?
#indicators for A3, A4 and A4 intersect A4
A3 = rep(0, sims)
A4 = rep(0, sims)
A3A4 = rep(0, sims)

#define basic parameters
n = 100

#run the loop
for(i in 1:sims){
  
  #create a path for the jumpers
  X = rep(NA, n)
  
  #draw the first four jumpers.  Use Expo (continuous, positive support)
  X[1:4] = rexp(4, 1)
  
  #see if the 3rd is best in recent memory
  if(X[3] > max(X[1], X[2])){
    A3[i] = 1
  }
  
  #see if 4th is best in recent memory
  if(X[4] > max(X[3], X[2])){
    A4[i] = 1
  }
  
  #see if both are best in recent memory
  if(A3[i] == 1 && A4[i] == 1){
    A3A4[i] = 1
  }
}

#should get different values (not independent); 1/9 and 1/12
mean(A3)*mean(A4)
## [1] 0.114244
mean(A3A4)
## [1] 0.095




Moment Generating Functions




5.1

We have discussed how a Binomial random variable can be thought of as a sum of Bernoulli random variables. Using MGFs, prove that if \(X \sim Bin(n, p)\), then \(X\) is the sum of \(n\) independent \(Bern(p)\) random variables.

Hint: The Binomial Theorem states that \(\sum_{k = 0}^n {n \choose k} x^n y^{n - k} = (x + y)^n\).



Analytical Solution:

Let \(X \sim Bin(n, p)\) and \(Y \sim Bern(p)\). We need to show that the MGF of \(X\) is equal to the MGF of \(Y\), raised to \(n\) (since the MGF of a sum of random variables is the product of the individual MGFs). First, we can find the MGF of \(M\). Letting \(M_x(t)\) mean ‘the MGF of \(X\)’, and knowing that the PMF of \(X\) is given by \(P(X = x) = {n \choose x} p^x (1 - p)^{n - x}\), we can perform a LoTUS calculation:

\[M_x(t) = E(e^{tx}) = \sum_{k = 0}^n e^{tx} {n \choose x} p^x (1 - p)^{n - x}\]

Combining terms that are raised to the \(k\):

\[= \sum_{k = 0}^n {n \choose x} (pe^t)^x (1 - p)^{n - x}\]

We can now use the Binomial Theorem stated in the Hint, where \(pe^t\) is \(x\) and \((1 - p)\) is \(y\). This yields:

\[M_x(t) = (pe^t + (1 - p))^n\]

Now, let’s find the MGF of \(Y\). We know the PMF of \(Y\) is \(P(Y = y) = p^y (1 - p)^{1 - y}\), so we can perform a LoTUS calculation:

\[E(e^{ty}) = \sum_{k = 0}^1 e^{tk} p^k (1 - p)^{1 - k}\]

There are only two terms in this sum, \(k = 0\) and \(k = 1\), so we can write these terms out:

\[= 1 - p + pe^t\]

And, clearly, if we raise this to \(n\), we get the MGF of \(X\).

So, we just showed that if we raise the MGF of \(Y\) to a power, we get the MGF of \(X\). This is the same as saying that the sum of \(n\) i.i.d. \(Bern(p)\) random variables (i.e., multiple random variables with the same distribution as \(Y\)) they have the same distribution as \(X\).


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#define simple parameters
n = 10
p = 1/2

#generate the r.v.'s
X = rbinom(sims, n, p)
Y = sample(c(0, 1), sims, replace = TRUE, prob = c(1 - p, p))


#define values of t (close to 0)
t = seq(from = -1/10, to = 1/10, length.out = 100)

#plot the MGFs
plot(t, sapply(t, function(t) mean(exp(t*X))),
     main = "MGFs", col = "firebrick3",
     type = "l", lwd = 3,
     xlab = "t", ylab = "")
lines(t, sapply(t, function(t) mean(exp(t*Y))^n),
     main = "MGFs", col = "dodgerblue4",
     type = "l", lwd = 3)


#create a legend
legend("topleft", legend = c("MGF of X", "MGF of Y raised to n"),
       lty=c(1,1), lwd=c(2.5,2.5),
       col=c("firebrick3", "dodgerblue4"))




5.2
  1. Ulysses has been studying the Uniform distribution and he makes the following claim. ‘Let \(X \sim Unif(0, 10)\) and \(Y_1, Y_2, ..., Y_{10}\) be i.i.d. \(Unif(0, 1)\). The sum of the \(Y\) random variables \(\sum_{k = 1}^{10} Y_k\) has the same distribution as \(X\); this is intuitive, adding up 10 random draws from 0 to 1 is the same as generating one random draw from 0 to 10.’ Test Ulysses’ claim using MGFs.

  2. Defend your answer to (a.) using intuition.



Analytical Solution:

  1. We can calculate the MGF of \(X\) and see if it is the same as the MGF of \(Y\) raised to the 10. Letting \(M_x(t)\) be the MGF of \(X\), we write:

\[E(e^{tX}) = \int_0^{10} \frac{e^{tx}}{10} dx\] \[= |_0^{10} \frac{e^{tx}}{10t}\] \[= \frac{e^{10 t} - 1}{10t}\]

Now we calculate the MGF of \(Y\).

\[M_y(t) = \int_0^1 e^{ty} dy\]

\[= |_0^1 \frac{e^{ty}}{t}\]

\[= \frac{e^t - 1}{t}\] We can clearly see that, in general:

\[\big(\frac{e^t - 1}{t}\big)^{10} \neq \frac{e^{10 t} - 1}{10t}\] That is, the MGF of \(Y\) raised to 10 is not the same as the MGF of \(X\), and thus the sum of the \(Y_i\) random variables is not the same distribution as \(X\).

  1. This is the same reasoning as why a 7 is the most likely sum of two die rolls; no matter what the value fo the first die is, there is still a possibility that the sum is 7. In general, values in the middle of the distribution are more likely than extreme values. For a really extreme value near 10 in this example, we would need each individual standard uniform to take on an extremely large value (meaning the overall extremely large value is very unlikely). Formally, there is not a one-to-one mapping from the sum of standard uniforms to the \(Unif(0, 10)\); there are more mappings to values in the middle of the \((0, 10)\) interval.


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#generate the r.v.'s
#Unif(0, 10)
X = runif(sims, 0, 10)

#create a matrix of many standard uniforms
data = matrix(runif(sims*10), nrow = sims, ncol = 10)

#the row sums are the random variable we want
Y = rowSums(data)


#find analytical and empirical MGFs
#define t
t = seq(from = -1, to = 1, length.out = 10)

#calculate the analytical MGFs
mgf.a.1 = sapply(t, function(t) (exp(10*t) - 1)/(10*t))
mgf.a.2 = sapply(t, function(t) ((exp(t) - 1)/t)^10)

#calculate empirical MGFs
mgf.e.1 = sapply(t, function(t) mean(exp(t*X)))
mgf.e.2 = sapply(t, function(t) mean(exp(t*Y)))



#compare plots; they diverge as t grows
plot(t, mgf.e.1, main = "MGFs (X ~ Unif(0, 10), sum of Y ~ Unif(0, 1))",
     xlab = "t", ylab = "",
     type = "l", col = "black", lwd = 3)
lines(t, mgf.a.1, type = "p", col = "red", lwd = 1, pch = 16)
lines(t, mgf.e.2, type = "l", col = "darkblue", lwd = 3)
lines(t, mgf.a.2, type = "p", col = "gold4", lwd = 1, pch = 16)


legend("topleft", legend = c("Analytical MGF (X)", "Empirical MGF (X)",
                             "Analytical MGF (sum of Y)",
                             "Empirical MGF (sum of Y)"),
       lty=c(1,1,1,1), lwd=c(2.5,2.5,2.5,2.5),
       col=c("red", "black","gold4","darkblue"))




5.3

Let \(X\) be a degenerate random variable such that \(X = c\) always, where \(c\) is a constant. Find the MGF of \(X\), and use the MGF to find the expectation, variance and all moments of \(X\). Explain why these results make sense.



Analytical Solution:

By definition:

\[M_x(t) = E(e^{tx}) = e^{tc}\]

Since \(X\) is always \(c\). To find expectation, we take the first derivative w.r.t. \(t\) and plug in 0:

\[E(X) = M_x^{\prime}(0) = ce^{0 \cdot c} = c\] To find the variance, we find the second moment and subtract the first moment squared:

\[E(X^2) = M_x^{\prime \prime}(0) = c^2e^{0 \cdot c} = c^2\] \[Var(X) = E(X^2) - E(X)^2 = c^2 - c^2 = 0\]

We can generalize this result to the \(k^{th}\) moment: we will simply take \(k\) derivatives and eventually plug in 0 for \(t\), which will result in:

\[E(X^k) = c^k e^{0 \cdot c} = c^k\]

This all makes sense. Since \(X\) is always \(c\), \(E(X^k) = E(c^k) = c^k\). That is, there is complete certainty as to what \(X^k\) is. Also, it makese sense that the Variance of \(X\) is 0, since \(X\) is a constant and thus does not vary!




5.4

Let \(X \sim N(\mu, \sigma^2)\). The MGF of \(X\) is given by \(M_x(t) = e^{\mu t + \frac{1}{2} \sigma^2 t^2}\) (in general, you can find this and other useful facts about distributions on Wikipedia). Using this fact, as well as properties of MGFs, show that the sum of independent Normal random variables has a Normal distribution.



Analytical Solution:

Consider \(X \sim N(\mu_x, \sigma_x^2)\) and \(Y \sim N(\mu_y, \sigma^2_y)\), where \(X\) and \(Y\) are independent. Now consider \(Z = X + Y\). By properties of the MGF, the MGF of \(Z\) is the product of the marginal MGFs of \(X\) and \(Y\). We are given the MGF of a Normal random variable in the prompt, so we write:

\[M_z(t) = M_x(t) M_y(t) = e^{\mu_x t + \frac{1}{2} \sigma_x^2 t^2} e^{\mu_y t + \frac{1}{2} \sigma_y^2 t^2}\]

\[= e^{(\mu_x + \mu_y)t + \frac{1}{2} (\sigma_x^2 + \sigma_y^2) t^2}\] This is clearly the MGF of a random variable with distribution \(N(\mu_x + \mu_y, \sigma_x^2 + \sigma_y^2)\), and since this random variable has this MGF, it must have a \(N(\mu_x + \mu_y, \sigma_x^2 + \sigma_y^2)\) distribution (the MGF determines distributions). We showed this in the general case for mean and variance, meaning that we showed that this result holds for any independent Normal random variables.


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#define simple parameters
mux = 3
muy = 5
sigmax = 2
sigmay = 6

#generate the r.v.'s
X = rnorm(sims, mux, sigmax)
Y = rnorm(sims, muy, sigmay)
Z = X + Y

#find analytical and empirical MGFs
#define t near 0
t = seq(from = -1/100, to = 1/100, length.out = 10)

#calculate the analytical MGF
mgf.a = sapply(t, function(t) 
  exp((mux + mux)*t + 1/2*(sigmax^2 + sigmay&2)*t^2))

#calculate empirical MGF
mgf.e = sapply(t, function(t) mean(exp(t*Z)))




#compare plots; empirical MGF beings to diverge
plot(t, mgf.e, main = "MGF",
     xlab = "t", ylab = "",
     type = "l", col = "black", lwd = 3)
lines(t, mgf.a, type = "p", col = "red", lwd = 1, pch = 16)


legend("topleft", legend = c("Empirical MGF", "Analytical MGF"),
       lty=c(1,1), lwd=c(2.5,2.5),
       col=c("black", "red"))




5.5

Let \(X \sim Expo(\lambda)\) and \(Y = X + c\) for some constant \(c\). Does \(Y\) have an Exponential distribution? Use the MGF of \(Y\) to answer this question.



Analytical Distribution:

We can find the MGF of \(Y\) and see if it is an Exponential MGF. We know that \(Y\) is the sum of independent random variables; \(c\) is a degenerate random variable (a constant) but a random variable nonetheless. The MGF of \(Y\), then, is the product of the MGF of \(X\) and the MGF of \(c\). We know from earlier that the MGF of \(X\) will be \(\frac{\lambda}{\lambda - t}\). For the MGF of \(c\), we simply write the definition of the MGF:

\[M_c(t) = E(e^{tc}) = e^{tc}\]

Since \(c\) is a constant. Putting it all together, we get:

\[M_y(t) = M_x(t)M_c(t) = \Big(\frac{\lambda}{\lambda - t}\Big) e^{tc} = \frac{\lambda e^{tc}}{\lambda - t}\]

This is clearly not the MGF of an Exponential, so \(Y\) is not Exponential (recall that MGFs determine distributions).


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#define simple parameters
lambda = 3
c = 5

#generate the r.v.'s
X = rexp(sims, lambda)
Y = X + c


#define t
t = seq(from = -1/100, to = 1/100, length.out = 10)

#calculate empirical MGFs
mgf.x = sapply(t, function(t) mean(exp(t*X)))
mgf.y = sapply(t, function(t) mean(exp(t*Y)))



#compare plots; they don't match
plot(t, mgf.x, main = "MGFs",
     xlab = "t", ylab = "", 
     ylim = c(min(c(mgf.y, mgf.x)), max(c(mgf.y, mgf.x))),
     type = "l", col = "black", lwd = 3)
lines(t, mgf.y, type = "p", col = "red", lwd = 1, pch = 16)

legend("topleft", legend = c("MGF of X", "MGF of Y"),
       lty=c(1,1), lwd=c(2.5,2.5),
       col=c("black", "red"))




BH Problems



The problems in this section are taken from @BH. The questions are reproduced here, and the analytical solutions are freely available online. Here, we will only consider empirical solutions: answers/approximations to these problems using simulations in R.




BH 6.13

A fair die is rolled twice, with outcomes \(X\) for the first roll and \(Y\) for the second roll. Find the moment generating function \(M_{X+Y}(t)\) of \(X+Y\) (your answer should be a function of \(t\) and can contain unsimplified finite sums).

#replicate
set.seed(110)
sims = 1000

#keep track of Z = X + Y
Z = rep(NA, sims)

#run the loop
for(i in 1:sims){
  
  #roll the die twice
  Z[i] = sum(sample(1:6, 2, replace = TRUE))
}

#find the MGF; empirical and analytical
#just show that these are the same for t = 1,2 (gets intractable from there)
t = 1:2
k = 1:6
MGF.e = rep(NA, length(t))
MGF.a = rep(NA, length(t))

#iterate, calculate the analytical and empirical MGFs
for(i in 1:length(t)){
  
  MGF.e[i] = mean(exp(Z*t[i]))
  MGF.a[i] = (1/6*sum(exp(k*t[i])))^2
}

#should be the same
MGF.a
## [1]     11258.38 984156698.54
MGF.e
## [1]     11218.01 952082975.97




BH 6.14

Let \(U_1, U_2, ..., U_{60}\) be i.i.d.~\(Unif(0,1)\) and \(X = U_1 + U_2 + ... + U_{60}\). Find the MGF of \(X\).

#replicate
set.seed(110)
sims = 1000

#keep track of Z = sum of the U's
Z = rep(NA, sims)

#run the loop
for(i in 1:sims){
  
  #sum 60 standard uniforms
  Z[i] = sum(runif(60))
}

#find the MGF; empirical and analytical
#just show that these are the same for small t (gets intractable from there)
t = c(1/1000, 2/1000)
MGF.e = rep(NA, length(t))
MGF.a = rep(NA, length(t))

#iterate, calculate the analytical and empirical MGFs
for(i in 1:length(t)){
  
  MGF.e[i] = mean(exp(Z*t[i]))
  MGF.a[i] = (exp(t[i]) - 1)^60/(t[i]^60)
}

#should be the same
MGF.a
## [1] 1.030457 1.061847
MGF.e
## [1] 1.030400 1.061729




BH 6.21

Let \(X_n \sim Bin(n,p_n)\) for all \(n \geq 1\), where \(np_n\) is a constant \(\lambda > 0\) for all \(n\) (so \(p_n = \lambda/n\)). Let \(X \sim Pois(\lambda)\). Show that the MGF of \(X_n\) converges to the MGF of \(X\) (this gives another way to see that the Bin(\(n,p\)) distribution can be well-approximated by the \(Pois(\lambda)\) when \(n\) is large, \(p\) is small, and \(\lambda = np\) is moderate).

### We show that the MGFs (plotted in red and black) converge as n grows ###

#replicate
set.seed(110)
sims = 1000

#define n, cover large values
n = round(seq(from = 2, to = 100, length.out = 4))

#define other parameters
p = 1/n
lambda = n*p

#initialize graphics
par(mfrow = c(2,2))


#iterate over n
for(i in 1:length(n)){

  #generate the r.v.'s
  X = rbinom(sims, n[i], p[i])
  Y = rpois(sims, lambda)
  
  #define t
  t = seq(from = -2, to = 2, length.out = 100)
  
  #calculate the MGFs
  MGF.X = sapply(t, function(t) mean(exp(X*t)))
  MGF.Y = sapply(t, function(t) mean(exp(Y*t)))
  
  #define a title
  title = paste0("n = ", n[i])
  #plot
  plot(t, MGF.X, main = title,
       type = "l", col = "black", lwd = 3,
       xlab = "", ylab = "")
  lines(t, MGF.Y, col = "firebrick3", lwd = 3)
}

#re-set graphics
par(mfrow = c(1,1))




Joint Distributions




6.1

Consider a square with area 1 that lies in the first quadrant of the x-y plane, where the bottom left corner starts at the origin (so, \(x\) and \(y\) run from 0 to 1). We pick a random point, \((x,y)\), in this square (the point is chosen uniformly).

  1. Find the joint PDF of \(X\) and \(Y\).

  2. Find the marginal PDF of \(X\).

  3. Find the PDF of \(X\) conditional on \(Y\).

  4. Show that \(X\) and \(Y\) are independent.



Analytical Solution:

  1. Remember that with a uniform distribution, length must be proportional to probability (length in the 1-D case and area in the 2-D case). Here, the area is 1, so the Joint PDF is just \(\frac{1}{1} = 1\).

  2. To find the marginal PDF of \(X\), we have to integrate out the variable \(y\) from the Joint PDF. We already know the Joint PDF, and we know the bounds of \(y\) are from 0 to 1 (usually the bounds have to do with the outer variable, but here no matter where \(x\) is \(y\) will always run from 0 to 1), so we can just do the calculation:

\[f(x) = \int_{0}^{1} 1 \; dy = y \Big|_{0}^{1} = 1\]

The PDF of \(X\), then, is 1 when \(0 < X < 1\) and 0 otherwise. Since the PDF is constant, \(X\) is uniform on these bounds; specifically, it has a standard uniform distribution.

  1. We need to find \(f(x|y)\). We can rewrite this as:

\[f(x|y) = \frac{f(x,y)}{f(x)}\]

And we already know both \(f(x,y)\) and \(f(x)\); they are both 1. So, the conditional PDF of \(X\) given \(Y\) is 1, or again just a standard uniform distribution.

  1. The product of the marginal PDFs of \(X\) and \(Y\), both 1, equals the Joint PDF, which is also 1. This implies that the random variables are independent.


Empirical Solution:

#replicate
set.seed(110)

#increased number of sims so we can see
sims = 10000

#generate the r.v.'s
#we can imagine selecting the x-coordinate uniformly from 0 to 1
#   and the same for the y-coordinate
X = runif(sims)
Y = runif(sims)

#round X and Y so we can view them
X = round(X, 2)
Y = round(Y, 2)

#generate a heat map
#first, empirical
data <- data.frame(X, Y)

data = group_by(data, X, Y)
data = summarize(data, density = n())
data$density = data$density/sims

#change colors so we can see
ggplot(data = data, aes(X, Y)) + 
  geom_tile(aes(fill = density), color = "black") +
  scale_fill_gradient(low = "black", high= "red", name = "Density") + 
  ggtitle("Empirical Density of X and Y") + 
  theme(plot.margin = unit(c(1.8,.5,1.75,1.55), "cm")) +
  theme(plot.title = element_text(family = "Trebuchet MS", 
                                  color="#383838", face="bold", size=25, vjust = 2)) + 
  theme(axis.title.x = element_text(family = "Trebuchet MS", 
                                    color="#383838", face="bold", size=22, vjust = -1.5),
        axis.title.y = element_text(family = "Trebuchet MS", 
                                    color="#383838", face="bold", size=22, vjust = 2)) +
  theme(axis.text.x = element_text(face="bold", color="#383838", 
                                   size=14),
        axis.text.y = element_text(face="bold", color="#383838", 
                                   size=14))

#analytical Joint Distribution
#define the supports/all combinations
X.a = seq(from = 0, to = 1, by = .01)
Y.a = seq(from = 0, to = 1, by = .01)
data = expand.grid(X.a = X.a, Y.a = Y.a)

#calculate density
data$density = apply(data, 1, function(x) 1)

#remove points with 0 density
data = data[data$density != 0, ]

#generate a heatmap
ggplot(data = data, aes(X.a, Y.a)) + 
  geom_tile(aes(fill = density), color = "black") +
  scale_fill_gradient(low = "black", high= "red", name = "Density") + 
  ggtitle("Analytical Density of X and Y") + 
  theme(plot.margin = unit(c(1.8,.5,1.75,1.55), "cm")) +
  theme(plot.title = element_text(family = "Trebuchet MS", 
                                  color="#383838", face="bold", size=25, vjust = 2)) + 
  theme(axis.title.x = element_text(family = "Trebuchet MS", 
                                    color="#383838", face="bold", size=22, vjust = -1.5),
        axis.title.y = element_text(family = "Trebuchet MS", 
                                    color="#383838", face="bold", size=22, vjust = 2)) +
  theme(axis.text.x = element_text(face="bold", color="#383838", 
                                   size=14),
        axis.text.y = element_text(face="bold", color="#383838", 
                                   size=14))

#show that X is uniform
plot(density(X), col = "black", lwd = 3, 
     main = "PDF of X", xlab = "x", ylab = "f(x)")
abline(h = 1, col = "red")

legend("bottomright", legend = c("Empirical PDF", "Analytical PDF"),
       lty=c(1,20), lwd=c(2.5,2.5),
       col=c("black", "red"))

#condition on Y being in a specific interval
y = .9

#title for the graph
title = paste0("Conditional PMF of X|Y = ", y)
y.axis = paste0("P(X = x|Y = ", y, ")")

plot(density(X[Y > y - .05 & Y < y + .05]), col = "black", 
     main = title, type = "l",
     xlab = "x", ylab = y.axis, 
     lwd = 3)

#analytical
abline(h = 1, col = "red")

legend("bottomright", legend = c("Empirical PDF", "Analytical PDF"),
       lty=c(1,20), lwd=c(2.5,2.5),
       col=c("black", "red"))




6.2

Let \(X \sim N(0, 1)\) and \(Y|X \sim N(X, 1)\). Find the joint PDF of \(X\) and \(Y\).



Analytical Solution:

By definition, we know \(f(x, y) = f(x)f(y|x)\). Since we know the distribution of \(X\) and \(Y|X\), we write:

\[f(x,y) = \frac{1}{\sqrt{2 \pi}} e^{-\frac{x^2}{2}} \frac{1}{\sqrt{2 \pi}} e^{-\frac{(y - x)^2}{2}}\]

Note that we can’t factor the PDF into \(x\) terms and \(y\) terms because, of course, they are highly dependent: \(X\) is the mean of \(Y\).


Empirical Solution:

#replicate
set.seed(110)

#increased number of sims for graphics
sims = 10000

#generate the r.v.'s
X = rnorm(sims)
Y = sapply(X, function(x) rnorm(1, x, 1))

#round X and Y so we can see
X = round(X, 1)
Y = round(Y, 1)

#generate a heat map
#first, empirical
data <- data.frame(X, Y)

data = group_by(data, X, Y)
data = summarize(data, density = n())
data$density = data$density/sims

ggplot(data = data, aes(X, Y)) + 
  geom_tile(aes(fill = density), color = "white") +
  scale_fill_gradient(low = "white", high= "red", name = "Density") + 
  ggtitle("Empirical Density of X and Y") + 
  theme(plot.margin = unit(c(1.8,.5,1.75,1.55), "cm")) +
  theme(plot.title = element_text(family = "Trebuchet MS", 
                                  color="#383838", face="bold", size=25, vjust = 2)) + 
  theme(axis.title.x = element_text(family = "Trebuchet MS", 
                                    color="#383838", face="bold", size=22, vjust = -1.5),
        axis.title.y = element_text(family = "Trebuchet MS", 
                                    color="#383838", face="bold", size=22, vjust = 2)) +
  theme(axis.text.x = element_text(face="bold", color="#383838", 
                                   size=14),
        axis.text.y = element_text(face="bold", color="#383838", 
                                   size=14))

#analytical Joint Distribution

#define the supports/all combinations
X.a = seq(from = min(X), to = max(X), length.out = 100)
Y.a = seq(from = min(Y), to = max(Y), length.out = 100)
data = expand.grid(X.a = X.a, Y.a = Y.a)

#calculate density
data$density = apply(data, 1, function(x){
      return(dnorm(x[1])*dnorm(x[2], x[1], 1))})

#remove points with 0 density
data = data[data$density != 0, ]

#generate a heatmap
ggplot(data = data, aes(X.a, Y.a)) + 
  geom_tile(aes(fill = density), color = "white") +
  scale_fill_gradient(low = "white", high= "red", name = "Density") + 
  ggtitle("Analytical Density of X and Y") + 
  theme(plot.margin = unit(c(1.8,.5,1.75,1.55), "cm")) +
  theme(plot.title = element_text(family = "Trebuchet MS", 
                                  color="#383838", face="bold", size=25, vjust = 2)) + 
  theme(axis.title.x = element_text(family = "Trebuchet MS", 
                                    color="#383838", face="bold", size=22, vjust = -1.5),
        axis.title.y = element_text(family = "Trebuchet MS", 
                                    color="#383838", face="bold", size=22, vjust = 2)) +
  theme(axis.text.x = element_text(face="bold", color="#383838", 
                                   size=14),
        axis.text.y = element_text(face="bold", color="#383838", 
                                   size=14))




6.3

You roll a fair, six-sided die \(X\) times, where \(X \sim Pois(\lambda)\). Let \(Y\) be the number of 6’s that you roll. Find \(P(Y > 0\)); you may leave your answer in terms of \(\lambda\).



Analytical Solution:

This paradigm is identical to the Chicken-Egg problem, except instead of having two possible outcomes (hatch or no hatch) we have 6 (the die can be 1, 2, etc.). Therefore, by the story of the Chicken-Egg, we can write \(Y \sim Pois(\lambda/6)\), since \(Y|X \sim Bin(X, 1/6)\). From here, we know \(P(Y > 0) = 1 - P(Y = 0)\), which we can write with the PMF of a Poisson:

\[P(Y > 0) = 1 - e^{-\lambda/6}\]


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#define a simple parameter
lambda = 10

#generate N
N = rpois(sims, lambda)

#keep track of Y
Y = rep(NA, sims)

#run the loop
for(i in 1:sims){
  
  #roll the die N times
  rolls = sample(1:6, N[i], replace = TRUE)
  
  #mark how many 6's we get
  Y[i] = length(rolls[rolls == 6])
}

#should get 1 - exp(-lambda/6) = .811
length(Y[Y > 0])/sims
## [1] 0.807
#these should match
hist(Y, main = "Y", col = "gray", xlab = "y", breaks = 0:10)

hist(rpois(sims, lambda/6), main = "Pois(lambda/6)", 
     col = "gray", xlab = "", breaks = 0:10)




6.4

There are \(n\) quarters. You flip them all (they are fair) and place them on a table. You then select \(m \leq n\) quarters at random. Let \(X\) be the number of quarters that show heads on the table, and \(Y\) be the number of quarters that show heads that you select. Find the Joint PMF of \(X\) and \(Y\).



Analytical Solution:

By definition, \(X \sim Bin(n, 1/2)\). If we condition on the number of heads flipped (condition on \(X = x\)) then \(Y\) is Hypergeometric; we are sampling ‘favorable’ and ‘unfavorable’ objects without replacement. Specifically, \(Y|X = x \sim HGeom(x, n - x, m)\). Putting these together:

\[P(X = x, Y = y) = P(X = x)P(Y = y|X = x)\] \[={n \choose x}\frac{1}{2^n}\cdot{x \choose y} {n - x \choose m - y}\]


Empirical Solution:

#replicate
set.seed(110)

#increased number of sims for graphics
sims = 10000

#define simple parameters
n = 10
m = 5

#generate the r.v.'s
X = rbinom(sims, n, 1/2)
Y = sapply(X, function(x) sum(sample(c(rep(1, x), rep(0, n - x)), 5, replace = FALSE)))


#generate a heat map
#first, empirical
data <- data.frame(X, Y)

data = group_by(data, X, Y)
data = summarize(data, density = n())
data$density = data$density/sims

ggplot(data = data, aes(X, Y)) + 
  geom_tile(aes(fill = density), color = "white") +
  scale_fill_gradient(low = "white", high= "red", name = "Density") + 
  ggtitle("Empirical Density of X and Y") + 
  theme(plot.margin = unit(c(1.8,.5,1.75,1.55), "cm")) +
  theme(plot.title = element_text(family = "Trebuchet MS", 
                                  color="#383838", face="bold", size=25, vjust = 2)) + 
  theme(axis.title.x = element_text(family = "Trebuchet MS", 
                                    color="#383838", face="bold", size=22, vjust = -1.5),
        axis.title.y = element_text(family = "Trebuchet MS", 
                                    color="#383838", face="bold", size=22, vjust = 2)) +
  theme(axis.text.x = element_text(face="bold", color="#383838", 
                                   size=14),
        axis.text.y = element_text(face="bold", color="#383838", 
                                   size=14))

#analytical Joint Distribution

#define the supports/all combinations
X.a = 0:n
Y.a = 0:m
data = expand.grid(X.a = X.a, Y.a = Y.a)

#calculate density
data$density = apply(data, 1, function(x){
  
  #define the terms
  c1 = choose(n, x[1])/2^n
  c2 = choose(x[1], x[2])*choose(n - x[1], m - x[2])
      return(c1*c2)})

#remove points with 0 density
data = data[data$density != 0, ]

#remove points where Y > X
data = data[data$X >= data$Y, ]

#generate a heatmap
ggplot(data = data, aes(X.a, Y.a)) + 
  geom_tile(aes(fill = density), color = "white") +
  scale_fill_gradient(low = "white", high= "red", name = "Density") + 
  ggtitle("Analytical Density of X and Y") + 
  theme(plot.margin = unit(c(1.8,.5,1.75,1.55), "cm")) +
  theme(plot.title = element_text(family = "Trebuchet MS", 
                                  color="#383838", face="bold", size=25, vjust = 2)) + 
  theme(axis.title.x = element_text(family = "Trebuchet MS", 
                                    color="#383838", face="bold", size=22, vjust = -1.5),
        axis.title.y = element_text(family = "Trebuchet MS", 
                                    color="#383838", face="bold", size=22, vjust = 2)) +
  theme(axis.text.x = element_text(face="bold", color="#383838", 
                                   size=14),
        axis.text.y = element_text(face="bold", color="#383838", 
                                   size=14))




6.5

Imagine a standard clock with the hours 1 to 12. The small hand starts at a random hour and moves counterclockwise (i.e., from 4 to 3) or clockwise (i.e., from 10 to 11) with equal probabilities. Let \(X_t\) be the hour that the small hand is at in round \(t\) (i.e., after the hand has moved \(t\) times). Find the joint PMF of \(X_t\) and \(X_{t + 1}\) for \(t \geq 1\).



Analytical Solution:

We can start with the definition of joint probability:

\[P(X_t = x_t , X_{t + 1} = x_{t + 1}) = P(X_t = x_t) P(X_{t + 1} = x_{t + 1} | X_t = x_t)\]

Since we start in a random location and the movement on the clock is symmetric, \(X_t\) is uniform over the hours 1 to 12 (by symmetry); that is, \(P(X_t = x_t) = 1/12\) for \(x_t = 1,2,...,12\). Conditioned on \(X_t = x_t\), we know that \(X_{t+1}\) is either \(x_t + 1\) or \(x_t - 1\) with equal probabilities, except, of course, in the corner cases where \(x_t\) is 1 or 12, in which case \(x_t\) is 12 or 2 (when \(x_t = 1\)) and 11 or 1 (when \(x_t = 1\)). Therefore, we get that the joint PMF is given by \(\frac{1}{12} \cdot \frac{1}{2}\) when \(x_{t + 1} = x_t + 1\) or \(x_{t + 1} = x_t - 1\) (except for the corner cases defined in the last sentence), and is 0 elsewhere.

As a sanity check, this joint distribution will sum to 1 over the entire support. We have \(\frac{1}{12} \cdot \frac{1}{2} + \frac{1}{12} \cdot \frac{1}{2}\) if we add over \(x_{t + 1}\) for any one value of \(x_t\), and there are 12 potential values of \(x_t\), so we add \(1/12\) twelve times to get 1.


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#define a simple parameter
t = 10

#keep track of X_t and X_{t + 1}
Xt = rep(NA, sims)
Xt1 = rep(NA, sims)

#run the loop
for(j in 1:sims){
  
  #initialize the chain in a random spot
  X = sample(1:12, 1)
  
  #run the chain t + 1 times
  for(i in 2:(t + 2)){
    
    #flip to see if we move up or down
    flip = runif(1)
    
    #move up
    if(flip <= 1/2){
      
      #usual case
      if(X[i - 1] < 12){
        X[i] = X[i - 1] + 1
      }
      
      #corner case is 12
      if(X[i - 1] == 12){
        X[i] = 1
      }
    }
    
    #move down
    if(flip > 1/2){
      
      #usual case
      if(X[i - 1] > 1){
        X[i] = X[i - 1] - 1
      }
      
      #corner case is 1
      if(X[i - 1] == 1){
        X[i] = 12
      }
    }
  }
  
  #mark the values
  Xt[j] = X[t + 1]
  Xt1[j] = X[t + 2]
}



#generate a heat map
#first, empirical
data <- data.frame(Xt, Xt1)

data = group_by(data, Xt, Xt1)
data = summarize(data, density = n())
data$density = data$density/sims

#change colors so we can see
ggplot(data = data, aes(Xt, Xt1)) + 
  geom_tile(aes(fill = density), color = "black") +
  scale_fill_gradient(low = "black", high= "red", name = "Density") + 
  ggtitle("Empirical Joint Density") + 
  theme(plot.margin = unit(c(1.8,.5,1.75,1.55), "cm")) +
  theme(plot.title = element_text(family = "Trebuchet MS", 
                                  color="#383838", face="bold", size=25, vjust = 2)) + 
  theme(axis.title.x = element_text(family = "Trebuchet MS", 
                                    color="#383838", face="bold", size=22, vjust = -1.5),
        axis.title.y = element_text(family = "Trebuchet MS", 
                                    color="#383838", face="bold", size=22, vjust = 2)) +
  theme(axis.text.x = element_text(face="bold", color="#383838", 
                                   size=14),
        axis.text.y = element_text(face="bold", color="#383838", 
                                   size=14))

#analytical Joint Distribution
#define the supports/all combinations
Xt.a = 1:12
Xt1.a = 1:12
data = expand.grid(Xt.a = Xt.a, Xt1.a = Xt1.a)

#calculate density
data$density = apply(data, 1, function(x) 1/24)

#remove points with 0 density
data = data[data$density != 0, ]

#remove points where X_t is more than 1 away from X_{t + 1}
data = data[abs(data$Xt.a - data$Xt1.a) == 1 | abs(data$Xt.a - data$Xt1.a) == 11, ]

#generate a heatmap
ggplot(data = data, aes(Xt.a, Xt1.a)) + 
  geom_tile(aes(fill = density), color = "black") +
  scale_fill_gradient(low = "black", high= "red", name = "Density") + 
  ggtitle("Analytical  Joint Density") + 
  theme(plot.margin = unit(c(1.8,.5,1.75,1.55), "cm")) +
  theme(plot.title = element_text(family = "Trebuchet MS", 
                                  color="#383838", face="bold", size=25, vjust = 2)) + 
  theme(axis.title.x = element_text(family = "Trebuchet MS", 
                                    color="#383838", face="bold", size=22, vjust = -1.5),
        axis.title.y = element_text(family = "Trebuchet MS", 
                                    color="#383838", face="bold", size=22, vjust = 2)) +
  theme(axis.text.x = element_text(face="bold", color="#383838", 
                                   size=14),
        axis.text.y = element_text(face="bold", color="#383838", 
                                   size=14))




6.6

Imagine a standard Rubix cube: 6 faces, each with a separate color, each face colored with 9 stickers (for a total of 54 stickers on the cube). You peel off \(n \leq 54\) random stickers. Let \(R\) be the number of red stickers that you get, and \(Y\) be the number of yellow stickers. Find the Joint PMF of \(R\) and \(Y\).



Analytical Solution:

Writing the definition of the Joint PMF:

\[P(R = r, Y = y) = P(R = r | Y = y)P(Y = y)\]

Unconditionally, we know that \(Y \sim HGeom(9, 45, n)\), since we are sampling ‘favorable’ (yellow) and ‘unfavorable’ (not yellow) stickers without replacement. Conditioning on \(Y\), we know that \((R|Y = y) \sim HGeom(9, 45 - y, n - y)\), since the same story applies but there are only \(45 - n\) unfavorable stickers left and \(n - y\) stickers left to be drawn. Since we know the PMF of a Hypergeometric:

\[P(R = r, Y = y) = \frac{{9 \choose y} {45 \choose n - y}}{{54 \choose n}} \cdot \frac{{9 \choose r} {45 - y \choose n - y - r}}{{54 - y \choose n - y}}\]


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#define a simple parameter
n = 10

#keep track of R and Y
R = rep(NA, sims)
Y = rep(NA, sims)

#define the stickers; no need to define colors other than r and y
stickers = c(rep("r", 9), rep("y", 9), rep(0, 9*4))

#run the loop
for(i in 1:sims){
  
  #sample n stickers
  samp = sample(stickers, n, replace = FALSE)
  
  #see how many we got of each
  R[i] = length(samp[samp == "r"])
  Y[i] = length(samp[samp == "y"])
}


#generate a heat map
#first, empirical
data <- data.frame(R, Y)

data = group_by(data, R, Y)
data = summarize(data, density = n())
data$density = data$density/sims

#change colors so we can see
ggplot(data = data, aes(R, Y)) + 
  geom_tile(aes(fill = density), color = "black") +
  scale_fill_gradient(low = "black", high= "red", name = "Density") + 
  ggtitle("Empirical Density of R and Y") + 
  theme(plot.margin = unit(c(1.8,.5,1.75,1.55), "cm")) +
  theme(plot.title = element_text(family = "Trebuchet MS", 
                                  color="#383838", face="bold", size=25, vjust = 2)) + 
  theme(axis.title.x = element_text(family = "Trebuchet MS", 
                                    color="#383838", face="bold", size=22, vjust = -1.5),
        axis.title.y = element_text(family = "Trebuchet MS", 
                                    color="#383838", face="bold", size=22, vjust = 2)) +
  theme(axis.text.x = element_text(face="bold", color="#383838", 
                                   size=14),
        axis.text.y = element_text(face="bold", color="#383838", 
                                   size=14))

#analytical Joint Distribution
#define the supports/all combinations
R.a = seq(from = min(R), to = max(R), by = 1)
Y.a = seq(from = min(Y), to = max(Y), by = 1)
data = expand.grid(R.a = R.a, Y.a = Y.a)

#calculate density
data$density = apply(data, 1, function(x) 
  
  choose(9, x[2])*choose(45, n - x[2])*choose(9, x[1])*choose(45 - x[2], n - x[2] - x[1])/(choose(54, n)*choose(54 - x[2], n - x[2])))

#remove points with 0 density
data = data[data$density != 0, ]

#generate a heatmap
ggplot(data = data, aes(R.a, Y.a)) + 
  geom_tile(aes(fill = density), color = "black") +
  scale_fill_gradient(low = "black", high= "red", name = "Density") + 
  ggtitle("Analytical Density of R and Y") + 
  theme(plot.margin = unit(c(1.8,.5,1.75,1.55), "cm")) +
  theme(plot.title = element_text(family = "Trebuchet MS", 
                                  color="#383838", face="bold", size=25, vjust = 2)) + 
  theme(axis.title.x = element_text(family = "Trebuchet MS", 
                                    color="#383838", face="bold", size=22, vjust = -1.5),
        axis.title.y = element_text(family = "Trebuchet MS", 
                                    color="#383838", face="bold", size=22, vjust = 2)) +
  theme(axis.text.x = element_text(face="bold", color="#383838", 
                                   size=14),
        axis.text.y = element_text(face="bold", color="#383838", 
                                   size=14))




6.7

Let \(X\), \(Y\) and \(Z\) be continuous random variables with the joint PDF given by \(f(x, y, z) = x^2 + y^2 + z^2\), where the support of each random variable is the interval from 0 to 1. Find \(f(x)\), \(f(y)\) and \(f(z)\), and check that they are all valid PDFs.



Analytical Solution:

We can simply marginalize out the two unwanted variables. We will first try to uncover the PDF of \(X\) by integrating out \(y\) and \(z\).

\[f(x) = \int_{0}^1 \int_{0}^1 f(x, y, z) dz dy\] \[= \int_{0}^1 \int_{0}^1 x^2 + y^2 + z^2 dz dy\] \[= \int_{0}^1 x^2 + y^2 + 1/3 \; dy\] \[= x^2 + 1/3 + 1/3\] \[f(x) = x^2 + 2/3\]

We now check that this is valid. We know the support of \(x\) is the interval 0 to 1, so we integrate over this interval.

\[\int_0^1 x^2 + 2/3 \; dx\] \[= 1/3 + 1/3 + 1/3 = 1\]

This integrates to 1, so it is a valid PDF. We don’t actually have to do the calculations for \(Y\) and \(Z\); by symmetry, their PDFs should have the same form as \(f(x)\) and they should be valid. That is, \(f(y) = y^2 + 2/3\) and \(f(z) = z^2 + 2/3\).


Empirical Solution:

#we will check that the PDF integrates to 1
pdf <- function(x){
  return(x^2 + 2/3)
}

#should get 1
integrate(pdf, 0, 1)
## 1 with absolute error < 1.1e-14




6.8

Let \(X,Y\) be i.i.d. \(N(0, 1)\). Find \(E\big((X + Y)^2\big)\) using 2-D LoTUS. You can leave your answer as an unsimplified integral.



Analytical Solution:

By 2-D LoTUS:

\[E\big((X + Y)^2\big) = \frac{1}{2\pi} \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} (x + y)^2 e^{-x^2/2}e^{-y^2/2}dx dy\]

That is, we multiply the function of \(x\) and \(y\), which is \((x + y)^2\), by the joint PDF of \(X\) and \(Y\) (which, since \(X\) and \(Y\) are independent, is the product of the marginal PDFs of \(X\) and \(Y\)) and integrate over the bounds of \(X\) and \(Y\). Solving this in a different way (i.e., using a linear combination of Normal random variables) shows that this comes out to 2.




6.9

CJ is waiting for a bus that has wait time \(X \sim Pois(\lambda_1)\) and a car that has wait time \(Y \sim Expo(\lambda_2)\), where \(X\) and \(Y\) are independent.

  1. In the case where \(\lambda_1 = \lambda_2\), what is the probability that the bus arrives first?

  2. In the case where \(\lambda_1\) and \(\lambda_2\) are not necessarily equal, what is the probability that the bus arrives first? Provide some intuition (i.e., discuss how this answer changes with \(\lambda_1\) and \(\lambda_2\) and say why this is the case).

  3. Now imagine that CJ is also waiting for a plane that has wait time \(Z \sim Pois(\lambda_3)\), independently of \(X\) and \(Y\). What is the probability that this plane arrives first (i.e., before the bus and car), in the general case where the parameters of \(X\), \(Y\) and \(Z\) are not necessarily equal?



Analytical Solution:

  1. If \(\lambda_1 = \lambda_2\), then \(X\) and \(Y\) are i.i.d., meaning that, by symmetry, the probability that the bus arrives first is \(1/2\).

  2. We are interested in \(P(X < Y)\). We can find this density by integrating the joint PDF of \(X\) and \(Y\) in the correct places. Since \(X\) and \(Y\) are independent, we can multiply their marginal PDFs to find the joint PDF. We then integrate \(x\) from \(0\) to \(y\) (since we want the probability that \(X\) is less than \(Y\)) and we integrate \(y\) over the full support of \(Y\).

\[P(X < Y) = \int_0^{\infty} \int_0^{y}f(x, y) dxdy = \int_0^{\infty} \int_0^{y}f(x)f(y) dxdy\] \[= \int_0^{\infty} \int_0^{y}\lambda_1 \lambda_2 e^{-\lambda_1 x} e^{-\lambda_2 y} dxdy\] \[= \lambda_1 \lambda_2 \int_0^{\infty} e^{-\lambda_2 y} \int_0^{y} e^{-\lambda_1 x} dxdy\]

\[= \lambda_2 \int_0^{\infty} e^{-\lambda_2 y} \big(1 - e^{-\lambda_1 y}\big) dy\] \[= \lambda_2 \int_0^{\infty} e^{-\lambda_2 y} - \int_0^{\infty} e^{-(\lambda_1 + \lambda_2) y} dy\]

\[= \lambda_2\Big(\frac{1}{\lambda_2} - \frac{1}{\lambda_1 + \lambda_2})\]

\[= 1 - \frac{\lambda_2}{\lambda_1 + \lambda_2}\]

\[P(X < Y) = \frac{\lambda_1}{\lambda_1 + \lambda_2}\]

We skipped over some of the finer points of the integration (since this is a probability book and not a calculus book) but you should be comfortable with the integration here. Anyways, consider this result. It is bounded between 0 and 1, which is a good sanity check (because it is a probability). As \(\lambda_1\) grows, this probability approaches 1. Does this make sense? Recall that \(E(X) = \frac{1}{\lambda_1}\), so a large value of \(\lambda_1\) means that \(X\) is small, on average, which means the probability that \(X\) is smaller than \(Y\) gets larger (and vice versa for when \(\lambda_1\) drops and lowers the probability). We can also do a quick sanity check: when \(\lambda_1 = \lambda_2\), we get a probability of \(1/2\), as we said in part a.

  1. Let \(M = min(X, Y)\). The probability that the plane arrives before the bus and car is \(P(Z < M)\). Thankfully, by the ‘Minimum of Exponentials’ property, we know that \(M \sim Expo(\lambda_1 + \lambda_2)\), and thus we have the same problem that we already solved in part b. Plugging in \(\lambda_3\), the parameter of \(Z\), and \(\lambda_1 + \lambda_2\), the parameter of \(M\), gives the result:

\[P(Z < M) = \frac{\lambda_3}{\lambda_1 + \lambda_2 + \lambda_3}\]


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#define simple parameters
lambda1 = 3
lambda2 = 5
lambda3 = 4

#generate the r.v.'s
X = rexp(sims, lambda1)
Y = rexp(sims, lambda2)
Z = rexp(sims, lambda3)


#part a. and b. Should get lambda1/(lambda1 + lambda2) = .375
length(X[X < Y])/sims
## [1] 0.371
#part c. Should get lambda3/(lambda1 + lambda2 + lambda3) = 1/3
length(Z[Z < X & Z < Y])/sims
## [1] 0.331




6.10

Let \(X \sim N(0, 1)\) and \(Y|X \sim N(x, 1)\). Find \(f(x,y)\).



Analytical Solution:

We know that \(f(x,y) = f(x)f(y|x)\). We are given the distribution of \(X\) and \(Y|X\), so we write:

\[f(x,y) = f(x)f(y|x) = \frac{1}{2 \pi} e^{-\frac{x^2}{2}} e^{\frac{-(y - x)^2}{2}}\]


Empirical Solution:

#replicate
set.seed(110)

#increased number of sims so we can see
sims = 10000

#generate the r.v.'s
X = rnorm(sims)
Y = sapply(X, function(x) rnorm(1, x, 1))

#round X and Y so we can view them
X = round(X, 1)
Y = round(Y, 1)

#generate a heat map
#first, empirical
data <- data.frame(X, Y)

data = group_by(data, X, Y)
data = summarize(data, density = n())
data$density = data$density/sims

#change colors so we can see
ggplot(data = data, aes(X, Y)) + 
  geom_tile(aes(fill = density), color = "black") +
  scale_fill_gradient(low = "black", high= "red", name = "Density") + 
  ggtitle("Empirical Density of X and Y") + 
  theme(plot.margin = unit(c(1.8,.5,1.75,1.55), "cm")) +
  theme(plot.title = element_text(family = "Trebuchet MS", 
                                  color="#383838", face="bold", size=25, vjust = 2)) + 
  theme(axis.title.x = element_text(family = "Trebuchet MS", 
                                    color="#383838", face="bold", size=22, vjust = -1.5),
        axis.title.y = element_text(family = "Trebuchet MS", 
                                    color="#383838", face="bold", size=22, vjust = 2)) +
  theme(axis.text.x = element_text(face="bold", color="#383838", 
                                   size=14),
        axis.text.y = element_text(face="bold", color="#383838", 
                                   size=14))

#analytical Joint Distribution
#define the supports/all combinations
X.a = seq(from = min(X), to = max(X), length.out = 100)
Y.a = seq(from = min(Y), to = max(Y), length.out = 100)
data = expand.grid(X.a = X.a, Y.a = Y.a)

#calculate density
data$density = apply(data, 1, function(x) 
  (1/(2*pi))*exp(-x[1]^2/2)*exp(-(x[2] - x[1])^2/2))

#remove points with 0 density
data = data[data$density != 0, ]

#generate a heatmap
ggplot(data = data, aes(X.a, Y.a)) + 
  geom_tile(aes(fill = density), color = "black") +
  scale_fill_gradient(low = "black", high= "red", name = "Density") + 
  ggtitle("Analytical Density of X and Y") + 
  theme(plot.margin = unit(c(1.8,.5,1.75,1.55), "cm")) +
  theme(plot.title = element_text(family = "Trebuchet MS", 
                                  color="#383838", face="bold", size=25, vjust = 2)) + 
  theme(axis.title.x = element_text(family = "Trebuchet MS", 
                                    color="#383838", face="bold", size=22, vjust = -1.5),
        axis.title.y = element_text(family = "Trebuchet MS", 
                                    color="#383838", face="bold", size=22, vjust = 2)) +
  theme(axis.text.x = element_text(face="bold", color="#383838", 
                                   size=14),
        axis.text.y = element_text(face="bold", color="#383838", 
                                   size=14))




BH Problems



The problems in this section are taken from @BH. The questions are reproduced here, and the analytical solutions are freely available online. Here, we will only consider empirical solutions: answers/approximations to these problems using simulations in R.




BH 7.14
  1. A stick is broken into three pieces by picking two points independently and uniformly along the stick, and breaking the stick at those two points. What is the probability that the three pieces can be assembled into a triangle?

Hint: A triangle can be formed from 3 line segments of lengths \(a,b,c\) if and only if \(a,b,c \in (0,1/2)\). The probability can be interpreted geometrically as proportional to an area in the plane, avoiding all calculus, but make sure for that approach that the distribution of the random point in the plane is Uniform over some region.

#replicate
set.seed(110)
sims = 1000

#employ the hint!
#set paths for a, b and c
a = rep(NA, sims)
b = rep(NA, sims)
c = rep(NA, sims)

#indicator if we get a triangle or not
triangle = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #break the stick at two points
  first.break = runif(1)
  second.break = runif(1)
  
  #define a as the lowest segment
  a[i] = min(first.break, second.break)
  
  #define b as between the breaks
  b[i] = abs(first.break - second.break)
  
  #define c as the highest segment
  c[i] = 1 - max(first.break, second.break)
  
  #see if we got the triangle
  if(a[i] < 1/2 && b[i] < 1/2 && c[i] < 1/2){
    triangle[i] = 1
  }
}

#should get 1/4
mean(triangle)
## [1] 0.255


  1. Three legs are positioned uniformly and independently on the perimeter of a round table. What is the probability that the table will stand?
#this is the same as 3 legs being in one semi-circle,
#   which is the same as P(a or b or c) > 1/2,
#   where a,b,c are arc lengths of random points on a circle with circumference 1

#so, the stand calculation is the same as part a.




BH 7.18

Let \((X,Y)\) be a uniformly random point in the triangle in the plane with vertices \((0,0), (0,1), (1,0)\). Find the joint PDF of \(X\) and \(Y\), the marginal PDF of \(X\), and the conditional PDF of \(X\) given \(Y\).

#replicate
set.seed(110)

#increased simes for visual
sims = 1000*10

#set paths for X and Y
X = rep(NA, sims)
Y = rep(NA, sims)

#run the loop
for(i in 1:sims){
  
  #draw until we get somthing that's in the triangle
  while(TRUE){
    
    #draw X and Y
    X[i] = runif(1)
    Y[i] = runif(1)
    
    #see if it's in the triangle
    if(X[i] + Y[i] < 1){
      break
    }
  }
}

#round X and Y so we can see
X = round(X, 2)
Y = round(Y, 2)

#generate a heat map
#first, empirical
data <- data.frame(X, Y)

data = group_by(data, X, Y)
data = summarize(data, density = n())
data$density = data$density/sims

#change the color to black so we can see!
ggplot(data = data, aes(X, Y)) + 
  geom_tile(aes(fill = density), color = "black") +
  scale_fill_gradient(low = "black", high= "red", name = "Density") + 
  ggtitle("Empirical Density of X and Y") + 
  theme(plot.margin = unit(c(1.8,.5,1.75,1.55), "cm")) +
  theme(plot.title = element_text(family = "Trebuchet MS", 
            color="#383838", face="bold", size=25, vjust = 2)) + 
  theme(axis.title.x = element_text(family = "Trebuchet MS", 
            color="#383838", face="bold", size=22, vjust = -1.5),
        axis.title.y = element_text(family = "Trebuchet MS", 
            color="#383838", face="bold", size=22, vjust = 2)) +
  theme(axis.text.x = element_text(face="bold", color="#383838", 
                           size=14),
          axis.text.y = element_text(face="bold", color="#383838", 
                           size=14))

#analytical Joint Distribution

#define the supports/all combinations
X.a = seq(from = 0, to = 1, length.out = 100)
Y.a = seq(from = 0, to = 1, length.out = 100)
data = expand.grid(X.a = X.a, Y.a = Y.a)

#calculate density
data$density = apply(data, 1, function(x) 2)

#remove points with 0 density
data = data[data$density != 0, ]

#remove points outside of the triangle
data = data[data$X + data$Y < 1, ]

#generate a heatmap
ggplot(data = data, aes(X.a, Y.a)) + 
  geom_tile(aes(fill = density), color = "black") +
  scale_fill_gradient(low = "black", high= "red", name = "Density") + 
  ggtitle("Analytical Density of X and Y") + 
  theme(plot.margin = unit(c(1.8,.5,1.75,1.55), "cm")) +
  theme(plot.title = element_text(family = "Trebuchet MS", 
            color="#383838", face="bold", size=25, vjust = 2)) + 
  theme(axis.title.x = element_text(family = "Trebuchet MS", 
            color="#383838", face="bold", size=22, vjust = -1.5),
        axis.title.y = element_text(family = "Trebuchet MS", 
            color="#383838", face="bold", size=22, vjust = 2)) +
  theme(axis.text.x = element_text(face="bold", color="#383838", 
                           size=14),
          axis.text.y = element_text(face="bold", color="#383838", 
                           size=14)) +
  scale_x_continuous(limits = c(0, 1)) +
  scale_y_continuous(limits = c(0, 1))

#calculate the analytical PDF
k = seq(from = 0, to = 1, length.out = 100)
PDF = 2*(1 - k)

#plots should line up
#empirical
plot(density(X), col = "black", 
     main = "PDF of X", type = "l",
     xlab = "x", ylab = "f(y)",lwd = 3)

lines(k, PDF, col = "red", lwd = 3)

legend("topright", legend = c("Empirical PDF", "Analytical PDF"),
       lty=c(1,1), lwd=c(2.5,2.5),
       col=c("black", "red"))

#condition on x being around a specific value
y = 1/2

#find the analytical PDF
k = seq(from = 0, to = 1 - y, length.out = 100)
PDF = rep(1/(1 - y), length(k))


#title for the graph
title = paste0("Conditional PDF of X|Y = ", y)
y.axis = paste0("f(x|y = ", y, ")")


#plots should line up
#empirical
plot(density(X[Y < y + .05 & Y > y - .05]), col = "black", 
     main = title, xlim = c(min(k), max(k)),
     xlab = "x", ylab = y.axis, 
     lwd = 3)

lines(k, PDF, col = "red", lwd = 3)

legend("bottomright", legend = c("Empirical PDF", "Analytical PDF"),
       lty=c(1,1), lwd=c(2.5,2.5),
       col=c("black", "red"))




BH 7.19

A random point \((X,Y,Z)\) is chosen uniformly in the ball \(B = \{(x,y,z): x^2 + y^2 + z^2 \leq 1 \}\).

  1. Find the joint PDF of \(X,Y,Z\).

  2. Find the joint PDF of \(X,Y\).

#replicate
set.seed(110)

#takes a lot of sims to get quality visuals
sims = 10000

#set paths for the r.v.'s
X = rep(NA, sims)
Y = rep(NA, sims)
Z = rep(NA, sims)

#run the loop
for(i in 1:sims){
  
  #draw until we are in the circle
  while(TRUE){
    
    #draw the r.v.'s
    X[i] = runif(1, -1, 1)
    Y[i] = runif(1, -1, 1)
    Z[i] = runif(1, -1, 1)
    
    #see if we are in the circle, and break the loop if we are
    if(X[i]^2 + Y[i]^2 + Z[i]^2 <= 1){
      break
    }
  }
}

#round so we can see the graphic
X = round(X, 1)
Y = round(Y, 1)

#joint PDF of X and Y is symmetric to the joint PDF of X and Z
data <- data.frame(X, Y)

data = group_by(data, X, Y)
data = summarize(data, density = n())
data$density = data$density/sims

#change the color sow e can see
ggplot(data = data, aes(X, Y)) + 
  geom_tile(aes(fill = density), color = "black") +
  scale_fill_gradient(low = "black", high= "red", name = "Density") + 
  ggtitle("Empirical Density of X and Y") + 
  theme(plot.margin = unit(c(1.8,.5,1.75,1.55), "cm")) +
  theme(plot.title = element_text(family = "Trebuchet MS", 
            color="#383838", face="bold", size=25, vjust = 2)) + 
  theme(axis.title.x = element_text(family = "Trebuchet MS", 
            color="#383838", face="bold", size=22, vjust = -1.5),
        axis.title.y = element_text(family = "Trebuchet MS", 
            color="#383838", face="bold", size=22, vjust = 2)) +
  theme(axis.text.x = element_text(face="bold", color="#383838", 
                           size=14),
          axis.text.y = element_text(face="bold", color="#383838", 
                           size=14))

#analytical Joint Distribution

#define the supports/all combinations
X.a = seq(from = -1, to = 1, length.out = 100)
Y.a = seq(from = -1, to = 1, length.out = 100)
data = expand.grid(X.a = X.a, Y.a = Y.a)

#only keep when we satisfy this condition
data = data[data$X.a^2 + data$Y.a^2 <= 1, ]

#calculate density
data$density = apply(data, 1, function(x) 
    (3/(2*pi))*sqrt(1 - x[1]^2 - x[2]^2))

#remove points with 0 density
data = data[data$density != 0, ]

#generate a heatmap
ggplot(data = data, aes(X.a, Y.a)) + 
  geom_tile(aes(fill = density), color = "white") +
  scale_fill_gradient(low = "white", high= "red", name = "Density") + 
  ggtitle("Analytical Density of X and Y") + 
  theme(plot.margin = unit(c(1.8,.5,1.75,1.55), "cm")) +
  theme(plot.title = element_text(family = "Trebuchet MS", 
            color="#383838", face="bold", size=25, vjust = 2)) + 
  theme(axis.title.x = element_text(family = "Trebuchet MS", 
            color="#383838", face="bold", size=22, vjust = -1.5),
        axis.title.y = element_text(family = "Trebuchet MS", 
            color="#383838", face="bold", size=22, vjust = 2)) +
  theme(axis.text.x = element_text(face="bold", color="#383838", 
                           size=14),
          axis.text.y = element_text(face="bold", color="#383838", 
                           size=14))

  1. Find an expression for the marginal PDF of \(X\), as an integral.




BH 7.20

Let \(U_1,U_2,U_3\) be i.i.d. Unif(\(0,1\)), and let \(L = \min(U_1,U_2,U_3), M = \max(U_1,U_2,U_3)\).

  1. Find the marginal CDF and marginal PDF of \(M,\) and the joint CDF and joint PDF of \(L,M\).
#replicate
set.seed(110)

#increased sims for graphics
sims = 1000*10

#generate the r.v.'s
U1 = runif(sims)
U2 = runif(sims)
U3 = runif(sims)

#set paths for M and L
M = rep(NA, sims)
L = rep(NA, sims)

#run the loop
for(i in 1:sims){
  
  #find the max and min
  M[i] = max(U1[i], U2[i], U3[i])
  L[i] = min(U1[i], U2[i], U3[i])
}

#calculate the analytical PDF
k = seq(from = 0, to = 1, length.out = 100)
PDF = 3*k^2

#plots should line up
#empirical
plot(density(M), col = "black", 
     main = "PDF of M", type = "l",
     xlab = "m", ylab = "f(m)",lwd = 3)

lines(k, PDF, col = "red", lwd = 3)

legend("topleft", legend = c("Empirical PDF", "Analytical PDF"),
       lty=c(1,1), lwd=c(2.5,2.5),
       col=c("black", "red"))

#show that the analytical and conditional CDFs match
k = seq(from = 0, to = 1, length.out = 100)
CDF = k^3

#plots should match
plot(ecdf(M), col = "black", 
     main = "CDF of M", xlim = c(min(k), max(k)),
     xlab = "m", ylab = "P(M < m)", 
     ylim = c(0, 1), lwd = 3)
lines(k, CDF, col = "red", lwd = 1, type = "p", pch = 20)

legend("topleft", legend = c("Empirical CDF", "Analytical CDF"),
       lty=c(1,1), lwd=c(2.5,2.5),
       col=c("black", "red"))

#generate a heat map
#first, empirical

#round so we can see
L = round(L, 2)
M = round(M, 2)

data <- data.frame(L, M)

data = group_by(data, L, M)
data = summarize(data, density = n())
data$density = data$density/sims

#make the color blue so we can see it
ggplot(data = data, aes(L, M)) + 
  geom_tile(aes(fill = density), color = "black") +
  scale_fill_gradient(low = "black", high= "red", name = "Density") + 
  ggtitle("Empirical Density of M and L") + 
  theme(plot.margin = unit(c(1.8,.5,1.75,1.55), "cm")) +
  theme(plot.title = element_text(family = "Trebuchet MS", 
            color="#383838", face="bold", size=25, vjust = 2)) + 
  theme(axis.title.x = element_text(family = "Trebuchet MS", 
            color="#383838", face="bold", size=22, vjust = -1.5),
        axis.title.y = element_text(family = "Trebuchet MS", 
            color="#383838", face="bold", size=22, vjust = 2)) +
  theme(axis.text.x = element_text(face="bold", color="#383838", 
                           size=14),
          axis.text.y = element_text(face="bold", color="#383838", 
                           size=14))

#analytical Joint Distribution

#define the supports/all combinations
M.a = seq(from = 0, to = 1, length.out = 100)
L.a = seq(from = 0, to = 1, length.out = 100)
data = expand.grid(M.a = M.a, L.a = L.a)

#calculate density
data$density = apply(data, 1, 
              function(x) 6*(x[1] - x[2]))

#remove points with 0 density
data = data[data$density != 0, ]

#take out when L > M
data = data[data$M.a > data$L.a, ]

#generate a heatmap
ggplot(data = data, aes(L.a, M.a)) + 
  geom_tile(aes(fill = density), color = "black") +
  scale_fill_gradient(low = "black", high= "red", name = "Density") + 
  ggtitle("Analytical Density of M and L") + 
  theme(plot.margin = unit(c(1.8,.5,1.75,1.55), "cm")) +
  theme(plot.title = element_text(family = "Trebuchet MS", 
            color="#383838", face="bold", size=25, vjust = 2)) + 
  theme(axis.title.x = element_text(family = "Trebuchet MS", 
            color="#383838", face="bold", size=22, vjust = -1.5),
        axis.title.y = element_text(family = "Trebuchet MS", 
            color="#383838", face="bold", size=22, vjust = 2)) +
  theme(axis.text.x = element_text(face="bold", color="#383838", 
                           size=14),
          axis.text.y = element_text(face="bold", color="#383838", 
                           size=14))

  1. Find the conditional PDF of \(M\) given \(L\).
#condition on L being in a small range
l = .25

#calculate the analytical PDF
k = seq(from = l, to = 1, length.out = 100)
PDF = 2*(k - l)/((1 - l)^2)

#title for the graph
title = paste0("Conditional PDF of M|L = ", l)
y.axis = paste0("f(m|L = ", l,")")


#plots should line up
#empirical
plot(density(M[L == l]), col = "black", 
     main = title, xlim = c(min(k), max(k)),
     xlab = "m", ylab = y.axis, 
     lwd = 3)

lines(k, PDF, col = "red", lwd = 2)

legend("topleft", legend = c("Empirical PDF", "Analytical PDF"),
       lty=c(1,1), lwd=c(2.5,2.5),
       col=c("black", "red"))




BH 7.24

Two students, \(A\) and \(B\), are working independently on homework (not necessarily for the same class). Student \(A\) takes \(Y_1 \sim Expo(\lambda_1)\) hours to finish his or her homework, while \(B\) takes \(Y_2 \sim Expo(\lambda_2)\) hours.

  1. Find the CDF and PDF of \(Y_1/Y_2\), the ratio of their problem-solving times.
#replicate
set.seed(110)
sims = 1000

#define simple parameters
lambda1 = 1/10
lambda2 = 1/20

#genrate the r.v.'s
Y1 = rexp(sims, lambda1)
Y2 = rexp(sims, lambda2)

#define R as the ratio
R = Y1/Y2


#show that the analytical and conditional CDFs match
#calculate the analytical CDF
k = seq(from = 0, to = 1000, length.out = 100)
CDF = k*lambda1/(k*lambda1 + lambda2)

#plots should match
#analytical
plot(ecdf(R), col = "black", 
     main = "CDF of R", xlim = c(min(k), max(k)),
     xlab = "r", ylab = "P(R < r)", 
     ylim = c(0, 1), lwd = 3)

#empirical
lines(k, CDF, col = "red", lwd = 3)


legend("bottomright", legend = c("Empirical CDF", "Analytical CDF"),
       lty=c(1,1), lwd=c(2.5,2.5),
       col=c("black", "red"))

#show that the analytical and conditional PDFs match
#calculate the analytical PDF
k = seq(from = 0, to = 1000, length.out = 100)
PDF = lambda1*lambda2/(k*lambda1 + lambda2)^2

plot(density(R), col = "black", 
     main = "PDF of R", xlim = c(min(k), max(k)),
     xlab = "r", ylab = "f(r)", 
     ylim = c(0, 1), lwd = 3)

lines(k, PDF, col = "red", lwd = 3)


legend("topright", legend = c("Empirical PDF", "Analytical PDF"),
       lty=c(1,1), lwd=c(2.5,2.5),
       col=c("black", "red"))

  1. Find the probability that \(A\) finishes his or her homework before \(B\) does.
#should match lambda1/(lambda1 + lambda2) = 2/3
length(Y1[Y1 < Y2])/sims
## [1] 0.645




BH 7.26

The bus company from Blissville decides to start service in Blotchville, sensing a promising business opportunity. Meanwhile, Fred has moved back to Blotchville. Now when Fred arrives at the bus stop, either of two independent bus lines may come by (both of which take him home). The Blissville company’s bus arrival times are exactly \(10\) minutes apart, whereas the time from one Blotchville company bus to the next is Expo(\(\frac{1}{10}\)). Fred arrives at a uniformly random time on a certain day.

  1. What is the probability that the Blotchville company bus arrives first? Hint: One good way is to use the continuous law of total probability.
#replicate
set.seed(110)
sims = 1000

#generate the r.v.'s
#from the moment Fred arrives, the wait time until the Bliss bus is Unif(0, 10)
#   The wait time until the Blotch bus is Expo(1/10)
Bliss = runif(sims, 0, 10)
Blotch = rexp(sims, 1/10)

#count how many times Blissville was first
#should get 1/exp(1) = .367
length(Bliss[Bliss > Blotch])/sims
## [1] 0.374
  1. What is the CDF of Fred’s waiting time for a bus?
#calculate the wait times
wait = rep(NA, sims)
for(i in 1:sims){
  wait[i] = min(Bliss[i], Blotch[i])
}


#show that the analytical and conditional CDFs match
#calculate the analytical CDF
k = seq(from = 0, to = 10, length.out = 100)
CDF = 1 - exp(-k/10)*(1 - k/10)

#plots should match
plot(ecdf(wait), col = "black", 
     main = "CDF", xlim = c(min(k), max(k)),
     xlab = "m", ylab = "P(M < m)", 
     ylim = c(0, 1), lwd = 3)
lines(k, CDF, col = "red", lwd = 2)

legend("topleft", legend = c("Empirical CDF", "Analytical CDF"),
       lty=c(1,1), lwd=c(2.5,2.5),
       col=c("black", "red"))




BH 7.31

Let \(X\) and \(Y\) be i.i.d. \(Unif(0,1)\). Find the standard deviation of the distance between \(X\) and \(Y\).

#replicate
set.seed(110)
sims = 1000

#generate the r.v.'s
X = runif(sims)
Y = runif(sims)
Z = abs(X - Y)

#should get 1/sqrt(18) = .236
sd(Z)
## [1] 0.2345789




BH 7.32

Let \(X,Y\) be i.i.d. Expo(\(\lambda\)). Find \(E|X-Y|\) in two different ways: (a) using 2D LOTUS and (b) using the memoryless property without any calculus.

#replicate
set.seed(110)
sims = 1000

#define a simple parameter
lambda = 1

#generate the r.v.'s
X = rexp(sims, lambda)
Y = rexp(sims, lambda)
Z = abs(X - Y)

#should get 1/lambda = 1
mean(Z)
## [1] 1.042689




BH 7.59

A \(Pois(\lambda)\) number of people vote in a certain election. Each voter votes for candidate \(A\) with probability \(p\) and for candidate \(B\) with probability \(q=1-p\), independently of all the other voters. Let \(V\) be the difference in votes, defined as the number of votes for \(A\) minus the number for \(B\).

  1. Find \(E(V)\).
#replicate
set.seed(110)
sims = 1000

#define simple parameters
lambda = 100
p = 2/3

#keep track of votes for A and B
A = rep(NA, sims)
B = rep(NA, sims)

#run the loop
for(i in 1:sims){
  
  #generate the number of voters
  voters = rpois(1, lambda)
  
  #generate votes for A
  A[i] = sum(rbinom(1, voters, p))
  
  #everyone else votes for B
  B[i] = voters - A[i]
}

#define V
V = A - B

#should get lambda*(p - 1 + p) = 33.3
mean(V)
## [1] 33.563
  1. Find \(Var(V)\).
#recycle vectors
#should get lambda
var(V)
## [1] 97.93997




BH 7.64

Let \((X_1,\dots,X_k)\) be Multinomial with parameters \(n\) and \((p_1,\dots,p_k)\). Use indicator r.v.s to show that \(Cov(X_i,X_j) = -np_i p_j\) for \(i \neq j\).

#replicate
set.seed(110)
sims = 1000

#define simple parameters
n = 3
p = c(1/2, 1/3, 1/6)

#generate the r.v.
X = rmultinom(sims, n, p)

#should be -n*p[1]*p[2] = -1/2
cov(X[1, ], X[2, ])
## [1] -0.503004




BH 7.65

Consider the birthdays of 100 people. Assume people’s birthdays are independent, and the 365 days of the year (exclude the possibility of February 29) are equally likely. Find the covariance and correlation between how many of the people were born on January 1 and how many were born on January 2.

#replicate
set.seed(110)
sims = 1000

#count number of births on Jan. 1 and Jan. 2
jan1 = rep(NA, sims)
jan2 = rep(NA, sims)

#run the loop
for(i in 1:sims){
  
  #sample the birthdays
  bdays = sample(1:365, 100, replace = TRUE)
  
  #count how many we have on each day
  jan1[i] = length(bdays[bdays == 1])
  jan2[i] = length(bdays[bdays == 2])
}


#should get -100/365^2, which is near 0
cov(jan1, jan2)
## [1] 0.006886887
#should get -1/364 = -.003
cor(jan1, jan2)
## [1] 0.02428755




BH 7.67

A group of \(n \geq 2\) people decide to play an exciting game of Rock-Paper-Scissors. As you may recall, Rock smashes Scissors, Scissors cuts Paper, and Paper covers Rock (despite Bart Simpson saying “Good old rock, nothing beats that!”).

Usually this game is played with 2 players, but it can be extended to more players as follows. If exactly 2 of the 3 choices appear when everyone reveals their choice, say \(a,b \in \{\textrm{Rock}, \textrm{Paper}, \textrm{Scissors}\}\) where \(a\) beats \(b\), the game is decisive: the players who chose \(a\) win, and the players who chose \(b\) lose. Otherwise, the game is indecisive and the players play again.

For example, with 5 players, if one player picks Rock, two pick Scissors, and two pick Paper, the round is indecisive and they play again. But if 3 pick Rock and 2 pick Scissors, then the Rock players win and the Scissors players lose the game.

Assume that the \(n\) players independently and randomly choose between Rock, Scissors, and Paper, with equal probabilities. Let \(X, Y , Z\) be the number of players who pick Rock, Scissors, Paper, respectively in one game.

  1. Find the joint PMF of \(X,Y,Z.\)

  2. Find the probability that the game is decisive. Simplify your answer.

#replicate
set.seed(110)
sims = 1000

#define a simple parameter
n = 5

#indicator if the game is decisive
decisive = rep(0, sims)

#vector for the game options
choices = c("Rock", "Scissor", "Paper")

#run the loop
for(i in 1:sims){
  
  #sample choices for each player
  players = sample(choices, n, replace = TRUE)
  
  #see if we get exactly two choices
  if(length(unique(players)) == 2){
    decisive[i] = 1
  }
}

#should get (2^n - 2)/(3^(n - 1)) = .370
mean(decisive)
## [1] 0.348
  1. What is the probability that the game is decisive for \(n = 5\)? What is the limiting probability that a game is decisive as \(n \to \infty\)? Explain briefly why your answer makes sense.
#we solved for n = 5 in part a.

#solve for increasing n
n = round(seq(from = 2, to = 20, length.out = 10))

#keep track of decisive probabilities
probs = rep(NA, length(n))

#iterate over n
for(j in 1:length(n)){
  
  #indicator if the game is decisive
  decisive = rep(0, sims)
  
  #vector for the game options
  choices = c("Rock", "Scissor", "Paper")
  
  #run the loop
  for(i in 1:sims){
    
    #sample choices for each player
    players = sample(choices, n[j], replace = TRUE)
    
    #see if we get exactly two choices
    if(length(unique(players)) == 2){
      decisive[i] = 1
    }
  }
  
  #mark down the probability
  probs[j] = mean(decisive)
}

plot(n, probs, 
     main = "P(Game is decisive) for different n",
     xlab = "n", ylab = "P(Game is decisive)",
     col = "black", lwd = 3, pch = 16)




BH 7.68

Emails arrive in an inbox according to a Poisson process with rate \(\lambda\) (so the number of emails in a time interval of length \(t\) is distributed as \(Pois(\lambda t)\), and the numbers of emails arriving in disjoint time intervals are independent). Let \(X,Y,Z\) be the numbers of emails that arrive from 9 am to noon, noon to 6 pm, and 6 pm to midnight (respectively) on a certain day.

  1. Find the joint PMF of \(X,Y,Z\).

  2. Find the conditional joint PMF of \(X,Y,Z\) given that \(X+Y+Z = 36\).

  3. Find the conditional PMF of \(X+Y\) given that \(X+Y+Z=36\), and find \(E(X+Y|X+Y+Z=36)\) and \(Var(X+Y|X+Y+Z=36)\) (conditional expectation and conditional variance given an event are defined in the same way as expectation and variance, using the conditional distribution given the event in place of the unconditional distribution).

#replicate
set.seed(110)

#increased sims (rare event)
sims = 1000*10

#define a simple parameter (one that will often get us a sum of 36!)
lambda = 2.4

#generate the r.v.'s; work in terms of hours
X = rpois(sims, 3*lambda)
Y = rpois(sims, 6*lambda)
Z = rpois(sims, 6*lambda)

#should get 21.6
mean((X + Y)[X + Y + Z == 36])
## [1] 21.41829
#should get 8.64
var((X + Y)[X + Y + Z == 36])
## [1] 8.009455
#show that this is the same as Bin(36, 9/15)
hist((X + Y)[X + Y + Z == 36], 
     col = "gray", xlab = "x + y",
     main = "X + Y | X + Y + Z = 36")

hist(rbinom(sims, 36, 9/15), col = "gray", xlab = "",
     main = "Bin(36, 9/15)")






Covariance and Correlation




7.1

Let \(X \sim Bern(p)\). Let \(I_1\) be the indicator that \(X = 1\) and \(I_0\) be the indicator that \(X = 0\). Find \(Cov(I_1, I_0)\).



Analytical Solution:

We simply use the definition of Covariance:

\[Cov(I_1, I_0) = E(I_1I_0) - E(I_1) E(I_0)\]

We know that \(I_0I_1 = 0\) always, since one of the indicators is 0 regardless of what value \(X\) takes on. We find the latter term by using the fundamental bridge (probability \(p\) we get \(X = 1\), probability \(1 - p\) we get \(X = 0\)).

\[Cov(I_1, I_0) = -p(1 - p)\]


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#define a simple parameter
p = 1/2

#generate the r.v's
X = rbinom(sims, 1, p)
I1 = X
I0 = 1 - X

#should get -p*(1 - p) = -1/4
cov(I1, I0)
## [1] -0.2500811




7.2

Let \(U \sim Unif(0, 1)\). Find \(Cov(U, 1 - U)\).



Analytical Solution:

It is easiest to expand the Covariance term here. We get:

\[Cov(U, 1 - U) = Cov(U, 1) - Cov(U, U)\]

The Covariance of a random variable and a constant is 0, so we are left with:

\[-Var(U)\]

Which we know is:

\[-1/12\]


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#generate the r.v.
U = runif(sims)

#should get -1/12
cov(U, 1 - U)
## [1] -0.08363732




7.3

Let \(Z \sim N(0, 1)\). Find \(Cov(Z^2, Z^3)\).



Analytical Solution:

Again, we use the definition of Covariance.

\[Cov(X^2, X^3) = E(X^2X^3) - E(X^2)E(X^3)\]

Recall that every odd moment of the standard normal is 0 (since \(x\) to a negative power times the PDF of a standard normal is an odd function, and we integrate over all real numbers in the LOTUS calculation). Therefore, \(E(X^3) = 0\), so the second term is 0. We are left with:

\[E(X^5)\]

Which is also an odd moment, meaning it is also 0. So, the Covariance is 0. Clearly, these are not independent, but have a Covariance of 0.


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#generate the r.v.'s
Z = rnorm(sims)

#should get 0; large values may skew a simulation result
cov(Z^2, Z^3)
## [1] -1.181303




7.4

Let \(X \sim Unif(-3, -1)\), \(Y \sim Unif(-1, 1)\) and \(Z \sim Unif(1, 3)\). All of these random variables are independent. Let \(D_1 = Z - Y\) and \(D_2 = Y - X\). Find \(Cov(D_1, D_2)\). Provide some intuition for this result.



Analytical Solution:

By the rules of Covariance:

\[Cov(D_1, D_2) = Cov(Z - Y, Y - X) = Cov(Z, Y) - Cov(Z, X) - Cov(Y, Y) + Cov(Y, X)\] \[= -Var(Y)\]

Since \(X,Y,Z\) are all independent, so their covariances are 0, and \(Cov(Y, Y) = Var(Y)\). We know the variance of a Uniform distribution, so this comes out to:

\[=-2^2/12 = -1/3\]

It is intuitive that this value is negative; if \(D_1\) is large, for example, then it is likely that \(Y\) is small, which means it is more likely that \(D_2\) is small.


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#generate the r.v.'s
X = runif(sims, -3, -1)
Y = runif(sims, -1, 1)
Z = runif(sims, 1, 3)
D1 = Z - Y
D2 = Y - X


#should get -1/3
cov(D1, D2)
## [1] -0.3956148




7.5
  1. Is it possible to construct random variables \(X\) and \(Y\) such that \(X\) and \(Y\) are not marginally Normal but the vector \(\{X, Y\}\) is MVN?

  2. Can we consider the set of integers \(\{0, 1, ..., 100\}\) to be MVN?



Analytical Solution:

  1. No. Consider the linear combination \(X + 0\cdot Y = Y\), and we are given that \(Y\) is not marginally Normal.

  2. Yes. Any linear combination of these integers will result in a constant \(c\), and we can say that \(c \sim N(0, c)\). It is a degenerate Normal, but it is a Normal!




7.6

Let \(U \sim Unif(0, 1)\) and \(A\) be the area of the random, 2-D disk with radius \(U\).

  1. Intuitively, does \(A\) have a Uniform distribution?
  2. Find the PDF of \(A\).
  3. Verify that the PDF of \(A\) is a valid PDF.



Analytical Solution:

  1. We know that \(A = \pi U^2\), and that \(U^2\) is not uniform (the values should clump up around 0). Therefore, we don’t expect \(A\) to be uniform.

  2. Using the transformation theorem:

\[f(a) = f(u)|\frac{\partial u}{\partial a}|\]

The PDF of \(U\) is just 1. We know that \(A = \pi U^2 \rightarrow U = \sqrt{\frac{A}{\pi}}\), and taking the derivative w.r.t. \(A\) yields \(\frac{1}{2 \sqrt{\pi A}}\). We plug in:

\[f(a) = \frac{1}{2 \sqrt{\pi a}}\]

This PDF changes with \(a\), so it is not uniform by definition as we guessed in the previous part.

  1. The support of \(A\) is 0 to \(\pi\), since at minimum \(U = 0\) and at maximum \(U = 1\). We integrate the PDF over these bounds:

\[\frac{1}{2\sqrt{\pi}} \int_0^\pi a^{-1/2} da = \frac{1}{2 \pi} 2\sqrt{a}\big|_0^\pi = \frac{\sqrt{pi}}{\sqrt{pi}} - 0 = 1\]

The PDF integrates to 1; it is valid!


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#generate the r.v.'s
U = runif(sims)
A = pi*U^2



#plot the PDFs; should be the same
#calculate the analytical PDF
a = seq(from = 0, to = pi, length.out = 100)
PDF = 1/(2*sqrt(pi*a))

#plots should line up
#empirical
plot(density(A), col = "black", 
     main = "PDF", type = "l",
     xlab = "v", ylab = "f(a)",lwd = 3)

#analytical
lines(a, PDF, col = "red", pch = 20, type = "p", lwd = 3)


legend("topright", legend = c("Empirical PDF", "Analytical PDF"),
       lty=c(1,1), lwd=c(2.5,2.5),
       col=c("black", "red"))




7.7

Let \(X \sim Expo(\lambda)\) and \(Y = X + c\) for some constant \(c\). Does \(Y\) have an Exponential distribution? Use the PDF of \(Y\) to answer this question (and verify that the PDF you find is a valid PDF).



Analytical Solution:

We can find \(f(y)\) with the transformation theorem.

\[f(y) = f(x)|\frac{dx}{dy}|\] \[= \lambda e^{-\lambda(y - c)}\]

Since \(Y = X + c \rightarrow X = Y - c\), so the derivative of \(X\) in terms of \(Y\) is just 1. This is clearly not the PDF of an Exponential random variable, so \(Y\) is not exponential. We can check that this is a valid PDF by integrating over the support. Since \(X\) has support 0 to \(\infty\), we know \(Y\) has support \(c\) to \(\infty\). Taking an integral:

\[\int_{c}^{\infty} \lambda e^{-\lambda(y - c)} = \lambda e^{\lambda c} \int_{0}^{\infty} e^{-\lambda y}\]

\[= (\lambda e^{\lambda c}) \big(\frac{-e^{-\lambda y}}{\lambda} \Big|_c^{\infty})\]

\[= (\lambda e^{\lambda c})\frac{e^{-\lambda c}}{\lambda} = 1\]

The PDF is valid!


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#define simple parameters
lambda = 3
c = 5

#generate the r.v.'s
X = rexp(sims, lambda)
Y = X + c



#plot the PDFs; should be the same
#calculate the analytical PDF
y = seq(from = min(Y), to = max(Y), length.out = 100)
PDF = (lambda*exp(lambda*c))*exp(-lambda*y)

#plots should line up
#empirical
plot(density(Y), col = "black", 
     main = "PDF", type = "l",
     xlab = "y", ylab = "f(y)",lwd = 3)

#analytical
lines(y, PDF, col = "red", pch = 20, type = "p", lwd = 3)


legend("topright", legend = c("Empirical PDF", "Analytical PDF"),
       lty=c(1,1), lwd=c(2.5,2.5),
       col=c("black", "red"))




7.8

Let \(X,Y\) be i.i.d. \(N(0, 1)\), and let \(Z = min(X, Y)\) and \(W = max(X, Y)\). Nick says that, since \(\{Z, W\}\) is the same vector as \(\{X, Y\}\) and \(\{X, Y\}\) is BVN, we know that \(\{Z, W\}\) is also BVN. Is Nick correct?

Hint: the maximum of two Normal distributions is not Normal.



Analytical Solution:

Nick is not correct. Even though \(\{Z, W\}\) has the same values as \(\{X, Y\}\), the order of the vector may be different (in fact, it will be different half of the time, by symmetry). Anyways, consider the linear combination \(0Z + W = W\) of this vector. This is not Normal, by the hint.




7.9

Let \(X,Y\) be i.i.d. \(N(0, 1)\). Find \(E\big((X + Y)^2\big)\) using algebraic expansion and linearity of expectation.



Analytical Solution:

We can expand the square:

\[E\big((X + Y)^2\big) = E(X^2 + 2XY + Y^2)\]

By linearity:

\[= E(X^2) + 2E(XY) + E(Y^2)\]

We know that \(E(X^2) = Var(X) + (E(X))^2 = 1\), and then \(E(Y^2) = 1\) by symmetry. Since \(X\) and \(Y\) are independent, we know \(E(XY) = E(X)E(Y) = 0\), so the middle term drops and we are left with:

\[1 + 1 = 2\]

Which matches what we saw earlier.


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#generate the r.v.'s
X = rnorm(sims)
Y = rnorm(sims)

#should get 2
mean((X + Y)^2)
## [1] 2.040195




7.10

Nick argues that \(U^k \sim Unif(0, 1)\), where \(U \sim Unif(0, 1)\) and \(k\) is a known integer. He argues that each point in \(U\), raised to \(k\), maps to a point in the interval 0 to 1. Use the transformation theorem to adjudicate his claim, and verify that the PDF that you find is a valid PDF.



Analytical Distribution:

Let \(X = U^k\), which implies \(X^{1/k} = U\). By the transformation theorem:

\[f(x) = f(u) |\frac{du}{dx}|\]

We know that \(f(u)\), and the derivative of \(U\) in terms of \(X\) is \(\frac{1}{k} x^{\frac{1}{k} - 1}\), so we get:

\[f(x) = \frac{1}{k} x^{\frac{1}{k} - 1}\]

This PDF changes with \(x\), so \(X\) is not Uniform. We can verify that it is a valid PDF by integrating over 0 to 1 (since 0 to 1 is the support of \(U\), it must be the support of \(X\)).

\[\int_0^1 \frac{1}{k} x^{\frac{1}{k} - 1} dx = \frac{1}{k} \int_0^n x^{\frac{1}{k} - 1} dx\]

\[ = \frac{k}{k} x^{\frac{1}{k}} \Big|_0^1 = x^{\frac{1}{k}} \Big|_0^1 = 1\]

The PDF integrates to 1; it is valid!


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#define a simple parameter
k = 5

#generate the r.v.
X = runif(sims)^k

#should not be Uniform
hist(X, main = "U^5", col = rgb(0, 1, 0, 1/4),
     xlab = "x")

#check that the PDF integrates to 1
pdf <- function(x){
  return((1/k)*x^(1/k - 1))
}

#should get 1
integrate(pdf, 0, 1)
## 1 with absolute error < 2.4e-14




7.11

Let \(U \sim Unif(0, 1)\) and \(c\) be a constant. Let \(X = cU\).

  1. Find the distribution of \(X\) by finding the PDF \(f(x)\).

  2. Provide some intuition for the result in part a.



Analytical Solution:

By the transformation theorem:

\[f(x) = f(u) |\frac{du}{dx}|\]

We know that \(f(u) = 1\), so this term drops out. We know that \(X/c = U\), so the derivative of \(U\) in terms of \(X\) is \(1/c\). Plugging in:

\[f(x) = 1/c\]

Since \(f(x)\) does not change with \(x\), it is Uniform. Since the PDF of a Uniform is proportional to length, \(X\) must have length \(c\). If we consider the endpoints of \(U\), 0 and 1, we see that the endpoints of \(X\) are \(0 \cdot c = 0\) and \(1 \cdot c = c\). Therefore, \(X \sim Unif(0, c)\).

  1. This is intuitive, because each point in the interval \((0,1)\) maps to a point in the interval \((0,c)\) when we multiply it by the scalar \(c\). We are essentially scaling the Uniform distribution up.


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#define a simple parameter
c = 3

#generate the r.v.'s
U = runif(sims)
X = c*U

#should be Unif(0, 3)
hist(X, main = "X", col = rgb(0,1,0,1/4), xlab = "")




7.12

Let \(U_1\) and \(U_2\) be i.i.d. \(Unif(0, 1)\) (that is, Standard Uniform). Nick says that if \(X = U_1 \cdot U_2\), then \(X \sim Unif(0, 1)\), since the support of \(X\) is and value in the integer (0, 1) and multiplying two Standard Uniforms results in a Standard Uniform. Challenge his claim.

Hint: Find \(Var(X)\) and compare it to the variance of a Standard Uniform.

  1. Provide intuition about the variance you found in part a. and how it compares to the variance of a Standard Uniform.

Analytical Solution:

We start by writing the definition of Variance in terms of Expectation:

\[Var(X) = E(X^2) - E(X)^2\] \[= E(U_1^2 U_2^2) - (E(U_1 U_2))^2\]

We know that \(U_1\) and \(U_2\) are independent, so \(E(U_1 U_2) = E(U_1)E(U_2)\). Likewise, since \(U_1\) and \(U_2\) are independent, we know that \(U_1^2\) and \(U_2^2\) are independent; squaring the random variables separately does not add any level of knowledge! So, we have \(E(U_1^2 U_2^2) = E(U_1^2)E(U_2^2)\). We get:

\[= E(U_1^2)E(U_2^2) - \big(E(U_1)E(U_2)\big)^2\]

We know that the mean of a Standard Uniform is 1/2, and…

\[E(U_1^2) = E(U_2^2) = Var(U_2) + (E(U_2))^2 = 1/12 + 1/2^2 = 1/3\]

Putting it all together:

\[Var(X) = (1/3)^2 - 1/16\]

This is less than 1/12, the correct variance for a Standard Uniform, so \(X\) cannot be Standard Uniform!

  1. We found that \(Var(X) < 1/12\), or the variance of a Standard Uniform. This makes sense: for \(X\) to take on an extremely large value, we need both \(U_1\) and \(U_2\) to take on extremely large values (i.e., both have to be large for \(X\) to be large). That is, it is harder for \(X\) to take on values close to 1 than \(U_1\) marginally. It’s intuitive, then, that the variance of \(X\) is smaller (it is near 1 less often).


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#generate the r.v.'s
U1 = runif(sims)
U2 = runif(sims)
X = U1*U2

#should get 1/9 - 1/16 = .048
var(X)
## [1] 0.04587414
#plotting shows that X clumps around its mean
hist(X, main = "X", col = rgb(1, 0, 0, 1/4), xlab = "")




7.13

Let \(c\) be a constant and \(X\) be a random variable. For the following distributions of \(X\), see if \(Y = cX\) has the same distribution as \(X\) (not the same parameters, but the same distribution).

  1. \(X \sim Expo(\lambda)\)
  2. \(X \sim Bin(n, p)\)
  3. \(X \sim Pois(\lambda)\)



Analytical Solution:

  1. We can find the PDF of \(Y\) via the transformation theorem:

\[f(y) = f(x) | \frac{dx}{dy} |\]

We know the PDF of \(X\), and we know that \(X = Y/c\), so the derivative of \(X\) in terms of \(Y\) is \(1/c\). Putting it together:

\[f(y) = \frac{\lambda}{c}e^{-\frac{\lambda}{c}y}\]

This is indeed the PDF of an Exponential; specifically, we see that \(Y \sim Expo(\frac{\lambda}{c})\). This agrees with results we have seen from earlier parts of the book (i.e., \(\lambda X \sim Expo(1)\)).

  1. We could find the PDF of \(Y\) if we wanted, or we could just realize that the support of \(Y\) will just be multiples of \(c\) (i.e., if \(c = 2\), the support of \(X\) is \(0,1,2,...,n\) so the support of \(Y\) is \(0, 2, 4, ...,2n\)). This is not the support of a Binomial (must be \(0,1,2,...,n\)) so \(Y\) cannot be Binomial.

  2. This is not Poisson for similar reasons as we found in the last part: the support of \(Y\) is \(0, c, 2c,...\) instead of \(0,1,2,...\).


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#define simple parameters
lambda = 5
n = 10
p = 1/2
c = 2

#part a.
#generate r.v.'s
X = rexp(sims, lambda)
Y = c*X

#show that Y ~ Expo(lambda/c)
#1x2 graph grid
par(mfrow = c(1,2))

#graphics
hist(Y, main = "Y", col = rgb(0,0,1,1/4), xlab = "")
hist(rexp(sims, lambda/c), main = "Expo(lambda/c)", col = rgb(0,1,0,1/4), xlab = "")

#re-set graphics
par(mfrow = c(1,1))


#part b.
#generate r.v.'s
X = rbinom(sims, n, p)
Y = c*X

#the support of Y is not 0,1,2,...
table(Y)
## Y
##   0   2   4   6   8  10  12  14  16  18 
##   1   6  49 116 186 273 207 114  39   9
#part c.
#generate r.v.'s
X = rpois(sims, lambda)
Y = c*X

#the support of Y is not 0,1,2,...
table(Y)
## Y
##   0   2   4   6   8  10  12  14  16  18  20  22  24  26 
##   9  31  76 121 150 179 178 110  73  39  19   7   6   2




7.14
  1. Let \(X\) be a random variable and let \(c\) be a constant and let \(Y = cX\). Show that \(Corr(X, Y) = 1\).

  2. Let \(X\) be a random variable and let \(c\) be a constant and let \(Y = X + c\). Show that \(Corr(X, Y) = 1\).



Analytical Solution:

  1. Using the properties of Correlation and Covariance, where \(\sigma_x^2 = Var(X)\) etc., we write:

\[Corr(X, Y) = \frac{Cov(X, Y)}{\sigma_x \sigma_y}\]

We know that \(Var(Y) = Var(cX) = c^2\sigma_x^2\), so we write:

\[= \frac{Cov(X, cX)}{\sigma_x \sqrt{c^2 \sigma_x}}\]

By the properties of Covariance, we can factor out the \(c\) in the numerator:

\[= \frac{cCov(X, X)}{c\sigma_x^2}\]

And since \(Cov(X, X) = Var(X)\):

\[\frac{c\sigma_x^2}{c\sigma_x^2} = 1\]

  1. Writing out the definition of Correlation:

\[Corr(X, Y) = \frac{Cov(X, X + c)}{\sigma_x\sigma_y}\]

We know that \(Var(Y) = Var(X + c) = Var(X)\). We can also expand the Covariance in the numerator:

\[=\frac{Cov(X, X) + Cov(X, c)}{\sigma_x^2}\]

The covariance of a random variable with a constant is 0, and \(Cov(X, X) = Var(X)\), so we get:

\[= \frac{\sigma_x^2}{\sigma_x^2} = 1\]


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#define a simple parameter
c = 3

#for simplicity, use standard normal random variables
#part a.
X = rnorm(sims)
Y = c*X

#should get 1
cor(X, Y)
## [1] 1
#part b.
X = rnorm(sims)
Y = X + c

#should get 1
cor(X, Y)
## [1] 1




BH Problems



The problems in this section are taken from @BH. The questions are reproduced here, and the analytical solutions are freely available online. Here, we will only consider empirical solutions: answers/approximations to these problems using simulations in R.




BH 7.38

Let \(X\) and \(Y\) be r.v.s. Is it correct to say “\(\max(X,Y) + \min(X,Y) = X+Y\)”? Is it correct to say “\(Cov( \max(X,Y), \min(X,Y)) = Cov(X,Y)\) since either the max is \(X\) and the min is \(Y\) or vice versa, and Covariance is symmetric”? Explain.

#replicate
set.seed(110)
sims = 1000

#generate standard normals with rho = 1/2 (Correlation) 
data = rmvnorm(sims, mean = c(0, 0), 
               sigma = matrix(c(1, 1/2, 1/2, 1), nrow = 2, ncol = 2))
X = data[, 1]
Y = data[, 2]

#keep track of the mins and maxes
mins = rep(NA, sims)
maxes = rep(NA, sims)

for(i in 1:sims){
  
  #calculate the mins and maxes
  mins[i] = min(X[i], Y[i])
  maxes[i] = max(X[i], Y[i])
}

#these should be equal (the difference is always 0)
table((maxes + mins) - (X + Y))
## 
##    0 
## 1000
#these should not match
cov(mins, maxes); cov(X, Y)
## [1] 0.6938947
## [1] 0.5384937




BH 7.39

Two fair six-sided dice are rolled (one green and one orange), with outcomes \(X\) and \(Y\) respectively for the green and the orange.

  1. Compute the Covariance of \(X+Y\) and \(X-Y\).
#replicate
set.seed(110)
sims = 1000

#generate the r.v.'s
X = sample(1:6, sims, replace = TRUE)
Y = sample(1:6, sims, replace = TRUE)

#calculate the Sum and the Difference
S = X + Y
D = X - Y

#should get 0
cov(S, D)
## [1] -0.04585686


  1. Are \(X+Y\) and \(X-Y\) independent?
#the means are different after conditioning; they are not independent!
mean(D)
## [1] -0.149
mean(D[S == 2])
## [1] 0




BH 7.41

Let \(X\) and \(Y\) be standardized r.v.s (i.e., marginally they each have mean 0 and variance 1) with Correlation \(\rho \in (-1,1)\). Find \(a,b,c,d\) (in terms of \(\rho\)) such that \(Z=aX + bY\) and \(W = cX + dY\) are uncorrelated but still standardized.

#replicate
set.seed(110)
sims = 1000

#define a simple parameter
rho = 1/2

#generate X and Y
data = rmvnorm(sims, mean = c(0, 0), 
               sigma = matrix(c(1, rho, rho, 1), nrow = 2, ncol = 2))
X = data[, 1]
Y = data[, 2]


#define the constants
a = 1
b = 0
c = -rho/(sqrt(1 - rho^2))
d = 1/(sqrt(1 - rho^2))

#generate Z and W
Z = a*X + b*Y
W = c*X + d*Y

#show that Z and W are still standardized
mean(Z); var(Z)
## [1] -0.07388428
## [1] 1.041435
mean(W); var(W)
## [1] -0.01456604
## [1] 0.9601092
#show that Z, W are still uncorrelated (should get 0)
cor(Z, W)
## [1] 0.02052713




BH 7.42

Let \(X\) be the number of distinct birthdays in a group of 110 people (i.e., the number of days in a year such that at least one person in the group has that birthday). Under the usual assumptions (no February 29, all the other 365 days of the year are equally likely, and the day when one person is born is independent of the days when the other people are born), find the mean and variance of \(X\).

#replicate
set.seed(110)
sims = 1000

#set a path for X
X = rep(NA, sims)

#run the loop
for(i in 1:sims){
  
  #sample birthdays
  bdays = sample(1:365, 110, replace = TRUE)
  
  #count how many unique birthdays we have
  X[i] = length(unique(bdays))
}

#should get 95.803
mean(X)
## [1] 95.149
#should get 10.019
var(X)
## [1] 10.26106




BH 7.47

Athletes compete one at a time at the high jump. Let \(X_j\) be how high the \(j\)th jumper jumped, with \(X_1,X_2,\dots\) i.i.d. with a continuous distribution. We say that the \(j\)th jumper sets a record if \(X_j\) is greater than all of \(X_{j-1},\dots,X_1\).

Find the variance of the number of records among the first \(n\) jumpers (as a sum). What happens to the variance as \(n \to \infty\)?

#replicate
set.seed(110)
sims = 1000

#define a simple parameter
n = 10

#keep track of the number of records
records = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #initialize the jumper scores
  scores = integer(0)
  
  #iterate over each jumper
  for(j in 1:n){
    
    #draw a new jump value; for simplicity, use a normal
    scores = c(scores, rnorm(1, 10, 1))
    
    #see if we have a record
    if(scores[j] == max(scores[1:j])){
      
      #if we have a new record, mark it down
      records[i] = records[i] + 1
    }
  }
}

#compare to the analytical result; they should match
j = 1:n
sum(1/j - 1/j^2)
## [1] 1.379201
var(records)
## [1] 1.385786
#now let n go to infinity
#define a sequence for n
n = seq(from = 10, to = 100, length.out = 10)

#keep track of the variances
vars = rep(0, length(n))

#iterate over n
for(k in 1:length(n)){
  
  #keep track of the number of records
  records = rep(0, sims)
  
  #run the loop
  for(i in 1:sims){
    
    #initialize the jumper scores
    scores = integer(0)
    
    #iterate over each jumper
    for(j in 1:n[k]){
      
      #draw a new jump value; for simplicity, use a normal
      scores = c(scores, rnorm(1, 10, 1))
      
      #see if we have a record
      if(scores[j] == max(scores[1:j])){
        
        #if we have a new record, mark it down
        records[i] = records[i] + 1
      }
    }
  }
  
  #mark the variance
  vars[k] = var(records)
}

#eventually, this will reach infinity!
plot(n, vars, ylab = "Variance of the number of records",
     main = "Variance of # of Records for different n",
     pch = 16)




BH 7.48

A chicken lays a \(Pois(\lambda)\) number \(N\) of eggs. Each egg hatches a chick with probability \(p\), independently. Let \(X\) be the number which hatch, so \(X|N=n \sim Bin(n,p)\).

Find the Correlation between \(N\) (the number of eggs) and \(X\) (the number of eggs which hatch). Simplify; your final answer should work out to a simple function of \(p\) (the \(\lambda\) should cancel out).

#replicate
set.seed(110)
sims = 1000

#define simple parameters
lambda = 10
p = 1/2

#generate the r.v.'s
N = rpois(sims, lambda)

#iterate over the number of eggs and draw Binomials for each
X = sapply(N, function(x) sum(rbinom(1, x, p)))

#should get sqrt(p) = .707
cor(N, X)
## [1] 0.6990949




BH 7.52

A drunken man wanders around randomly in a large space. At each step, he moves one unit of distance North, South, East, or West, with equal probabilities. Choose coordinates such that his initial position is \((0,0)\) and if he is at \((x,y)\) at some time, then one step later he is at \((x,y+1), (x,y-1),(x+1,y)\), or \((x-1,y)\). Let \((X_n,Y_n)\) and \(R_n\) be his position and distance from the origin after \(n\) steps, respectively.

  1. Determine whether or not \(X_n\) is independent of \(Y_n\).
#replicate
set.seed(110)

#extra number of sims; rare events
sims = 10000

#define a simple value for n
n = 10

#keep track of Xn, Yn and Rn
Xn = rep(NA, sims)
Yn = rep(NA, sims)
Rn = rep(NA, sims)


#run the loop
for(i in 1:sims){
  
  #set paths for the steps X and Y
  X = rep(0, n)
  Y = rep(0, n)
  
  #each step, we incrememnt x by 1 with probability .25, x by -1 
  #   with probability .25, y by 1 with probability .25 and
  #   y by -1 with probability .25.
  #   Let's recreate this!
  draws = runif(n)
  
  #25% of the time, increase X by 1, 25% of the time decrease X by 1, etc.
  X[draws <= .25] = -1
  X[.25 < draws & draws <= .5] = 1
  Y[.5 < draws & draws <= .75] = -1
  Y[.75 < draws] = 1
  
  #find Xn, Yn and Rn
  Xn[i] = sum(X)
  Yn[i] = sum(Y)
  Rn[i] = sqrt(Xn[i]^2 + Yn[i]^2) 
}

#these means are different (since it forces the number of 
#   steps in the Y direction to be even or odd), so not independent
mean(Yn[Xn == 6])
## [1] 0.5454545
mean(Yn[Xn == 5])
## [1] -0.06293706


  1. Find \(Cov(X_n,Y_n)\).
#should get 0
cov(Xn, Yn)
## [1] 0.0964788


  1. Find \(E(R_n^2)\).
#should get n = 10
mean(Rn^2)
## [1] 9.9966




BH 7.53

A scientist makes two measurements, considered to be independent standard Normal r.v.s. Find the Correlation between the larger and smaller of the values.

#replicate
set.seed(110)
sims = 1000

#generate the r.v.'s
X = rnorm(sims)
Y = rnorm(sims)

#keep track of the min and max
M = rep(NA, sims)
L = rep(NA, sims)

#calculate the min and max
for(i in 1:sims){
  
  #calculate the mins and maxes
  M[i] = max(X[i], Y[i])
  L[i] = min(X[i], Y[i])
}

#should get 1/(pi - 1) = .466
cor(M, L)
## [1] 0.4759846




BH 7.55

Consider the following method for creating a bivariate Poisson (a joint distribution for two r.v.s such that both marginals are Poissons). Let \(X=V+W, Y = V+Z\) where \(V,W,Z\) are i.i.d. Pois(\(\lambda\)) (the idea is to have something borrowed and something new but not something old or something blue).

  1. Find \(Cov(X,Y)\).
#replicate
set.seed(110)
sims = 1000

#define a simple parameter
lambda = 1

#generate the r.v.'s
V = rpois(sims, lambda)
W = rpois(sims, lambda)
Z = rpois(sims, lambda)
X = V + W
Y = V + Z

#should get lambda = 1
cov(X,Y)
## [1] 0.8665666


  1. Are \(X\) and \(Y\) independent? Are they conditionally independent given \(V\)?
#not independent; when Y is large, X tends to be large!
mean(X)
## [1] 1.94
mean(Y[X > 5])
## [1] 3.666667
#conditionally independent given V, since we have 
#   transformations of W and Z, which are independent!
mean(W)
## [1] 1.008
mean(W[Z > 3])
## [1] 1


  1. Find the joint PMF of \(X,Y\) (as a sum).
#group the data
data <- data.frame(X, Y)

#generate a heatmap; should only have 3 values
#be careful about interpreting here; look at the center of each square
group_by(data,X,Y) %>% summarize(realizations=n()) %>% 
  ggplot(aes(X, Y,fill=realizations/sims)) + 
  geom_tile(color = "white") +
  scale_fill_gradient(low = "white", high= "red", name = "Density") + 
  ggtitle("Empirical Density of X and Y") + 
  theme(plot.margin = unit(c(1.8,.5,1.75,1.55), "cm")) +
  theme(plot.title = element_text(family = "Trebuchet MS", 
            color="#383838", face="bold", size=25, vjust = 2)) + 
  theme(axis.title.x = element_text(family = "Trebuchet MS", 
            color="#383838", face="bold", size=22, vjust = -1.5),
        axis.title.y = element_text(family = "Trebuchet MS", 
            color="#383838", face="bold", size=22, vjust = 2)) +
  theme(axis.text.x = element_text(face="bold", color="#383838", 
                           size=14),
          axis.text.y = element_text(face="bold", color="#383838", 
                           size=14))

#analytical Joint Distribution

#define the supports/all combinations
X.a = 0:max(X)
Y.a = 0:max(Y)
data = expand.grid(X.a = X.a, Y.a = Y.a)

#calculate density
data$density = apply(data, 1, function(x){
  c = exp(-3*lambda)*lambda^(x[1] + x[2])
  v = 0:(min(x[1], x[2]))
  d = lambda^(-v)/(factorial(x[1] - v)*factorial(x[2] - v)*factorial(v))
  
  return(c*sum(d))
})

#remove points with 0 density
data = data[data$density != 0, ]

#generate a heatmap
ggplot(data = data, aes(X.a, Y.a)) + 
  geom_tile(aes(fill = density), color = "white") +
  scale_fill_gradient(low = "white", high= "red", name = "Density") + 
  ggtitle("Analytical Density of X and Y") + 
  theme(plot.margin = unit(c(1.8,.5,1.75,1.55), "cm")) +
  theme(plot.title = element_text(family = "Trebuchet MS", 
            color="#383838", face="bold", size=25, vjust = 2)) + 
  theme(axis.title.x = element_text(family = "Trebuchet MS", 
            color="#383838", face="bold", size=22, vjust = -1.5),
        axis.title.y = element_text(family = "Trebuchet MS", 
            color="#383838", face="bold", size=22, vjust = 2)) +
  theme(axis.text.x = element_text(face="bold", color="#383838", 
                           size=14),
          axis.text.y = element_text(face="bold", color="#383838", 
                           size=14)) +
  labs(x = "X = # of Women Selected", y = "Y = # of Men Selected")




BH 7.41

Let \(X\) and \(Y\) be standardized r.v.s (i.e., marginally they each have mean 0 and variance 1) with Correlation \(\rho \in (-1,1)\). Find \(a,b,c,d\) (in terms of \(\rho\)) such that \(Z=aX + bY\) and \(W = cX + dY\) are uncorrelated but still standardized.

#replicate
set.seed(110)
sims = 1000

#define a simple parameter
rho = 1/2

#generate X and Y
data = rmvnorm(sims, mean = c(0, 0), 
               sigma = matrix(c(1, rho, rho, 1), nrow = 2, ncol = 2))
X = data[, 1]
Y = data[, 2]


#define the constants
a = 1
b = 0
c = -rho/(sqrt(1 - rho^2))
d = 1/(sqrt(1 - rho^2))

#generate Z and W
Z = a*X + b*Y
W = c*X + d*Y

#show that Z and W are still standardized
mean(Z); var(Z)
## [1] -0.07388428
## [1] 1.041435
mean(W); var(W)
## [1] -0.01456604
## [1] 0.9601092
#show that Z, W are still uncorrelated (should get 0)
cor(Z, W)
## [1] 0.02052713




BH 7.71

Let \((X,Y)\) be Bivariate Normal, with \(X\) and \(Y\) marginally \(N(0,1)\) and with correlation \(\rho\) between \(X\) and \(Y\).

  1. Show that \((X+Y,X-Y)\) is also Bivariate Normal.

  2. Find the joint PDF of \(X+Y\) and \(X-Y\) (without using calculus), assuming \(-1 < \rho < 1\).

#replicate
set.seed(110)

#increased number of sims so we can see better
sims = 1000*10

#define a simple parameter
rho = 1/2

#generate the r.v.'s
data = rmvnorm(sims, mean = c(0, 0), 
        sigma = matrix(c(1, rho, rho, 1), nrow = 2, ncol = 2))
W = data[, 1]
Z = data[, 2]

#define X as W + Z and Y as W - Z
X = W + Z
Y = W - Z

#round the r.v.'s so we can look at them discretely
X = round(X, 1)
Y = round(Y, 1)

#Empirical density; group the data
data <- data.frame(X, Y)

#generate a heatmap
#change the color so we can see!
#we are treating this as discrete, so the density key may be different
# from the analytical version
group_by(data,X,Y) %>% summarize(realizations=n()) %>% 
  ggplot(aes(X, Y,fill=realizations/sims)) + geom_tile(color = "black") +
  scale_fill_gradient(low = "black", high= "red", name = "Density") + 
  ggtitle("Empirical Density of X and Y") + 
  theme(plot.margin = unit(c(1.8,.5,1.75,1.55), "cm")) +
  theme(plot.title = element_text(family = "Trebuchet MS", 
            color="#383838", face="bold", size=25, vjust = 2)) + 
  theme(axis.title.x = element_text(family = "Trebuchet MS", 
            color="#383838", face="bold", size=22, vjust = -1.5),
        axis.title.y = element_text(family = "Trebuchet MS", 
            color="#383838", face="bold", size=22, vjust = 2)) +
  theme(axis.text.x = element_text(face="bold", color="#383838", 
                           size=14),
          axis.text.y = element_text(face="bold", color="#383838", 
                           size=14))

#analytical Joint Distribution

#define the supports/all combinations
#start just above 0
X.a = seq(from = min(X), to = max(X), by = .1)
Y.a = seq(from = min(Y), to = max(Y), by = .1)
data = expand.grid(X.a = X.a, Y.a = Y.a)

#calculate density
data$density = apply(data, 1, function(x) 
  (1/(4*pi*sqrt(1 - rho^2)))*exp(-.25*(x[1]^2/(1 + rho) + x[2]^2/(1 - rho))))

#remove points with 0 density
data = data[data$density != 0, ]


#generate a heatmap
#make it black so we can see!
ggplot(data = data, aes(X.a, Y.a)) + 
  geom_tile(aes(fill = density), col = "white") +
  scale_fill_gradient(low = "black", high= "red", name = "Density") + 
  ggtitle("Analytical Density of X and Y") + 
  theme(plot.margin = unit(c(1.8,.5,1.75,1.55), "cm")) +
  theme(plot.title = element_text(family = "Trebuchet MS", 
            color="#383838", face="bold", size=25, vjust = 2)) + 
  theme(axis.title.x = element_text(family = "Trebuchet MS", 
            color="#383838", face="bold", size=22, vjust = -1.5),
        axis.title.y = element_text(family = "Trebuchet MS", 
            color="#383838", face="bold", size=22, vjust = 2)) +
  theme(axis.text.x = element_text(face="bold", color="#383838", 
                           size=14),
          axis.text.y = element_text(face="bold", color="#383838", 
                           size=14))




BH 7.84

A network consists of \(n\) nodes, each pair of which may or may not have an edge joining them. For example, a social network can be modeled as a group of \(n\) nodes (representing people), where an edge between \(i\) and \(j\) means they know each other. Assume the network is undirected and does not have edges from a node to itself (for a social network, this says that if \(i\) knows \(j\), then \(j\) knows \(i\) and that, contrary to Socrates’ advice, a person does not know himself or herself). A clique of size \(k\) is a set of \(k\) nodes where every node has an edge to every other node (i.e., within the clique, everyone knows everyone). An anticlique of size \(k\) is a set of \(k\) nodes where there are no edges between them (i.e., within the anticlique, no one knows anyone else).

  1. Form a random network with \(n\) nodes by independently flipping fair coins to decide for each pair \(\{x,y\}\) whether there is an edge joining them. Find the expected number of cliques of size \(k\) (in terms of \(n\) and \(k\)).
#replicate
set.seed(110)
sims = 1000

#define simple (small) parameters
n = 5
k = 3

#generate all of the potential cliques
pot.cliques = combn(n, k)

#generate the potential edges in a clique of size k
edges = combn(k, 2)
  
#keep track of the number of cliques
cliques = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #generate the edge matrix (keeps track of edges)
  E = matrix(rbinom(n*n, 1, 1/2), nrow = n, ncol = n)
  
  #fold the matrix over itself
  for(j in 1:n){
    for(l in 1:n){
      E[l, j] = E[j, l]
    }
  }
  
  #get rid of self-referring edges
  diag(E) = 0
  
  #iterate through each potential clique
  for(j in 1:dim(pot.cliques)[2]){
    
    #initialize a marker that tells is if we have no clique
    no.clique = 0
    
    #check all of the edges
    for(l in 1:dim(edges)[2]){
      
      #see if there is no edge here; if not, set no.clique to 1
      if(E[pot.cliques[edges[1, l], j], pot.cliques[edges[2, l], j]] == 0){
        no.clique = 1
      }
    }
    
    #if this is not a clique, go to the next pass
    if(no.clique == 1){
      next
    }
    
    #if it is a clique, increment
    cliques[i] = cliques[i] + 1
  }
}

#should get choose(n, k)/(2^(choose(k, 2))) = 1.25
mean(cliques)
## [1] 1.262


  1. A triangle is a clique of size 3. For a random network as in (a), find the variance of the number of triangles (in terms of \(n\)).
#we used k = 3 in part a, so find the variance
#should get (7/64)*choose(n, 3) + (3/16)*choose(n, 4) = 2.03
var(cliques)
## [1] 2.043399
  1. Suppose that \({n \choose k} < 2^{{k \choose 2} -1}\). Show that there is a network with \(n\) nodes containing no cliques of size \(k\) or anticliques of size \(k\).




BH 7.85

Shakespeare wrote a total of 884647 words in his known works. Of course, many words are used more than once, and the number of distinct words in Shakespeare’s known writings is 31534 (according to one computation). This puts a lower bound on the size of Shakespeare’s vocabulary, but it is likely that Shakespeare knew words which he did not use in these known writings.

More specifically, suppose that a new poem of Shakespeare were uncovered, and consider the following (seemingly impossible) problem: give a good prediction of the number of words in the new poem that do not appear anywhere in Shakespeare’s previously known works.

Ronald Thisted and Bradley Efron studied this problem in the papers 9 and 10, developing theory and methods and then applying the methods to try to determine whether Shakespeare was the author of a poem discovered by a Shakespearean scholar in 1985. A simplified version of their method is developed in the problem below. The method was originally invented by Alan Turing (the founder of computer science) and I.J.~Good as part of the effort to break the German Enigma code during World War II.

Let \(N\) be the number of distinct words that Shakespeare knew, and assume these words are numbered from 1 to \(N\). Suppose for simplicity that Shakespeare wrote only two plays, \(A\) and \(B\). The plays are reasonably long and they are of the same length. Let \(X_j\) be the number of times that word \(j\) appears in play \(A\), and \(Y_j\) be the number of times it appears in play \(B\), for \(1 \leq j \leq N\).

  1. Explain why it is reasonable to model \(X_j\) as being Poisson, and \(Y_j\) as being Poisson with the same parameter as \(X_j\).

  2. Let the numbers of occurrences of the word “eyeball” (which was coined by Shakespeare) in the two plays be independent Pois(\(\lambda\)) r.v.s. Show that the probability that ``eyeball" is used in play \(B\) but not in play \(A\) is \[e^{-\lambda}(\lambda - \lambda^2/2! + \lambda^3/3! - \lambda^4/4! + \dots).\]

#replicate
set.seed(110)
sims = 1000

#define a simple, reasonable parameter
lambda = 1

#generate the r.v.'s
X = rpois(sims, lambda)
Y = rpois(sims, lambda)

#probabilities should match
length(X[X > 0 & Y == 0])/sims
## [1] 0.226
exp(-lambda)*(lambda - lambda^2/2 + lambda^3/factorial(3) - lambda^4/factorial(4))
## [1] 0.2299247


  1. Now assume that \(\lambda\) from (b) is unknown and is itself taken to be a random variable to reflect this uncertainty. So let \(\lambda\) have a PDF \(f_0\). Let \(X\) be the number of times the word “eyeball” appears in play \(A\) and \(Y\) be the corresponding value for play \(B\). Assume that the conditional distribution of \(X,Y\) given \(\lambda\) is that they are independent \(Pois(\lambda)\) r.v.s. Show that the probability that ``eyeball" is used in play \(B\) but not in play \(A\) is the alternating series \[P(X=1) - P(X=2) + P(X=3) - P(X=4) + \dots.\]

  2. Assume that every word’s numbers of occurrences in \(A\) and \(B\) are distributed as in (c), where \(\lambda\) may be different for different words but \(f_0\) is fixed. Let \(W_j\) be the number of words that appear exactly \(j\) times in play \(A\). Show that the expected number of distinct words appearing in play \(B\) but not in play \(A\) is \[E(W_1) - E(W_2) + E(W_3) - E(W_4) + \dots.\]




BH 8.4

Find the PDF of \(Z^4\) for \(Z \sim \mathcal{N}(0,1)\).

#replicate
set.seed(110)
sims = 1000

#generate the r.v.
Z = rnorm(sims)

#transform
X = Z^4


#calculate analytical PDF
x = seq(from = min(X), to = max(X), length.out = 100)
PDF = (1/(2*sqrt(2*pi)))*exp(-(1/2)*x^(1/2))*x^(-(3/4))

#plot the PDFs, they should match
plot(density(X), main = "Density of X", xlab = "X", col = "black", lwd = 6)
lines(x, PDF, lwd = 3, type = "h", col = "red")


legend("topright", legend = c("Empirical PDF", "Analytical PDF"),
       lty=c(1,1), lwd=c(2.5,2.5),
       col=c("black", "red"))




BH 8.6

Let \(U \sim Unif(0,1)\). Find the PDFs of \(U^2\) and \(\sqrt{U}\).

#replicate
set.seed(110)
sims = 1000

#draw the r.v.'s
U = runif(sims)
X = U^2
Y = sqrt(U)

#should match a Beta(1/2, 1)
hist(X, freq = FALSE, 
     main = "Standard Uniform Squared", 
     col = "gray", xlab = "x")

hist(rbeta(sims, 1/2, 1), 
     main = "Beta(1/2, 1)",
     col = "gray", freq = FALSE, xlab = "x")

#should match a Beta(2, 1)
hist(Y, freq = FALSE, 
     main = "Standard Uniform Squared", 
     col = "gray", xlab = "y")

hist(rbeta(sims, 2, 1), 
     main = "Beta(1/2, 1)",
     col = "gray", freq = FALSE, xlab = "y")






Beta and Gamma




8.1

Let \(Z \sim Beta(a, b)\).

  1. Find \(E(\frac{1}{Z})\) using pattern integration.

  2. Find \(Var(\frac{1}{Z})\) using pattern integration.



Analytical Solutionn:

  1. First, we’ll use LOTUS:

\[\int_{0}^1 \frac{\Gamma(a + b)}{\Gamma(a)\Gamma(b)} \frac{1}{z} z^{a - 1}(1 - z)^{1 - b}dz\] \[ = \frac{\Gamma(a + b)}{\Gamma(a)\Gamma(b)} \int_{0}^1 z^{a - 2}(1 - z)^{1 - b}dz\]

We should recognize that the terms that are a function of \(z\) in the integrand resemble the ‘meaty’ part of the PDF of a \(Beta(a - 1, b)\). We introduce the normalizing constant (multiply and divide by it):

\[ = \frac{\Gamma(a + b)}{\Gamma(a)\Gamma(b)} \frac{\Gamma(a - 1)\Gamma(b)}{\Gamma(a - 1 + b)} \int_{0}^1 \frac{\Gamma(a - 1 + b)}{\Gamma(a - 1)\Gamma(b)}z^{a - 2}(1 - z)^{1 - b}dz\]

The integrand is now the integral of a PDF over it’s support, which, by definition, goes to 1. We are left with:

\[E(\frac{1}{Z}) = \frac{\Gamma(a + b)}{\Gamma(a)\Gamma(b)} \frac{\Gamma(a - 1)\Gamma(b)}{\Gamma(a - 1 + b)}\]

  1. \(Var(\frac{1}{Z}) = E(\frac{1}{Z^2}) - E(\frac{1}{Z})^2\). We know the second term, and can solve \(E(\frac{1}{Z^2})\) via LoTUS.

\[\int_0^1 \frac{1}{z^2} \frac{\Gamma(a + b)}{\Gamma(a)\Gamma(b)}z^{a - 1}(1 - z)^{b - 1}\]

\[\int_0^1 \frac{\Gamma(a + b)}{\Gamma(a)\Gamma(b)}z^{a - 3}(1 - z)^{b - 1}\]

We complete the PDF of a \(Beta(a - 2, b)\):

\[ = \frac{\Gamma(a + b)}{\Gamma(a)\Gamma(b)} \frac{\Gamma(a - 2)\Gamma(b)}{\Gamma(a - 2 + b)} \int_{0}^1 \frac{\Gamma(a - 2 + b)}{\Gamma(a - 2)\Gamma(b)}z^{a - 3}(1 - z)^{1 - b}dz\]

The integrand now integrates to 1, and we are left with

\[E(1/Z^2) = \frac{\Gamma(a + b)}{\Gamma(a)\Gamma(b)} \frac{\Gamma(a - 2)\Gamma(b)}{\Gamma(a - 2 + b)}\]

So, we can write:

\[Var(\frac{1}{Z}) = E(\frac{1}{Z^2}) - E(\frac{1}{Z})^2\]

Where \(E(\frac{1}{Z^2})\) is solved above and \(E(\frac{1}{Z})\) is solved in part (a).


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#define simple parameters
a = 4
b = 3

#generate the r.v.
Z = rbeta(sims, a, b)

#analytical and empirical results should match
mean(1/Z)
## [1] 1.924767
gamma(a + b)*gamma(a - 1)/(gamma(a)*gamma(a - 1 + b))
## [1] 2
var(1/Z)
## [1] 0.73599
(gamma(a + b)*gamma(a - 2))/(gamma(a)*gamma(a - 2 + b)) - 
((gamma(a + b)/(gamma(a)*gamma(b)))*(gamma(a - 1)*gamma(b))/gamma(a + b - 1))^2
## [1] 1




8.2
  1. Let \(U \sim Unif(0,1)\), \(B \sim Beta(1,1)\), \(E \sim Expo(10)\) and \(G \sim Gamma(1,10)\) (all are independent). Find \(E(U - B + E - G)\) without using any calculations (just facts about each distribution)

  2. For what value of \(n\) is \(\frac{ \Big(\frac{\Gamma(n+1)}{\Gamma(n)}\Big)^2}{4}\) the PDF of a Standard Uniform?



Analytical Solution:

  1. \(B\) has the same distribution as \(U\), and \(G\) has the same distribution as \(E\) (recall the story of a \(Gamma(a, \lambda)\): the sum of \(a\) independent \(Expo(\lambda)\) random variables). Therefore, the expectations will cancel out to 0.

  2. The PDF of a Standard Uniform is just 1. After simplifying: \(\frac{\Gamma(n+1)}{\Gamma(n)} = \frac{n\Gamma(n)}{\Gamma(n)} = n\). So we need \(\frac{n^2}{4}\) to be 1, and thus \(n = 2\).


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#generate the r.v.'s
U = runif(sims)
B = rbeta(sims, 1, 1)
E = rexp(sims, 10)
G = rgamma(sims, 1, 10)

#should get 0
mean(U - B + E - G)
## [1] -0.03986786
#set a 2x2 graphing grid
par(mfrow = c(2,2))

#show that the histograms match
hist(U, main = "U", xlab = "u", col = rgb(0, 1, 0, 1/4))
hist(B, main = "B", xlab = "b", col = rgb(0, 1, 0, 1/4))
hist(E, main = "E", xlab = "e", col = rgb(1, 0, 0, 1/4))
hist(G, main = "G", xlab = "g", col = rgb(1, 0, 0, 1/4))

#re-set grid
par(mfrow = c(1,1))
#part b., define n
n = 2

#should get 1
(gamma(n + 1)/gamma(n))^2/4
## [1] 1




8.3

The New England Patriots, considered by many as the greatest dynasty in modern sports, won 5 Super Bowls (NFL Championships) and played in 2 more from 2001 to 2017. The previous two coaches for the Patriots have been Pete Carroll (1997 - 1999) and Bill Belichick (2000 -). Carroll’s overall regular season record with the Patriots was 27-21 (27 wins and 21 losses), and Belichick’s current regular season record (at the time of this publication) is 201-71. Since both coached the same team in a similar time period, it may be reasonable to isolate and compare their coaching abilities based on their results.

Define \(p_{Carroll}\) as the true probability that Pete Carroll will win a game that he coaches. Consider a Bayesian approach where we assign a random distribution to this parameter: a reasonable (uninformative) distribution would be \(p_{Carroll} \sim Beta(1, 1)\). Based on Beta-Binomial conjugacy (where each game is treated as a Bernoulli trial with \(p_{Carroll}\) probability of success), the posterior distribution of \(p_{Carroll}\) after observing his regular season record is \(p_{Carroll} \sim Beta(1 + 27, 1 + 21)\).

  1. Consider \(p_{Belichick}\), the true probability that Bill Belichick will win a game that he coaches. First, if we assume that \(p_{Belichick}\) follows the same distribution as \(p_{Carroll}\), find the probability of observing Belichick’s 201-71 record, or better.

  2. By employing a similar approach to assign a distribution to \(p_{Belichick}\) as we did with \(p_{Carroll}\), compare \(E(p_{Belichick})\) and \(Var(p_{Belichick})\) with \(E(p_{Carroll})\) and \(Var(p_{Carroll})\).

  3. If we assume that \(p_{Carroll}\) follows the same distribution as \(p_{Belicik}\) that we solved for in (b), find the probability of observing Carroll’s record of 27-21, or worse. Compare this to your answer to (a).

Hint: In R, the command \(pbeta(x, alpha, beta)\) returns \(P(X \leq x)\) for \(X \sim Beta(alpha, beta)\).



Analytical Solution:

  1. If \(p_{Belichick} \sim Beta(1 + 27, 1 + 21)\), the same distribution as \(p_{Carroll}\), then the probability of Belichick’s record (or better) is \(P(X \geq 201/(201 + 71))\) for \(X \sim Beta(28, 22)\). The command in R returns .0034.

  2. By employing an analogous approach to \(p_{Belichick}\), we find \(p_{Belichick} \sim Beta(1 + 201, 1 + 71)\), since Belichick has won 201 games and lost 71. By using facts about the Beta, we find \(E(p_{Belichick}) = 202/(202 + 72) = .737\) and \(E(p_{Carroll}) = 28/(28 + 22) = .56\), then \(Var(p_{Belichick}) = .0007\) and \(Var(p_{Carroll}) = .004\).

  3. If \(p_{Carroll} \sim Beta(1 + 201, 1 + 71)\), the same distribution as \(p_{Belichick}\) from (b), then the probability of Carroll’s record (or worse) is \(P(X \leq 27/(21 + 27))\) for \(X \sim Beta(202, 72)\). The command in R returns \(9.56 \cdot 10^{-10}\). This answer is much closer to 0 than the answer to (a). This is because \(Var(p_{Belichick}) < Var(p_{Carroll})\) (there is more certainty about Belichick’s true winning probability because he has a bigger sample size).


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#generate the r.v.'s for Belichick and Carroll
B = rbeta(sims, 202, 72)
C = rbeta(sims, 28, 22)

#probability of Belichick's record or better with Carroll's distribution
#should get .003
length(C[C > 201/(201 + 71)])/sims
## [1] 0.005
#probability of Carroll's record or worse with Belichick's distribution
#should get close to 0
length(B[B < 27/(27 + 21)])/sims
## [1] 0
#Belichick should have a higher mean and lower variance
mean(B); mean(C)
## [1] 0.7389501
## [1] 0.5613179
var(B); var(C)
## [1] 0.0006432912
## [1] 0.004954525




8.4

Solve the integral:

\[\int_{0}^1 \sqrt{x - x^2} dx\]

You may leave your answer in terms of the \(\Gamma\) function.

Hint: Try factoring the integrand into two different terms (both square roots) and then using Pattern Integration.



Analytical Solution:

As suggested in the hint, we first factor the integrand to \(\sqrt{x}\sqrt{1 - x}\). Now, the integrand resembles the PDF of a \(Beta(3/2, 3/2)\). We complete the PDF by multiplying and dividing by the normalizing constant:

\[\frac{\Gamma(3/2) \Gamma(3/2)}{\Gamma(3)} \int_{0}^1 \frac{\Gamma(3)}{\Gamma(3/2) \Gamma(3/2)} \cdot \frac{\sqrt{x}}{\sqrt{1 - x}}dx\]

The integrand is now the full PDF of a \(Beta(3/2, 3/2)\) integrated over the support, so it goes to 1. We are left with:

\[\frac{\Gamma(3/2) \Gamma(3/2)}{\Gamma(3)} \]


Empirical Solution:

#define the function
fun <- function(x){
  return(sqrt(x - x^2))
}

#solve the integral; these should match
integrate(f = fun, 0, 1)
## 0.3926992 with absolute error < 0.00011
(gamma(3/2)*gamma(3/2))/gamma(3)
## [1] 0.3926991




8.5

Let \(X \sim Gamma(a,\lambda)\). Find an expression for the \(k^{th}\) moment of \(X\) using pattern recognition integration, and use this expression to find \(E(X)\) and \(Var(X)\).



Analytical Solution:

Start with LoTUS: just multiply \(x^k\) by the PDF of \(X\) and integrate (the support of \(X\) is all positive numbers).

\[E(X^k) = \int_{0}^{\infty} \frac{\lambda^a}{\Gamma(a)} x^k x^{a-1} e^{-\lambda x} dx= \frac{\lambda^a}{\Gamma(a)} \int_{0}^{\infty} x^{a+k-1} e^{-\lambda x} dx\]

This is a prime opportunity to use Pattern Integration. This integrand looks like the PDF of a \(Gamma(a+k,\lambda)\), but lacks the normalizing constant \(\frac{\lambda^{a+k}}{\Gamma(a+k)}\). If we multiply and divide by the normalizing constant, we complete the PDF (which integrates to 1) and are left with the reciprocal of the normalizing constant.

\[E(X^k) = \frac{\lambda^a\Gamma(a+k)}{\lambda^{a+k}\Gamma(a)} = \frac{\Gamma(a+k)}{\lambda^{k}\Gamma(a)} \]

Find \(E(X)\) by plugging in \(k = 1\) (recall that \(\Gamma(a+1) = a\Gamma(a)\)).

\[\frac{\Gamma(a+1)}{\lambda\Gamma(a)} = \frac{a\Gamma(a)}{\lambda\Gamma(a)} = \frac{a}{\lambda}\]

This matches the result for the mean of a Gammma. To find the variance, we first find the second moment (plug in \(k = 2\)):

\[E(X^2) = \frac{\Gamma(a+2)}{\lambda^{2}\Gamma(a)} = \frac{(a+1)\Gamma(a+1)}{\lambda^{2}\Gamma(a)} = \frac{(a+1)(a)\Gamma(a)}{\lambda^{2}\Gamma(a)} = \frac{a^2 + a}{\lambda^2}\]

Now we can find variance:

\[Var(X) = E(X^2) - E(X)^2 = \frac{a^2 + a}{\lambda^2} - \frac{a^2}{\lambda^2} = \frac{a^2}{\lambda^2} + \frac{a}{\lambda^2} - \frac{a^2}{\lambda^2} = \frac{a}{\lambda^2}\] Which is the variance of a Gamma!


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#define simple parameters
a = 3
lambda = 5

#generate the r.v.
X = rgamma(sims, a, lambda)

#compare the analytical and empirical moments
k = 1:5
moments.e = sapply(k, function(x) mean(X^x))
moments.a = gamma(a + k)/(lambda^k*gamma(a))

#plots should match, although high moments may start to diverge
plot(k, moments.e, main = "E(X^k) for X ~ Gamma(a, lambda)",
     xlab = "k", ylab = "E(X^k)", col = "red", pch = 16)
lines(k, moments.a, lwd = 3)




8.6

Let \(X \sim Pois(\lambda)\), where \(\lambda\) is unknown. We put a Gamma prior on \(\lambda\) such that \(\lambda \sim Gamma(a, b)\). Find \(\lambda | X\), the posterior distribution of \(\lambda\).



Analytical Solution:

We start with Bayes’ Rule:

\[f(\lambda | X = x) \propto P(X = x|\lambda) f(\lambda)\]

Where, recall, \(\propto\) means proportional to. We use this because we only care about terms that are functions of \(\lambda\).

We know that \(X | \lambda \sim Pois(\lambda)\) and \(\lambda \sim Gamma(a, b)\), so we can plug in the density functions:

\[= \frac{e^{-\lambda} \lambda^x}{x!} \cdot \frac{1}{\Gamma(a)} b^a \lambda^{a - 1} e^{-\lambda b}\]

Dropping terms that don’t change with \(\lambda\):

\[\propto e^{-\lambda(1 + b)} \lambda^{a + x - 1}\]

This looks like the ‘meaty’ part of a Gamma distribution with parameters \(x + a\) and \(1 + b\). So, we found the posterior distribution:

\[(\lambda | X = x) \sim Gamma(x + a, 1 + b)\]

Notice that we used a Gamma prior and got a Gamma posterior; this is another example of conjugacy (using a distribution as a prior and getting the same distribution as a posterior).


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#define simple parameters
a = 5
b = 2

#generate the r.v.'s
lambda = rgamma(sims, a, b)
X = sapply(lambda, function(x) rpois(1, x))

#condition on a specific value of X
x = 1

#define titles
title1 = paste0("lambda | X = ", x)
title2 = paste0("Gamma(", a + x, ", ", b + 1, ")")

#plot lambda conditioned on X = x; should match
#   the Gamma distribution

#2x2 plot grid
par(mfrow = c(1,2))

hist(lambda[X == x], main = title1, col = rgb(1, 0, 0, 1/4), 
     xlab = "lambda", ylab = "")
hist(rgamma(sims, x + a, 1 + b), main = title2, col = rgb(0, 1, 0, 1/4),
     xlab = "", ylab = "")

#re-set grahpics
par(mfrow = c(1,1))




8.7

Let \(X \sim Beta(a, b)\) and \(Y = cX\) for some constant \(c\).

  1. Find \(f(y)\).
  2. Does \(Y\) have a Beta distribution?



Analytical Solution:

By the transformation theorem:

\[f(y) = f(x) |\frac{dx}{dy}|\]

\[= \frac{\Gamma(a + b)}{\Gamma(a)\Gamma(b)}(\frac{y}{c})^{a - 1}(1 - \frac{y}{c})^{b - 1} \frac{1}{c}\]

  1. The PDF does not look to be Beta. More rigorously, though, we can immediately say that \(Y\) has support 0 to \(c\), since the support of \(X\) is 0 to 1. This is not the support of a Beta (must be 0 to 1) so \(Y\) is not Beta.


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#define simple parameters
a = 3
b = 5
c = 2

#generate the random variables
X = rbeta(sims, a, b)
Y = c*X

#plot the PDFs; should be the same
#calculate the analytical PDF
y = seq(from = 0, to = c, length.out = 100)
PDF = (gamma(a + b)/(gamma(a)*gamma(b)))*(y/c)^(a - 1)*(1 - y/c)^(b - 1)*(1/c)

#plots should line up
#empirical
plot(density(Y), col = "black", 
     main = "PDF", type = "l",
     xlab = "y", ylab = "f(y)",lwd = 3)

#analytical
lines(y, PDF, col = "red", pch = 20, type = "p", lwd = 3)


legend("topright", legend = c("Empirical PDF", "Analytical PDF"),
       lty=c(1,1), lwd=c(2.5,2.5),
       col=c("black", "red"))

#Y can't be Beta, the max is over 1
max(Y)
## [1] 1.58276




8.8

Let \(a, b\) and \(m\) be positive integers such that \(a + b > m\). Find:

\[\sum_{x = 0}^m {a \choose x}{b \choose m - x}\]



Analytical Solution:

We should pattern-match this to the ‘meaty’ part of the PMF of a \(HGeom(a, b, m)\); we are only missing the constant \({a + b \choose m}\). We complete the PMF:

\[{a + b \choose m} \sum_{x = 0}^m \frac{{a \choose x}{b \choose m - x}}{{a + b \choose m}}\] And now we have the sum of a valid PMF over its support, which must be 1. We are left with:

\[\sum_{x = 0}^m {a \choose x}{b \choose m - x} = {a + b \choose m}\]

This is also called Vandermonde’s Identity.


Empirical Solution:

#define simple parameters
a = 10
b = 5
m = 7

#define x 
x = 0:m

#these should match
sum(choose(a, x)*choose(b, m - x)); choose(a + b, m)
## [1] 6435
## [1] 6435




8.9

Solve this integral:

\[\int_{0}^1 \int_{0}^1 (xy)^{a - 1} \big((1 - x)(1 - y)\big)^{b - 1} dx dy\]



Analytical Solution:

We can pattern match and realize that this is the ‘meaty part’ joint PDF of \(X\) and \(Y\), where \(X\) and \(Y\) are i.i.d. \(Beta(a, b)\); that is, it is the product of the ‘meaty parts’ of the PDF of \(X\) and \(Y\) (we can take the product because these random variables are independent). Completing the joint PDF:

\[\Big(\frac{\Gamma(a)\Gamma(b)}{\Gamma(a + b)}\Big)^2\int_{0}^1 \int_{0}^1 \Big(\frac{\Gamma(a + b)}{\Gamma(a)\Gamma(b)}\Big)^2(xy)^{a - 1} \big((1 - x)(1 - y)\big)^{b - 1} dx dy\]

\[\Big(\frac{\Gamma(a)\Gamma(b)}{\Gamma(a + b)}\Big)^2\int_{0}^1 \int_{0}^1 \frac{\Gamma(a + b)}{\Gamma(a)\Gamma(b)}x^{a - 1} (1 - x)^{b - 1} \frac{\Gamma(a + b)}{\Gamma(a)\Gamma(b)}y^{a - 1} (1 - y)^{b - 1}dx dy\] And now we have a joint PDF integrated over its support (integrating out both \(x\) and \(y\) over their supports) which means the integral goes to 1 and we are left with:

\[\Big(\frac{\Gamma(a)\Gamma(b)}{\Gamma(a + b)}\Big)^2\]




8.10

Let \(c\) be a constant and \(X\) be a random variable. For the following distributions of \(X\), see if \(Y = X + c\) has the same distribution as \(X\) (not the same parameters, but the same distribution).

  1. \(X \sim Unif(0, 1)\)
  2. \(X \sim Beta(a, b)\)
  3. \(X \sim Expo(\lambda)\)
  4. \(X \sim Bin(n, p)\)
  5. \(X \sim Pois(\lambda)\)



Analytical Solution:

  1. We can find \(f(y)\) via the transformation theorem:

\[f(y) = f(x)|\frac{dx}{dy}|\]

We know that \(f(x) = 1\), and \(Y = X + c \rightarrow X = Y - c\), so the derivative of \(X\) in terms of \(Y\) is just 1. Putting it together:

\[f(y) = 1\]

Since the PDF is constant, we know that \(Y\) is Uniform, and since the PDF of a Uniform is proportional to length, we know that \(Y\) has length 1. We know that the support of \(X\) is 0 to 1, and adding \(c\) to these endpoints allows us to see that the support of \(Y\) is \(c\) to \(c + 1\), so \(Y \sim Unif(c, c + 1)\). This is intuitive; each point in (0,1) maps to a point in \((c, c + 1)\) when we add \(c\) to it.

  1. We could find the PDF of \(Y\) via the transformation theorem, or we could see that the support of \(Y\) could include values above 1 (imagine if \(c = 100\), for example). The support of a Beta random variable is 0 to 1, so \(Y\) cannot be Beta.

  2. We could find the PDF of \(Y\) via the transformation theorem, or we could think about memorylessness. Imagine if \(c = 4\), for example. If we think about the random variable \(Y\) and we have been waiting for 2 minutes (recall that the story of an Exponential is that we are waiting for a bus), we know that we have to wait at least 2 more minutes (since we have \(Y + 4\) and the smallest \(Y\) can be is 0). If we wait for another minute (so we have been waiting for 3 minutes total), then we know we have to wait at least 1 more minute (i.e., our distribution of wait time going forward has changed). This clearly violates the memoryless property, and if \(Y\) is not memoryless, it does not have an Exponential distribution.

  3. Again, we merely have to think about the support of \(Y\); we don’t even need to find the PMF. Since the support of \(X\) is \(0,1,2,...,n\), the support of \(Y\) is \(c,c + 1,..., c + n\). Since this is not the support of a Binomial, \(Y\) is not Binomial.

  4. \(Y\) is not Poisson, for the same reason in part d. that \(Y\) is not Binomial (the support is \(c, c + 1,...\) instead of \(0, 1,...\)).


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#define simple parameters
lambda = 5
a = 3
b = 5
n = 10
p = 1/2
c = 2

#part a.
#generate r.v.'s
X = runif(sims)
Y = X + c

#show that Y ~ Unif(c, c + 1)
#1x2 graph grid
par(mfrow = c(1,2))

#graphics
hist(Y, main = "Y", col = rgb(0,0,1,1/4), xlab = "")
hist(runif(sims, c, c + 1), main = "Unif(c, c + 1)", col = rgb(0,1,0,1/4), xlab = "")

#re-set graphics
par(mfrow = c(1,1))


#part b.
#generate r.v.'s
X = rbeta(sims, a, b)
Y = X + c

#the support of Y is not contained in (0,1)
max(Y)
## [1] 2.813904
#part c.
#generate r.v.'s
X = rexp(sims, lambda)
Y = X + c

#these probabilities are different (not memoryless, so not exponential)
length(Y[Y >= 1 + c])/length(Y[Y >= 1]); length(Y[Y >= c])/sims
## [1] 0.006
## [1] 1
#part d.
#generate r.v.'s
X = rbinom(sims, n, p)
Y = X + c


#the support of Y is not 0,1,2,...
table(Y)
## Y
##   2   3   4   5   6   7   8   9  10  11  12 
##   2  16  44 107 216 236 196 129  44   8   2
#part e.
#generate r.v.'s
X = rpois(sims, lambda)
Y = X + c


#the support of Y is not 0,1,2,...
table(Y)
## Y
##   2   3   4   5   6   7   8   9  10  11  12  13  14  15 
##  11  35  98 115 176 185 146 105  68  33  17   7   3   1




8.11

Brandon is doing his homework, but he is notorious for taking frequent breaks. The breaks he takes over the next hour follow a Poisson process with rate \(\lambda\). Given that he takes less than 3 breaks overall, what is the probability that he takes a break in the first half hour? Your answer should only include \(\lambda\) (and constants).

Hint: conditioned on the number of arrivals, the arrival times of a Poisson process are uniformly distributed.



Analytical Solution:

Define \(X\) as the number of arrivals. We know that we either have 0, 1 or 2 arrivals. We can find the probability of these cases using the PMF of a Poisson and renormalizing:

\[P(X = 0) = \frac{e^{-\lambda}}{e^{-\lambda} + \lambda e^{-\lambda} + \lambda^2 e^{\lambda}/2}\] \[P(X = 1) = \frac{\lambda e^{-\lambda}}{e^{-\lambda} + \lambda e^{-\lambda} + \lambda^2 e^{\lambda}/2}\] \[P(X = 0) = \frac{\lambda^2 e^{\lambda}/2}{e^{-\lambda} + \lambda e^{-\lambda} + \lambda^2 e^{\lambda}/2}\]

Now define \(A\) as the event that there is an arrival in the first half hour. By LOTP:

\[P(A) = P(A|X = 0)P(X = 0) + P(A|X = 1)P(X = 1) + P(A|X = 2)P(X = 2)\]

\(P(A|X = 0) = 0\), since if we have no arrivals, we can’t have an arrival in the first half hour. \(P(A|X = 1) = 1/2\), since there is a 1/2 probability that this first arrival is in the first half hour (because arrivals are uniform in this hour if we condition on the number of arrivals). \(P(A|X = 2) = 3/4\), since there is a \(1/2^2\) probability that neither of the two arrivals are in the first half hour (again, arrivals are uniform if we condition on the number of arrivals). Putting it all together:

\[P(A) = P(X = 1)/2 + 3P(X = 2)/4\]

Where \(P(X = 1)\) and \(P(X = 2)\) are defined above.


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#define a simple parameter
lambda = 2

#keep track of the number of arrivals
arrivals = rep(NA, sims)

#indicator if we have an arrival in the first half hour
first.half = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #generate the number of arrivals
  arrivals[i] = rpois(1, lambda)
  
  #if we didn't get any arrivals, just go to the next iteration
  if(arrivals[i] == 0){
    next
  }
  
  #generate arrival times, using the hint that they are uniform
  times = runif(arrivals[i], 0, 60)
  
  #see if we got anything in the first half hour
  if(min(times) <= 30){
    first.half[i] = 1
  }
}

#find the analytical Solution:
(1/2)*dpois(1, lambda)/(dpois(0, lambda) + dpois(1, lambda) + dpois(2, lambda)) + 
(3/4)*dpois(2, lambda)/(dpois(0, lambda) + dpois(1, lambda) + dpois(2, lambda))
## [1] 0.5
#find the mean when we get less than three arrivals; should match analytical
mean(first.half[arrivals < 3])
## [1] 0.522694
#alternative Solution:, not using the result that the arrival times
#   are uniform conditioned on the number of arrivals

#define a simple parameter
lambda = 2

#keep track of number of arrivals
arrivals = rep(0, sims)

#indicator if we have an arrival in the first half hour
first.half = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #initialize the current time at 0
  current.time = 0
  
  #go until we exceed 1 hour, then break
  while(TRUE){
    
    #add on another random time
    current.time = current.time + rexp(1, lambda)
    
    #see if we breached 1 hour, get out if we did
    if(current.time > 1){
      break
    }
    
    #if we didn't breach an hour, increment the arrivals
    arrivals[i] = arrivals[i] + 1
    
    #if this arrival was in the first half hour, mark it
    if(current.time < 1/2){
      first.half[i] = 1
    }
  }
}

#again, should match the analytical result
mean(first.half[arrivals < 3])
## [1] 0.5038052




8.12

You again find yourself in a diving competition; in this competition, you take 3 dives, and each is scored by a judge from 0 to 1 (1 being the best, 0 the worst). The scores are continuous (i.e., you could score a .314, etc.). The judge is completely hapless, meaning that the scores are completely random and independent. On average, what will your highest score be? On average, what will your lowest score be?



Analytical Solution:

We saw this problem earlier, and used a ‘brute force’ approach to solve it. Here, letting \(X_1, X_2, X_3\) represent the 3 dive scores such that they are i.i.d. \(Unif(0,1)\), we know that \(X_{(1)}\) and \(X_{(3)}\), or the first and third Order Statistics, represent the lowest and highest scores. We know that the Order Statistics of a Uniform are Beta such that \(X_{(1)} \sim Beta(1, 3 - 1 + 1)\) and \(X_{(3)} \sim Beta(3, 3 - 1 + 1)\). Using facts about the Beta, we have that \(E(X_{(1)}) = 1/4\) and \(E(X_{(3)}) = 3/4\), or the lowest score on average is \(1/4\) and the highest score on average is \(3/4\). This agrees with the result we saw earlier.


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#generate the r.v.'s
X1 = rep(NA, sims)
X2 = rep(NA, sims)
X3 = rep(NA, sims)


#keep track of the high and low scores
high = rep(NA, sims)
low = rep(NA, sims)


#calculate the low and high scores
for(i in 1:sims){
  
  high[i] = max(X1[i], X2[i], X3[i])
  low[i] = min(X1[i], X2[i], X3[i])
}

#should get 1/4 and 3/4
mean(low); mean(high)
## [1] NA
## [1] NA




8.13

Imagine a flat, 2-D circle on which \(n > 1\) blobs are randomly (uniformly) placed along the outside of the circle. All of the blobs travels in the clockwise direction, and each blob is assigned independently assigned a speed from draws of a \(Unif(0, 1)\) distributions (the higher the draw, the faster the speed).

If a blob ‘catches’ up to a slower blob in front of it, the fast blob ‘eats’ the slow bob. The fast blob does not change size (size is irrelevant in this structure anyways) but it now travels slower: specifically, it takes the speed of the slower blob that it just ate.

Imagine letting this system run forever. Let \(X\) be the random variable that represents the average speed of all surviving blobs (note that this random variable is an average, not necessarily a single point). Find \(E(X)\).



Analytical Solution:

At any point in time, one of the blobs in the system is the ‘fastest blob’: recall that, for continuous random variables, the probability of the variable taking on any one specific value is 0, so blobs cannot have the exact same speed (even after ‘fast’ blobs eat ‘slow’ blobs). It is a certainty that, eventually, this ‘fastest blob’ will catch the blob in front of it (since the blob in front of the ‘fastest blob’ must be slower than the ‘fastest blob’), therefore reducing the number of blobs in the system by 1. Once the fastest blob has eaten the blob in front of it, the system is smaller, but the structure is the same: there is still a ‘fastest blob’ (perhaps a different blob from the original ‘fastest blob’) and thus the system must continue to decrease by 1 until there is one blob left in the system. Of course, there will be times when other blobs than just the ‘fastest blob’ catch the blobs in front of them, but this just also serves to reduce the size of the system.

The final remaining blob must have the speed of the slowest blob from the original layout: the slowest blob can never catch another blob and thus can never adapt another speed (and ‘shed’ the slowest speed), and we know that there must be one blob remaining, so it must have the speed of the slowest blob. Therefore, since the average of a single number is just that number, the random variable \(X\) is always equivalent to the speed of the slowest blob. We know, by the definition of the Uniform order statistic, that \(U_{(1)} \sim Beta(1, n)\), where \(U_{(1)}\) is the first order statistic (or the speed of the slowest blob). Therefore, \(E(X) = \frac{1}{1 + n}\)




8.14

Imagine a subway station where trains arrive according to a Poisson Process with rate parameter \(\lambda_1\). Independently, customers arrive to the station according to a Poisson Process with rate parameter \(\lambda_2\). Each time that a train arrives, it picks up all of the customers at the station and departs.

If you arrive at the station and see 5 customers there (i.e., 5 customers have arrived since the last train departed) how long should you expect to wait for the next train?



Analytical Solution: The time between trains are i.i.d. \(Expo(\lambda_1)\) random variables by the story of a Poisson Process. It’s intuitive to think that the number of customers at the station when you arrive gives some indication as to how long ago the most recent train arrived (i.e., if there are many customers instead of a few, the most recent train probably departed further back in time) which then gives information about how much longer you should expect to wait for the next train; however, due to the memoryless property, it doesn’t matter how long you have been waiting for the next train to arrive. Regardless of the time that has passed since the last train, the waiting time going forward for the next train is still distributed \(Expo(\lambda_2)\). Therefore, we expect to wait for \(1/\lambda_2\) minutes regardless of how many customers are on the platform.




BH Problems



The problems in this section are taken from @BH. The questions are reproduced here, and the analytical solutions are freely available online. Here, we will only consider empirical solutions: answers/approximations to these problems using simulations in R.




BH 8.29

Let \(B \sim Beta(a,b)\). Find the distribution of \(1-B\) in two ways: (a) using a change of variables and (b) using a story proof. Also explain why the result makes sense in terms of Beta being the conjugate prior for the Binomial.

#replicate
set.seed(110)
sims = 1000

#define simple parameters
a = 5
b = 10

#generate the r.v.
B = rbeta(sims, a, b)

#1 - B should be Beta(b, a)
hist(1 - B, col = "gray", main = "1 - B",
     xlab = "1 - b")

hist(rbeta(sims, b, a), col = "gray", 
     main = "Beta(b, a)", xlab = "")




BH 8.30

Let \(X \sim Gamma(a,\lambda)\) and \(Y \sim Gamma(b,\lambda)\) be independent, with \(a\) and \(b\) integers. Show that \(X+Y \sim Gamma(a+b,\lambda)\) in three ways: (a) with a convolution integral; (b) with MGFs; (c) with a story proof.

#replicate
set.seed(110)
sims = 1000

#define simple parameters
a = 5
b = 10
lambda = 1

#generate the r.v.'s
X = rgamma(sims, a, lambda)
Y = rgamma(sims, b, lambda)

#histograms should match
hist(X + Y, col = "gray", xlab = "x + y",
     main = "X + Y")

hist(rgamma(sims, a + b, lambda), 
     col = "gray", xlab = "",
     main = "Gamma(a + b, lambda)")




BH 8.32

Fred waits \(X \sim Gamma(a,\lambda)\) minutes for the bus to work, and then waits \(Y \sim \Gamma(b,\lambda)\) for the bus going home, with \(X\) and \(Y\) independent. Is the ratio \(X/Y\) independent of the total wait time \(X+Y\)?

#replicate
set.seed(110)
sims = 1000

#define simple parameters
a = 5
b = 10
lambda = 1

#generate the r.v.'s
X = rgamma(sims, a, lambda)
Y = rgamma(sims, b, lambda)
ratio = X/Y
total = X + Y

#the ratio should have the same distribution if the total wait time is high or low (above/below its mean)
#this does not prove independence rigorously just gives a specific example
hist(ratio[total < mean(total)], 
     xlim = c(0, max(ratio)), 
     main = "X/Y when X + Y is small",
     xlab = "Ratio", col = "gray")

hist(ratio[total > mean(total)], 
     xlim = c(0, max(ratio)), 
     main = "X/Y when X + Y is large",
     xlab = "Ratio", col = "gray")




BH 8.33

The \(F\)-test is a very widely used statistical test based on the \(F(m,n)\) distribution, which is the distribution of \(\frac{X/m}{Y/n}\) with \(X \sim Gammam(\frac{m}{2},\frac{1}{2}), Y \sim Gamma(\frac{n}{2},\frac{1}{2}).\) Find the distribution of \(mV/(n+mV)\) for \(V \sim F(m,n)\).

#replicate
set.seed(110)
sims = 1000

#define simple parameters
m = 3
n = 5

#generate the r.v.'s (use Z instead of F, since F is saved in R as "FALSE")
X = rgamma(sims, m/2, 1/2)
Y = rgamma(sims, n/2, 1/2)
Z = (X/m)/(Y/m)

#show that the distributions are the same
hist(m*Z/(n + m*Z), 
     main = "(mZ)/(n + mZ), where Z ~ F(m, n)", col = "gray", 
     xlab = "")

hist(rbeta(sims, m/2, n/2), 
     main = "Beta(m/2, n/2)", 
     col = "gray", xlab = "")




BH 8.34

Customers arrive at the Leftorium store according to a Poisson process with rate \(\lambda\) customers per hour. The true value of \(\lambda\) is unknown, so we treat it as a random variable. Suppose that our prior beliefs about \(\lambda\) can be expressed as \(\lambda \sim Expo(3).\) Let \(X\) be the number of customers who arrive at the Leftorium between 1 pm and 3 pm tomorrow. Given that \(X=2\) is observed, find the posterior PDF of \(\lambda\).

#replicate
set.seed(110)

#increased number of sims; rare event
sims = 10000

#generate the r.v.'s
lambda = rexp(sims, 3)

#generate the times using the lambas
X = rep(NA, sims)
for(i in 1:sims){
  X[i] = rpois(1, lambda[i])
}

#find the posterior lambda
lambda.post = lambda[X == 2]


#calculate the analytical PDF
t = seq(from = min(lambda.post), to = max(lambda.post), length.out = 100)
PDF = (125/2)*t^2*exp(-5*t)


#plot the PDFs, they should match
plot(density(lambda.post), 
     main = "Posterior Density of Lambda", 
     xlab = "Y", col = "red", lwd = 6, ylim = c(0,2))
lines(t, PDF, lwd = 1, type = "h")


legend("topright", legend = c("Empirical PDF", "Analytical PDF"),
       lty=c(1,1), lwd=c(2.5,2.5),
       col=c("red", "black"))




BH 8.35

Let \(X\) and \(Y\) be independent, positive r.v.s. with finite expected values.

  1. Give an example where \(E(\frac{X}{X+Y}) \neq \frac{E(X)}{E(X+Y)}\), computing both sides exactly.

Hint: Start by thinking about the simplest examples you can think of!

#replicate
set.seed(110)
sims = 1000

#simple example, similar to the given solution
X = sample(1:2, sims, replace = TRUE)
Y = sample(2:3, sims, replace = TRUE)

#the first should be slightly smaller (small difference)
mean(X/(X + Y))
## [1] 0.36685
mean(X)/mean(X + Y)
## [1] 0.3708229
  1. If \(X\) and \(Y\) are i.i.d., then is it necessarily true that \(E(\frac{X}{X+Y}) = \frac{E(X)}{E(X+Y)}\)?
#show in the case of rolling two 6-sided die
X = sample(1:6, sims, replace = TRUE)
Y = sample(1:6, sims, replace = TRUE)

#should be equal; this doesn't prove all cases, of course, just one
mean(X/(X + Y))
## [1] 0.4898773
mean(X)/mean(X + Y)
## [1] 0.4897491
  1. Now let \(X \sim Gamma(a,\lambda)\) and \(Y \sim Gamma(b,\lambda)\). Show without using calculus that \[E\left(\frac{X^c}{(X+Y)^c}\right) = \frac{E(X^c)}{E((X+Y)^c)}\] for every real \(c>0\).
#define simple parameters
a = 5
b = 10
lambda = 3

#try for different values of c
c = 1:10

#keep track of the results
path1 = rep(NA, length(c))
path2 = rep(NA, length(c))

#run the loop
for(i in 1:length(c)){
  
  #calculate the values
  path1[i] = mean(X^c[i]/((X + Y)^c[i]))
  path2[i] = mean(X^c[i])/(mean((X + Y)^c[i]))
}

#show that the paths are the same
plot(c, path1, 
     main = "E(X^c/((X + Y)^c)) and E(X^c)/E((X + Y)^c)", 
     col = "black", type = "l", lwd = 3, ylab = "")
lines(c, path2, type = "p", col = "red", pch = 20, lwd = 3)


legend("topright", legend = c("E(X^c/((X + Y)^c))", "E(X^c)/E((X + Y)^c)"),
       lty=c(1,1), lwd=c(2.5,2.5),
       col=c("black", "red"))




BH 8.41

Let \(X \sim Bin(n,p)\) and \(B \sim Beta(j,n-j+1)\), where \(n\) is a positive integer and \(j\) is a positive integer with \(j \leq n\). Show using a story about order statistics that \[ P(X \geq j) = P(B \leq p).\] This shows that the CDF of the continuous r.v. \(B\) is closely related to the CDF of the discrete r.v. \(X\), and is another connection between the Beta and Binomial.

#replicate
set.seed(110)
sims = 1000

#define simple parameters
n = 10
p = 1/2
j = 5

#draw the r.v.'s
X = rbinom(sims, n, p)
B = rbeta(sims, j, n - j + 1)

#probabilities should be equal
length(X[X >= j])/sims
## [1] 0.6
length(B[B <= p])/sims
## [1] 0.641




8.45

Let \(X\) and \(Y\) be independent \(Expo(\lambda)\) r.v.s and \(M = \max(X,Y)\). Show that \(M\) has the same distribution as \(X + \frac{1}{2}Y\), in two ways: (a) using calculus and (b) by remembering the memoryless property and other properties of the Exponential.

#replicate
set.seed(110)
sims = 1000

#define simple parameters
lambda = 1

#generate the r.v.'s
X = rexp(sims, lambda)
Y = rexp(sims, lambda)

#find M
M = rep(NA, sims)
for(i in 1:sims){
  M[i] = max(X[i], Y[i])
}

#show that the distributions are the same
hist(M, main = "M", col = "gray", xlab = "")

hist(X + Y/2, main = "X + Y/2", col = "gray", xlab = "" )




BH 8.46
  1. If \(X\) and \(Y\) are i.i.d. continuous r.v.s with CDF \(F(x)\) and PDF \(f(x)\), then \(M = \max(X,Y)\) has PDF \(2F(x)f(x)\). Now let \(X\) and \(Y\) be discrete and i.i.d., with CDF \(F(x)\) and PMF \(f(x)\). Explain in words why the PMF of \(M\) is not \(2F(x)f(x)\).

  2. Let \(X\) and \(Y\) be independent \(Bern(1/2)\) r.v.s, and let \(M = \max(X,Y)\), \(L= \min(X,Y)\). Find the joint PMF of \(M\) and \(L\), i.e., \(P(M=a,L=b)\), and the marginal PMFs of \(M\) and \(L\).

#replicate
set.seed(110)
sims = 1000

#generate the r.v.'s
X = rbinom(sims, 1, 1/2)
Y = rbinom(sims, 1, 1/2)

#set paths for M and L
M = rep(NA, sims)
L = rep(NA, sims)

#run the loop
for(i in 1:sims){
  
  #calculate the mins and maxes
  M[i] = max(X[i], Y[i])
  L[i] = min(X[i], Y[i])
}


#create heatmaps
data <- data.frame(M, L)

data = group_by(data, M, L)
data = summarize(data, density = n())
data$density = data$density/sims

ggplot(data = data, aes(L, M)) + 
  geom_tile(aes(fill = density), color = "white") +
  scale_fill_gradient(low = "white", high= "red", name = "Density") + 
  ggtitle("Empirical Density of M and L") + 
  theme(plot.margin = unit(c(1.8,.5,1.75,1.55), "cm")) +
  theme(plot.title = element_text(family = "Trebuchet MS", 
                                  color="#383838", face="bold", size=25, vjust = 2)) + 
  theme(axis.title.x = element_text(family = "Trebuchet MS", 
                                    color="#383838", face="bold", size=22, vjust = -1.5),
        axis.title.y = element_text(family = "Trebuchet MS", 
                                    color="#383838", face="bold", size=22, vjust = 2)) +
  theme(axis.text.x = element_text(face="bold", color="#383838", 
                                   size=14),
        axis.text.y = element_text(face="bold", color="#383838", 
                                   size=14))

#analytical Joint Distribution

#define the supports/all combinations
M.a = 0:1
L.a = 0:1
data = expand.grid(M.a = M.a, L.a = L.a)

#calculate density
data$density = c(1/4, 1/2, 0, 1/4)

#remove points with 0 density
data = data[data$density != 0, ]

#generate a heatmap
ggplot(data = data, aes(L.a, M.a)) + 
  geom_tile(aes(fill = density), color = "white") +
  scale_fill_gradient(low = "white", high= "red", name = "Density") + 
  ggtitle("Analytical Density of M and L") + 
  theme(plot.margin = unit(c(1.8,.5,1.75,1.55), "cm")) +
  theme(plot.title = element_text(family = "Trebuchet MS", 
                                  color="#383838", face="bold", size=25, vjust = 2)) + 
  theme(axis.title.x = element_text(family = "Trebuchet MS", 
                                    color="#383838", face="bold", size=22, vjust = -1.5),
        axis.title.y = element_text(family = "Trebuchet MS", 
                                    color="#383838", face="bold", size=22, vjust = 2)) +
  theme(axis.text.x = element_text(face="bold", color="#383838", 
                                   size=14),
        axis.text.y = element_text(face="bold", color="#383838", 
                                   size=14))

#show that M is Bern(3/4)
hist(M, col = "gray", main = "M", xlab = "")

hist(rbinom(sims, 1, 3/4), main = "Bern(3/4)", col = "gray")




BH 8.48

Let \(X_1, X_2, ..., X_n\) be i.i.d. r.v.s with CDF \(F\) and PDF \(f\). Find the joint PDF of the order statistics \(X_{(i)}\) and \(X_{(j)}\) for \(1 \leq i < j \leq n\), by drawing and thinking about a picture.

#replicate
set.seed(110)
sims = 1000

#define simple parameters
n = 5
j = 4
i = 2

#keep track of the order statistics
Xi = rep(NA, sims)
Xj = rep(NA, sims)

#run the loop
for(k in 1:sims){
  
  #draw and sort the r.v. (use a standard normal for simplicity)
  X = rnorm(n)
  X = sort(X)
  
  #find the order statistics
  Xi[k] = X[i]
  Xj[k] = X[j]
}


#round so we can see
Xi = round(Xi, 1)
Xj = round(Xj, 1)

#generate a heat map
#first, empirical
data <- data.frame(Xi, Xj)

data = group_by(data, Xi, Xj)
data = summarize(data, density = n())
data$density = data$density/sims

ggplot(data = data, aes(Xi, Xj)) + 
  geom_tile(aes(fill = density), color = "white") +
  scale_fill_gradient(low = "white", high= "red", name = "Density") + 
  ggtitle("Empirical Density of Xi and Xj") + 
  theme(plot.margin = unit(c(1.8,.5,1.75,1.55), "cm")) +
  theme(plot.title = element_text(family = "Trebuchet MS", 
                                  color="#383838", face="bold", size=25, vjust = 2)) + 
  theme(axis.title.x = element_text(family = "Trebuchet MS", 
                                    color="#383838", face="bold", size=22, vjust = -1.5),
        axis.title.y = element_text(family = "Trebuchet MS", 
                                    color="#383838", face="bold", size=22, vjust = 2)) +
  theme(axis.text.x = element_text(face="bold", color="#383838", 
                                   size=14),
        axis.text.y = element_text(face="bold", color="#383838", 
                                   size=14))

#analytical Joint Distribution

#define the supports/all combinations
Xi.a = seq(from = min(Xi), to = max(Xi), length.out = 100)
Xj.a = seq(from = min(Xj), to = max(Xj), length.out = 100)
data = expand.grid(Xi.a = Xi.a, Xj.a = Xj.a)

#calculate density
data$density = apply(data, 1, function(x){
  
  #define the constant
  c = factorial(n)/(factorial(i - 1)*factorial(j - i - 1)*factorial(n - j))
  
  return(c*pnorm(x[1])^(i - 1)*dnorm(x[1])*(pnorm(x[2]) - pnorm(x[1]))^(j - i - 1)*
           dnorm(x[2])*(1 - pnorm(x[2]))^(n - j))})

#remove points with 0 density
data = data[data$density != 0, ]

#can never have Xi > Xj
data = data[data$Xj.a > data$Xi.a, ]

#generate a heatmap
ggplot(data = data, aes(Xi.a, Xj.a)) + 
  geom_tile(aes(fill = density), color = "white") +
  scale_fill_gradient(low = "white", high= "red", name = "Density") + 
  ggtitle("Analytical Density of X and Y") + 
  theme(plot.margin = unit(c(1.8,.5,1.75,1.55), "cm")) +
  theme(plot.title = element_text(family = "Trebuchet MS", 
                                  color="#383838", face="bold", size=25, vjust = 2)) + 
  theme(axis.title.x = element_text(family = "Trebuchet MS", 
                                    color="#383838", face="bold", size=22, vjust = -1.5),
        axis.title.y = element_text(family = "Trebuchet MS", 
                                    color="#383838", face="bold", size=22, vjust = 2)) +
  theme(axis.text.x = element_text(face="bold", color="#383838", 
                                   size=14),
        axis.text.y = element_text(face="bold", color="#383838", 
                                   size=14))




BH 8.49

Two women are pregnant, both with the same due date. On a timeline, define time 0 to be the instant when the due date begins. Suppose that the time when the woman gives birth has a Normal distribution, centered at 0 and with standard deviation 9 days. Assume that the two birth times are i.i.d. Let \(T\) be the time of the first of the two births (in days).

  1. Show that \[E(T) = \frac{-8}{\sqrt{\pi}}.\]

Hint: For any two random variables \(X\) and \(Y\), we have \(\max(X,Y)+\min(X,Y)=X+Y\) and \(\max(X,Y)-\min(X,Y)=|X-Y|\). Example 7.2.3 derives the expected distance between two i.i.d. \(N(0,1)\) r.v.s.

#replicate
set.seed(110)
sims = 1000

#keep track of first birth
time = rep(NA, sims)

#run the loop
for(i in 1:sims){
  
  #draw the two births, mark the first
  time[i] = min(rnorm(2, 0, 8))
}

#should get -8/sqrt(pi) = -4.51
mean(time)
## [1] -4.86302
  1. Find \(Var(T)\), in terms of integrals. You can leave your answer unsimplified for this part, but it can be shown that the answer works out to \[ Var(T) = 64\left(1-\frac{1}{\pi}\right).\]
#recycle vector. Should get 64*(1 - 1/pi) = 43.63
var(time)
## [1] 42.91849




BH 8.53

A DNA sequence can be represented as a sequence of letters, where the ``alphabet’’ has 4 letters: A,C,T,G. Suppose such a sequence is generated randomly, where the letters are independent and the probabilities of A,C,T,G are \(p_1, p_2, p_3, p_4\) respectively.

  1. In a DNA sequence of length \(115\), what is the expected number of occurrences of the expression “CATCAT” (in terms of the \(p_j\))? (Note that, for example, the expression ``CATCATCAT’’ counts as 2 occurrences.)
#replicate
set.seed(110)
sims = 1000

#define simple parameters
p1 = 1/4
p2 = 1/4
p3 = 1/4
p4 = 1/4

#count how many CATs we see
cat = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #generate the string
  sequence = sample(c("A", "C", "T", "G"), 115, prob = c(p1, p2, p3, p4), replace = TRUE)
  
  #iterate through the sequence; count the CAT's
  for(j in 6:115){
    
    #see if we got a CAT
    if(sequence[j - 5] == "C" && sequence[j - 4] == "A" && sequence[j - 3] == "T"
       && sequence[j - 2] == "C" && sequence[j - 1] == "A" && sequence[j] == "T"){
      cat[i] = cat[i] + 1
    }
  }
}

#should get 110*(p1*p2*p3)^2 = .026
mean(cat)
## [1] 0.028
  1. What is the probability that the first A appears earlier than the first C appears, as letters are generated one by one (in terms of the \(p_j\))?
#update probabilities to make this part more interesting
p1 = 1/3
p2 = 1/2
p3 = 1/12
p4 = 1/12

#indicator if A appears first
success = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #loop until we get an A or C
  while(TRUE){
    
    #draw a letter
    letter = sample(c("A", "C", "T", "G"), 1, prob = c(p1, p2, p3, p4), replace = TRUE)
    
    #mark if we get an A and stop looping
    if(letter == "A"){
      success[i] = 1
      break
    }
    
    #stop looping if we get a c
    if(letter == "C"){
      break
    }
  }
}

#should get p1/(p1 + p2) = .4
mean(success)
## [1] 0.406
  1. For this part, assume that the \(p_j\) are unknown. Suppose we treat \(p_2\) as a \(Unif(0,1)\) r.v. before observing any data, and that then the first 3 letters observed are “CAT”. Given this information, what is the probability that the next letter is C?
#since we are only intersted in p2, we only care if we have C or not C; CAT is the same as CGT, for example
#order also doesn't matter, so we consider cases when we have one success in first three trials
#indicators for getting one C in first three trials, and then a C next
one.C = rep(0, sims)
C.next = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #draw p2 from the prior
  p2 = runif(1)
  
  #draw the data; code 1 as C and the rest as 0
  data = sample(c(0, 1), 4, prob = c(p2, 1 - p2), replace = TRUE)
  
  #see if we satisfy the indicators
  if(sum(data[1:3]) == 1){
    one.C[i] = 1
  }
  if(data[4] == 1){
    C.next[i] = 1
  }
}

#should get 2/5 = .4
mean(C.next[one.C == 1])
## [1] 0.3877551




BH 8.54

Consider independent Bernoulli trials with probability \(p\) of success for each. Let \(X\) be the number of failures incurred before getting a total of \(r\) successes.

  1. Determine what happens to the distribution of \(\frac{p}{1-p}X\) as \(p \to 0\), using MGFs; what is the PDF of the limiting distribution, and its name and parameters if it is one we have studied?
#replicate
set.seed(110)
sims = 1000

#define simple parameters
p = 1/2
r = 10

#create a path for X
X = rep(0, sims)

#run the loop; fill in X
for(i in 1:sims){
  
  #stop when we have enough successes
  successes = 0
  
  #go until we hit r successes
  while(successes < r){
    
    #draw the r.v.
    flip = rbinom(1, 1, p)
    
    #iterate if we get a success
    if(flip == 1){
      successes = successes + 1
    }
    
    #iterate X if we get a failure
    if(flip == 0){
      X[i] = X[i] + 1
    }
  }
}

#histograms should be the same
hist((p*X)/(1 - p), main = "Scaled Negative Binomial",
     col = "gray", xlab = "")

hist(rgamma(sims, r, 1), 
     main = "Gamma(r, 1)", col = "gray", xlab = "")






Limit Theorems and Conditional Expectation




9.1

You observe a sequence of \(n\) normal random variables. The first is a standard normal: \(X_1 \sim N(0, 1)\). Then, the second random variable in the sequence has variance 1 and mean of the first random variable, so \(X_2|X_1 \sim N(X_1, 1)\). In general, \(X_j|X_{j - 1} \sim N(X_{j - 1}, 1)\).

  1. Find \(E(X_n)\) for \(n \geq 2\).

  2. Find \(Var(X_n)\).



Analytical Solution:

  1. Let’s consider \(E(X_n)\). Imagine conditioning on the entire past: \(X_1, X_2, ..., X_{n - 1}\). We can quickly see that the only relevant information is the immediate past, \(X_{n - 1}\) (this is the mean of \(X_n\), it doesn’t necessarily matter how we got to this mean). Later, we will see that this characteristic makes it a Markov process. In this case, we can easily apply Adam’s Law, conditioning on \(X_{n - 1}\).

\[E(X_n) = E(E(X_n | X_{n - 1}))\] \[= E(X_{n - 1})\]

Since \(X_n | X_{n - 1} \sim N(X_{n - 1}, 1)\). Since we showed that this relation holds in the generic case of \(n\) and \(n - 1\), we know that this iteration (induction) continues all the way back to \(E(X_1)\), which we know is 0. So, \(E(X_n) = 0\).

  1. We will employ Eve’s Law.

\[Var(X_n) = E(Var(X_n | X_{n - 1})) + Var(E(X_n | X_{n - 1}))\]

\[ = 1 + Var(X_{n - 1})\]

Again, because \(X_n | X_{n - 1} \sim N(X_{n - 1}, 1)\). Since we proved this in the generic case of \(n\), we can see that this holds for any \(X_j\): the variance is 1 plus the variance of \(X_{j - 1}\). We iterate this \(n\) times (since \(Var(X_1) = 1\)), which means we add 1, \(n\) times. So, we get \(Var(X_n) = n\).


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#define a simple, small parameter (so the simulation is quick)
n = 10

#keep track of Xn
Xn = rep(NA, sims)

#run the loop
for(i in 1:sims){
  
  #initialize the X vector; the first value is N(0, 1)
  X = rep(NA, n)
  X[1] = rnorm(1)
  
  #iterate for X
  for(j in 2:n){
    
    #generate the next X based on the previous X
    X[j] = rnorm(1, X[j - 1], 1)
  }
  
  #mark the last value
  Xn[i] = X[n]
}

#should get 0 and 10
mean(Xn)
## [1] -0.04936691
var(Xn)
## [1] 10.20469




9.2

Let \(X \sim N(0, 1)\) and \(Y|X \sim N(X, 1)\). Find \(Cov(X, Y)\).



Analytical Solution:

By the definition of Covariance:

\[Cov(X, Y) = E(XY) - E(X)E(Y)\]

\(E(X) = 0\), so the second term is 0. To find the first term, we apply Adam’s Law and condition on \(X\). Recall that \(E(X|X) = X\).

\[E(E(XY|X)) = E(XE(Y|X)) = E(X^2)\]

Since \(Y|X \sim N(X, 1)\). \(E(X^2) = Var(X) + E(X)^2 + Var(X) = 1\), so \(Cov(X, Y) = 1\).


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#generate the r.v.'s
X = rnorm(sims)
Y = sapply(X, function(x) rnorm(1, x, 1))

#should get 1
cov(X, Y)
## [1] 0.959541




9.3

(With help from Matt Goldberg)

You need to design a way to split $100 randomly among three people. ‘Random’ here means symmetrical: if \(X_i\) is the amount that the \(i^{th}\) person receives, then \(X_1,X_2\) and \(X_3\) must be i.i.d. Also, the support of \(X_1\) must be from 0 to 100; otherwise, we could simply assign each person a constant $33.33.

  1. Consider the following scheme to randomly split the money: you draw a random value from 0 to 100 and give that amount to the first person, then you draw a random amount from 0 to the amount of money left ($100 minus what you gave to the first person) and give that to the second person, etc. Show why this scheme violates the ‘symmetrical’ property that we want.

  2. Consider the following scheme: generate three values, one for each person, from a \(Unif(0, 1)\) r.v. Then, normalize the values (divide each value by the sum of the three values) and assign each person the corresponding proportional value out of $100 (i.e., if the first person has a normalized value of .4, give him $40). Show why this scheme results in the correct expectation for each person (i.e., the expectation satisfies the property of symmetry, and each person expects $33.33).



Analytical Solution:

  1. By construction, \(X_1 \sim Unif(0, 100)\), \(X_2|X_1 \sim Unif(0, X_1)\), etc. We can quickly see that \(E(X_1) = 50\). By Adam’s Law, we can write \(E(X_2) = E(E(X_2|X_1)) = E(\frac{X_1 - 0}{2}) = 25\). Already, then, person 1 has a higher expected value than person 2, which means \(X_1\) and \(X_2\) are not i.i.d. (they are also highly dependent).

  2. We need to show \(E(\frac{100X_1}{X_1 + X_2 + X_3}) = 33.33\), which is equivalent to showing \(E(\frac{X_1}{X_1 + X_2 + X_3}) = 1/3\). By employing linearity (in the opposite direction compared to what we are used to) we can write:

\[E(\frac{X_1}{X_1 + X_2 + X_3}) + E(\frac{X_2}{X_1 + X_2 + X_3}) + E(\frac{X_3}{X_1 + X_2 + X_3}) = E(\frac{X_1 + X_2 + X_3}{X_1 + X_2 + X_3}) = 1\]

By symmetry, we know \(E(\frac{X_1}{X_1 + X_2 + X_3}) = E(\frac{X_2}{X_1 + X_2 + X_3}) = E(\frac{X_3}{X_1 + X_2 + X_3})\), and since these add to 1, we know that each ‘piece’ must be 1/3. Therefore, \(E(\frac{X_1}{X_1 + X_2 + X_3})\), the expected proportion for the first person, is 1/3, which satisfies the ‘symmetry’ property.


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#consider the first scheme; generate the dollar amounts
X1 = runif(sims, 0, 100)
X2 = sapply(X1, function(x) runif(1, 0, x))
X3 = sapply(X2, function(x) runif(1, 0, x))

#the means are different (violation of the symmetry property)
mean(X1); mean(X2); mean(X3)
## [1] 48.25323
## [1] 23.96835
## [1] 11.73888
#consider the second scheme; generate the weights and total them
w1 = runif(sims)
w2 = runif(sims)
w3 = runif(sims)
N = w1 + w2 + w3

#generate the dollar amounts
X1 = 100*w1/N
X2 = 100*w2/N
X3 = 100*w3/N

#the means should all be $33.33 (satisfies the symmetry property)
mean(X1); mean(X2); mean(X3)
## [1] 33.15754
## [1] 34.29418
## [1] 32.54829




9.4

Let \(X \sim N(0, 1)\) and \(Y = |X|\). Find \(Corr(X, Y)\).



Analytical Solution:

It may be easier to find Covariance first. Using the expectation expansion:

\[Cov(X, Y) = E(X)E(Y)\] \[= E(XY)\]

Since \(E(X) = 0\). We can now employ Adam’s Law and condition on \(Y\).

\[= E\big(X Y | Y \big)\]

\[= E\big(Y E(X | Y)\big)\] Consider the distribution of \(X\) conditional on \(Y\), which is just \(|X|\). If we know \(|X| = 1\), for example, then \(X\) is 1 or -1 with equal probabilities (by the symmetry of the Normal). If we average these two values, of course, we get 0. Therefore, \(E(X|Y) = 0\), which means the above comes out to 0, and thus:

\[Cov(X, Y) = Corr(X, Y) = 0\]

This is an excellent example of random variables that are not independent (knowing \(X\) completely determines \(Y\)) but are uncorrelated.


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#generate the r.v.'s
X = rnorm(sims)
Y = abs(X)

#should get 0
cor(X, Y)
## [1] -0.1405503




9.5

CJ has \(X \sim Pois(\lambda)\) chores to run, and will spend \(M \sim Pois(\lambda)\) time at each chore. Time spent at each chore is independent of the number of chores and the time spent at other chores. Let \(Y\) be the total amount of time spent doing chores. Find \(E(Y)\).



Analytical Solution:

By Adam’s Law (conditioning on \(X\)):

\[E(Y) = E\big(E(Y|X)\big)\] \[= E(XM)\]

Since, if we condition on \(X\), the number of chores we have to run, our expected total time is just \(MX\), or the time spent at each chore times the number of chores we have to do. Then, since \(X\) and \(M\) are independent:

\[E(XM) = E(X)E(M) = \lambda^2\]


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#define a simple parameter
lambda = 5

#set a path for Y
Y = rep(NA, sims)

#run the loop
for(i in 1:sims){
  
  #generate X
  X = rpois(1, lambda)
  
  #generate X waiting times
  Y[i] = sum(rpois(X, lambda))
}

#should get lambda^2 = 25
mean(Y)
## [1] 25.171




9.6

Brandon is a cell. During his life cycle, he has \(Pois(\lambda)\) offspring before he dies.

  1. Each of his descendants have i.i.d. \(Pois(\lambda)\) offspring distributions (i.e., like Brandon they independently have \(Pois(\lambda)\) children before they die). Let \(X_n\) be the size of the \(n^{th}\) generation, and let Brandon be the \(0^{th}\) generation (so \(X_0 = 1\), since Brandon is just 1 cell, and \(X_1\) is the number of children that Brandon has, or the first generation). Find \(E(X_n)\).

  2. Discuss for what values of \(\lambda\) the generation mean goes to infinity as \(n\) grows, and for what values the generation mean goes to 0.



Analytical Solution:

By Adam’s Law:

\[E(X_n) = E\big(E(X_n | X_{n - 1})\big)\]

\[= E(\lambda X_{n - 1})\]

Since, if we condition on \(X_{n - 1}\) cells in the previous generation, and each has \(Pois(\lambda)\) offspring, the expected value of the next generation is just \(\lambda X_{n -1}\) because each of the \(X_{n - 1}\) cells expects to have \(\lambda\) offspring. So:

\[E(X_n) = \lambda E(X_{n -1})\]

Since this holds in the general case of \(X_n\), it holds in general, and we can continue to iterate:

\[E(X_n) = \lambda^2 E(X_{n - 2})\] \[E(X_n) = \lambda^3 E(X_{n - 3})\] \[...\]

Until we get to \(E(X_0)\), which we know is 1 (Brandon). So, we get:

\[E(X_n) = \lambda^n\]

  1. The expected value of the \(n^{th}\) generation, \(\lambda^n\), goes to \(\infty\) if \(\lambda > 1\) and goes to \(0\) if \(\lambda < 1\) as \(n\) grows. In general, an interesting result (that we will not prove here) is that, given the set-up of this problem, if the mean of the offspring distribution is \(\leq 1\), the population will eventually go extinct with certainty, and if the mean of the offspring distribution is \(>1\), the population may grow without bound (@stochasticmodeling).


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#define simple parameters
lambda = 2
n = 4

#keep track of Xn
Xn = rep(NA, sims)

#run the loop
for(i in 1:sims){
  
  #initialize the population (1 in the 0th generation)
  population = 1
  
  #do n generations
  for(j in 2:(n + 1)){
    
    #add offspring
    population = c(population, sum(rpois(population[j - 1], lambda)))
  }
  
  #mark Xn
  Xn[i] = population[n + 1]
}

#should get lambda^n = 16
mean(Xn)
## [1] 15.436




9.7

Brandon is a cell. During his life cycle, he has \(Pois(\lambda)\) offspring before he dies. Each of his descendants have i.i.d. \(Pois(\lambda)\) offspring distributions (i.e., like Brandon they independently have \(Pois(\lambda)\) children before they die). Let \(X_n\) be the size of the \(n^{th}\) generation, and let Brandon be the \(0^{th}\) generation (so \(X_0 = 1\), since Brandon is just 1 cell, and \(X_1\) is the number of children that Brandon has, or the first generation).

Find \(Var(X_n)\). You can do this by finding a general form for \(Var(X_n)\) in terms of \(Var(X_{n - 1})\), and then using this equation to write out some of the first variances in the sequence (i.e., \(Var(X_1)\), \(Var(X_2)\), etc.). From here, you can guess at the general pattern of the sequence and see what \(Var(X_n)\) will be. Of course, this ‘guess’ is not a formal proof; you could prove this rigorously using induction, but this book is not focused on induction, and the ‘guess’ will suffice!



Analytical Solution:

We can employ Eve’s Law, conditioning on \(X_{n - 1}\):

\[Var(X_n) = E\big(Var(X_n | X_{n - 1})\big) + Var\big(E(X_n | X_{n - 1})\big)\]

If we condition on \(X_{n - 1}\), we are essentially saying that we have \(X_{n - 1}\) cells in the \((n - 1)^{st}\) generation, and each of these has an independent \(Pois(\lambda)\) offspring distribution, each with variance and expectation \(\lambda\). Plugging in:

\[= E(\lambda X_{n - 1}) + Var(\lambda X_{n - 1})\] \[= \lambda E(X_{n - 1}) + \lambda^2 Var(X_{n - 1})\]

We found \(E(X_{n}) = \lambda^n\) in the previous problem, so plugging in \(\lambda^{n - 1}\) for \(E(X_{n - 1})\) gives:

\[Var(X_{n}) = \lambda^n + \lambda^2 Var(X_{n - 1})\]

Now, as the hint suggests, we should start writing out the first few terms. We know that \(X_0 = 1\), so \(Var(X_0) = 1\), so we write:

\[Var(X_1) = \lambda\]

Continuing:

\[Var(X_2) = \lambda^2 + \lambda^3 = \lambda^2(1 + \lambda)\]

\[Var(X_3) = \lambda^3 + \lambda^4 + \lambda^5 = \lambda^3(1 + \lambda^2 + \lambda^3)\]

We can see where this pattern is going, and we make our ‘guess’:

\[Var(X_n) = \lambda^n(1 + \lambda + \lambda^2 + ... + \lambda^{n - 1})\]


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#define simple parameters
lambda = 2
n = 4

#keep track of Xn
Xn = rep(NA, sims)

#run the loop
for(i in 1:sims){
  
  #initialize the population (1 in the 0th generation)
  population = 1
  
  #do n generations
  for(j in 2:(n + 1)){
    
    #add offspring
    population = c(population, sum(rpois(population[j - 1], lambda)))
  }
  
  #mark Xn
  Xn[i] = population[n + 1]
}

#should get lambda^n*(1 + lambda + lambda^2 + lambda^3) = 240
var(Xn)
## [1] 238.5665




9.8

Each year, the top (Men and Women) collegiate basketball teams in the NCAA square off in a massive, single elimintation tournament. The tournament is colloquially known as “March Madness”, and, despite recent expansion to include ‘play-in’ games, it can be thought of as a 64-team tournament. Teams are ‘seeded’ (i.e., assigned a seed, 1 to 16) based on their performance during the season. Lower seed values are better (i.e., 1 is the best, 16 is the worst) and the teams are paired in the first round based on seeds (i.e., they are paired such that the seeds total to 17: each 1 seed plays a 16 seed, each 10 seed plays a 7 seed, etc.).

Of late, the UConn Huskies have had highly successful tournaments. The Men’s Team have won championships in 2011 and 2014 (as well as previously in 1999 and 2004) and the Women’s Team, considered by many the greatest collegiate team in the nation (across all sports), won 4 straight championships from 2013 to 2016, as well as 7 championships from 1995 to 2010.

Of course, in this tournament, it does not make sense to assume that teams are equal; in fact, they are seeded based on their ability. Consider this common model to assess win probability of for a random matchup. Let \(a\) be the seed of the first team, and \(b\) be the seed of the second team, and let \(p\) be the probability that the first team wins. We can model \(p\) with a Beta prior such that \(p \sim Beta(b, a)\). Based on this prior, find the probability that the first team wins, and, based on this probability, explain why this is a reasonable choice for a prior (i.e., consider how the probability changes as \(a\) changes relative to \(b\)).



Analytical Solution:

Let \(I_1\) be the indicator that the first team wins. We can write the probability that the first team wins as \(P(I_1 = 1)\), which, by the fundamental bridge, is \(E(I_1)\). Using Adam’s Law to condition on \(p\), we get:

\[E\big(E(I_1|p)\big) = E(p)\]

Since, if we know \(p\), the mean of \(I_1\) is just \(p\). We know that the distribution of \(p\) is \(Beta(b, a)\), so we have:

\[E(p) = \frac{b}{a + b}\]

This is intuitive (it’s also bounded between 0 and 1, which is a good sanity check). Consider when \(a = b\): we get a probability of \(1/2\) that the first team wins, which makes sense because the seeds of the teams (the only information we currently have about them) are equal. As \(a\) gets smaller (i.e., the first team gets better; remember, lower values for seeds are better), the probability that the first team wins increases (similarly if \(b\) increases, or the second team has a higher and thus worse seed).




9.9

Let \(X\) and \(Y\) be i.i.d. \(N(0, 1)\) and \(Z\) be a random variable such that it takes on the value \(X\) (the value that \(X\) crystallizes to) or the value \(Y\) (the value that \(Y\) crystallizes to) with equal probabilities (recall we saw a similar structure in Chapter @ref(covariance-and-correlation), where we showed that the vector \((X, Y, Z)\) is not Multivariate Normal). Find \(Cov(X, Z)\).



Analytical Solution:

Since \(Z\) is always taking on the value from a Standard Normal random variable, it is marginally Standard Normal. We begin by writing the expectation expansion of Covariance:

\[Cov(X, Z) = E(XZ) - E(X)E(Z)\]

The second term is 0, since both \(X\) and \(Z\) are Standard Normal random variables. For the first term, we can use extra conditioning on \(X\).

\[ = E\big(E(XZ | X)\big)\]

We know that \(E(X|X) = X\), so this factors out. Now consider \(Z\) conditioned on \(X\). There is now a 1/2 probability that \(Z\) takes on the value that \(X\) took on, and a 1/2 probability that \(Z\) takes on the value \(Y\) (which is an unknown, independent random variable). The 1/2 is a constant and factors out of the expectation, and we are left with:

\[(1/2)E\big( X\cdot X + X \cdot Y\big)\]

Expanding by linearity:

\[(1/2)E\big( X\cdot X\big) + (1/2)E\big(X \cdot Y\big)\]

Since \(X\) and \(Y\) are independent, \(E(XY) = E(X)E(Y) = 0\). We are left with:

\[(1/2)E(X^2)\]

It is easy to find the second moment of a Standard Normal, since \(E(X^2) = Var(X) + E(X)^2 = 1\). So, we are left with: \[Cov(X, Z) = 1/2\]


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#generate the r.v.'s
X = rnorm(sims)
Y = rnorm(sims)

#set a path for Z
Z = rep(NA, sims)

#run a loop to calculate Z
for(i in 1:sims){
  
  #flip to see if Z takes on X or Y
  flip = runif(1)
  
  #Z takes on X
  if(flip <= 1/2){
    Z[i] = X[i]
  }
  
  #Z takes on Y
  if(flip > 1/2){
    Z[i] = Y[i]
  }
}

#should get 1/2
cov(X, Z)
## [1] 0.4624799




9.10

A stoplight in town toggles from red to green (no yellow). The times for the ‘toggles’ (switching from the current color to the other color) are distributed according to a Poisson process with rate parameter \(\lambda\). If you drive through the stoplight at a random time during the day, what is your expected wait time at the light?



Analytical Solution:

Let \(X\) be the wait time, and let \(I\) be the indicator that the stoplight is on red when you arrive at the light. By law of total expectation:

\[E(X) = E(X|I = 1)P(I = 1) + E(X|I = 0)P(I = 0)\]

By the symmetry of the problem, \(P(I = 1) = P(I = 0) = 1/2\). If the stoplight is green when we arrive, then our wait time is 0. Therefore, we are left with:

\[E(X) = E(X|I = 1)/2\]

The toggles in the Poisson process are distributed \(Expo(\lambda)\) and, by the memoryless property of the Exponential, if you arrive during the red light then your wait time until the toggle is still \(Expo(\lambda)\), which has mean \(1/\lambda\). Therefore, \(E(X) = \frac{1}{2\lambda}\).


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#define a simple parameter
lambda = 1

#keep track of the wait time
wait = rep(NA, sims)

#run the loop
for(i in 1:sims){
  
  #keep track of the times that the light turns green and red;
  #   initialize here
  green = integer(0)
  red = integer(0)
  
  #start the light at green (arbitrary)
  green = c(green, 0)
  
  #generate the toggle times for a single day (time between toggles is Expo(lambda))
  #go until we hit 24 hours
  while(TRUE){
    
    #generate a red toggle time after the latest green toggle
    red = c(red, green[length(green)] + rexp(1, lambda))
    
    #see if we are past 24 hours; break if so
    if(red[length(red)] > 24){
      break
    }
    
    #generate a green toggle time after the latest red toggle
    green = c(green, red[length(red)] + rexp(1, lambda))
    
    #see if we are past 24 hours; break if so
    if(green[length(green)] > 24){
      break
    }
  }
    
  #draw a random arrival time for you
  arrival = runif(1, 0, 24)
  
  #see if we arrived during a green light, in the case where there was
  #   a red change before the arrival
  if(min(red) < arrival){
    if(max(green[green < arrival]) > max(red[red < arrival])){
      #no wait time
      wait[i] = 0
      next
    }
  
  
    #see if we arrived during a red light
    if(max(green[green < arrival]) < max(red[red < arrival])){
      #wait until next green light
      wait[i] = min(green[green > arrival]) - arrival
      next
    }
  }
  
  #if there was no red before the arrival, then we have no wait time (arrived)
  #   at green
  if(min(red) > arrival){
    wait[i] = 0
  }
}

#should get 1/(2*lambda) = .5
mean(wait)
## [1] 0.479029
#this lambda is unrealistic - an average of 1 light change
#   per hour - but can be scaled. Also, ignore warnings;
#   they simply indicate an early arrival time




9.11

Dollar bills are the base currency in the United States. Bills are used widely in 6 denominations: $1, $5, $10, $20, $50, $100 (the $2 still exists, but is not widely used). Imagine that you randomly select one of these denominational values from $5 to $100 (i.e., $10) and withdraw it from your bank. On average, how many withdrawals must you make to withdraw at least $15?



Analytical Solution:

Let \(X\) be the number of withdrawals needed to withdraw at least $15. Let \(V_i\) be the value withdrawn at the \(i^{th}\) withdrawal. Employing the law of total expectation:

\[E(X) = E(X|V_1 < 15)P(V_1 < 15) + E(X|V_1 > 15)P(V_1 > 15)\]

We can ignore the \(P(V_1 = 15)\) case, since this has 0 probability. Because we select denominations randomly and 3 of the 5 possible denominations are greater than 15, \(P(V_i > 15) = 3/5\) and likewise \(P(V_1 < 15) = 2/5\). Further, if we select a value greater than 15 on the first draw, we have achieved the goal (to withdraw at least $15) and thus \(X = 1\).

Now consider \(E(X|V_1 < 15)\). Conditioning on our first selection being under $15, there is a .5 probability that we selected $10 (and thus the next draw is guaranteed to put us at or above $15) and a .5 probability of selecting $5 (and thus a 5/6 probability that we hit $15 on the next draw, and a 1/6 probability that it takes two more draws: another $5 and then any denomination). Putting it all together:

\[E(X) = 2/5(1/2(1 + 1) + 1/2(1 + 5/6 + 1/6(1 + 1))) + 3/5\] \[E(X) = 1.43\]


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#define the denominations
money = c(5, 10, 20, 50, 100)

#keep track of the number of withdrawals needed
withdrawals = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #keep track of amount withdrawn; initialize here
  amount = 0
  
  #go until we hit 15
  while(amount < 15){
    
    #draw a new amount
    amount = amount + sample(money, 1)
    
    #increment
    withdrawals[i] = withdrawals[i] + 1
  }
}

#should get 1.43
mean(withdrawals)
## [1] 1.461




BH Problems



The problems in this section are taken from @BH. The questions are reproduced here, and the analytical solutions are freely available online. Here, we will only consider empirical solutions: answers/approximations to these problems using simulations in R.




BH 9.10

A coin with probability \(p\) of Heads is flipped repeatedly. For (a) and (b), suppose that \(p\) is a known constant, with \(0<p<1\).

  1. What is the expected number of flips until the pattern HT is observed?
#replicate
set.seed(110)
sims = 1000

#define a simple parameter
p = 1/2

#create a coin
coin = c("H","T")

#count how long until we get HT
HT = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #keep track of current and previous flips; initialize
  flip.current = 0
  flip.previous = 0
  
  #go until we get the pattern
  while(flip.previous != "H" || flip.current != "T"){
    
    #iterate the previous flip
    flip.previous = flip.current
    
    #flip again
    flip.current = sample(coin, 1, prob = c(p, 1 - p))
    
    #increment
    HT[i] = HT[i] + 1
  }
}

#should get 1/(1 - p) + 1/p = 4
mean(HT)
## [1] 3.937
  1. What is the expected number of flips until the pattern HH is observed?
#define a simple parameter
p = 1/2

#create a coin
coin = c("H","T")

#count how long until we get HT
HH = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #keep track of current and previous flips; initialize
  flip.current = 0
  flip.previous = 0
  
  #go until we get the pattern
  while(flip.previous != "H" || flip.current != "H"){
    
    #iterate the previous flip
    flip.previous = flip.current
    
    #flip again
    flip.current = sample(coin, 1, prob = c(p, 1 - p))
    
    #increment
    HH[i] = HH[i] + 1
  }
}

#should get 1/p + 1/p^2 = 6
mean(HH)
## [1] 5.757
  1. Now suppose that \(p\) is unknown, and that we use a Beta(\(a,b\)) prior to reflect our uncertainty about \(p\) (where \(a\) and \(b\) are known constants and are greater than 2). In terms of \(a\) and \(b\), find the corresponding answers to (a) and (b) in this setting.
#same thing as before, but we introduce randomness for p

#define simple parameters
a = 3
b = 3

#create a coin
coin = c("H","T")

#count how long until we get HT
HT = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #generate p
  p = rbeta(1, a, b)
  
  #keep track of current and previous flips; initialize
  flip.current = 0
  flip.previous = 0
  
  #go until we get the pattern
  while(flip.previous != "H" || flip.current != "T"){
    
    #iterate the previous flip
    flip.previous = flip.current
    
    #flip again
    flip.current = sample(coin, 1, prob = c(p, 1 - p))
    
    #increment
    HT[i] = HT[i] + 1
  }
}

#should get (a + b - 1)/(a - 1) + (a + b - 1)/(b - 1) = 5
mean(HT)
## [1] 4.897
#count how long until we get HT
HH = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #generate p
  p = rbeta(1, a, b)
  
  #keep track of current and previous flips; initialize
  flip.current = 0
  flip.previous = 0
  
  #go until we get the pattern
  while(flip.previous != "H" || flip.current != "H"){
    
    #iterate the previous flip
    flip.previous = flip.current
    
    #flip again
    flip.current = sample(coin, 1, prob = c(p, 1 - p))
    
    #increment
    HH[i] = HH[i] + 1
  }
}

#should get (a + b - 1)/(a - 1) + ((a + b - 1)*(a + b - 2))/((a - 1)*(a - 2)) = 12.5
mean(HH)
## [1] 10.518




BH 9.13

Let \(X_1,X_2\) be i.i.d., and let \(\bar{X}= \frac{1}{2}(X_1+X_2)\) be the sample mean. In many statistics problems, it is useful or important to obtain a conditional expectation given \(\bar{X}\). As an example of this, find \(E(w_1X_1+w_2X_2 | \bar{X})\), where \(w_1,w_2\) are constants with \(w_1+w_2=1\).

#replicate
set.seed(110)
sims = 1000

#define simple parameters
w1 = 1/2
w2 = 1/2

#keep track of Xbar and the transformation
Xbar = rep(NA, sims)
total = rep(NA, sims)


#run the loop
for(i in 1:sims){
  
  #generate the r.v.'s 
  X1 = rbinom(1, 1, 1/2)
  X2 = rbinom(1, 1, 1/2)
  
  #find the mean and the transformation
  Xbar[i] = mean(c(X1, X2))
  total[i] = w1*X1 + w2*X2
}

#should just get the mean back (so 0, 1/2, 1)
mean(total[Xbar == 0])
## [1] 0
mean(total[Xbar == 1/2])
## [1] 0.5
mean(total[Xbar == 1])
## [1] 1




BH 9.15

Consider a group of \(n\) roommate pairs at a college (so there are \(2n\) students). Each of these \(2n\) students independently decides randomly whether to take a certain course, with probability \(p\) of success (where “success” is defined as taking the course).

Let \(N\) be the number of students among these \(2n\) who take the course, and let \(X\) be the number of roommate pairs where both roommates in the pair take the course. Find \(E(X)\) and \(E(X|N)\).

#replicate
set.seed(110)
sims = 1000

#define simple parameters
n = 50
p = 1/2

#keep track of X and N
X = rep(NA, sims)
N = rep(NA, sims)


#run the loop
for(i in 1:sims){
  
  #generate the first and second roommates (indicator if they have taken the course or not)
  student.1 = rbinom(n, 1, p)
  student.2 = rbinom(n, 1, p)
  
  #create an indicator if both roommates took the class
  both = rep(0, n)
  
  #set the indicator equal to 1 where both students took it
  both[student.1 == 1 & student.2 == 1] = 1
  
  #calculate N an X
  N[i] = sum(student.1, student.2)
  X[i] = sum(both)
}

#should get n*p^2 = 12.5
mean(X)
## [1] 12.465
#find the conditional expectation for a common N (50)
#should get (50*49/2)*(1/(2*n - 1)) = 12.37
mean(X[N == 50])
## [1] 12.18462




BH 9.16

Show that \(E( (Y - E(Y|X))^2|X) = E(Y^2|X) - (E(Y|X))^2,\) so these two expressions for \(Var(Y|X)\) agree.

#replicate
set.seed(110)
sims = 1000

#generate some simple, dependent r.v.'s for this example
X = rbinom(sims, 1, 1/2)

#Y is just X plus a Bern(1/2)
Y = X + rbinom(sims, 1, 1/2)

#show that the two are equal. Be careful about conditioning!
#first, when X = 0
Y.given.X = mean(Y[X == 0])
mean(((Y - Y.given.X)^2)[X == 0])
## [1] 0.2480763
mean((Y^2)[X == 0]) - mean(Y[X == 0])^2
## [1] 0.2480763
#next, when X = 1
Y.given.X = mean(Y[X == 1])
mean(((Y - Y.given.X)^2)[X == 1])
## [1] 0.2499989
mean((Y^2)[X == 1]) - mean(Y[X == 1])^2
## [1] 0.2499989




BH 9.22

Let \(X\) and \(Y\) be random variables with finite variances, and let \(W=Y - E(Y|X)\). This is a residual: the difference between the true value of \(Y\) and the predicted value of \(Y\) based on \(X\).

  1. Compute \(E(W)\) and \(E(W|X)\).
#replicate
set.seed(110)
sims = 1000


#generate the r.v.'s; make them dependent (X is Bern(1/2), Y = X + N(0, 1))
X = rbinom(sims, 1, 1/2)
Y = X + rnorm(sims)

#find E(Y|X) using LOTP
Y.given.X = mean(Y[X == 0])*length(X[X == 0])/sims + mean(Y[X == 1])*length(X[X == 1])/sims

#calculate W
W = Y - Y.given.X

#should get 0 for both
mean(W)
## [1] -7.469204e-18
mean(W[X == 0])*length(X[X == 0])/sims + mean(W[X == 1])*length(X[X == 1])/sims
## [1] 0
  1. Compute \(Var(W)\), for the case that \(W|X \sim N(0,X^2)\) with \(X \sim N(0,1)\).
#generate new r.v.'s
X = rnorm(sims)
W = sapply(X, function(x) rnorm(1, mean = 0, sd = sqrt(x^2)))

#should get 1
var(W)
## [1] 1.173705




BH 9.23

One of two identical-looking coins is picked from a hat randomly, where one coin has probability \(p_1\) of Heads and the other has probability \(p_2\) of Heads. Let \(X\) be the number of Heads after flipping the chosen coin \(n\) times. Find the mean and variance of \(X\).

#replicate
set.seed(110)
sims = 1000

#define simple parameters
p1 = 1/4
p2 = 2/3
n = 10

#set a path for X
X = rep(NA, sims)

#run the loop
for(i in 1:sims){
  
  #draw X, sample one of the probabilities
  X[i] = rbinom(1, n, sample(c(p1, p2)))
}

#should get (n/2)*(p1 + p2) = 4.583
mean(X)
## [1] 4.547
#should get (1/2)*(n*p1*(1 - p1) + n*p2*(1 - p2)) + (1/4)*n^2*(p1 - p2)^2 = 6.39
var(X)
## [1] 6.756548




BH 9.30

Emails arrive one at a time in an inbox. Let \(T_n\) be the time at which the \(n^{th}\) email arrives (measured on a continuous scale from some starting point in time). Suppose that the waiting times between emails are i.i.d. Expo(\(\lambda\)), i.e., \(T_1, T_2 - T_1, T_3 - T_2,...\) are i.i.d. Expo(\(\lambda\)).

Each email is non-spam with probability \(p\), and spam with probability \(q=1-p\) (independently of the other emails and of the waiting times). Let \(X\) be the time at which the first non-spam email arrives (so \(X\) is a continuous r.v., with \(X = T_1\) if the 1st email is non-spam, \(X = T_2\) if the 1st email is spam but the 2nd one isn’t, etc.).

  1. Find the mean and variance of \(X\).
#replicate
set.seed(110)
sims = 1000

#define simple parameters
lambda = 1
p = 1/2

#set a path for X
X = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #run the loop until we get a non-spam email
  while(TRUE){
    
    #wait for another email
    X[i] = X[i] + rexp(1, lambda)
    
    #see if it is spam
    test = runif(1)
    
    #if it is not spam, get out of the loop
    if(test < p){
      break
    }
  }
}

#should get 1/(p*lambda) = 2
mean(X)
## [1] 1.937334
#shoulg et 1/((p*lambda)^2) = 4
var(X)
## [1] 4.214597
  1. Find the MGF of \(X\). What famous distribution does this imply that \(X\) has (be sure to state its parameter values)?
#show that the MGFs match
#use a small interval for t
t = seq(from = -.01, to = .01, length.out = 100)

#analytical MGF
MGF.a = (p*lambda)/(p*lambda - t)

#empirical MGF
MGF.e = rep(NA, length(t))

for(i in 1:length(t)){
  MGF.e[i] = mean(exp(X*t[i]))
}

plot(t, MGF.e, main = "MGF of X", xlab = "t", 
     ylab = "E(e^(tX))", type = "h", col = "black", lwd = 4)
lines(t, MGF.a, type = "p", pch = 20, col = "red", lwd = 4)

legend("topleft", legend = c("Empirical MGF", "Analytical MGF"),
       lty=c(1,1), lwd=c(2.5,2.5),
       col=c("black", "red"))

#show that the histograms are the same
hist(X, col = "gray", main = "X", 
     ylim = c(0, sims), xlim = c(0, 25),
     breaks = 0:25)

hist(rexp(sims, p*lambda), col = "gray", main = "Expo(p*lambda)", 
     ylim = c(0, sims), xlim = c(0, 25), breaks = 0:25, xlab = "")




BH 9.33

Judit plays in a total of \(N \sim Geom(s)\) chess tournaments in her career. Suppose that in each tournament she has probability \(p\) of winning the tournament, independently. Let \(T\) be the number of tournaments she wins in her career.

  1. Find the mean and variance of \(T\).
#replicate
set.seed(110)
sims = 1000

#define simple, reasonable parameters
s = .1
p = .2

#keep track of tournaments won
wins = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #generate the tournaments played
  tourney = rgeom(1, s)
  
  #iterate over each tournament
  for(j in 1:tourney){
    
    #see if we won the tournament
    play = runif(1)
    
    #increment if she wins
    if(play < p){
      wins[i] = wins[i] + 1
    }
  }
}

#should get p*(1 - s)/s = 1.8
mean(wins)
## [1] 1.905
#should get p*(1 - s)*(s + (1 - s)*p)/s^2 = 5.04
var(wins)
## [1] 5.30528
  1. Find the MGF of \(T\). What is the name of this distribution (with its parameters)?
#show that the MGFs match
#use a small interval for t
t = seq(from = -.1, to = .1, length.out = 100)

#calculate the analytical MGF
MGF.a = s/(1 - (1 - s)*(p*exp(t) + 1 - p))

#calculate the empirical MGF
MGF.e = rep(NA, length(t))
for(i in 1:length(t)){
  MGF.e[i] = mean(exp(wins*t[i]))
}

#plots should match
plot(t, MGF.e, main = "MGF of X", xlab = "t", 
     ylab = "E(e^(tX))", type = "h", col = "black", lwd = 4)
lines(t, MGF.a, type = "p", pch = 20, col = "red", lwd = 4)

legend("topleft", legend = c("Empirical MGF", "Analytical MGF"),
       lty=c(1,1), lwd=c(2.5,2.5),
       col=c("black", "red"))

#show that the histograms are the same
hist(wins, col = "gray", main = "T", xlim = c(0, 20), 
     ylim = c(0, sims), breaks = 0:20)

hist(rgeom(sims, s/(s + (1 - s)*p)), col = "gray", 
     main = "Geom(s/(s + (1 - s)p))", xlab = "", 
     ylim = c(0, sims), breaks = 0:20)




BH 9.36

A certain stock has low volatility on some days and high volatility on other days. Suppose that the probability of a low volatility day is \(p\) and of a high volatility day is \(q=1-p\), and that on low volatility days the percent change in the stock price is \(N(0,\sigma^2_1)\), while on high volatility days the percent change is \(N(0,\sigma^2_2)\), with \(\sigma_1 < \sigma_2\).

Let \(X\) be the percent change of the stock on a certain day. The distribution is said to be a mixture of two Normal distributions, and a convenient way to represent \(X\) is as \(X=I_1X_1 + I_2X_2\) where \(I_1\) is the indicator r.v. of having a low volatility day, \(I_2=1-I_1\), \(X_j \sim N(0,\sigma^2_j)\), and \(I_1,X_1,X_2\) are independent.

  1. Find \(Var(X)\) in two ways: using Eve’s law, and by calculating \(Cov(I_1X_1 + I_2X_2, I_1X_1 + I_2X_2)\) directly.
#replicate
set.seed(110)
sims = 1000

#define simple parameters (assuming that the stock market goes up in the long run)
p = 3/4
sigma1 = 1
sigma2 = 2

#generate I1 and I2
I1 = rbinom(sims, 1, p)
I2 = 1 - I1

#set a path for X
X = rep(NA, sims)

#fill in the values for X based on the indicators
X[I1 == 1] = rnorm(length(I1[I1 == 1]), 0, sigma1)
X[I2 == 1] = rnorm(length(I2[I2 == 1]), 0, sigma2)

#should get p*sigma1^2 + (1 - p)*sigma2^2 = 1.75
var(X)
## [1] 1.689135
  1. Recall from Chapter 6 that the kurtosis of an r.v. \(Y\) with mean \(\mu\) and standard deviation \(\sigma\) is defined by \[Kurt(Y) = \frac{E(Y-\mu)^4}{\sigma^4}-3.\] Find the kurtosis of \(X\) (in terms of \(p,q,\sigma^2_1,\sigma^2_2\), fully simplified). The result will show that even though the kurtosis of any Normal distribution is 0, the kurtosis of \(X\) is positive and in fact can be very large depending on the parameter values.
#calculate the kurtosis empirically
#should get (3*p*sigma1^4 + 3*(1 - p)*sigma2^4)/((p*sigma1^2 + (1 - p)*sigma2^2)^2) - 3 = 1.65
mean((X - 0)^4)/sd(X)^4 - 3
## [1] 1.707038




BH 9.43

Empirically, it is known that 49% of children born in the U.S. are girls (and 51% are boys). Let \(N\) be the number of children who will be born in the U.S. in March of next year, and assume that \(N\) is a Pois(\(\lambda)\) random variable, where \(\lambda\) is known. Assume that births are independent (e.g., don’t worry about identical twins).

Let \(X\) be the number of girls who will be born in the U.S. in March of next year, and let \(Y\) be the number of boys who will be born then.

  1. Find the joint distribution of \(X\) and \(Y\). (Give the joint PMF.)
#replicate
set.seed(110)
sims = 1000

#define a simple parameter (small value for computational ease)
lambda = 10

#keep track of the r.v.'s
N = rep(NA, sims)
X = rep(NA, sims)
Y = rep(NA, sims)

#run the loop
for(i in 1:sims){
  
  #generate the r.v.'s
  N[i] = rpois(1, lambda)
  X[i] = rbinom(1, N[i], .51)
  Y[i] = N[i] - X[i]
}


#generate a heat map
#first, empirical
data <- data.frame(X, Y)

data = group_by(data, X, Y)
data = summarize(data, density = n())
data$density = data$density/sims

ggplot(data = data, aes(X, Y)) + 
  geom_tile(aes(fill = density), color = "white") +
  scale_fill_gradient(low = "white", high= "red", name = "Density") + 
  ggtitle("Empirical Density of X and Y") + 
  theme(plot.margin = unit(c(1.8,.5,1.75,1.55), "cm")) +
  theme(plot.title = element_text(family = "Trebuchet MS", 
                                  color="#383838", face="bold", size=25, vjust = 2)) + 
  theme(axis.title.x = element_text(family = "Trebuchet MS", 
                                    color="#383838", face="bold", size=22, vjust = -1.5),
        axis.title.y = element_text(family = "Trebuchet MS", 
                                    color="#383838", face="bold", size=22, vjust = 2)) +
  theme(axis.text.x = element_text(face="bold", color="#383838", 
                                   size=14),
        axis.text.y = element_text(face="bold", color="#383838", 
                                   size=14))

#analytical Joint Distribution
#define the supports/all combinations
X.a = seq(from = min(X), to = max(X), length.out = 100)
Y.a = seq(from = min(Y), to = max(Y), length.out = 100)
data = expand.grid(X.a = X.a, Y.a = Y.a)

#calculate density
data$density = apply(data, 1, function(x){
  
  return(exp(-.49*lambda)*(.49*lambda)^(x[1])*exp(-.51*lambda)*(.51*lambda)^(x[2])/
           (factorial(x[1])*factorial(x[2])))})

#remove points with 0 density
data = data[data$density != 0, ]

#generate a heatmap
ggplot(data = data, aes(X.a, Y.a)) + 
  geom_tile(aes(fill = density), color = "white") +
  scale_fill_gradient(low = "white", high= "red", name = "Density") + 
  ggtitle("Analytical Density of X and Y") + 
  theme(plot.margin = unit(c(1.8,.5,1.75,1.55), "cm")) +
  theme(plot.title = element_text(family = "Trebuchet MS", 
                                  color="#383838", face="bold", size=25, vjust = 2)) + 
  theme(axis.title.x = element_text(family = "Trebuchet MS", 
                                    color="#383838", face="bold", size=22, vjust = -1.5),
        axis.title.y = element_text(family = "Trebuchet MS", 
                                    color="#383838", face="bold", size=22, vjust = 2)) +
  theme(axis.text.x = element_text(face="bold", color="#383838", 
                                   size=14),
        axis.text.y = element_text(face="bold", color="#383838", 
                                   size=14))

  1. Find \(E(N|X)\) and \(E(N^2|X)\).
#condition on a specific value of X
x = 5

#should get x + .51*lambda = 10.1
mean(N[X == x])
## [1] 9.869822
#should get (x + .51*lambda)^2 + .51*lambda = 107.11
mean((N^2)[X == x])
## [1] 101.6213




BH 9.44

Let \(X_1,X_2,X_3\) be independent with \(X_i \sim Expo(\lambda_i)\) (so with possibly different rates). Recall from Chapter 7 that \[P(X_1 < X_2) = \frac{\lambda_1}{\lambda_1 + \lambda_2}.\]

  1. Find \(E(X_1 + X_2 + X_3 | X_1 > 1, X_2 > 2, X_3 > 3)\) in terms of \(\lambda_1,\lambda_2,\lambda_3\).
#replicate
set.seed(110)
sims = 1000

#define simple parameters
lambda1 = 1
lambda2 = 1/2
lambda3 = 1/3

#generate the r.v.'s
X1 = rexp(sims, lambda1)
X2 = rexp(sims, lambda2)
X3 = rexp(sims, lambda3)

#calculate the sum
S = X1 + X2 + X3

#find the mean, should get 1/lambda1 + 1/lambda2 + 1/lambda3 + 6 = 12
mean(S[X1 > 1 & X2 > 2 & X3 > 3])
## [1] 11.72603
  1. Find \(P\left(X_1 = \min(X_1,X_2,X_3)\right)\), the probability that the first of the three Exponentials is the smallest.
#indicator for the first being the smallest
indicator = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #see if the first is the smallest
  if(X1[i] == min(X1[i], X2[i], X3[i])){
    indicator[i] = 1
  }
}

#should get lambda1/(lambda1 + lambda2 + lambda3) = .545 
mean(indicator)
## [1] 0.518
  1. For the case \(\lambda_1 = \lambda_2 = \lambda_3 = 1\), find the PDF of \(\max(X_1,X_2,X_3)\). Is this one of the important distributions we have studied?
#define simple parameters
lambda1 = 1
lambda2 = 1
lambda3 = 1

#generate the r.v.'s
X1 = rexp(sims, lambda1)
X2 = rexp(sims, lambda2)
X3 = rexp(sims, lambda3)

#find the max
M = rep(NA, sims)

#run the loop
for(i in 1:sims){
  M[i] = max(X1[i], X2[i], X3[i])
}

#find analytical PDF
m = seq(from = min(M), to = max(M), length.out = 100)
PDF = 3*(1 - exp(-m))^2*exp(-m)

#plot the PDFs, they should match
#empirical PDF spikes at 0 because of the asymptote in the analytical solution
plot(density(M), main = "Density of M", xlab = "M", col = "black", lwd = 3, ylim = c(0, max(density(M)$y, PDF)))
lines(m, PDF, lwd = 3, col = "red")

legend("topright", legend = c("Empirical PDF", "Analytical PDF"),
       lty=c(1,1), lwd=c(2.5,2.5),
       col=c("black", "red"))




BH 9.45

A task is randomly assigned to one of two people (with probability 1/2 for each person). If assigned to the first person, the task takes an Expo(\(\lambda_1\)) length of time to complete (measured in hours), while if assigned to the second person it takes an Expo(\(\lambda_2\)) length of time to complete (independent of how long the first person would have taken). Let \(T\) be the time taken to complete the task.

  1. Find the mean and variance of \(T\).
#replicate
set.seed(110)
sims = 1000

#define simple, reasonable parameters (should have a mean around 24 for part b.)
lambda1 = 1/20
lambda2 = 1/24

#keep track of the total time
time = rep(NA, sims)

#run the loop
for(i in 1:sims){
  
  #flip for if we want the first or second person to complete the task
  flip = runif(1)
  
  #first person
  if(flip < 1/2){
    time[i] = rexp(1, lambda1)
  }
  
  #second person
  if(flip > 1/2){
    time[i] = rexp(1, lambda2)
  }
}

#should get .5*(1/lambda1 + 1/lambda2) = 22
mean(time)
## [1] 20.79702
#should get .5*(1/lambda1^2 + 1/lambda2^2) + 1/4*(1/lambda1 - 1/lambda2)^2 = 492
var(time)
## [1] 438.491
  1. Suppose instead that the task is assigned to both people, and let \(X\) be the time taken to complete it (by whoever completes it first, with the two people working independently). It is observed that after \(24\) hours, the task has not yet been completed. Conditional on this information, what is the expected value of \(X\)?
#define simple, reasonable parameters (should have a mean around 24 for part b.)
lambda1 = 1/20
lambda2 = 1/24

#keep track of the total time
time = rep(NA, sims)

#run the loop
for(i in 1:sims){
  time[i] = min(rexp(1, lambda1), rexp(1, lambda2))
}

#should get 24 + 1/(lambda1 + lambda2) = 34.9
mean(time[time > 24])
## [1] 35.82184




BH 9.47

A certain genetic characteristic is of interest. It can be measured numerically. Let \(X_1\) and \(X_2\) be the values of the genetic characteristic for two twin boys. If they are identical twins, then \(X_1=X_2\) and \(X_1\) has mean \(0\) and variance \(\sigma^2\); if they are fraternal twins, then \(X_1\) and \(X_2\) have mean \(0\), variance \(\sigma^2\), and correlation \(\rho\). The probability that the twins are identical is \(1/2\). Find Cov(\(X_1,X_2\)) in terms of \(\rho,\sigma^2.\)

#replicate
set.seed(110)
sims = 1000

#define simple parameters
sigma = 1
rho = .75

#keep track of X1 and X2
X1 = rep(NA, sims)
X2 = rep(NA, sims)

#run the loop
for(i in 1:sims){
  
  #see if they are identical or fraternal
  twins = runif(1)
  
  #case that they are identical; draw from a normal for simplicity
  if(twins <= 1/2){
    
    #generate the r.v.
    X1[i] = rnorm(1, 0, sigma)
    X2[i] = X1[i]
  }
  
  #fraternal; draw from MVN
  if(twins > 1/2){
    scores = rmvnorm(1, mean = c(0,0), 
                     sigma = matrix(c(sigma, rho, rho, sigma), nrow = 2, ncol = 2))
    X1[i] = scores[1]
    X2[i] = scores[2]
  }
}

#should get sigma^2/2*(1 + rho) = .875
cov(X1, X2)
## [1] 0.8736153




BH 9.48

The Mass Cash lottery randomly chooses 5 of the numbers from \(1,2,...,35\) each day (without repetitions within the choice of 5 numbers). Suppose that we want to know how long it will take until all numbers have been chosen. Let \(a_j\) be the average number of additional days needed if we are missing \(j\) numbers (so \(a_{0}=0\) and \(a_{35}\) is the average number of days needed to collect all 35 numbers). Find a recursive formula for the \(a_j\).

#replicate
set.seed(110)
sims = 1000

#find for a specific j; use 1 so we can compare to the analytical result
j = 1

#count how many days we need to finish
days = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #keep track of the winners; intialize as an empty vector
  winners = integer(0)
  
  #stop when we get the j numbers we are missing
  #assume we are missing numbers 1 through j
  #stop when we get these numbers
  while(length(intersect(winners, 1:j)) < j){
  
    #sample from the 35 values
    winners = c(winners, sample(1:35, 5, replace = FALSE))
  
    }
  
  #mark how many days it took
  days[i] = length(winners)/5
}

#these should match
1/(1 - choose(j, 0)*choose(35 - j, 5)/choose(35, 5))
## [1] 7
mean(days)
## [1] 7.115




BH 10.17

Let \(X_1, X_2, ...\) be i.i.d. positive random variables with mean 2. Let \(Y_1, Y_2, ...\) be i.i.d. positive random variables with mean 3. Show that \[\frac{X_1+X_2+ \dots + X_n}{Y_1+Y_2 + \dots +Y_n} \to \frac{2}{3}\] with probability 1. Does it matter whether the \(X_i\) are independent of the \(Y_j\)?

#replicate
set.seed(110)

#increase number of sims to try to get convergence
sims = 5000

#first, do the case with independence
#let X ~ Expo(2) and Y ~ Expo(3) for simplicity
X = rexp(sims, 1/2)
Y = rexp(sims, 1/3)

#keep track of the ratio
ratio = rep(NA, sims)


#run the loop
for(i in 1:sims){
  
  #calculate the ratio
  ratio[i] = sum(X[1:i])/sum(Y[1:i])
  
}

#plot after the first 30 trials (allow some convergence)
#should approach 2/3
plot(30:sims, ratio[30:sims], type = "l", col = "red",
     main = "Convergence of (X1 + ... + Xn)/(Y1 + ... + Yn), Independent case",
     xlab = "n", ylab = "(X1 + ... + Xn)/(Y1 + ... + Yn)")
abline(h = 2/3)

#construct the dependent case
#let X ~ Expo(2) and Y ~ X + Expo(1) for simplicity
X = rexp(sims, 1/2)
Y = X + rexp(sims, 1)

#keep track of the ratio
ratio = rep(NA, sims)


#run the loop
for(i in 1:sims){
  
  #calculate the ratio
  ratio[i] = sum(X[1:i])/sum(Y[1:i])
  
}

#plot after the first 30 trials (allow some convergence)
#should approach 2/3 (dependence is irrelevant)
plot(30:sims, ratio[30:sims], type = "l", col = "red",
     main = "Convergence of (X1 + ... + Xn)/(Y1 + ... + Yn), Dependent case",
     xlab = "n", ylab = "(X1 + ... + Xn)/(Y1 + ... + Yn)")
abline(h = 2/3)




BH 10.18

Let \(U_1, U_2, \dots, U_{60}\) be i.i.d. Unif(0,1) and \(X = U_1 + U_2 + \dots + U_{60}\).

  1. Which important distribution is the distribution of \(X\) very close to? Specify what the parameters are, and state which theorem justifies your choice.
#replicate
set.seed(110)
sims = 1000

#generate the r.v.'s
X = rep(NA, sims)

for(i in 1:sims){
  X[i] = sum(runif(60))
}

#show that the histograms match
hist(X, xlim = c(15, 45), breaks = 0:45, main = "X",
     col = "gray", xlab = "x", ylim = c(0, sims/5))

hist(rnorm(sims, 30, sqrt(5)), main = "N(30, 5)", ylim = c(0, sims/5),
     xlim = c(15, 45), breaks = 0:45, col = "gray", xlab = "")

  1. Give a simple but accurate approximation for \(P(X >17)\). Justify briefly.
#show that the approximations match
length(X[X > 17])/sims; pnorm(13/sqrt(5))
## [1] 1
## [1] 1




BH 10.19

Let \(V_n \sim \chi^2_n\) and \(T_n \sim t_n\) for all positive integers \(n\).

  1. Find numbers \(a_n\) and \(b_n\) such that \(a_n(V_n - b_n)\) converges in distribution to \(N(0,1)\).
#replicate
set.seed(110)
sims = 1000


#show convergence for different n
#round so we get integers
n = round(seq(from = 1, to = 100, length.out = 8))

#set up the graphics state
par(mfrow = c(3,3))

#run the loop, iterate over n
for(i in 1:length(n)){
  
  #define a_n and b_n
  an = 1/sqrt(2*n[i])
  bn = n[i]
  
  #generate the title for the histogram
  title = paste0("an(Vn - bn), n = ", n[i])
  
  #generate the r.v.
  X = an*(rchisq(sims, n[i]) - bn)
  
  #plot the histogram, should observe convergence
  hist(X, main = title, col = "gray", xlab = "")
}

#plot asymptotic distribution
hist(rnorm(sims), col = "red", main = "N(0, 1)", xlab = "")

#restore graphics
par(mfrow = c(1,1))
  1. Show that \(T^2_n/(n+T^2_n)\) has a Beta distribution (without using calculus).
#define a simple parameter
n = 10

#generate the r.v.
Tn = rt(sims, n)
X = Tn^2/(n + Tn^2)

#show that the histograms are the same
hist(X, ylim = c(0, sims), xlim = c(0, 1), col = "gray",
     breaks = seq(from = 0, to = 1, length.out = 20), main = "X",
     xlab = "x")

hist(rbeta(sims, 1/2, n/2), ylim = c(0, sims), xlim = c(0, 1), col = "gray",
     breaks = seq(from = 0, to = 1, length.out = 20),
     main = "Beta(1/2, n/2)", xlab = "")




BH 10.20

Let \(T_1, T_2, ...\) be i.i.d. Student-\(t\) r.v.s with \(m \geq 3\) degrees of freedom. Find constants \(a_n\) and \(b_n\) (in terms of \(m\) and \(n\)) such that \(a_n(T_1 + T_2 + \dots + T_n - b_n)\) converges to \(N(0,1)\) in distribution as \(n \to \infty\).

#replicate
set.seed(110)
sims = 1000

#define a simple parameter
m = 3

#show convergence for different n
#round so we get integers
n = round(seq(from = 1, to = 100, length.out = 8))

#set up the graphics state
par(mfrow = c(3,3))

#run the loop, iterate over n
for(i in 1:length(n)){
  
  #define a_n and b_n
  an = sqrt((m - 2)/(m*n[i]))
  bn = 0
  
  #generate the title for the histogram
  title = paste0("an(T1 + ... + Tn - bn), n=", n[i])
  
  #find the summation of the T r.v.'s
  summation = 0
  for(j in 1:n[i]){
    summation = summation + rt(sims, m)
  }
  
  #generate the r.v.
  X = an*(summation - bn)
  
  #plot the histogram, should observe convergence
  hist(X, main = title, col = "gray", xlab = "")
}

#plot asymptotic distribution
hist(rnorm(sims), col = "red", main = "N(0, 1)", xlab = "")

#restore graphics
par(mfrow = c(1,1))




BH 10.21
  1. Let \(Y = e^X\), with \(X \sim Expo(3)\). Find the mean and variance of \(Y\).
#replicate
set.seed(110)
sims = 1000

#generate the r.v.'s
X = rexp(sims, 3)
Y = exp(X)

#should get 3/2 and 3/4
mean(Y); var(Y)
## [1] 1.526426
## [1] 0.7918376
  1. For \(Y_1,\dots,Y_n\) i.i.d. with the same distribution as \(Y\) from (a), what is the approximate distribution of the sample mean \(\bar{Y}_n = \frac{1}{n} \sum_{j=1}^n Y_j\) when \(n\) is large?
#show convergence for different n
#round so we get integers
n = round(seq(from = 1, to = 1000, length.out = 8))

#set up the graphics state
par(mfrow = c(3,3))

#run the loop, iterate over n
for(i in 1:length(n)){
  
  #generate the title for the histogram
  title = paste0("Mean of (Y1,...,Yn), n = ", n[i])
  
  #find the summation (calculate many sample means)
  summation = 0
  for(j in 1:n[i]){
    summation = summation + exp(rexp(sims, 3))
  }
  
  #generate the r.v. (vector of sample means)
  X = summation/n[i]
  
  #plot the histogram, should observe convergence
  hist(X, main = title, col = "gray", xlab = "")
}


#plot the asymptotic distribution
hist(rnorm(sims, 3/2, 3/(4*n[i])), col = "red", main = "N(3/2, 3/(4*n)), n = 1000", xlab = "")

#restore graphics
par(mfrow = c(1,1))




BH 10.22
  1. Explain why the \(Pois(n)\) distribution is approximately Normal if \(n\) is a large positive integer (specifying what the parameters of the Normal are).
#replicate
set.seed(110)
sims = 1000

#show convergence for different n
#round so we get integers; don't need high n for quck convergence
n = round(seq(from = 1, to = 100, length.out = 8))

#set up the graphics state
par(mfrow = c(3,3))

#run the loop, iterate over n
for(i in 1:length(n)){
  
  #generate the title for the histogram
  title = paste0("Pois(n), n = ", n[i])
  
  #generate the r.v. 
  X = rpois(sims, n[i])
  
  #plot the histogram, should observe convergence
  hist(X, main = title, col = "gray", xlab = "")
}


#plot the asymptotic distribution
hist(rnorm(sims, n[i], sqrt(n[i])), col = "red", main = "N(n, n), n = 100", xlab = "")

#restore graphics
par(mfrow = c(1,1))
  1. Stirling’s formula is an amazingly accurate approximation for factorials:

\[n! \approx \sqrt{2\pi n} \left(\frac{n}{e}\right)^n,\]

where in fact the ratio of the two sides goes to 1 as \(n \to \infty\). Use (a) to give a quick heuristic derivation of Stirling’s formula by using a Normal approximation to the probability that a Pois(\(n\)) r.v. is \(n\), with the continuity correction: first write \(P(N=n) = P(n-\frac{1}{2} < N < n + \frac{1}{2})\), where \(N \sim Pois(n)\).

#show that Stirling's formula is reasonable
n = seq(from = 1, to = 5, by = 1/10)

#calculate the two sides
LHS = factorial(n)
RHS = sqrt(2*pi*n)*(n/exp(1))^n

plot(n, LHS, main = "Sterling's Formula", xlab = "n", ylab = "Value",
     col = "black", type = "l", lwd = 3)
lines(n, RHS, col = "red", type = "p", pch = 20, lwd = 3)

legend("topleft", legend = c("LHS", "RHS"),
       lty=c(1,1), lwd=c(2.5,2.5),
       col=c("black", "red"))




BH 10.23
  1. Consider i.i.d. Pois(\(\lambda\)) r.v.s \(X_1,X_2,\dots\). The MGF of \(X_j\) is \(M(t) = e^{\lambda(e^t-1)}\). Find the MGF \(M_n(t)\) of the sample mean \(\bar{X}_n= \frac{1}{n} \sum_{j=1}^n X_j\).
#replicate
set.seed(110)
sims = 1000

#define a simple parameter
lambda = 1

#show convergence for different n
#round so we get integers; don't need high n for quck convergence
n = round(seq(from = 1, to = 100, length.out = 8))

#set up the graphics state
par(mfrow = c(3,3))

#run the loop, iterate over n
for(i in 1:length(n)){
  
  #generate the title for the histogram
  title = paste0("MGF of samp. mean, n = ", n[i])
  
  #find the summation (calculate many sample means)
  summation = 0
  for(j in 1:n[i]){
    summation = summation + rpois(sims, lambda)
  }
  
  #generate the r.v., vector of sample means
  X = summation/n[i]
  
  #find the MGF, for different t
  t = seq(from = 0, to = 10, length.out = 100)
  MGF.e = rep(NA, length(t))
  
  #find the empirical MGF
  for(j in 1:length(t)){
    MGF.e[j] = mean(exp(t[j]*X))
  }
  
  #plot the histogram, should observe convergence
  plot(t, MGF.e, main = title, col = "black", xlab = "t", type = "l", lwd = 3
       , ylab = "")
}


#plot the asymptotic distribution; note the y-axes
plot(t, exp(t*lambda), type = "l", col = "red", lwd = 3,
     xlab = "t", main = "exp(t*lambda)")

#restore graphics
par(mfrow = c(1,1))
  1. Find the limit of \(M_n(t)\) as \(n \to \infty\). (You can do this with almost no calculation using a relevant theorem; or you can use (a) and the fact that \(e^x \approx 1 + x\) if \(x\) is very small.)
#shown in part (a) code




BH 10.31

Let \(X\) and \(Y\) be independent standard Normal r.v.s and let \(R^2=X^2 + Y^2\) (where \(R>0\) is the distance from \((X,Y)\) to the origin).

  1. The distribution of \(R^2\) is an example of three of the important distributions we have seen (in ‘Probability!’, we have only learned about two of these distributions, so you only need to mention two). State which three of these distributions \(R^2\) is an instance of, specifying the parameter values.
#replicate
set.seed(110)
sims = 1000

#generate X, Y and R
X = rnorm(sims)
Y = rnorm(sims)
R = sqrt(X^2 + Y^2)

#show that the histograms are the same
par(mfrow = c(2,2))
hist(R^2, main = "R^2", col = "gray",
     xlim = c(0, 20), ylim = c(0, sims/2), breaks = 0:20,
     xlab = "")
hist(rchisq(sims, 2), main = "Chi-Squared, df = 2", col = "gray",
     xlim = c(0, 20), ylim = c(0, sims/2), breaks = 0:20,
     xlab = "")
hist(rexp(sims, 1/2), main = "Expo(1/2)", col = "gray",
     xlim = c(0, 20), ylim = c(0, sims/2), breaks = 0:20,
     xlab = "")
hist(rgamma(sims, 1, 1/2), main = "Gamma(1, 1/2)",col= "gray",
     xlim = c(0, 20), ylim = c(0, sims/2), breaks = 0:20,
     xlab = "")

par(mfrow = c(1,1))
  1. Find the PDF of \(R\).
#calculate analytical PDF
r = seq(from = min(R), to = max(R), length.out = 100)
PDF = r*exp(-r^2/2)

#plot the PDFs, they should match
plot(density(R), main = "Density of R", xlab = "r", col = "black", lwd = 3, ylim = c(0, 1))
lines(r, PDF, lwd = 3, col = "red")

legend("topright", legend = c("Empirical PDF", "Analytical PDF"),
       lty=c(1,1), lwd=c(2.5,2.5),
       col=c("black", "red"))

  1. Find \(P(X>2Y+3)\) in terms of the standard Normal CDF \(\Phi\).
#these should match
Z = X - 2*Y
length(Z[Z > 3])/sims; 1 - pnorm(3/sqrt(5))
## [1] 0.086
## [1] 0.08985625
  1. Compute \(\textrm{Cov}(R^2,X)\). Are \(R^2\) and \(X\) independent?
#should get 0
cov(R^2, X)
## [1] -0.1863286




BH 10.32

Let \(Z_1,...,Z_n \sim N(0,1)\) be i.i.d.

  1. As a function of \(Z_1\), create an Expo(\(1\)) r.v. \(X\) (your answer can also involve the standard Normal CDF \(\Phi\)).
#replicate
set.seed(110)
sims = 1000

#define a simple parameter
n = 3

#draw Z1
Z1 = rnorm(sims)

#transform to find X
X = -log(1 - pnorm(Z1))

#show that the histograms are the same
hist(X, main = "X", col = "gray",
     xlim = c(0, 10), ylim = c(0, sims/2), breaks = 0:10,
     xlab = "x")

hist(rexp(sims, 1), main = "Expo(1)", col = "gray",
     xlim = c(0, 10), ylim = c(0, sims/2), breaks = 0:10,
     xlab = "")

  1. we haven’t learned the relevant information for this part

  2. Let \(X_1 = 3 Z_1 - 2 Z_2\) and \(X_2 = 4Z_1 + 6Z_2\). Determine whether \(X_1\) and \(X_2\) are independent (be sure to mention which results you’re using).

#generate r.v.'s
Z1 = rnorm(sims)
Z2 = rnorm(sims)

#find X1 and X2
X1 = 3*Z1 - 2*Z2
X2 = 4*Z1 + 6*Z2

#should have correlation 0
#since they are MVN, this implies independence
cor(X1, X2)
## [1] -0.0714936




BH 10.33

Let \(X_1, X_2, \dots\) be i.i.d. positive r.v.s. with mean \(\mu\), and let \(W_n = \frac{X_1}{X_1+\dots + X_n}.\)

  1. Find \(E(W_n).\)
#replicate
set.seed(110)
sims = 1000

#define a simple parameter
n = 10

#set a path for W
W = rep(NA, sims)

#run the loop
for(i in 1:sims){
  
  #draw the X's
  #use Expo(1); this will be useful in part c.
  X = rexp(n, 1)
  
  #calculate W
  W[i] = X[1]/sum(X)
}

#should get 1/n = .1
mean(W)
## [1] 0.1002529
  1. What random variable does \(nW_n\) converge to (with probability \(1\)) as \(n \to \infty\)?
#iterate over different n
n = seq(from = 10, to = 100, length.out = 8)

#initialize graphics
par(mfrow = c(3,3))

#run the loop
for(i in 1:length(n)){
  
  #keep track of W
  W = rep(NA, sims)
  
  #sample many times
  for(j in 1:sims){
    
    #draw the X's
    X = rexp(n[i], 1)
    
    #calculate W
    W[j] = n[i]*X[1]/sum(X)
    
  }
  
  #generate the title for the histogram
  title = paste0("n*Wn, n = ", round(n[i]))
  
  #plot the histogram, should observe convergence
  hist(W, main = title, col = "gray", xlab = "")
}

#plot the asymptotic distribution; mu = 1
#be sure to consider the x and y axes
hist(rexp(sims, 1), col = "red", xlab = "", 
     main = "X1/mu ~ Expo(1)")

#restore the graphics state
par(mfrow = c(1,1))
  1. For the case that \(X_j \sim Expo(\lambda)\), find the distribution of \(W_n\), preferably without using calculus. (If it is one of the named distributions, state its name and specify the parameters; otherwise, give the PDF.)
#recycle vectors
#we currently have n*W, where n = 100; transform back to W
W = W/n[i]

#show that the histograms are the same
hist(W, col = "gray", main = "W", xlab = "w",
     xlim = c(0, 1/10), breaks = seq(from = 0, to = 1/10, length.out = 20))

hist(rbeta(sims, 1, n[i] - 1), col = "gray", main = "Beta(1, 100 - 1)",
     xlim = c(0, 1/10), xlab = "",
     breaks = seq(from = 0, to = 1/10, length.out = 20))




Markov Chains




10.1

You flip a fair, two-sided coin 100 times. Define a ‘run’ as a sequence of coin flips with the same value (heads or tails). for example, the sequence \(HTTHH\) has 3 runs: the first \(H\), then the \(TT\) block, then the \(HH\) block.

Let \(X\) be the number of runs we observe in \(n\) flips. We’ve discussed how to find \(E(X)\) using indicators. Now, approximate \(E(X)\) for large \(n\) by imagining this process as a Markov Chain.



Analytical Solution:

Let us create a Markov Chain with two states: ‘Start a New Run’ (1) and ‘Do not start a New Run’ (2). We oscillate between the two states by flipping the coin. For example, we start in the first (new run) state, since no matter what our first flip is, we have started a run. If we flip an \(H\) and then and \(H\), we transfer from state 1 to state 2, since our second state did not start a run.

\(E(X)\) in this case is how much time we expect to spend in state 1, since for each visit to state 1 we start a new run. This is simply the stationary distribution, which for this chain is uniform over the states because the transition matrix is symmetric (the transition matrix is a 2x2 matrix filled with 1/2). So, we expect to start a new run half of the time, meaning that we approximate \(E(X) \approx n/2\) for large \(n\). This is similar to the result we’ve seen that \(E(X) = 50.5\) when \(n = 100\). The slight difference (50.5 instead of 50) is because we start in state 1, so we are guaranteed that this first flip starts a run; as \(n\) becomes large, this first flip becomes more and more negligible.


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#define a large (but not too large!) n
n = 100

#count how many runs we get 
#we know we must start with one run
runs = rep(1, sims)

#run the loop
for(i in 1:sims){
  
  #keep track of the chain of flips; start with one flip
  flips = sample(c("H", "T"), 1)
  
  #iterate through n
  for(j in 2:n){
    
    #flip again
    flips = c(flips, sample(c("H", "T"), 1))
    
    #see if we started a run; iterate if we did
    if(flips[j] != flips[j - 1]){
      runs[i] = runs[i] + 1
    }
  }
}

#should get 50.5 (which is close to 50)
mean(runs)
## [1] 50.75




10.2

You are repeatedly rolling a fair, six-sided die. Let \(X_n\) be the cumulative sum of the rolls after \(n\) rolls (so, if we roll a 2 and then a 5, \(X_1 = 2\) and \(X_2 = 7\)). Find the probability that \(X_n\) ever hits 7.

Hint: You may leave your answer as a system of equations. You can quickly solve systems of equations in R with the command \(solve(a, b)\), where \(a\) is the matrix containing the coefficients of the system, and \(b\) is the vector giving the RHS of the system.



Analytical Solution:

Define \(Y_j\) as the probability of hitting \(7\), if we start from \(j\). By construction, \(Y_j = 0\) when \(j > 8\) (once we go above 7, we can’t come back). We are interested in \(Y_0\).

First, we can write that \(Y_6 = 1/6\). If we are currently at a cumulative sum of 6, we can either roll a 1 and hit 7 (with probability 1/6) or not hit 1 and go above 7 forever. Then, we can write \(Y_5 = (1/6)Y_6 + 1/6\). From 5, we either go to 7 (roll a 2) with probability 1/2, or we go to 6 with probability 1/6 and then have a \(Y_6\) probability of getting to 7. If we continue in this way, we get a system of equations:

\[Y_6 = 1/6\] \[Y_5 = 1/6(1 + Y_6)\] \[Y_4 = 1/6(1 + Y_6 + Y_5)\] \[Y_3 = 1/6(1 + Y_6 + Y_5 + Y_4)\] \[Y_2 = 1/6(1 + Y_6 + Y_5 + Y_4 + Y_3)\] \[Y_1 = 1/6(1 + Y_6 + Y_5 + Y_4 + Y_3 + Y_2)\] \[Y_0 = 1/6(Y_6 + Y_5 + Y_4 + Y_3 + Y_2 + Y_1)\]

This is a system of equations that we can solve, as suggested in the hint. We can solve this system in R with the following code.

A <- matrix(c(0, 0, 0, 0, 0, 0, 1,
              0, 0, 0, 0, 0, 1, -1/6,
              0, 0, 0, 0, 1, -1/6, -1/6,
              0, 0, 0, 1, -1/6, -1/6, -1/6,
              0, 0, 1, -1/6, -1/6, -1/6, -1/6,
              0, 1, -1/6, -1/6, -1/6, -1/6, -1/6,
              1, -1/6, -1/6, -1/6, -1/6, -1/6, -1/6),
              nrow = 7, ncol = 7, byrow = TRUE)
B <- c(rep(1/6, 6), 0)

#we want the first entry
solve(A, B)[1]
## [1] 0.2536044

The first entry is the what we are solving for, \(Y_0\). We get \(Y_0 \approx .25\).


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#keep track of successes (hitting 7)
success = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  
  #keep track of the cumulative sum; initialize here
  cum.sum = 0
  
  #go until we get at or above 7
  while(cum.sum < 7){
    
    #roll again
    cum.sum = cum.sum + sample(1:6, 1)
  }
  
  #see if we got 7
  if(cum.sum == 7){
    success[i] = 1
  }
}

#should get .25
mean(success)
## [1] 0.254




10.3

Let \(X\) be the Markov Chain with transition matrix given by:

\[\left(\begin{array} 1 q & p & 0 & 0 & 0 & 0\\ 0 & q & p & 0 & 0 & 0\\ 0 & 0 & q & p & 0 & 0\\ 0 & 0 & 0 & q & p & 0\\ 0 & 0 & 0 & 0 & q & p\\ 0 & 0 & 0 & 0 & 0 & 1\\ \end{array}\right)\]

Let this chain start at State 0 (the first state), and let \(T\) be the time it takes, in discrete steps, until absorption (i.e., get to State 5). Find the PMF of \(T\).



Analytical Solution:

By the story of the Negative Binomial, we say \((T - 5) \sim NBinom(5, p)\), since \(T\) counts all of the ‘failures’ (where failure means not moving ‘up’ one state) in addition to 5 ‘successes’ (where succes means moving ‘up’ one state, like from State 3 to State 4). Therefore, using the PMF of a Negative Binomial:

\[P(T - 5 = x) = {5 + x - 1 \choose 5 - 1} p^5 q^x\]

Adding 5 to both sides in \(P(T - 5 = x)\) yields:

\[P(T = x + 5) = {5 + x - 1 \choose 5 - 1} p^5 q^x\]

Letting \(t = x + 5\), we can plug in \(t - 5\) for \(x\):

\[P(T = t) = {t - 1 \choose 5 - 1} p^5 q^{t - 5}\]


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#keep track of time until absorption (use Y instead of T)
Y = rep(NA, sims)

#define a simple parameter
p = 1/2
q = 1 - p

#run the loop
for(i in 1:sims){
  
  #initialize the chain at 0
  X = 0
  
  #go until we hit state 5
  while(X[length(X)] < 5){
    
    #flip to see if we move up
    flip = runif(1)
    
    #case where we move up
    if(flip <= p){
      
      #move up one step
      X = c(X, X[length(X)] + 1)
    }
    
    #case where we move down
    if(flip > p){
      
      #stay
      X = c(X, X[length(X)])
    }
  }
  
  #count how long we ran the chain (don't count where we started as a step)
  Y[i] = length(X) - 1
}


#show that the PMFs line up
#calculate the analytical PMF
k = as.numeric(rownames(table(Y)))
PMF = choose(k - 1, 5 - 1)*p^5*q^(k - 5)


#plots should line up
#empirical
plot(k, table(Y)/sims, col = "black", main = "Empirical and Analytical PMF", type = "h",
     xlab = "x", ylab = "P(X = x)", xlim = c(min(k), max(k)), ylim = c(0, 1), lwd = 3)

#analytical
lines(k, PMF, main = "Analytical PMF", ylab = "P(X = x)", xlab = "x", col = "red", pch = 20, ylim = c(0, 1), type = "p", lwd = 3)


legend("topright", legend = c("Empirical PMF", "Analytical PMF"),
       lty=c(1,20), lwd=c(2.5,2.5),
       col=c("black", "red"))




10.4

Consider this Markov Chain:

The arrows are not labeled because each arrow has probability \(1/2\); that is, there is an equal probability of going clockwise or counterclockwise at every step. Let \(T\) be the number of steps it takes to return to State 1. Find \(E(T)\) without using facts about the stationary distribution.

Hint: You may leave your answer as a system of equations. You can quickly solve systems of equations in R with the command \(solve(a, b)\), where \(a\) is the matrix containing the coefficients of the system, and \(b\) is the vector giving the RHS of the system.



Analytical Solutions:

Let \(T_i\) be the expected number of steps needed to return to State 1 if we start at Stat \(i\). We are interested in \(T_1\). We can write:

\[T_1 = 1 + (T_2 + T_5)/2\]

Since, from \(T_1\), we take one step and either go to State 2 or State 5 with equal probabilities. Continuing in this way, we write:

\[T_2 = 1 + T_3/2\] \[T_3 = 1 + (T_4 + T_2)/2\] \[T_4 = 1 + (T_5 + T_3)/2\] \[T_5 = 1 + T_4/2\]

This is a system of equations that we can solve, as suggested in the hint. We can solve this system in R with the following code.

A <- matrix(c(1, -1/2, 0, 0, -1/2,
              0, 1, -1/2, 0, 0,
              0, -1/2, 1, -1/2, 0,
              0, 0, -1/2, 1, -1/2,
              0, 0, 0, -1/2, 1),
              nrow = 5, ncol = 5, byrow = TRUE)
B <- rep(1, 5)

#we want the first entry
solve(A, B)[1]
## [1] 5


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#keep track of the return time
Y = rep(NA, sims)

#run the loop
for(i in 1:sims){
  
  #initialize X
  X = sample(c(2, 5), 1)
  
  #go until we get back to 1
  while(TRUE){
    
    #flip to see if we go up or down
    flip = runif(1)
    
    #go up
    if(flip <= 1/2){
      
      #corner case is 5
      if(X[length(X)] == 5){
        X = c(X, 1)
        
        #break if we get to 1
        break
      }
      
      #usual case
      if(X[length(X)] < 5){
        X = c(X, X[length(X)] + 1)
      }
    }
    
    #go down
    if(flip > 1/2){
      
      #usual case
      if(X[length(X)] > 1){
        X = c(X, X[length(X)] - 1)
      }
      
    
    }
    
    #see if we're at 1
    if(X[length(X)] == 1){
      break
    }
  }

  #count how long the return took 
  Y[i] = length(X) 
}

#should get 5
mean(Y)
## [1] 4.656




10.5

Consider this Markov Chain from the previous problem:

Again, the arrows are not labeled because each arrow has probability \(1/2\); that is, there is an equal probability of going clockwise or counterclockwise at every step. Let \(T\) be the number of steps it takes to return to State 1. Find \(E(T)\) by using facts about the stationary distribution.



Analytical Distribution:

The transition matrix is symmetric (think about how walking along this path is symmetric), so the stationary distribution is uniform. Since the return time to State \(i\) is the reciprocal of the \(i^{th}\) entry in the stationary distribution, and the stationary distribution is uniform over the 5 states (i.e., \(s = (1/5, 1/5, 1/5, 1/5, 1/5)\)), we can say that \(E(T) = \frac{1}{1/5} = 5\), as we saw earlier.


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#keep track of the return time
Y = rep(NA, sims)

#run the loop
for(i in 1:sims){
  
  #initialize X
  X = sample(c(2, 5), 1)
  
  #go until we get back to 1
  while(TRUE){
    
    #flip to see if we go up or down
    flip = runif(1)
    
    #go up
    if(flip <= 1/2){
      
      #corner case is 5
      if(X[length(X)] == 5){
        X = c(X, 1)
        
        #break if we get to 1
        break
      }
      
      #usual case
      if(X[length(X)] < 5){
        X = c(X, X[length(X)] + 1)
      }
    }
    
    #go down
    if(flip > 1/2){
      
      #usual case
      if(X[length(X)] > 1){
        X = c(X, X[length(X)] - 1)
      }
      
    
    }
    
    #see if we're at 1
    if(X[length(X)] == 1){
      break
    }
  }

  #count how long the return took 
  Y[i] = length(X) 
}

#should get 5
mean(Y)
## [1] 4.656




10.6

Let \(X \sim Pois(\lambda)\). Imagine a Markov Chain with \(X + 1\) states and transition matrix \(Q\) such that \(q_{i, i + 1} = p\) and \(q_{i, i} = 1 - p\) for \(i = 0, 1, ..., x\) and \(q_{x + 1, x + 1} = 1\). Let \(M_t\) be the value of the chain (the current state) at time \(t\), and let \(M_0 = 0\) (start at 0). Find the PMF of \(M_X\).



Analytical Solution:

Consider moving through the chain in a simple case with 3 steps. We start at 0, and then we have 3 ‘trials’: we run the chain for 3 steps, and each step we go up a state with probability \(p\). This is the same story as the Chicken-Egg result. We have \(X \sim Pois(\lambda)\) trials, and each independently has probability \(p\) of success (moving up a state). So, \(M_X \sim Pois(\lambda p)\). We can write the PMF:

\[P(M_X = m) = \frac{e^{-\lambda p}(\lambda p)^{m}}{m!}\]


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#define simple parameters
lambda = 5
p = 1/2

#keep track of M_X 
MX = rep(0, sims)


#run the loop
for(i in 1:sims){
  
  #generate the number of states
  X = rpois(1, lambda)
  
  #run the chain for X steps
  for(j in 1:X){
    
    #flip to see if we move up
    flip = runif(1)
    
    #move up
    if(flip <= p){
      MX[i] = MX[i] + 1
    }
  }
}



#show that the PMFs line up
#calculate the analytical PMF
k = as.numeric(rownames(table(MX)))
PMF = exp(-lambda*p)*(lambda*p)^k/factorial(k)


#plots should line up
#empirical
plot(k, table(MX)/sims, col = "black", main = "Empirical and Analytical PMF", type = "h",
     xlab = "m", ylab = "P(M_X = m)", xlim = c(min(k), max(k)), ylim = c(0, 1), lwd = 3)

#analytical
lines(k, PMF, col = "red", pch = 20, type = "p", lwd = 3)


legend("topright", legend = c("Empirical PMF", "Analytical PMF"),
       lty=c(1,20), lwd=c(2.5,2.5),
       col=c("black", "red"))




10.7

Consider this Markov Chain, where \(a\) and \(b\) are valid probabilities.

Let \(T\) be the time of absorption (time of arrival at State 3). Imagine that we start at State 1.

  1. If \(a = b = 1/2\), find \(E(T)\).
  2. If \(a \neq b\), find \(E(T)\).



Analytical Solution:

  1. If \(a = b = 1/2\), then \(T \sim FS(1/2)\), since each time we have a 1/2 probability of success, where success is defined as arriving at State 3. So, \(E(T) = 2\) in this case.

  2. We know that the chain will start at 1. However, let’s consider the expected time until absorption from States 1 and 2. Let \(T_i\) be the expected time until absorption given that we start at State \(i\). We write:

\[E(T_1) = 1 - a + a(1 + E(T_2))\]

\[E(T_2) = 1 - b + b(1 + E(T_1))\]

Since from both State 1 and State 2, we either go to State 3 (absorbed) or the other non-absorbing state. To clean up notation, allow \(q_a = 1 - a\) and \(q_b = 1 - b\). We are interested in \(E(T_1)\), since we start in State 1, so we plug the second equation into the first equation.

\[E(T_1) = q_a + a\big(1 + q_b + b(1 + E(T_1))\big)\] \[= q_a + a\big(1 + q_b + b + bE(T_1)\big)\] \[q_a + a + aq_b + ab + abE(T_1)\]

\[= 1 + a + abE(T_1)\]

Since \(a + q_a = 1\) and \(a(q_b + b) = a\). Solving:

\[E(T) = \frac{1 + a}{1 - ab}\]

As a sanity check, when \(a = b = 1/2\), this comes out to 2, which confirms what we saw in part a.


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#keep track of time until absorption; call it Y
Y = rep(NA, sims)

#define simple parameters
a = 1/4
b = 3/4

#define the transition matrix
Q = matrix(c(0, a, 1 - a,
             b, 0, 1 - b,
             0, 0, 1), nrow = 3, ncol = 3, byrow = TRUE)
#run the loop
for(i in 1:sims){
  
  #start in State 1
  X = 1
  
  #go until we hit 3
  while(X[length(X)] < 3){
    
    #sample a new X from the MC
    new.X = sample(1:3, 1, prob = Q[X[length(X)], ])
    
    #combine 
    X = c(X, new.X)
  }
  
  #see how long it took to get to 3
  Y[i] = length(X) - 1
}


#these should match
(1 + a)/(1 - a*b); mean(Y)
## [1] 1.538462
## [1] 1.504




10.8

Are the following process, \(X_t\), Markovian? That is, do they satisfy the Markov property?

  1. Imagine a “branching process” with offspring distribution \(Pois(\lambda)\), where \(\lambda\) is known. That is, at generation 0, there is 1 individual who has \(Pois(\lambda)\) offspring in his lifetime, and each of his descendants independently has \(Pois(\lambda)\) offspring. Let \(X_t\) be the size of the \(t^{th}\) generation.

  2. You repeatedly roll a fair die. Let \(X_t\) be the value of the \(t^{th}\) roll.

  3. You repeatedly roll a fair die. Let \(X_t\) be the mean of roll 1, roll 2, up to roll \(t - 1\).

  4. Let \(X_t = X_{t - 1} + \epsilon_t\) where \(X_0 = 0\) and \(\epsilon_i \sim N(0, \sigma^2)\), where \(\sigma^2\) is known.

  5. Same as part d., but \(\sigma^2\) is unknown.



Analytical Solution:

  1. Yes, this is Markovian. If we know \(X_{t - 1}\), or the size of the \((t - 1)^{st}\) generation, we have all of the relevant information to find the distribution of \(X_t\), since each of the \(X_{t - 1}\) children independently have \(Pois(\lambda)\) offspring (the past gives no further information).

  2. Yes. Rolls are independent, so looking in the past gives no information. This is actually a stronger condition than the Markov property, which states that only the previous state gives information; however, this is still Markov, since information before \(X_{t - 1}\) does not give any more information than \(X_t\) (they both give no information about predicting \(X_t\)).

  3. Yes. We can think of \(X_t\) as the sum of \(X_1, X_2, ..., X_t\), divided by \(t\). If we know \(X_{t - 1}\), then we know the sum up to time \((t - 1)\). It does not matter how we got to this sum (i.e., if we rolled all 3’s, or rolled a mixture of 1’s and 6’s), it only matters that we arrived at this sum (there is no more useful information further in the past).

  4. Yes. If we know \(X_{t - 1}\), the only random part of \(X_t\) is \(\epsilon\), which is marginally Normal and independent of the past.

  5. No. Imagine if \(X_{t - 1} = 0\). Looking further back in the past gives information on how we got to \(X_{t - 1} = 0\), which gives information about \(\sigma^2\). For example, if \(X_1, X_2, ..., X_{t - 1}\) are all 0, then \(\sigma^2\) is probably very small. If \(X_1, X_2, ..., X_{t - 1}\) are all very far from 0 and we simply happened to bounce back to 0 at time \((t - 1)\), then \(\sigma^2\) is likely large. This influences our prediction on \(X_t\) (if the variance is small, we will predict a value close to \(X_{t - 1}\); if the variance is large, we likely won’t).




10.9

Cameron is performing a simple, symmetric random walk on the integers; that is, at every integer, he either goes up 1 integer (i.e., 5 to 6) or down 1 integer (i.e., 1 to 0). Let \(Y_t\) be his location at time \(t\). He starts at 0 at time 1, so \(Y_1 = 0\).

We can plot a sample path to show what this process looks like. The x-axis is \(t\), or the number of rounds, and the y-axis is \(Y_t\), or Cameron’s location at time \(t\).

  1. Is \(Y_t\) a Markovian process? That is, does it have the Markov property?

  2. Find \(P(Y_{2t + 1} > 0)\). You may leave your answer as a sum.



Analytical Solution:

  1. Yes, this is Markovian. If we know \(Y_{t - 1}\), then \(Y_{t}\) is either \(Y_{t - 1} + 1\) or \(Y_{t - 1} - 1\), with equal probabilties, independently of the past before \(t - 1\).

  2. When \(t = 1\), we start at 0. Let \(X\) be the number of up moves that we make in the \(2t\) steps from time 1 to time \(2t + 1\). Since every move is independently an up move with probability 1/2, we can say that \(X \sim Bin(2t, 1/2)\). We can the say that \(P(Y_{2t + 1} > 0) = P(X > t)\), since if we have more than \(t\) up moves, then we have more up moves than down moves and thus we will end above 0 at time \(2t + 1\). We can then simply sum the PMF of a Binomial to find this answer:

\[P(X > t) = \sum_{x = t + 1}^{2t} {2t \choose x} \frac{1}{2^{2t}}\]

Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#define a simple parameter
t = 10

#keep track of Y_{2t + 1}
Y2t = rep(NA, sims)

#run the loop
for(i in 1:sims){
  
  
  #start at 0
  Y = 0
  
  #take t steps
  for(j in 1:(2*t)){
    
    #move up or down with equal probabilities
    flip = runif(1)
    
    #move up
    if(flip <= 1/2){
      Y = c(Y, Y[length(Y)] + 1)
    }
    
    #move down
    if(flip > 1/2){
      Y = c(Y, Y[length(Y)] - 1)
    }
  }
  
  #mark the last value
  Y2t[i] = Y[2*t + 1]
}

#these should match
1 - pbinom(t, 2*t, 1/2); length(Y2t[Y2t > 0])/sims
## [1] 0.4119015
## [1] 0.408




10.10

Cameron is performing a simple, symmetric random walk on the integers; that is, at every integer, he either goes up 1 integer (i.e., 5 to 6) or down 1 integer (i.e., 1 to 0). Let \(Y_t\) be his location at time \(t\). He starts at 0 at time 0, so \(Y_0 = 0\).

Find \(P(Y_{2t} < 0 \cap max(Y_1, Y_2, ..., Y_{2t}) > 3)\), i.e. the probability that the path hits 3 and ends below 0. You may leave your answer as a sum.

Hint: the ‘reflection principle’, which applies to this random walk, states that reflecting a path about a horizontal line (i.e., mimicking a path up until some time \(t\), and then reflecting a path for the rest of the time interval) creates a path that has the same probability as the original path. Consider the visual below: the original path is in black and the reflected path is in red. The reflected path mimics the original path up until the time that the original path hits a value of 2, and then for the rest of the interval, the reflected path ‘reflects’ the original path across the horizontal line \(y = 2\). The reflection principle states that the probability of these two paths are the same.



Analytical Solution:

By the reflection principle, a path that hits 3 and ends below 0 has the same probability as a path that ends below 6 (consider reflecting the original path from the moment that it hits 3). An example is below, for visual intuition:

So, we simply have to find the probability that this random walk ends above 6. We can consider \(X\) as the number of ‘up moves’ in the \(2t\) steps. We can then say that \(P(Y_{2t} > 6) = P(X > t + 3)\), since if we see more than \(t + 3\) up moves, we will have more than 6 up moves more than down moves, which means we will end above 6. We know that \(X \sim Bin(2t, 1/2)\), so we can find this value by summing the PMF of a Binomial:

\[P(X > t + 3) = \sum_{x = t + 4}^{2t} {2t \choose x} \frac{1}{2^{2t}}\]


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#define a simple parameter
t = 10

#indicator if we hit 3 and end below 0
success = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #initialize Y
  Y = integer(0)
  
  #take t steps
  for(j in 1:(2*t)){
    
    #move up or down with equal probabilities
    flip = runif(1)
    
    #if it's the first time, go to -1 or 1
    if(j == 1){
      #go to 1
      if(flip <= 1/2){
        Y = 1
      }
      
      #go to -1
      if(flip > 1/2){
        Y = -1
      }
    }
    
    #move up or down if it's not the first time
    if(j > 1){
      
      #move up
      if(flip <= 1/2){
        Y = c(Y, Y[length(Y)] + 1)
      }
      
      #move down
      if(flip > 1/2){
        Y = c(Y, Y[length(Y)] - 1)
      }
    }
  }
  
  #see if we hit 3
  if(max(Y) >= 3){
    
    #see if we ended below 0
    if(Y[2*t] < 0){
      
      #mark a success
      success[i] = 1
    }
  }
}

#these should match
1 - pbinom(t + 3, 2*t, 1/2); mean(success)
## [1] 0.05765915
## [1] 0.063




10.11

A particle is traveling randomly around the vertices of a square. If can only move across vertices connected by edges in one step (i.e., from the bottom left, it can travel to the top left or bottom right, but not the top right). If the particle starts in the top left vertex, what is the probability that it visits the bottom right vertex before it visits the bottom left vertex?



Analytical Solution:

Define the vertexes top left, top right, bottom left, bottom right as states (1,2,3,4), respectively. Now define \(X_i\) as the probability that we visit state 4 before we visit state 3 given that we start in state \(i\), where \(i = 1,2,3,4\). It’s clear that \(X_3 = 0\) and \(X_4 = 1\) (if we start in state 3, the condition cannot hold; if we start in state 4, it must!). Consider, now, \(X_1\) and \(X_2\):

\[X_1 = X_2/2\] \[X_2 = X_1/2 + 1/2\]

The first equation holds because, from state 1, there is a \(1/2\) probability of traveling to state 3 (and thus not getting to state 4 before state 3) and a \(1/2\) probability of going to state 2 and thus now having probability \(X_2\) of getting to state 4 before state 1 (similar reasons justify the second equation). We are interested in \(X_1\), which we can quickly solve by plugging in \(2X_1 = X_2\) into the second equation:

\[3X_1/2 = 1/2\] \[X_1 = 1/3\]

So we have a 1/3 chance of visiting state 4 before state 3. This is intuitive; we are right next to state 3 and two steps from state 4, so it should be more likely to visit state 3 first. It’s also intuitive that \(X_2 = 2/3\) for similar reasons.


Empirical Solution:

#replicate
set.seed(110)
sims = 1000

#create a matrix of edges
#a 1 means that you can travel from the i^th to j^th state
E = matrix(c(0, 1, 1, 0,
             1, 0, 0, 1,
             1, 0, 0, 1,
             0, 1, 1, 0), 
             nrow = 4, ncol = 4, byrow = TRUE)

#indicator of successes (getting to state 4 before state 3)
success = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #keep track of the current state; initialize here
  state = 1
  
  #go until we get to state 3 or 4, then break
  while(TRUE){
    
    #sample a new state, using the edges as proportional to probabilities
    state = sample(1:4, 1, prob = E[state, ])
    
    #see if we got state 3 or 4
    if(state == 3 || state == 4){
      break
    }
  }
  
  #mark if we got to state 4
  if(state == 4){
    success[i] = 1
  }
}

#should get 1/3
mean(success)
## [1] 0.35




10.12

Consider these two Markov Chains (an \(\alpha\) chain and a \(\beta\) chain).



Count the number of recurrent states and the number of aperiodic states in each chain, then indicate if the chains are irreducible.



Analytical Solution:

In the \(\alpha\) chain, States \(\alpha_4, \alpha_5, \alpha_6\) are recurrent (the rest are transient). All of the states in the \(\beta\) chain are recurrent (if we start in each of these states, we are guaranteed to return to that state). The \(\alpha_2\) and \(\alpha_1\) States are aperiodic, while the rest are periodic with period of 3. All \(\beta\) states are aperiodic. Neither state is irreducible (the \(\alpha\) chain eventually leaves \(\alpha_1\) and \(\alpha_2\) forever, and the \(\beta_4\) state in the \(\beta\) chain is an island.


BH Problems



The problems in this section are taken from @BH. The questions are reproduced here, and the analytical solutions are freely available online. Here, we will only consider empirical solutions: answers/approximations to these problems using simulations in R.




We will be using the function heat.110, which is a personalized, slightly modified version of the heatmap.2 function in the gplots package. The important parts of the function (i.e., the actual plotting) are nearly identical; we just make minor aesthetic alterations (i.e., axis labeling, etc.).



BH 11.2

Let \(X_0,X_1,X_2, ...\) be an irreducible Markov chain with state space \(\{1,2,..., M\}\), \(M \geq 3\), transition matrix \(Q=(q_{ij})\), and stationary distribution \(\mathbf{s}=(s_1,...,s_M)\). Let the initial state \(X_0\) follow the stationary distribution, i.e., \(P(X_0 = i) = s_i\).

  1. On average, how many of \(X_0,X_1, \dots, X_9\) equal \(3\)? (In terms of \(\mathbf{s}\); simplify.)
#replicate
set.seed(110)
sims = 1000

#define a simple value for M
M = 5

#define a simple stationary distribution; uniform!
s = rep(1/M, M)

#if the first state X0 has stationary distribution s, then 
#  every row of Q must by s by construction
Q = matrix(1/M, nrow = M, ncol = M)

#count how many times we visit 3
state3 = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #generate the first state from s
  X = sample(1:M, 1, prob = s)
  
  #see if we got state 3
  if(X == 3){
    state3[i] = state3[i] + 1
  }
  
  #run the chain 9 more times
  for(j in 1:9){
    
    #transition to the next state
    X = sample(1:M, 1, prob = Q[X, ])
    
    #see if we got state 3
    if(X == 3){
      state3[i] = state3[i] + 1
    }
  }
}

#should get 10/M = 2
mean(state3)
## [1] 1.948
  1. Let \(Y_n = (X_n-1)(X_n-2)\). For \(M=3\), find an example of \(Q\) (the transition matrix for the original chain \(X_0,X_1, ...\)) where \(Y_0,Y_1, ...\) is Markov, and another example of \(Q\) where \(Y_0,Y_1, ...\) is not Markov. In your examples, make \(q_{ii}>0\) for at least one \(i\) and make sure it is possible to get from any state to any other state eventually.




BH 11.4

Consider the Markov chain shown below, where \(0 < p < 1\) and the labels on the arrows indicate transition probabilities.

  1. Write down the transition matrix \(Q\) for this chain.
#define a simple parameter 
p = 1/4

#define the transition matrix
Q = matrix(c(1/4, 3/4, 3/4, 1/4), nrow = 2, ncol = 2)
  1. Find the stationary distribution of the chain.
#take the eigenvectors of the transpose of Q
eigenvectors = eigen(t(Q))

#select the vector that corresponds to eigenvalue 1
#this is the stationary distribution; take the absolute value
s = eigenvectors$vectors[, eigenvectors$values == 1]

#normalize s
s = s/sum(s)

#should get (1/2, 1/2)
s
## [1] 0.5 0.5
  1. What is the limit of \(Q^n\) as \(n \to \infty\)?
#should approach a matrix full of 1/2
gradient = colorpanel(1000, low = "gray78", high = "red")

heat.110(Q%^%100, main = "100-step Transition Matrix", xlab = "State j", 
    ylab = "State i", dendrogram = 'none', col = gradient, 
    trace = 'none', Rowv = FALSE, Colv = FALSE, key = FALSE,
    lwid=c(.65,3), lhei=c(1,3.25), margins = c(3.5, 5))




BH 11.5

Consider the Markov chain shown below, with state space \(\{1,2,3,4\}\) and the labels on the arrows indicate transition probabilities.

  1. Write down the transition matrix \(Q\) for this chain.
#define Q
Q = matrix(c(1/2, 1/2, 0, 0,
             1/4, 3/4, 0, 0,
             0, 0, 1/4, 3/4, 
             0, 0, 3/4, 1/4), 
             nrow = 4, ncol = 4, byrow = TRUE)

gradient = colorpanel(1000, low = "gray78", high = "red")

heat.110(Q, main = "Transition Matrix", xlab = "State j", 
    ylab = "State i", dendrogram = 'none', col = gradient, 
    trace = 'none', Rowv = FALSE, Colv = FALSE, key = FALSE,
    lwid=c(.65,3), lhei=c(1,3.25), margins = c(3.5, 5))

  1. Which states (if any) are recurrent? Which states (if any) are transient?
#see part c.
  1. Find two different stationary distributions for the chain.
#start in state 1 or 2; run the 
gradient = colorpanel(1000, low = "gray78", high = "red")

#observe two stationary distributions: for states 1 and 2, then states 3 and 4
#this also shows that the states are recurrent
heat.110(Q%^%100, main = "100-step matrix", xlab = "State j", 
    ylab = "State i", dendrogram = 'none', col = gradient, 
    trace = 'none', Rowv = FALSE, Colv = FALSE, key = FALSE,
    lwid=c(.65,3), lhei=c(1,3.25), margins = c(3.5, 5))

#find stationary distribution of states 1 and 2
Q12 = Q[1:2, 1:2]

#run the chain 100 times, should get s
s = (Q12%^%100)[1, ]

#should get 1/3, 2/3
s
## [1] 0.3333333 0.6666667
#find stationary distribution of states 1 and 2
Q34 = Q[3:4, 3:4]

#run the chain 100 times, should get s
s = (Q34%^%100)[1, ]

#should get 1/2, 212
s
## [1] 0.5 0.5




BH 11.6

Daenerys has three dragons: Drogon, Rhaegal, and Viserion. Each dragon independently explores the world in search of tasty morsels. Let \(X_n, Y_n, Z_n\) be the locations at time \(n\) of Drogon, Rhaegal, Viserion respectively, where time is assumed to be discrete and the number of possible locations is a finite number \(M\). Their paths \(X_0,X_1,X_2,...\); \(Y_0,Y_1,Y_2,...\); and \(Z_0,Z_1,Z_2,...\) are independent Markov chains with the same stationary distribution \(\mathbf{s}\). Each dragon starts out at a random location generated according to the stationary distribution.

  1. Let state \(0\) be home (so \(s_0\) is the stationary probability of the home state). Find the expected number of times that Drogon is at home, up to time \(24\), i.e., the expected number of how many of \(X_0,X_1,...,X_{24}\) are state 0 (in terms of \(s_0\)).
#replicate
set.seed(110)
sims = 1000

#define a simple value for M
M = 3

#define a simple stationary distribution: uniform!
s = rep(1/M, M)

#define Q
Q = matrix(1/M, nrow = M, ncol = M)

#count how many times he visits state 1 (home)
home = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #generate the first state
  X = sample(1:M, 1, prob = s)
  
  #see if we visited home (state 1)
  if(X == 1){
    home[i] = home[i] + 1
  }
  
  #run the chain 24 more times
  for(j in 2:25){
    
    #draw a new state
    X = sample(1:M, 1, prob = Q[X, ])
    
    #see if we visited home (state 1)
    if(X == 1){
      home[i] = home[i] + 1
    }
  }
}

#should get 25*s[1] = 8.3333
mean(home)
## [1] 8.36
  1. If we want to track all 3 dragons simultaneously, we need to consider the vector of positions, \((X_n,Y_n,Z_n)\). There are \(M^3\) possible values for this vector; assume that each is assigned a number from \(1\) to \(M^3\), e.g., if \(M=2\) we could encode the states \((0,0,0), (0,0,1), (0,1,0), ..., (1,1,1)\) as \(1,2,3, ..., 8\) respectively. Let \(W_n\) be the number between \(1\) and \(M^3\) representing \((X_n,Y_n,Z_n)\). Determine whether \(W_0,W_1, ...\) is a Markov chain.

  2. Given that all 3 dragons start at home at time \(0\), find the expected time it will take for all 3 to be at home again at the same time.

#keep track of how long it takes for all to return home
reunion = rep(NA, sims)

#run the loop
for(i in 1:sims){
  
  #keep track of the paths of the three dragons
  #initialize the three
  D = sample(1:M, 1, prob = s)
  R = sample(1:M, 1, prob = s)
  V = sample(1:M, 1, prob = s)
  
  #go until all return home (check the most recent visit)
  while(D[length(D)] != 1 || R[length(R)] != 1 || V[length(V)] != 1){
    
    D = c(D, sample(1:M, 1, prob = s))
    R = c(R, sample(1:M, 1, prob = s))
    V = c(V, sample(1:M, 1, prob = s))
  
  }
  
  #mark how long it took
  reunion[i] = length(D)
}

#should get 1/s[1]^3 = 27
mean(reunion)
## [1] 27.374




BH 11.7

A Markov chain \(X_0,X_1, ...\) with state space \(\{-3,-2,-1,0,1,2,3\}\) proceeds as follows. The chain starts at \(X_0=0\). If \(X_n\) is not an endpoint (\(-3\) or \(3\)), then \(X_{n+1}\) is \(X_n-1\) or \(X_n+1\), each with probability \(1/2\). Otherwise, the chain gets reflected off the endpoint, i.e., from \(3\) it always goes to \(2\) and from \(-3\) it always goes to \(-2\). A diagram of the chain is shown below.

  1. Is \(|X_0|,|X_1|,|X_2|, ...\) also a Markov chain? Explain.

  2. Let sgn be the sign function: \(sgn(x)= 1\) if \(x>0\), \(sgn(x) = -1\) if \(x<0\), and \(sgn(0)=0\). Is \(sgn(X_0),sgn(X_1),sgn(X_2),\dots\) a Markov chain? Explain.

  3. Find the stationary distribution of the chain \(X_0,X_1,X_2,\dots\).

#replicate
set.seed(110)
sims = 1000

#define the transition matrix
Q = matrix(c(0, 1, 0, 0, 0, 0,  0,
             1/2, 0, 1/2, 0, 0, 0, 0,
             0, 1/2, 0, 1/2, 0, 0, 0,
             0, 0, 1/2, 0, 1/2, 0, 0,
             0, 0, 0, 1/2, 0, 1/2, 0,
             0, 0, 0, 0, 1/2, 0, 1/2,
             0, 0, 0, 0, 0, 1, 0), 
             nrow = 7, ncol = 7, byrow = TRUE)

#plot Q
gradient = colorpanel(1000, low = "gray", high = "red")

heat.110(Q, main = "Transition Matrix", xlab = "State j", 
    ylab = "State i", dendrogram = 'none', col = gradient, 
    trace = 'none', Rowv = FALSE, Colv = FALSE, key = FALSE,
    lwid=c(.65,3), lhei=c(1,3.25), margins = c(3.5, 5))

#run the chain, start at 0 (here, we code as state 1)
X = 1

#run the chain
for(i in 2:sims){
  
  #draw a new state
  X = c(X, sample(1:nrow(Q), 1, prob = Q[X[i - 1], ]))
}

#transform X to match the problem
X = X - 4

#plot the proportion of time spent in each state
plot(table(X)/sims, ylab = "Proportion of time in state", xlab = "State",
     main = "Convergence of X to s (start at 0)",
     lwd = 3, col = "black")

#plot 1/12 and 2/12, where the two states should be 
abline(h = 1/12)
abline(h = 2/12)

#eigenvalue method for s
#take the eigenvectors of the transpose of Q
eigenvectors = eigen(t(Q))

#select the vector that corresponds to eigenvalue 1
#this is the stationary distribution; take the absolute value
s = abs(eigenvectors$vectors[, round(eigenvectors$values, 3) == 1.00])

#normalize s
s = s/sum(s)

#should get 1/12(1, 2, 2, 2, 2, 2, 1)
round(s, 2)
## [1] 0.08 0.17 0.17 0.17 0.17 0.17 0.08
  1. Find a simple way to modify some of the transition probabilities \(q_{ij}\) for \(i \in \{-3,3\}\) to make the stationary distribution of the modified chain uniform over the states.
#re-define the matrix to make it symmetric
Q = matrix(c(1/2, 1/2, 0, 0, 0, 0,  0,
             1/2, 0, 1/2, 0, 0, 0, 0,
             0, 1/2, 0, 1/2, 0, 0, 0,
             0, 0, 1/2, 0, 1/2, 0, 0,
             0, 0, 0, 1/2, 0, 1/2, 0,
             0, 0, 0, 0, 1/2, 0, 1/2,
             0, 0, 0, 0, 0, 1/2,  1/2), 
             nrow = 7, ncol = 7, byrow = TRUE)

#plot the symmetric Q
gradient = colorpanel(1000, low = "gray78", high = "red")

heat.110(Q, main = "Transition Matrix", xlab = "State j", 
    ylab = "State i", dendrogram = 'none', col = gradient, 
    trace = 'none', Rowv = FALSE, Colv = FALSE, key = FALSE,
    lwid=c(.65,3), lhei=c(1,3.25), margins = c(3.5, 5))

#plot the symmetric Q
#reverse colors this time so we can see (all values will be the same)
gradient = colorpanel(10, low = "red", high = "gray78")

#need to round so the heatmap is reasonable
heat.110(round(Q%^%1000, 3), main = "Transition Matrix", xlab = "State j", 
    ylab = "State i", dendrogram = 'none', col = gradient, 
    trace = 'none', Rowv = FALSE, Colv = FALSE, key = FALSE)

#eigenvalue method
#take the eigenvectors of the transpose of Q
eigenvectors = eigen(t(Q))

#select the vector that corresponds to eigenvalue 1
#this is the stationary distribution; take the absolute value
s = abs(eigenvectors$vectors[, round(eigenvectors$values, 3) == 1.00])

#normalize s
s = s/sum(s)

#should get a uniform distribution
round(s, 2)
## [1] 0.14 0.14 0.14 0.14 0.14 0.14 0.14




BH 11.8

(BH 11.8) Let \(G\) be an undirected network with nodes labeled \(1,2,...,M\) (edges from a node to itself are not allowed), where \(M \geq 2\) and random walk on this network is irreducible. Let \(d_j\) be the degree of node \(j\) for each \(j\). Create a Markov chain on the state space \(1,2,...,M\), with transitions as follows. From state \(i\), generate a proposal \(j\) by choosing a uniformly random \(j\) such that there is an edge between \(i\) and \(j\) in \(G\); then go to \(j\) with probability \(\min(d_i/d_j,1)\), and stay at \(i\) otherwise.

  1. Find the transition probability \(q_{ij}\) from \(i\) to \(j\) for this chain, for all states \(i,j\).
#replicate
set.seed(110)
sims = 1000

#define a simple value for the size of the state space
M = 5

#randomly connect nodes (draw an edge)
#choose a simple probability to connect nodes
p = 1/2

#set up a matrix to keep track of the edges
E = matrix(0, nrow = M, ncol = M)

#fill in the matrix
for(i in 1:(M - 1)){
  
  #randomly select if two nodes are connected
  E[i, (i + 1):M] = rbinom(M - i, 1, p)
  
  #can't have a row with 0's, so add a 1 if necessary
  if(sum(E[i, (i + 1):M]) == 0){
    E[i, i + 1] = 1
  }
}

#fold the matrix over the diagonal
for(i in 1:M){
  for(j in 1:M){
    E[j, i] = E[i, j]
  }
}

#look at the edges we have
gradient = colorpanel(1000, low = "gray78", high = "red")

heat.110(E, main = "Edges", xlab = "State j", 
    ylab = "State i", dendrogram = 'none', col = gradient, 
    trace = 'none', Rowv = FALSE, Colv = FALSE, key = FALSE,
    lwid=c(.65,3), lhei=c(1,3.25), margins = c(3.5, 5))

#calculate the degree matrix
d = rowSums(E)

#find the transition matrix; first set a path
Q = matrix(0, nrow = M, ncol = M)

#fill in the transition matrix for i != j
for(i in 1:M){
  
  for(j in 1:M){
    
    if(i != j){
      #find the probability of going from i to j
      #first, see if i and j are connected
      if(E[i, j] == 1){
        #generate the probability
        Q[i, j] = (1/d[i])*min(d[i]/d[j], 1)
      }
    }
  }
}

#round Q so it is easier to look at 
Q = round(Q, 2)

#fill in the probability of going from i to i
for(i in 1:M){
  Q[i, i] = 1 - sum(Q[i, ])
}

#plot Q, should match the analytical solutions
#part b. offers better proof that this is the correct Q
gradient = colorpanel(1000, low = "gray78", high = "red")

heat.110(Q, main = "Transition Matrix", xlab = "State j", 
    ylab = "State i", dendrogram = 'none', col = gradient, 
    trace = 'none', Rowv = FALSE, Colv = FALSE, key = FALSE,
    lwid=c(.65,3), lhei=c(1,3.25), margins = c(3.5, 5))

  1. Find the stationary distribution of this chain.
#take the eigenvectors of the transpose of Q
eigenvectors = eigen(t(Q))

#select the vector that corresponds to eigenvalue 1
#this is the stationary distribution; take the absolute value
s = eigenvectors$vectors[, round(eigenvectors$values, 3) == 1.00]

#normalize s
s = s/sum(s)

#should get a uniform distribution
s
## [1] 0.2 0.2 0.2 0.2 0.2
#reverse the colors so we can see (should get uniformity)
gradient = colorpanel(1000, low = "red", high = "gray78")

#round so we can see (we don't care about small differences)
heat.110(round(Q%^%100, 2), main = "Stationary Distribution", xlab = "State j", 
    ylab = "State i", dendrogram = 'none', col = gradient, 
    trace = 'none', Rowv = FALSE, Colv = FALSE, key = FALSE, 
    lwid=c(.65,3), lhei=c(1,3.25), margins = c(3.5, 5))




BH 11.9
  1. Consider a Markov chain on the state space \(\{1,2,...,7\}\) with the states arranged in a “circle” as shown below, and transitions given by moving one step clockwise or counterclockwise with equal probabilities. For example, from state 6, the chain moves to state 7 or state 5 with probability \(1/2\) each; from state 7, the chain moves to state 1 or state 6 with probability \(1/2\) each. The chain starts at state 1.

  2. Find the stationary distribution of this chain.

#replicate
set.seed(110)
sims = 1000

#define Q
Q = matrix(c(0, 1/2, 0, 0, 0, 0, 1/2,
             1/2, 0, 1/2, 0, 0, 0, 0,
             0, 1/2, 0, 1/2, 0, 0, 0,
             0, 0, 1/2, 0, 1/2, 0, 0,
             0, 0, 0, 1/2, 0, 1/2, 0,
             0, 0, 0, 0, 1/2, 0, 1/2,
             1/2, 0, 0, 0, 0, 1/2, 0), 
           nrow = 7, ncol = 7)

gradient = colorpanel(1000, low = "gray78", high = "red")

#view Q
heat.110(Q, main = "Transition Matrix", xlab = "State j", 
    ylab = "State i", dendrogram = 'none', col = gradient, 
    trace = 'none', Rowv = FALSE, Colv = FALSE, key = FALSE,
    lwid=c(.65,3), lhei=c(1,3.25), margins = c(3.5, 5))

#take the eigenvectors of the transpose of Q
eigenvectors = eigen(t(Q))

#select the vector that corresponds to eigenvalue 1
#this is the stationary distribution; take the absolute value
s = eigenvectors$vectors[, round(eigenvectors$values, 3) == 1.00]

#normalize s
s = s/sum(s)

#should get a uniform distribution
s
## [1] 0.1428571 0.1428571 0.1428571 0.1428571 0.1428571 0.1428571 0.1428571
#plot the stationary distribution
#reverse the colors so we can see (should get uniformity)
gradient = colorpanel(1000, low = "red", high = "gray78")

#view Q
heat.110(round(Q%^%100, 2), main = "Stationary Distribution", xlab = "State j", 
    ylab = "State i", dendrogram = 'none', col = gradient, 
    trace = 'none', Rowv = FALSE, Colv = FALSE, key = FALSE,
    lwid=c(.65,3), lhei=c(1,3.25), margins = c(3.5, 5))

  1. Consider a new chain obtained by “unfolding the circle”. Now the states are arranged as shown below. From state 1 the chain always goes to state 2, and from state 7 the chain always goes to state 6. Find the new stationary distribution.
#define Q
Q = matrix(c(0, 1, 0, 0, 0, 0, 0,
             1/2, 0, 1/2, 0, 0, 0, 0,
             0, 1/2, 0, 1/2, 0, 0, 0,
             0, 0, 1/2, 0, 1/2, 0, 0,
             0, 0, 0, 1/2, 0, 1/2, 0,
             0, 0, 0, 0, 1/2, 0, 1/2,
             0, 0, 0, 0, 0, 1, 0),
             nrow = 7, ncol = 7, byrow = TRUE)

gradient = colorpanel(1000, low = "gray78", high = "red")

#view Q
heat.110(Q, main = "Transition Matrix", xlab = "State j", 
    ylab = "State i", dendrogram = 'none', col = gradient, 
    trace = 'none', Rowv = FALSE, Colv = FALSE, key = FALSE,
    lwid=c(.65,3), lhei=c(1,3.25), margins = c(3.5, 5))

#take the eigenvectors of the transpose of Q
eigenvectors = eigen(t(Q))

#select the vector that corresponds to eigenvalue 1
#this is the stationary distribution; take the absolute value
s = eigenvectors$vectors[, round(eigenvectors$values, 3) == 1.00]

#normalize s
s = s/sum(s)

#should get 1/12(1, 2, 2, 2, 2, 2, 1)
round(s, 2)
## [1] 0.08 0.17 0.17 0.17 0.17 0.17 0.08
#run the chain, start at a random state
X = sample(1:7, 1)

#run the loop
for(i in 2:sims){
  
  #draw a new state
  X = c(X, sample(1:nrow(Q), 1, prob = Q[X[i - 1], ]))
}

#plot the proportion of time spent in each state
plot(table(X)/sims, 
     ylab = "Proportion of time in state", xlab = "State",
     main = "Convergence of X to s (start at 0)",
     lwd = 3, col = "black")

#plot 1/12 and 2/12, where the two states should be 
abline(h = 1/12)
abline(h = 2/12)




BH 11.10

(BH 11.10) Let \(X_n\) be the price of a certain stock at the start of the \(n\)th day, and assume that \(X_0, X_1, X_2, ...\) follows a Markov chain with transition matrix \(Q\). (Assume for simplicity that the stock price can never go below 0 or above a certain upper bound, and that it is always rounded to the nearest dollar.)

  1. A lazy investor only looks at the stock once a year, observing the values on days \(0, 365, 2 \cdot 365, 3 \cdot 365, \dots\). So the investor observes \(Y_0, Y_1, \dots,\) where \(Y_n\) is the price after \(n\) years (which is \(365n\) days; you can ignore leap years). Is \(Y_0,Y_1, \dots\) also a Markov chain? Explain why or why not; if so, what is its transition matrix?

  2. The stock price is always an integer between $0 and $28. From each day to the next, the stock goes up or down by $1 or $2, all with equal probabilities (except for days when the stock is at or near a boundary, i.e., at $0, $1, $27, or $28).

If the stock is at $0, it goes up to $1 or $2 on the next day (after receiving government bailout money). If the stock is at $28, it goes down to $27 or $26 the next day. If the stock is at $1, it either goes up to $2 or $3, or down to $0 (with equal probabilities); similarly, if the stock is at $27 it either goes up to $28, or down to $26 or $25. Find the stationary distribution of the chain.

#replicate
set.seed(110)
sims = 1000

#define the transition matrix
#define state 1 as $0, etc.
Q = matrix(0, nrow = 29, ncol = 29)

#fill in the matrix
Q[1, ] = c(0, 1/2, 1/2, rep(0, 26))
Q[2, ] = c(1/3, 0, 1/3, 1/3, rep(0, 25))
Q[28, ] = rev(Q[2, ])
Q[29, ] = rev(Q[1, ])

#run the loop
for(i in 3:27){
  
  #can go up 1 or 2, down 1 or 2 with equal probabilities
  Q[i, (i - 2):(i - 1)] = c(1/4, 1/4)
  Q[i, (i + 1):(i + 2)] = c(1/4, 1/4)
}

gradient = colorpanel(1000, low = "gray78", high = "red")

#view Q
heat.110(Q, main = "Transition Matrix", xlab = "State j", 
    ylab = "State i", dendrogram = 'none', col = gradient, 
    trace = 'none', Rowv = FALSE, Colv = FALSE, key = FALSE,
    lwid=c(.65,3), lhei=c(1,3.25), margins = c(3.5, 5))

#find s
#take the eigenvectors of the transpose of Q
eigenvectors = eigen(t(Q))

#select the vector that corresponds to eigenvalue 1
#this is the stationary distribution; take the absolute value
s = eigenvectors$vectors[, round(eigenvectors$values, 3) == 1.00]

#normalize s
s = s/sum(s)

#should get 1/110*c(2, 3, rep(4, 25), 3, 2)
round(s, 2)
##  [1] 0.02 0.03 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04
## [15] 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.03
## [29] 0.02
gradient = colorpanel(1000, low = "gray73", high = "red")

#plot s
heat.110(round(Q%^%1000, 2), main = "Stationary Distribution", xlab = "State j", 
    ylab = "State i", dendrogram = 'none', col = gradient, 
    trace = 'none', Rowv = FALSE, Colv = FALSE, key = FALSE,
    lwid=c(.65,3), lhei=c(1,3.25), margins = c(3.5, 5))

#run the chain, start at a random state
X = sample(1:28, 1)

#increase number of sims
for(i in 2:(sims*5)){
  
  #draw a new state
  X = c(X, sample(1:nrow(Q), 1, prob = Q[X[i - 1], ]))
}

#plot the proportion of time spent in each state
plot(table(X)/sims, ylab = "Proportion of time in state", xlab = "State",
     main = "Convergence of X to s (start at 0)",
     lwd = 3, col = "black")

#plot 1/12 and 2/12, where the two states should be 
abline(h = 1/12)
abline(h = 2/12)




BH 11.11

In chess, the king can move one square at a time in any direction (horizontally, vertically, or diagonally).

For example, in the diagram, from the current position the king can move to any of 8 possible squares. A king is wandering around on an otherwise empty \(8 \times 8\) chessboard, where for each move all possibilities are equally likely. Find the stationary distribution of this chain (of course, don’t list out a vector of length 64 explicitly! Classify the 64 squares into types and say what the stationary probability is for a square of each type).

#replicate
set.seed(110)
sims = 1000

#for simplicity, work with a 3x3 board
#   this board still has 3 'types' of squares: corner, edge, and regular
#   number the squares as if we were reading left to right and went down a row after each line
Q = matrix(c(0, 1/3, 0, 1/3, 1/3, 0, 0, 0, 0,
             1/5, 0, 1/5, 1/5, 1/5, 1/5, 0, 0, 0,
             0, 1/3, 0, 0, 1/3, 1/3, 0, 0, 0,
             1/5, 1/5, 0, 0, 1/5, 0, 1/5, 1/5, 0,
             1/8, 1/8, 1/8, 1/8, 0, 1/8, 1/8, 1/8, 1/8,
             0, 1/5, 1/5, 0, 1/5, 0, 0, 1/5, 1/5,
             0, 0, 0, 1/3, 1/3, 0, 0, 1/3, 0,
             0, 0, 0, 1/5, 1/5, 1/5, 1/5, 0, 1/5,
             0, 0, 0, 0, 1/3, 1/3, 0, 1/3, 0), 
             nrow = 9, ncol = 9, byrow = TRUE)

gradient = colorpanel(1000, low = "gray73", high = "red")

#view Q
heat.110(Q, main = "Transition Matrix", xlab = "State j", 
    ylab = "State i", dendrogram = 'none', col = gradient, 
    trace = 'none', Rowv = FALSE, Colv = FALSE, key = FALSE,
    lwid=c(.65,3), lhei=c(1,3.25), margins = c(3.5, 5))

#find s by raising Q to a power
s = round(Q%^%1000, 3)[1, ]

#transform s so we can interpret it as the board
s = matrix(s, nrow = 3, ncol = 3, byrow = TRUE)

#view s, on the board
heat.110(s, main = "Stationary Distribution (on the board)", xlab = "Bottom of Board", 
    ylab = "Side of Board", dendrogram = 'none', col = gradient, 
    trace = 'none', Rowv = FALSE, Colv = FALSE, key = FALSE,
    lwid=c(.65,3), lhei=c(1,3.25), margins = c(3.5, 5))

#unravel s
#there are length(Q[Q > 0]) = 40 total degrees, so should get 3/40, 5/40 and 8/40 in s
s = as.vector(s)
s
## [1] 0.075 0.125 0.075 0.125 0.200 0.125 0.075 0.125 0.075




BH 11.13

(BH 11.13) Find the stationary distribution of the Markov chain shown below, without using matrices. The number above each arrow is the corresponding transition probability.

#replicate
set.seed(110)
sims = 1000

#define the transition matrix
Q = matrix(c(0, 1, 0, 0, 0,
             1/2, 0, 1/2, 0, 0,
             0, 1/4, 5/12, 1/3, 0,
             0, 0, 1/6, 7/12, 1/4,
             0, 0, 0, 1/8, 7/8), nrow = 5, ncol = 5, byrow = TRUE)


gradient = colorpanel(1000, low = "gray73", high = "red")

#view Q
heat.110(Q, main = "Transition Matrix", xlab = "State j", 
    ylab = "State i", dendrogram = 'none', col = gradient, 
    trace = 'none', Rowv = FALSE, Colv = FALSE, key = FALSE,
    lwid=c(.65,3), lhei=c(1,3.25), margins = c(3.5, 5))

#find s
#take the eigenvectors of the transpose of Q
eigenvectors = eigen(t(Q))

#select the vector that corresponds to eigenvalue 1
#this is the stationary distribution; take the absolute value
s = eigenvectors$vectors[, round(eigenvectors$values, 3) == 1.00]

#normalize s
s = s/sum(s)

#should get 1/31*c(1, 2, 4, 8, 16)
round(s, 2)
## [1] 0.03 0.06 0.13 0.26 0.52
#plot s
heat.110(Q%^%100, main = "Stationary Distribution", xlab = "State j", 
    ylab = "State i", dendrogram = 'none', col = gradient, 
    trace = 'none', Rowv = FALSE, Colv = FALSE, key = FALSE,
    lwid=c(.65,3), lhei=c(1,3.25), margins = c(3.5, 5))




BH 11.17

A cat and a mouse move independently back and forth between two rooms. At each time step, the cat moves from the current room to the other room with probability 0.8. Starting from room 1, the mouse moves to room 2 with probability 0.3 (and remains otherwise). Starting from room 2, the mouse moves to room 1 with probability 0.6 (and remains otherwise).

  1. Find the stationary distributions of the cat chain and of the mouse chain.
#replicate
set.seed(110)
sims = 1000

#write the transition matrices
Q.cat = matrix(c(.2, .8,
                 .8, .2), nrow = 2, ncol = 2, byrow = TRUE)
Q.mouse = matrix(c(.7, .3,
                   .6, .4), nrow = 2, ncol = 2, byrow = TRUE)

#find the stationary distributions
#first the cat
eigenvectors = eigen(t(Q.cat))

#select the vector that corresponds to eigenvalue 1
#this is the stationary distribution; take the absolute value
s.cat = abs(eigenvectors$vectors[, round(eigenvectors$values, 3) == 1.00])

#normalize s, should get (1/2, 1/2)
s.cat = s.cat/sum(s.cat)

#now the mouse
eigenvectors = eigen(t(Q.mouse))

#select the vector that corresponds to eigenvalue 1
#this is the stationary distribution; take the absolute value
s.mouse = abs(eigenvectors$vectors[, round(eigenvectors$values, 3) == 1.00])

#normalize s, should get (2/3, 1/3)
s.mouse = s.mouse/sum(s.mouse)

#plot the cat chain and mouse chain
gradient = colorpanel(1000, low = "gray78", high = "red")

heat.110(Q.cat, main = "Cat", xlab = "Room at time t + 1", 
    ylab = "Room at time t", dendrogram = 'none', col = gradient, 
    trace = 'none', Rowv = FALSE, Colv = FALSE, key = FALSE,
    lwid=c(.65,3), lhei=c(1,3.25), margins = c(3.5, 5))

heat.110(Q.mouse, main = "Mouse", xlab = "Room at time t + 1", 
    ylab = "Room at time t", dendrogram = 'none', col = gradient, 
    trace = 'none', Rowv = FALSE, Colv = FALSE, key = FALSE,
    lwid=c(.65,3), lhei=c(1,3.25), margins = c(3.5, 5))

  1. Note that there are 4 possible (cat, mouse) states: both in room 1, cat in room 1 and mouse in room 2, cat in room 2 and mouse in room 1, and both in room 2. Number these cases \(1,2,3,4\), respectively, and let \(Z_n\) be the number representing the (cat, mouse) state at time \(n\). Is \(Z_0,Z_1,Z_2,...\) a Markov chain?

  2. Now suppose that the cat will eat the mouse if they are in the same room. We wish to know the expected time (number of steps taken) until the cat eats the mouse for two initial configurations: when the cat starts in room 1 and the mouse starts in room 2, and vice versa. Set up a system of two linear equations in two unknowns whose solution is the desired values.

#first, in the case where the cat starts in 1 and the mouse in 2

#count how may steps it takes to get in the same room
steps = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #initialize cat and mouse
  cat = 1
  mouse = 2
  
  #go until they are in the same room; then break the loop
  while(TRUE){
    
    #draw new values for cat and mouse
    cat = sample(1:2, 1, prob = Q.cat[cat, ])
    mouse = sample(1:2, 1, prob = Q.mouse[mouse, ])
    
    #increment our steps
    steps[i] = steps[i] + 1
    
    #see if they are in the same room
    if(cat == mouse){
      break
    }
  }
}

#should get 335/169 = 1.98
mean(steps)
## [1] 1.971
#first, in the case where the cat starts in 2 and the mouse in 1

#count how may steps it takes to get in the same room
steps = rep(0, sims)

#run the loop
for(i in 1:sims){
  
  #initialize cat and mouse
  cat = 2
  mouse = 1
  
  #go until they are in the same room; then break the loop
  while(TRUE){
    
    #draw new values for cat and mouse
    cat = sample(1:2, 1, prob = Q.cat[cat, ])
    mouse = sample(1:2, 1, prob = Q.mouse[mouse, ])
    
    #increment our steps
    steps[i] = steps[i] + 1
    
    #see if they are in the same room; break if so
    if(cat == mouse){
      break
    }
  }
}

#should get 290/169 = 1.71
mean(steps)
## [1] 1.714