2  Working with Probabilities

Example 2.1 The probability that a randomly selected U.S. household has a pet dog is 0.47. The probability that a randomly selected U.S. household has a pet cat is 0.25. (These values are based on the 2018 General Social Survey (GSS).)

  1. Represent the information provided using proper symbols.




  2. Donny Don’t says: “the probability that a randomly selected U.S. household has a pet dog OR a pet cat is \(0.47 + 0.25=0.72\).” Do you agree? What must be true for Donny to be correct? Explain. (Hint: for the remaining parts it helps to consider two-way tables.)





  3. What is the smallest possible value of the probability that a randomly selected U.S. household has a pet dog AND a pet cat? Describe the (unrealistic) situation in which this extreme case would occur.




  4. What is the largest possible value of the probability that a randomly selected U.S. household has a pet dog AND a pet cat? Describe the (unrealistic) situation in which this extreme case would occur. What would be the probability that a randomly selected U.S. household has a pet dog OR a pet cat in this scenario?




  5. Donny Don’t says: “I remember hearing once that in probability OR means add and AND means multiply. So the probability that a randomly selected U.S. household has a pet dog AND a pet cat is \(0.47 \times 0.25=0.1175\).” Do you agree? Explain.




  6. According to the GSS, the probability that a randomly selected U.S. household has a pet dog AND a pet cat is \(0.15\). Compute the probability that a randomly selected U.S. household has a pet dog OR a pet cat.




  7. Compute and interpret \(\text{P}(C \cap D^c)\).




2.1 Equally Likely Outcomes and Uniform Probability Measures

  • For a sample space \(\Omega\) with finitely many possible outcomes, assuming equally likely outcomes corresponds to a probabiliy measure \(\text{P}\) which satisfies \[ \text{P}(A) = \frac{|A|}{|\Omega|} = \frac{\text{number of outcomes in $A$}}{\text{number of outcomes in $\Omega$}} \qquad{\text{when outcomes are equally likely}} \]

Example 2.2 Roll a fair four-sided die twice, and record the result of each roll in sequence.

  1. How many possible outcomes are there? Are they equally likely?




  2. Compute \(\text{P}(A)\), where \(A\) is the event that the sum of the two dice is 4.



  3. Compute \(\text{P}(B)\), where \(B\) is the event that the sum of the two dice is at most 3.




  4. Compute \(\text{P}(C)\), where \(C\) the event that the larger of the two rolls (or the common roll if a tie) is 3.




  5. Compute and interpret \(\text{P}(A\cap C)\).




Table 2.1: Table representing the sample space of two rolls of a four-sided die. The outcomes in orange comprise the event \(A\), the sum is equal to 4.
First roll Second roll Sum is 4?
1 1 no
1 2 no
1 3 yes
1 4 no
2 1 no
2 2 yes
2 3 no
2 4 no
3 1 yes
3 2 no
3 3 no
3 4 no
4 1 no
4 2 no
4 3 no
4 4 no
  • The continuous analog of equally likely outcomes is a uniform probability measure. When the sample space is uncountable, size is measured continuously (length, area, volume) rather that discretely (counting). \[ \text{P}(A) = \frac{|A|}{|\Omega|} = \frac{\text{size of } A}{\text{size of } \Omega} \qquad \text{if $\text{P}$ is a uniform probability measure} \]

Example 2.3 Regina and Cady are meeting for lunch. Suppose they each arrive uniformly at random at a time between noon and 1:00, independently of each other. Record their arrival times as minutes after noon, so noon corresponds to 0 and 1:00 to 60.

  1. Draw a picture representing the sample space.



  2. Compute the probability that the first person to arrive has to wait at most 15 minutes for the other person to arrive. In other words, compute the probability that they arrive within 15 minutes of each other.




  3. Compute the probability that the first person to arrive arrives before 12:15.




N_rep = 1000

# Simulate values uniformly between 0 and 60, independently
u1 = runif(N_rep, 0, 60)
u2 = runif(N_rep, 0, 60)

# waiting time
waiting_time = abs(u1 - u2)

# first time
first_arrival_time = pmin(u1, u2)

# put the variables together in a data frame
meeting_sim = data.frame(u1, u2, waiting_time, first_arrival_time)

# first few rows (with kable formatting)
head(meeting_sim) |>
  kbl(digits = 3) |>
  kable_styling()
u1 u2 waiting_time first_arrival_time
59.440 8.202 51.239 8.202
13.179 37.741 24.562 13.179
11.262 14.005 2.743 11.262
2.024 34.330 32.307 2.024
55.781 36.682 19.098 36.682
48.931 57.138 8.207 48.931
# Approximate probability that waiting time is less than 15
sum(waiting_time < 15) / N_rep
[1] 0.435
# Approximate probability that first arrival time is less than 15
sum(first_arrival_time < 15) / N_rep
[1] 0.438
# "Base R" plots
plot(u1, u2,
     col = ifelse(waiting_time < 15, "orange", "black"),
     xlab = "Regina's arrival time",
     ylab = "Cady's arrival time")
abline(a = 15, b = 1)
abline(a = -15, b = 1)
plot(u1, u2,
     col = ifelse(first_arrival_time < 15, "orange", "black"),
     xlab = "Regina's arrival time",
     ylab = "Cady's arrival time")

Simulated event that waiting time is less than 15

Simulated event that first arrival time is less than 15
library(ggplot2)

# ggplots
ggplot(meeting_sim,
       aes(x = u1, y = u2, col = (waiting_time < 15))) +
  geom_point() +
  geom_abline(slope = c(1, 1), intercept = c(15, -15)) +
  labs(x = "Regina's arrival time",
       y = "Cady's arrival time")
ggplot(meeting_sim,
       aes(x = u1, y = u2, col = (first_arrival_time < 15))) +
  geom_point() +
  labs(x = "Regina's arrival time",
       y = "Cady's arrival time")

Simulated event that waiting time is less than 15

Simulated event that first arrival time is less than 15