8 Discrete distributions
This chapter presents some of the most useful parameterized discrete probability distributions.
8.1 Bernoulli distribution
Bernoulli distribution is a two-point distribution in which random variable may take only two values: \(0\) with probablity \(1-p\) and \(1\) with probability \(p\). Thus, the distribution has just one parameter \(p\). Determining the expected value and variance of the Bernoulli distribution can be left as an exercise (vide exercise 7.8).
8.2 Binomial distribution
The binomial distribution has two parameters: \(n\) and \(p\). Binomial random variable \(X\) can take whole number values from \(0\) to \(n\). The binomial distribution describes the probability of achieving \(x\) successes in \(n\) independent and identical trials. The parameter \(p\) represents the probability of success in a single trial, known as a Bernoulli trial. The probability of success remains the same in each trial. For brevity, it is common to use the notation \(q = 1 - p\) (\(q\) can be referred to as probability of failure).
Let us assume that \(X\) is a random variable following a binomial distribution with parameters \(n\) and \(p\). Then, the probability that the number of successes in \(n\) trials is \(x\) can be determined using the following formula
\[\begin{equation} \mathbb{P}(X=x) = \textbf{p}(x)={n\choose x}p^x (1-p)^{n-x} \text{, for } x \in \{ 0, 1, 2, ..., n \} \tag{8.1} \end{equation}\]
The Bernoulli distribution (8.1) is a binomial distribution with \(n=1\).
Example of binomial distribution: we roll eight dice together; the variable \(X\) is the number of sixes and follows binomial distribution with parameters \(n=8\) and \(p=1/6\).
A binomial random variable has the following properties:
An "experiment" consists of \(n\) trials (attempts).
There are only two possible outcomes for a single trial: "success" (S) or "failure" (F).
The probability of "success" \(p\) does not change from trial to trial. The probability of "failure", which we often denote by the letter \(q\) (\(q=1-p\)), of course does not change either.
Trials (attempts) are independent.
The binomial random variable \(X\) is the number of successes in \(n\) trials.
Expected value in a binomial distribution:
\[\begin{equation} \mu=np \tag{8.2} \end{equation}\]
Variance and standard deviation:
\[\begin{equation} \sigma^2= npq \tag{8.3} \end{equation}\]
\[\begin{equation} \sigma=\sqrt{npq} \tag{8.4} \end{equation}\]
8.3 Poisson distribution
Poisson distribution named after French 18th century physicist and mathematician Siméon Denis Poisson.
It is used to describe the number of rare events that will occur in a given period (or a given length, area, volume, etc.). For the number of rare events to be Poisson distributed, the events must occur independently of each other, with a constant expected frequency in each equal interval.
The Poisson distribution is called the distribution of rare events. To explain in what situations we use the Poisson distribution, it is best to use examples:
Number of computer system failures on a given day
Number of spare parts orders in a given month
Number of ships arriving at the quay within 12 hours
Number of delivery vehicles arriving at the warehouse per hour
Number of defects in a large roll of sheet metal 20 m long
Number of passengers arriving at the station within 10 minutes of afternoon on a weekday
The number of customers who come to the checkout at a local grocery store during a given time period.
If the random variable \(X\) follows Poisson distribution with expected value \(\lambda\), then the probability that \(X=x\) can be obtained from the following formula:
\[\begin{equation} \mathbb{P}(X=x) = \textbf{p}(x)=\frac{\lambda^x e^{-\lambda}}{x!} \text{, for } x \in \{ 0, 1, 2, ... \} \tag{8.5} \end{equation}\]
The expected value and variance in a Poisson distribution are both equal to \(\lambda\).
\[\begin{equation} \mu= \sigma^2 = \lambda \tag{8.6} \end{equation}\]
The standard deviation is, as always, the square root of the variance:
\[\begin{equation} \sigma=\sqrt{\lambda} \tag{8.7} \end{equation}\]
8.4 Templates
Spreadsheets
Discrete distributions calculator — Google spreadsheet
Discrete distributions calculator — Excel template
R code
# Binomial distribution
n <- 18
p <- 0.6
from <- 12
to <- 14
result <- pbinom(to, n, p)-pbinom(from-1, n, p)
if (from > to) {
# error from > to
print("!!! 'From' cannot be greater than 'to' !!!")
} else {
p=paste0("P(", from, " <= X <= ", to, ")")
print(p)
print(result)
}
## [1] "P(12 <= X <= 14)"
## [1] 0.3414956
# Poisson
lambda <- 5/3
from <- 2
to <- Inf
result <- ppois(to, lambda)-ppois(from-1, lambda)
if (from > to) {
# error from > to
print("!!! 'From' cannot be greater than 'to' !!!")
} else {
p=paste0("P(", from, " <= X <= ", to, ")")
print(p)
print(result)
}
## [1] "P(2 <= X <= Inf)"
## [1] 0.4963317
# Hypergeometric distribution
N <- 49
r <- 6
n <- 6
from <- 3
to <- 6
result <- phyper(to, r, N-r, n)-phyper(from-1, r, N-r, n)
if (from > to) {
# error from > to
print("!!! 'From' cannot be greater than 'to' !!!")
} else {
p=paste0("P(", from, " <= X <= ", to, ")")
print(p)
print(result)
}
## [1] "P(3 <= X <= 6)"
## [1] 0.01863755
Python code
from scipy.stats import binom, poisson, hypergeom
# Binomial distribution
n = 18
p = 0.6
_from = 12
_to = 14
result = binom.cdf(_to, n, p) - binom.cdf(_from-1, n, p)
if _from > _to:
print("!!! 'From' cannot be greater than 'to' !!!")
else:
p = "P(" + str(_from) + " <= X <= " + str(_to) + ")"
print(p)
print(result)
## P(12 <= X <= 14)
## 0.34149556326865305
# Poisson distribution
lambda_val = 5/3
from_val = 2
to_val = float('inf')
result = poisson.cdf(to_val, lambda_val) - poisson.cdf(from_val-1, lambda_val)
if from_val > to_val:
print("!!! 'From' cannot be greater than 'to' !!!")
else:
p = "P(" + str(from_val) + " <= X <= " + str(to_val) + ")"
print(p)
print(result)
## P(2 <= X <= inf)
## 0.49633172576650164
# Hypergeometric distribution
N = 49
r = 6
n = 6
_from = 3
_to = 6
result = hypergeom.cdf(_to, N, r, n) - hypergeom.cdf(_from-1, N, r, n)
if _from > _to:
print("!!! 'From' cannot be greater than 'to' !!!")
else:
p = "P(" + str(_from) + " <= X <= " + str(_to) + ")"
print(p)
print(result)
## P(3 <= X <= 6)
## 0.018637545002022304
8.5 Questions
Question 8.1 Among the eleven cars in the parking lot, four are defective. We randomly select five cars. Let F be the number of defective cars chosen. Is F a binomial random variable?
Question 8.2 A salesperson has observed that, in the long run, one out of three phone sales offers is successful (end up with a transaction). She plans to make ten sales calls. Let X be the number of successful transactions. Is X a binomial random variable?
8.6 Exercises
Exercise 8.1 A salesman goes from house to house in a residential area to demonstrate new home appliances to potential customers. The probability that a potential customer will place an order for the product after the demonstration is a constant 0.2. To perform the job satisfactorily, the salesman needs at least four orders. If the salesman gives 12 demonstrations, what is the probability that there will be exactly 4 orders after each demonstration? At least 4 orders?
Exercise 8.2 (Based on McClave and Sincich 2012) The Healthy Water Foundation found that 25% of bottled water sold in stores is actually tap water poured into bottles. Let's say 5 bottles from different stores and brands are drawn. Let X be the number of bottles containing tap water.
Explain why X can be approximated a binomial distributed variable.
Give the probability distribution in the form of a formula for this case.
Find \[\mathbb{P}(X = 2)\] and \[\mathbb{P}(X ≤ 1)\].
Exercise 8.3 Let's assume that the email server at Gdańsk University of Technology crashes an average of 0.91 times in a given semester. We assume that the number of crashes is Poisson distributed (crashes are independent, intensity remains constant).
What is the probability that there will be no crashes in a given semester?
What is the probability of crashing at least twice in a semester?
Exercise 8.4 Suppose we are interested in the occurrence of serious defects in a road one month after the laying of asphalt. We will assume that the probability of a serious defect is the same for any two sections of the road of equal length and that the occurrence (or non-occurrence) of a defect on any section is independent of the occurrence (or non-occurrence) of a defect on any other section. We learn that serious defects appear one month after asphalting at a rate of 2 defects/kilometer. What is the probability that there will be no defects on a three-kilometer section?
Exercise 8.5 Customers come to the store randomly, but with constant intensity throughout the working day (8:00-20:00). During the 12 hours of the store's operation, about 240 customers usually come. What is the probability that on the next working Tuesday between 10:00 and 10:05 no one comes?
Exercise 8.6 We toss a balanced coin 100 times.
What is the probability of getting 47 to 53 heads?
What is the probability of getting less than 40 heads or more than 60 heads?
Exercise 8.7 We randomly select a spot on a planet that is 71% covered in water ten times.
What is the probability that we will select water 7 out of 10 times?
What is the probability that we will select water 10 out of 10 times?
Exercise 8.8 A lady claimed during an afternoon meeting with friends that she could tell with 90% accuracy whether milk was poured into a cup before or after tea was poured. The participants decided to conduct an experiment: hidden from the lady, they would toss a coin and, depending on the result, pour the milk first or the tea first, and then give it to the lady, whose task was to recognize the method of preparation.
If the accuracy was indeed 90% in each trial, what was the probability that the lady would correctly guess the method of preparation at least 17 times in 20 trials?
What is the probability of at least 17 successes in 20 trials if the accuracy was completely random (50%)?
Exercise 8.9 During World War II, the German air force bombed London. British authorities collected reports on the locations of individual bomb hits. For this purpose, London was divided into 576 square areas of 0.25 km2. Let's assume that the bombing was random (individual hits were not in any way dependent on each other), was not concentrated on any one point (the probability was the same everywhere), and the intensity of the bombing throughout the period was 3.73 bombs per km2.
Given the above assumptions, what was the probability that more than 5 bombs would hit a given square (with an area of 1/4 km2)?
What was the probability that at least one of the 576 squares would be hit by more than 5 bombs?
Exercise 8.10 In a certain computer system, break-in attempts occur quite often: from Monday to Friday with an average frequency of 3 times per hour, on weekends (Saturday and Sunday) with an average intensity of 8 times per hour.
We randomly select one hour in a certain week. What is the probability that during this hour there will be no attempt to break into the system?
On a random day, an employee notices that for half an hour from the moment he showed up at work, there has been no break-in attempt. What is the probability that it is the weekend?
We assume that each hour and day has the same probability of being selected and that break-in attempts are independent of each other and that the occurrence of a break-in attempt does not change the probability of a break-in attempt at any other time or period, while the intensity is constant for a given 24-hour period.