1.3 Families of Distributions

Above, we introduced the basic concepts of random variables and probability distributions. Now, let’s explore some common distributions.

1.3.1 Discrete probability distributions

Bernoulli distribution

Definition 1.1 A Bernoulli trial is a random experiment with exactly two possible outcomes: “success” and “failure.” These outcomes are typically denoted as 1 for success and 0 for failure. The probability of success is often denoted by \(p\), and the probability of failure is then given by \(q = 1 - p\).

The Bernoulli distribution is a discrete probability distribution that describes the probability of getting exactly successes in a Bernoulli trials with the probability of success \(p\in(0,1)\).

Denote \(X \sim \text{Ber}(p)\) if a random variable \(X\) follows Bernoulli distribution. The probability mass function of the Bernoulli distribution is given by: \[ \begin{split} p_X(x) &= \begin{cases} p, & x = 1 \\ 1-p, & x = 0 \\ 0, & \mbox{otherwise} \end{cases} \\ &= p^x(1-p)^{1-x}, \quad x=0,1 \end{split} \] and the range \(R_X=\{0,1\}\).

Proposition 1.1 \(\mathbb{E}(X)=p, \quad \mathrm{Var}(X)=p(1-p)\)

Binomial distribution

The Binomial distribution describes the probability of getting exactly \(k\) successes in \(n\) Bernoulli trials with the probability of success \(p\in(0,1)\).

Denote \(X \sim \text{Bin}(n,p)\) if a random variable \(X\) follows Binomial distribution. The probability mass function is given by: \[ p_X(x) = {n \choose x} p^x(1-p)^{n-x}, \quad x=0,1,\cdots,n. \] and the range \(R_X=\{0,1,\cdots,n\}\).

Proposition 1.2

\(\mathbb{E}(X)=np, \quad \mathrm{Var}(X)=np(1-p)\)
\(\text{Ber}(p) \sim \text{Bin}(1,p)\)
(additive property) Let \(X \sim \text{Bin}(n_1,p), Y \sim Bin(n_2,p)\) and \(X \perp\!\!\!\perp Y\). Then \(X+Y \sim \text{Bin}(n_1+n_2,p)\)

Poisson distribution

Before we introduce the Poisson distribution, we need to know what is Poisson process.

Definition:

A Poisson process is a stochastic process that models a sequence of events occurring randomly over time or space. It is characterized by the following properties:

The probability that exactly 1 event occurs in a given interval of length \(h\) is equal to \(\lambda h + o(h)\).
The probability that 2 or more events occur in an interval of length \(h\) is equal to \(o(h)\).
For any integers \(n\), \(j_1, j_2, \cdots, j_n\) and any set of \(n\) non-overlapping intervals, if we define \(E_i\) to be the event that exactly \(j_i\) of the events under consideration occur in the \(i\)-th of these intervals, then events \(E_1,E_2,\cdots,E_n\) are independent.

Little \(o\) notation: \(o(h)\) stands for any function \(f(h)\) for which \(\displaystyle \lim_{h \to 0} \frac{f(h)}{h} = 0\).

\[ p_X(k) = \frac{\lambda^{k} e^{-\lambda}}{k!}, \quad k \in \mathbb{N}, \lambda \in (0,\infty) \]

Zero-inflated Poisson distribution, Zero-truncated Poisson distribution, Conway–Maxwell–Poisson distribution, Skellam distribution

Geometric distribution

\(p_X(k) = (1-p)^{k-1}p, \quad 0 < p \leq 1\)

Negative binomial distribution

\(f(k;r,p)\equiv p_X(k) = {\binom{k+r-1}{k}} (1-p)^k p^r\)

Hypergeometric distribution

\(p_{X}(k) = \frac{{\binom{K}{k}}{\binom{N-K}{n-k}}}{\binom{N}{n}}\)

Zeta distribution

\(p_{X}(k) = \frac{1}{\zeta(s)}k^{-s}\)

where \(\zeta(s)\) is the Riemann zeta function and \(s>1\).

1.3.2 Continuous probability distributions

Uniform distribution (discrete and continuous)

\[ f(k) = \begin{cases} \frac{1}{b-a+1}, & \text{for } a \leq k \leq b, \quad k \in \mathbb{N} \\ 0, & \text{otherwise} \end{cases} \] where \(a\) and \(b\) are integers.

\[ f(x) = \begin{cases} \frac{1}{b-a}, & \text{for } a \leq x \leq b \\ 0, & \text{otherwise} \end{cases} \] where \(a\) and \(b\) are real numbers.

(log) Normal distribution

\[ f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right) \]

where \(\mu\) is the mean and \(\sigma^2\) is the variance.

\[ f(x) = \frac{1}{x\sigma\sqrt{2\pi}} \exp\left(-\frac{(\ln x - \mu)^2}{2\sigma^2}\right) \]

(shifted/double) Exponential distribution

\[ f(x) = \begin{cases} \lambda e^{-\lambda x} & \text{for } x \geq 0 \\ 0 & \text{otherwise} \end{cases} \]

where \(\lambda\) is the rate parameter.

\[ f(x) = \begin{cases} \lambda e^{-\lambda (x-\mu)} & \text{for } x \geq \mu \\ 0 & \text{otherwise} \end{cases} \]

where \(\lambda\) is the rate parameter and \(\mu\) is the shift parameter.

Double Exponential Distribution (Laplace Distribution)

\[ f(x) = \frac{\lambda}{2} e^{(-\lambda |x-\mu|)} \]

where \(\lambda\) is the rate parameter and \(\mu\) is the location parameter.

Gamma distribution (Erlang distribution)

\[ f(x) = \frac{\lambda^k}{\Gamma(k)} x^{k-1} e^{-\lambda x} \] where \(k\) is the shape parameter and \(\lambda\) is the rate parameter.

Erlang Distribution

\[ f(x) = \frac{\lambda^k}{(k-1)!} x^{k-1} e^{-\lambda x} \]

where \(k\) is an integer (number of events) and \(\lambda\) is the rate parameter.

Beta distribution

\[ f(x) = \frac{1}{B(\alpha, \beta)} x^{\alpha-1} (1-x)^{\beta-1} \]

where \(\alpha\) and \(\beta\) are shape parameters and \(B(\alpha,\beta)\) is the beta function.

Weibull distribution (Rayleigh distribution)

\[ f(x) = \frac{k}{\lambda} \left(\frac{x}{\lambda}\right)^{k-1} \exp\left(-\left(\frac{x}{\lambda}\right)^k\right) \]

where \(k\) is the shape parameter and \(\lambda\) is the scale parameter.

Rayleigh Distribution

\[ f(x) = \frac{x}{\sigma^2} \exp\left(-\frac{x^2}{2\sigma^2}\right) \]

where \(\sigma\) is the scale parameter.

Pareto distribution

\[ f(x) = \begin{cases} \frac{\alpha x_m^\alpha}{x^{\alpha+1}} & \text{, for } x \geq x_m \\ 0 & \text{, otherwise} \end{cases} \]

where \(\alpha\) is the shape parameter and \(x_m\) is the scale parameter.

Logistic distribution

\[ f(x) = \frac{e^{-(x-\mu)/s}}{s\left(1+e^{-(x-\mu)/s}\right)^2} \]

where \(\mu\) is the location parameter and \(s\) is the scale parameter.

Cauchy distribution

\[ f(x) = \frac{1}{\pi \gamma} \left[ \frac{1}{1 + \left( \frac{x - x_0}{\gamma} \right)^2} \right] = \frac{1}{\pi} \left[ \frac{\gamma}{(x - x_0)^2 + \gamma^2} \right] \] where \(x_0\) is the location parameter (also the median and mode), \(\gamma\) is the scale parameter.

1.3.3 Distributions derive from Normal

Students’ \(t\)-distribution

\[ f(t) = \frac{\Gamma\left(\frac{\nu+1}{2}\right)}{\sqrt{\nu\pi} \, \Gamma\left(\frac{\nu}{2}\right)} \left(1 + \frac{t^2}{\nu}\right)^{-\frac{\nu+1}{2}} \]

where \(t\) is the test statistic, \(\nu\) is the degrees of freedom (df.).

(Doubly) non-central \(t\) distribution ? The PDF of the non-central t-distribution is more complex and generally expressed in terms of hypergeometric functions.

\(\chi^2\) distribution

\[ f(x) = \frac{1}{2^{\nu/2} \Gamma(\nu/2)} x^{\nu/2 - 1} e^{-x/2} \]

where \(x\) is the chi-squared variable, \(\nu\) is the degrees of freedom.

non-central \(\chi^2\)

\[ f(x) = \frac{1}{2} \left( \frac{x}{\lambda} \right)^{\frac{\nu}{4} - \frac{1}{2}} e^{-\frac{1}{2}(x+\lambda)} I_{\frac{\nu}{2}-1}(\sqrt{\lambda x}) \]

where \(x\) is the non-central chi-squared variable, \(\nu\) is the degrees of freedom, \(\lambda\) is the non-centrality parameter, \(I_{\nu}(x)\) is the first kind modified Bessel function.

\(F\) distribution

\[ f(x) = \frac{\Gamma\left(\frac{\nu_1 + \nu_2}{2}\right)}{\Gamma\left(\frac{\nu_1}{2}\right) \Gamma\left(\frac{\nu_2}{2}\right)} \left(\frac{\nu_1}{\nu_2}\right)^{\frac{\nu_1}{2}} x^{\frac{\nu_1}{2} - 1} \left(1 + \frac{\nu_1}{\nu_2} x\right)^{-\frac{\nu_1 + \nu_2}{2}} \]

where \(x\) is the F statistic, \(\nu_1\) is the degrees of freedom for the numerator, \(\nu_2\) is the degrees of freedom for the denominator.

(Doubly) non-central \(F\) distribution ?

Non-central F-distribution

The PDF of the non-central F-distribution is also quite complex and involves hypergeometric functions.

Doubly Non-central F-distribution

The doubly non-central F-distribution has two non-centrality parameters and its PDF is even more intricate.