Chapter 2 Multivariate Normal Distributions

To date, we have studied a number of standard probability distributions: uniform, Bernoulli, binomial, geometric, negative binomial, Poisson, exponential, gamma, normal. These are all random variables that govern exactly one quantity. This begs the question, are there any standard probability distributions in higher dimensions?

2.1 Definition of Multivariate Normal Distribution

The normal distribution generalises to higher dimensions. Let \(n \in \mathbb{Z}_{\geq 1}\) represent the dimension we are interested in.

Informally a multivariate normal distribution can be defined as follows: Consider \(n\) standard normal distributions \(Z_1, Z_2, \ldots, Z_n\), that is, \(Z_i \sim N(0,1)\) for all \(i\) with \(1 \leq i \leq n\). From this, Form a vector of random variables \(\mathbf{Z} = \begin{pmatrix} Z_1 & Z_2 & \cdots & Z_n \end{pmatrix}^{T}\). Then for any \(n \times n\) non-singular matrix \(\mathbf{A}\) and any vector \(\mathbf{\mu} = \begin{pmatrix} \mu_1 & \mu_2 & \cdots & \mu_n \end{pmatrix}^T\), define \(\mathbf{X} = \mathbf{AZ}+\mathbf{\mu}\). Then the vector of random variables \(\mathbf{X} = \begin{pmatrix} X_1 & X_2 & \cdots & X_n \end{pmatrix}^T\) is distributed by a multivariate normal distribution.

This is saying that every component \(X_i\) is a linear combination of some fixed group of standard normal distributions, and a translation.

Formally we define the multivariate normal distribution by the joint probability density function of \(X_1, X_2, \ldots, X_n\).

Let \(\mathbf{\mu} = \begin{pmatrix} \mu_1 \\ \mu_2 \\ \vdots \\ \mu_n \end{pmatrix}\) and \(\mathbf{\Sigma} = (\sigma_{ij})\) be an \(n \times n\) real, symmetric, positive definite matrix all of whose eigenvalues are positive.

A vector of random variables \(\mathbf{X} = \begin{pmatrix} X_1 \\ X_2 \\ \vdots \\ X_n \end{pmatrix}\) is said to have an \(\mathit{n}\)-dimensional normal distribution with parameters \(\mathbf{\mu}\) and \(\mathbf{\Sigma}\), denoted \(N_n(\mathbf{\mu},\mathbf{\Sigma})\), if the joint PDF of \(\mathbf{X}\) is given by \[ f_{\mathbf{X}}(\mathbf{x}) = \frac{1}{\sqrt{(2\pi)^n\cdot \det(\mathbf{\Sigma})}} \exp \left\{ -\frac{1}{2} (\mathbf{x}-\mathbf{\mu})^T \mathbf{\Sigma}^{-1} (\mathbf{x}-\mathbf{\mu}) \right\}.\]

The PDF outlined in the formal Definition 2.1.1 can be derived from the informal construction of \(\mathbf{X}\) at the beginning of the section by taking \(\mathbf{\Sigma} = \mathbf{AA}^T\).

Note for \(n=1\), Definition 2.1.1 simplifies to give the PDF of the one-dimensional normal distribution.

The multivariate normal distribution has the following important properties:

  • The expectation of \(X_i\) is given by the \(i^{th}\) entry of \(\mathbf{\mu}\), that is, \(E[X_i]=\mu_i\) for all \(i\). Succinctly, the vector \(\mathbf{\mu}\) is the expectation vector of \(X\);

  • The covariance \(\text{Cov}(X_i,X_j)\), reducing to \(\text{Var}(X_i)\) when \(i=j\), is given by the element in the \(i^{th}\) row and \(j^{th}\) column of \(\mathbf{\Sigma}\), that is, \(\sigma_{ij} = \text{Cov}(X_i,X_j)\) for all \(i\) and \(j\). Succinctly, the matrix \(\Sigma\) is the variance-covariance matrix of \(\mathbf{X}\). It follows that if \(X_1,X_2,\dots,X_n\) are independent then \(\sigma_{ij}=0\) for all \(i \neq j\);

These two points indicate why \(\mathbf{\mu}\) and \(\mathbf{\Sigma}\) are the two input parameters for the multivariate normal distribution.

Consider \(\mathbf{x} \sim \mathbf{X} = N_p \left( \mathbf{\mu}_{\mathbf{x}}, \mathbf{\Sigma}_{\mathbf{x}, \mathbf{x}} \right)\) and \(\mathbf{y} \sim \mathbf{Y} = N_q \left( \mathbf{\mu}_{\mathbf{y}}, \mathbf{\Sigma}_{\mathbf{y}, \mathbf{y}} \right)\). Then \(\mathbf{x}\) and \(\mathbf{y}\) are independent if and only if \(\text{Cov}(\mathbf{x}, \mathbf{y}) = \mathbf{0}_{p,q}\).

Clearly if \(\mathbf{x}\) and \(\mathbf{y}\) are independent then \(\text{Cov}(\mathbf{x}, \mathbf{y}) = \mathbf{0}_{p,q}\) by well-established results in statistics. It remains to show the other direction of the implication.

Assume \(\text{Cov}(\mathbf{x}, \mathbf{y}) = \mathbf{0}_{p,q}\). Then \[\mathbf{z} = \begin{pmatrix} \mathbf{x} \\ \mathbf{y} \end{pmatrix} \sim N_{p+q} (\mathbf{\mu}, \mathbf{\Sigma}), \qquad \text{where } \mathbf{\mu} = \begin{pmatrix} \mathbf{\mu}_{\mathbf{x}} \\ \mathbf{\mu}_{\mathbf{x}} \end{pmatrix} \text{ and } \mathbf{\Sigma} = \begin{pmatrix} \mathbf{\Sigma}_{\mathbf{x},\mathbf{x}} & \mathbf{0}_{p,q} \\ \mathbf{0}_{p,q} & \mathbf{\Sigma}_{\mathbf{y},\mathbf{y}} \end{pmatrix}.\] Now calculate that \[\begin{align*} f_{\mathbf{Y} \mid \mathbf{X}}(\mathbf{y} \mid \mathbf{x}) &= \frac{f_{\mathbf{X}, \mathbf{Y}}(\mathbf{x}, \mathbf{y})} {f_{\mathbf{X}} (\mathbf{x})} \\[9pt] &= \frac{\frac{1}{\sqrt{(2\pi)^{p+q}} \det(\mathbf{\Sigma})} \cdot \exp \left\{ -\frac{1}{2} (\mathbf{z}-\mathbf{\mu})^T \mathbf{\Sigma} (\mathbf{z}-\mathbf{\mu}) \right\}}{\frac{1}{\sqrt{(2\pi)^p} \det(\mathbf{\Sigma_{x,x}})} \cdot \exp \left\{ -\frac{1}{2} (\mathbf{x}-\mathbf{\mu_x})^T \mathbf{\Sigma_{x,x}} (\mathbf{x}-\mathbf{\mu_x}) \right\}} \\[12pt] &= \frac{\frac{1}{\sqrt{(2\pi)^{p+q}} \det(\mathbf{\Sigma_{x,x}}) \det(\mathbf{\Sigma_{y,y}})} \cdot \exp \left\{ -\frac{1}{2} (\mathbf{x}-\mathbf{\mu_x})^T \mathbf{\Sigma_{x,x}} (\mathbf{x}-\mathbf{\mu_x}) -\frac{1}{2} (\mathbf{y}-\mathbf{\mu_y})^T \mathbf{\Sigma_{y,y}} (\mathbf{y}-\mathbf{\mu_y}) \right\}}{\frac{1}{\sqrt{(2\pi)^p} \det(\mathbf{\Sigma_{\mathbf{x},\mathbf{x}}})} \cdot \exp \left\{ -\frac{1}{2} (\mathbf{x}-\mathbf{\mu}_{\mathbf{x}})^T \mathbf{\Sigma}_{\mathbf{x},\mathbf{x}} (\mathbf{x}-\mathbf{\mu}_{\mathbf{x}}) \right\}} \\[12pt] &= \frac{1}{\sqrt{(2\pi)^{q}} \det(\mathbf{\Sigma_{y,y}})} \cdot \exp \left\{ -\frac{1}{2} (\mathbf{y}-\mathbf{\mu_y})^T \mathbf{\Sigma_{y,y}} (\mathbf{y}-\mathbf{\mu_y}) \right\} \\[9pt] &= f_{\mathbf{Y}} (\mathbf{y}). \end{align*}\] Therefore \(\mathbf{x}\) and \(\mathbf{y}\) are independent.

In practice, one often has observations that are assumed to come from a multivariate distribution whose parameters are not specified. The mean and the the sample variance matrix can be approximated using the usual sample statistics.

Consider \(\mathbf{x}_1, \ldots, \mathbf{x}_n \sim N_p (\mathbf{\mu}, \mathbf{\Sigma})\). Show that \(\bar{\mathbf{x}} \sim N_p \left( \mathbf{\mu}, \frac{1}{n} \mathbf{\Sigma} \right)\).

If \(\mathbf{x}_1, \ldots, \mathbf{x}_n\) is an independent and identically distributed random sample from \(N_p (\mathbf{\mu}, \mathbf{\Sigma})\), then the sample mean \(\bar{\mathbf{x}} = \frac{1}{n} \sum\limits_{i=1}^n \mathbf{x}_i\) and sample variance matrix \(\mathbf{S} = \frac{1}{n} \sum\limits_{i=1}^n \left( \mathbf{x}_i - \bar{\mathbf{x}} \right) \left( \mathbf{x}_i - \bar{\mathbf{x}} \right)^T\) are independent.

Since \(\mathbf{x}_1, \ldots, \mathbf{x}_n \sim N_p (\mathbf{\mu}, \mathbf{\Sigma})\) from Exercise 2.1, we know \(\bar{\mathbf{x}} \sim N_p \left( \mathbf{\mu}, \frac{1}{n} \mathbf{\Sigma} \right)\). Set \(\mathbf{y}_i = \mathbf{x}_i - \bar{\mathbf{x}}\). Then \[\begin{align*} \text{Cov} \left( \bar{\mathbf{x}}, \mathbf{y}_i \right) &= \text{Cov} \left( \bar{\mathbf{x}}, \mathbf{x}_i - \bar{\mathbf{x}} \right) \\[3pt] &= \text{Cov} \left( \bar{\mathbf{x}}, \mathbf{x}_i \right) - \text{Cov} \left( \bar{\mathbf{x}}, \bar{\mathbf{x}} \right) \\[3pt] &= \text{Cov} \left( \frac{1}{n} \sum\limits_{j=1}^n \mathbf{x}_j, \mathbf{x}_i \right) - \text{Cov} \left( \bar{\mathbf{x}}, \bar{\mathbf{x}} \right) \\[3pt] &= \frac{1}{n} \sum\limits_{j=1}^n \mathbb{E} \left[ (\mathbf{x}_j - \mathbf{\mu})(\mathbf{x}_i - \mathbf{\mu})^T \right] - \mathbb{E} \left[ \left(\bar{\mathbf{x}} - \mathbf{\mu} \right) \left( \bar{\mathbf{x}} - \mathbf{\mu} \right)^T \right] \\[3pt] &= \frac{1}{n} \mathbf{\Sigma} - \frac{1}{n} \mathbf{\Sigma} \\[3pt] &= \mathbf{0}_{p \times p}. \end{align*}\]

Therefore \(\bar{\mathbf{x}}\) and \(\mathbf{y}_i\) are independent. Since \[\mathbf{S} = \frac{1}{n} \sum\limits_{i=1}^n \left( \mathbf{x}_i - \bar{\mathbf{x}} \right) \left( \mathbf{x}_i - \bar{\mathbf{x}} \right)^T = \frac{1}{n} \sum\limits_{i=1}^n \mathbf{y}_i \mathbf{y}_i^T,\] it follows that \(\bar{\mathbf{x}}\) and \(\mathbf{S}\) are independent.

2.2 Two-Dimensional Normal Distribution

In this section we study the \(2\)-dimensional normal distribution, often referred to as the bivariate normal distribution.

When \(n=2\), or another suitably small value, we can explicitly label the elements of \(\mathbf{\mu}\) and \(\mathbf{\Sigma}\) as \(\begin{pmatrix} \mu_1 \\ \mu_2 \end{pmatrix}\) and \(\begin{pmatrix} \sigma_1^2 & \sigma_1 \sigma_2 \rho \\ \sigma_1 \sigma_2 \rho & \sigma_2^2 \end{pmatrix}\) respectively. Here \(\sigma_1\) is the standard deviation of \(X_1\), \(\sigma_2\) is the standard deviation of \(X_2\) and \(\rho\) is the correlation coefficient \(\rho(X_1,X_2)\).

Why are the upper right and lower left entries of \(\mathbf{\Sigma}\) equal to \(\sigma_1 \sigma_2 \rho\)?

Show that \(\Sigma\) is non-singular if and only if \(\lvert \rho \rvert <1\) and \(\sigma_1\sigma_2 \neq 0\).

Substitution of these values into Definition 2.1.1 gives \[\begin{align*} f_{X_1,X_2}(x_1,x_2) = \frac{1}{2 \pi \sigma_1 \sigma_2 \sqrt{1-\rho^2}} exp \left\{ - \frac{1}{2(1-\rho^2)} \left[ \left( \frac{x_1-\mu_1}{\sigma_1} \right)^2 \\[5pt] \hspace{150pt}-2\rho \left( \frac{x_1-\mu_1}{\sigma_1} \right)\left( \frac{x_2-\mu_2}{\sigma_2} \right) + \left( \frac{x_2-\mu_2}{\sigma_2} \right)^2 \right] \right\} \end{align*}\]

The following code creates a contour plot of the PDF above with \(\mathbf{\mu} = \begin{pmatrix} 0 \\ 0 \end{pmatrix}\) and \(\mathbf{\Sigma} = \begin{pmatrix} 2 & -1 \\ -1 & 2 \end{pmatrix}\).

library(mnormt)

#create bivariate normal distribution
x     <- seq(-3, 3, 0.1) 
y     <- seq(-3, 3, 0.1)
mu    <- c(0, 0)
sigma <- matrix(c(2, -1, -1, 2), nrow=2)
f     <- function(x, y) dmnorm(cbind(x, y), mu, sigma)
z     <- outer(x, y, f)

#create contour plot
contour(x, y, z)

A sample of size \(200\) can be observed from a bivariate normal distribution using the following R code.

library(plotly)
library(mixtools)

# Set up means, standard deviations, correlations
mu1 <- 1;  mu2 <- 2
sig1 <- 1;  sig2 <- 1;  rho <- 0;

# Construct mean vector and (co)variance matrix
mu <- c(mu1,mu2)
Sigma <- matrix(c(sig1^2,rho*sig1*sig2,rho*sig1*sig2,sig2^2), ncol=2)

#plot all realisations
X=rmvnorm(n=200, mu, Sigma)
plot(X)

In the above code, by setting mu as a vector of length \(n\), and \(Sigma\) as a \(n \times n\) matrix before inputting them into rmvnorm allows one to simulate an \(n\)-dimensional normal distribution.

2.3 Properties of Multivariate Normal Distribution

The following theorem states that the transformation of a multivariate normal distribution is itself a multivariate normal distribution.

Let \(\mathbf{X} \sim N_n(\mathbf{\mu},\mathbf{\Sigma})\) and consider some \(p \times n\) matrix \(\mathbf{D}\). Define a new random vector \(\mathbf{Z} = \mathbf{D}\mathbf{X}\). Then \(\mathbf{Z} \sim N_p (\mathbf{D} \mathbf{\mu}, \mathbf{D} \mathbf{\Sigma} \mathbf{D}^T)\);

As a corollary of this theorem, we can easily calculate the marginal distributions of a multivariate normal distribution.

The marginal distribution of each component \(X_i\) is a one-dimensional normal distribution.

This is a direct consequence of Theorem 2.3.1 by setting \(\mathbf{D}=(0,\dots,0,1,0,\dots,0)\) where the \(1\) is in the \(i^{th}\) position.

Suppose \(\mathbf{X}=(X_1,X_2,X_3)^T \sim N_3(\mathbf{0},\mathbf{\Sigma})\), where \[\mathbf{\Sigma} = \begin{pmatrix} 2 & 1 & 0 \\ 1 & 4 & 0 \\ 0 & 0 & 5 \end{pmatrix}.\] Find the distribution of \(Z=X_1+X_2\).



Writing \(Z = X_1+X_2\), in the form \(\mathbf{DX}\) requires \(\mathbf{D} = \begin{pmatrix} 1 & 1 & 0 \end{pmatrix}\). By the properties of a multivariate normal distribution \[Z \sim N(\mathbf{D0}, \mathbf{D \Sigma D}^T).\] Calculate \(\mathbf{D0}=\mathbf{0}\) and \[\begin{align*} \mathbf{D} \mathbf{\Sigma} \mathbf{D}^T &= \begin{pmatrix} 1 & 1 & 0 \end{pmatrix} \begin{pmatrix} 2 & 1 & 0 \\ 1 & 4 & 0 \\ 0 & 0 & 5 \end{pmatrix} \begin{pmatrix} 1 \\ 1 \\ 0 \end{pmatrix} \\[5pt] &= \begin{pmatrix} 3 & 5 & 0 \end{pmatrix} \begin{pmatrix} 1 \\ 1 \\ 0 \end{pmatrix} \\[5pt] &= 8. \end{align*}\] Therefore, \(Z \sim N(0,8)\).
Suppose \(\mathbf{X}=(X_1,X_2,X_3)^T \sim N_3(\mathbf{0},\mathbf{\Sigma})\), where \[\mathbf{\Sigma} = \begin{pmatrix} 2 & 1 & 0 \\ 1 & 4 & 0 \\ 0 & 0 & 5 \end{pmatrix}.\] Determine the constant \(c\) such that \(Z_1 = 2X_1 + cX_2\) and \(Z_2 = 2X_1 + cX_3\) are independent.



Writing \(\mathbf{Z} = \begin{pmatrix} Z_1 \\ Z_2 \end{pmatrix} = \begin{pmatrix} 2X_1 + cX_2 \\ 2X_1 + cX_3 \end{pmatrix}\) in the form \(\mathbf{DX}\) requires

\[\mathbf{D} = \begin{pmatrix} 2 & c & 0 \\ 2 & 0 & c \end{pmatrix}.\]

By the properties of a multivariate normal distribution,

\[\mathbf{Z} \sim N_2(\mathbf{D0}, \mathbf{D \Sigma D}^T).\]

Calculate \(\mathbf{D0} = \mathbf{0}\) and

\[\begin{align*} \mathbf{D \Sigma D}^T &= \begin{pmatrix} 2 & c & 0 \\ 2 & 0 & c \end{pmatrix} \begin{pmatrix} 2 & 1 & 0 \\ 1 & 4 & 0 \\ 0 & 0 & 5 \end{pmatrix} \begin{pmatrix} 2 & 2 \\ c & 0 \\ 0 & c \end{pmatrix} \\[5pt] &= \begin{pmatrix} 4+c & 2+4c & 0 \\ 4 & 2 & 5c \end{pmatrix} \begin{pmatrix} 2 & 2 \\ c & 0 \\ 0 & c \end{pmatrix} \\[5pt] &= \begin{pmatrix} 8+4c+4c^2 & 8+2c \\ 8+2c & 8+5c^2 \end{pmatrix}. \end{align*}\]

The independence of \(Z_1\) and \(Z_2\) is equivalent to \(\text{Cov}(Z_1,Z_2) = 8+2c = 0\) by Proposition 2.1.2. Therefore \(c=-4\).

2.4 Inference Testing

Let \(\mathbf{x}_1, \ldots, \mathbf{x}_n\) be a random sample from \(N_p \left( \mathbf{\mu}, \mathbf{\Sigma} \right)\) where \(\mathbf{\mu}\) is unknown, but the covariance matrix \(\mathbf{\Sigma}\) is known. Suppose one wished to do hypothesis testing for the mean of the population from the sample. In particular, how can one conduct the hypothesis test \[H_0 : \mathbf{\mu} = \mathbf{a} \qquad \text{versus} \qquad H_1: \mathbf{\mu} \neq \mathbf{a}\] where \(\mathbf{a}\) is some specified fixed vector, to infer on the population mean \(\mathbf{\mu}\).

If \(\mathbf{x} \sim N_p \left( \mathbf{\mu}, \mathbf{\Sigma} \right)\), and \(\mathbf{\Sigma}\) is positive definite. Then \[\left( \mathbf{x} - \mathbf{\mu} \right)^T \mathbf{\Sigma}^{-1} \left( \mathbf{x} - \mathbf{\mu} \right) \sim \chi_p^2.\]

Define \(\mathbf{y} = \mathbf{\Sigma}^{-\frac{1}{2}} (\mathbf{x}-\mathbf{\mu})\). Then by Theorem 2.3.1 \(\mathbf{y} \sim N_p \left( \mathbf{0}, \mathbf{I}_p \right)\), and so by Corollary 2.3.2 the components of \(\mathbf{y}\) have independent univariate distributions with mean \(\mathbf{0}\) and variance \(\mathbf{1}\).

Then \[(\mathbf{x}-\mathbf{\mu})^T \mathbf{\Sigma}^{-1} (\mathbf{x}-\mathbf{\mu}) = \left( \mathbf{\Sigma}^{-\frac{1}{2}} (\mathbf{x} - \mathbf{\mu}) \right)^T \left( \mathbf{\Sigma}^{-\frac{1}{2}} (\mathbf{x} - \mathbf{\mu}) \right) = \mathbf{y}^T \mathbf{y} = \sum\limits_{i=1}^p y_i^2.\] Since each \(y_i \sim N(0,1)\) and are pairwise independent, it follows by the definition of the Chi-squared distribution that \[(\mathbf{x}-\mathbf{\mu})^T \mathbf{\Sigma}^{-1} (\mathbf{x}-\mathbf{\mu}) \sim \chi_p^2.\]

By Exercise 2.1 \[\begin{align*} \bar{\mathbf{x}} \sim N_p \left( \mathbf{\mu}, \frac{1}{n} \mathbf{\Sigma} \right) \\[3pt] \implies \qquad n^{\frac{1}{2}} \bar{\mathbf{x}}\sim N_p \left( n^{\frac{1}{2}} \mathbf{\mu}, \mathbf{\Sigma} \right). \end{align*}\] Define the test statistic \[\zeta^2 = n \left( \bar{\mathbf{x}}-\mathbf{a} \right)^T \mathbf{\Sigma}^{-1} \left( \bar{\mathbf{x}} - \mathbf{a} \right).\] Assuming the null hypothesis \(H_0\) is true, the test statistic can be written \[\begin{align*} \zeta^2 &= n \left( \bar{\mathbf{x}}-\mathbf{\mu} \right)^T \mathbf{\Sigma}^{-1} \left( \bar{\mathbf{x}} - \mathbf{\mu} \right) \\[3pt] &= \left( n^{\frac{1}{2}} \bar{\mathbf{x}} - n^{\frac{1}{2}} \mathbf{\mu} \right)^T \mathbf{\Sigma}^{-1} \left( n^{\frac{1}{2}} \bar{\mathbf{x}} - n^{\frac{1}{2}} \mathbf{\mu} \right) \end{align*}\] and so by Proposition 2.4.1, the test statistic \(\zeta^2\) follows a \(\chi^2_p\) distribution. Therefore at significance level \(\alpha\), one should reject the null hypothesis \(H_0\) if \(\zeta^2 > \chi^2_{p,\alpha}\), where \(\chi^2_{p,\alpha}\) is the upper \(\alpha\) quantile of the \(\chi^2_p\) distribution. The \(p\)-value is given by \(p = \mathbb{P}\left( \chi^2_p > \zeta^2 \right)\). Furthermore the \(100(1-\alpha) \%\) confidence region for \(\mathbf{\mu}\) is \(\{ \mathbf{a} : \zeta^2 \leq \chi^2_{p,\alpha} \}\) which will be an ellipsoid.

Download the file weight-height_data.csv from the Moodle page. The data details the weights and heights of \(5000\) adult males in stones and inches respectively. Assume that the data follows a \(N_2 \left( \mathbf{\mu}, \mathbf{\Sigma} \right)\) distribution where \[\mathbf{\Sigma} = \begin{pmatrix} 8.2 & 48.9 \\ 48.9 & 391.3 \end{pmatrix}.\] A recent article in a men’s health magazine claims that men have an average height of \(68.97\) inches and an average weight of \(186.93\) stone. Use R to conduct a hypothesis to determine whether you are \(95 \%\) sure that this claim is accurate.



Begin by loading the data into R:

df <- read.csv("weight-height_data.csv")
print(head(df))
##     Height   Weight
## 1 73.84702 241.8936
## 2 68.78190 162.3105
## 3 74.11011 212.7409
## 4 71.73098 220.0425
## 5 69.88180 206.3498
## 6 67.25302 152.2122

The entirety of the data can be visualised using the plot() function. Note that the data follows an ellipse shape, typical of a multivariate normal distribution.

plot(df[,1],df[,2],
     main="Scatter Plot of Height and Weight of Males",
     xlab="Height",
     ylab="Weight",
     cex=0.3,
     pch=19
     )

The question is equivalent to conducting the hypothesis test \[H_0 : \mathbf{\mu} = \begin{pmatrix} 68.7 \\ 186.93 \end{pmatrix} \qquad \text{versus} \qquad H_1: \mathbf{\mu} \neq \begin{pmatrix} 68.7 \\ 186.93 \end{pmatrix}\] at the \(95 \%\) significance level. By the above theory, one should calculate the test statistic \(\zeta^2\). This can be completed by the following code. Note that the sample mean \(\bar{\mathbf{x}}\) is calculated using the mean() function that belongs to the matlib library. and then stored in a variable sample_mean. The covariance matrix \(\mathbf{\Sigma}\) given in the question is stored in a variable cov_mat, and the proposed mean is stored in a variable a.

library(matlib)
sample_mean = c(mean(df[,1]), mean(df[,2]))
a = c(68.97,186.93)
cov_mat = matrix(c(8.2, 48.9, 48.9, 391.3), nrow=2, ncol=2)
zeta2 = 5000 * t(sample_mean - a) %*% inv(cov_mat) %*%  (sample_mean - a)
zeta2
##          [,1]
## [1,] 4.956178

The test statistic should finally be compared to the critical value \(c=\chi^2_{2,0.95}\), that is the value such that \(\mathbb{P} \left( \chi^2_2 \leq c \right) = 0.95\). This can be computed using the built-in quantile function for the chi-squared distribution in R called qchisq().

zeta2 < qchisq(0.95, 2)
##      [,1]
## [1,] TRUE

The output indicates that the test statistic is within an acceptable range, and we do not reject the null hypothesis. Indeed the \(p\)-value of the observed value for the test statistic can be calculated using the distribution function of the chi-squared distribution. In R, this is given by pchisq().

1-pchisq(zeta2, 2)
##            [,1]
## [1,] 0.08390342

One might think conducting \(p\) different hypothesis tests, one z-test to consider the mean of each of the univariate normal distributions governing each variable, which would be equivalent to the above hypothesis test. However this is not the case. Since the correlations between the variables is not taken into consideration, some of the \(p\) separate univariate z-tests may indicate a rejection of the null hypotheses while the multivariate analogue described above.