11  Matrix in Forecasting

In forecasting, matrices are powerful tools used for organizing and analyzing data. They allow the representation of multiple relationships and variables compactly, making it easier to perform computations and apply statistical or machine learning techniques. Here’s an overview of how matrices are commonly applied in forecasting:

11.1 Linear Regression

Linear regression aims to model the relationship between input features and a target variable. In this explanation, we will explore how to express and solve linear regression problems using matrices.

11.1.1 General Form

In linear regression, the relationship between the input features \(X\) and the target variable \(y\) is assumed to be linear. The linear regression equation is:

\[ y = X\beta + \epsilon \]

Where:

  • \(y\) is an \(n \times 1\) vector of observed target values (response variable).
  • \(X\) is an \(n \times p\) design matrix (features matrix), where each row represents an observation and each column represents a feature.
  • \(\beta\) is a \(p \times 1\) vector of coefficients (parameters).
  • \(\epsilon\) is a vector of errors (residuals).

11.1.2 Matrix Representation

The matrix \(X\) contains the input features. The first column of \(X\) is filled with 1’s to represent the intercept \(\beta_0\). For example, for a dataset with three data points and two features:

\[ X = \begin{bmatrix} 1 & x_{11} & x_{12} \\ 1 & x_{21} & x_{22} \\ 1 & x_{31} & x_{32} \end{bmatrix} \]

Where: \(x_{11}, x_{12}, ...\) are the input values.

11.1.3 Vector of Coefficients \(\beta\)

The vector \(\beta\) represents the coefficients (weights) of the model, including the intercept:

\[ \beta = \begin{bmatrix} \beta_0 \\ \beta_1 \\ \beta_2 \end{bmatrix} \]

Where \(\beta_0\) is the intercept, and \(\beta_1, \beta_2\) are the coefficients for the input features.

11.1.4 Target Vector \(y\)

The target vector \(y\) contains the observed values of the dependent variable.

\[ y = \begin{bmatrix} y_1 \\ y_2 \\ y_3 \end{bmatrix} \]

11.1.5 Objective: Minimizing the Cost Function

To find the optimal coefficients \(\beta\), we minimize the error between the predicted values \(\hat{y}\) and the actual values \(y\). The error is measured using the sum of squared residuals (errors) called the cost function \(J(\beta)\):

\[ J(\beta) = \frac{1}{2} \sum_{i=1}^n (y_i - \hat{y}_i)^2 = \frac{1}{2} (y - X\beta)^T (y - X\beta) \]

Where:

  • \((y - X\beta)\) is the residual vector.
  • The factor \(1/2\) is to simplify the differentiation step.

11.1.6 Minimizing the Cost Function

To minimize the cost function, we take the derivative with respect to \(\beta\), set it to zero, and solve for \(\beta\):

\[ \frac{\partial J(\beta)}{\partial \beta} = - X^T (y - X\beta) \]

Set the derivative equal to zero:

\[ X^T (y - X\beta) = 0 \]

Simplifying:

\[ X^T y = X^T X \beta \]

Now, solve for \(\beta\) by multiplying both sides by \((X^T X)^{-1}\) (assuming \(X^T X\) is invertible):

\[ \beta = (X^T X)^{-1} X^T y \]

This is the closed-form solution for linear regression, also known as the normal equation.

11.1.7 Making Predictions

Once \(\beta\) is computed, predictions can be made for the target variable \(y\) using:

\[ \hat{y} = X\beta \]

11.1.8 Assumptions of Linear Regression

For the linear regression model to be meaningful, certain assumptions are typically made:

  1. Linearity: The relationship between the input features and the target variable is linear.
  2. Independence: The residuals (errors) are independent.
  3. Homoscedasticity: The variance of residuals is constant across all observations.
  4. Normality of Errors: The residuals follow a normal distribution (important for hypothesis testing and confidence intervals).

11.2 6. Example in R

Here’s an example in R of computing \(\beta\) using the closed-form solution and making predictions:

import numpy as np

# Sample Data (X and y)
X = np.array([[1, 1, 4],   # Design matrix (including intercept column of 1's)
              [1, 2, 5],
              [1, 3, 6]])

y = np.array([5, 7, 9])  # Actual target values

# Compute the coefficients using the Normal Equation with pseudo-inverse
X_transpose = X.T  # Transpose of X
X_transpose_X = X_transpose.dot(X)  # X^T X
X_transpose_y = X_transpose.dot(y)  # X^T y

# Use the pseudo-inverse in case X^T X is singular
beta = np.linalg.pinv(X_transpose_X).dot(X_transpose_y)

# Display the coefficients (beta values)
print("Coefficients (beta):", beta)

# Make predictions
y_hat = X.dot(beta)  # Predicted target values
print("Predicted values (y_hat):", y_hat)
Coefficients (beta): [3.55271368e-15 1.00000000e+00 1.00000000e+00]
Predicted values (y_hat): [5. 7. 9.]

11.3 Markov Chains

A Markov Chain is a mathematical model that describes a system undergoing transitions from one state to another, where the probability of moving to the next state depends only on the current state (not past states). This property is called the Markov property.

In the context of Linear Algebra, Markov Chains can be analyzed using matrices, particularly the transition matrix, to understand how the system evolves over time.

11.3.1 State Vectors

In a Markov Chain, the system’s state at any given time is represented by a state vector. This vector consists of probabilities of being in each possible state.

For example, if a system has two states, Hujan (H) and Cerah (C), the state vector \(\mathbf{x}\) could be:

\[ \mathbf{x} = \begin{pmatrix} p(H) \\ p(C) \end{pmatrix} \]

Where \(p(H)\) is the probability of the system being in state H, and \(p(C)\) is the probability of the system being in state C.

11.3.2 Transition Matrix

The transition matrix \(P\) describes the probabilities of transitioning between states in the system. It is a square matrix where the element \(P_{ij}\) represents the probability of transitioning from state \(i\) to state \(j\).

For a two-state system with Hujan and Cerah, the transition matrix might look like:

\[ P = \begin{pmatrix} 0.7 & 0.3 \\ 0.4 & 0.6 \end{pmatrix} \]

In this example:

  • The probability of staying in state H (Hujan to Hujan) is 0.7.
  • The probability of transitioning from Hujan to Cerah is 0.3.
  • The probability of transitioning from Cerah to Hujan is 0.4.
  • The probability of staying in state C (Cerah to Cerah) is 0.6.

11.3.3 Matrix Multiplication

To compute the state of the system at the next time step, you multiply the current state vector by the transition matrix.

If the current state vector is \(\mathbf{x}_t\), the state vector at the next time step, \(\mathbf{x}_{t+1}\), is given by:

\[ \mathbf{x}_{t+1} = P \cdot \mathbf{x}_t \]

For example, if \(\mathbf{x}_t = \begin{pmatrix} 0.5 \\ 0.5 \end{pmatrix}\), then:

\[ \mathbf{x}_{t+1} = \begin{pmatrix} 0.7 & 0.3 \\ 0.4 & 0.6 \end{pmatrix} \cdot \begin{pmatrix} 0.5 \\ 0.5 \end{pmatrix} \]

This results in:

\[ \mathbf{x}_{t+1} = \begin{pmatrix} 0.5 \\ 0.5 \end{pmatrix} \]

11.3.4 Steady State

A crucial concept in Markov Chains is the steady state or stationary distribution, where the system reaches a point where the state probabilities no longer change over time.

Mathematically, the steady state vector \(\mathbf{x}\) satisfies the equation:

\[ \mathbf{x} = P \cdot \mathbf{x} \]

To find the steady state, you need to solve for the eigenvector corresponding to eigenvalue \(\lambda = 1\) of the transition matrix. The steady-state vector is the distribution where the system remains unchanged after one application of the transition matrix.

11.3.5 Eigenvectors and Eigenvalues

The steady state of a Markov Chain can be determined by finding the eigenvector corresponding to the eigenvalue 1 of the transition matrix \(P\), since at steady state the state vector doesn’t change when multiplied by the transition matrix.

To summarize, in a Markov Chain:

  • The transition matrix \(P\) describes the system’s transition probabilities.
  • The state vector \(\mathbf{x}\) updates over time by multiplying it by the transition matrix.
  • The steady state vector \(\mathbf{x}\) is the eigenvector associated with eigenvalue 1, representing the system’s long-term probabilities of being in each state.

11.3.6 Example Problem: Finding Steady State

Let’s take the transition matrix:

\[ P = \begin{pmatrix} 0.7 & 0.3 \\ 0.4 & 0.6 \end{pmatrix} \]

To find the steady-state vector, solve:

\[ \mathbf{x} = P \cdot \mathbf{x} \]

which translates to solving the system of equations to find the vector \(\mathbf{x}\) that does not change after multiplication with the matrix \(P\).

Markov Chains in Linear Algebra make use of key concepts such as matrices, vectors, and eigenvalues to model systems that evolve probabilistically. By applying matrix operations and finding eigenvectors corresponding to eigenvalue 1, we can describe long-term behavior and steady states in such systems.

11.4 SVD Applications

11.5 Eigenvalues in Systems

11.6 Matrix Factorization

11.7 Neural Network Weights

11.8 Simulation with Matrices