1 Introduction

\[ \]

“Linear algebra is the key to understanding higher mathematics, as it provides a unified way of handling systems of equations, transformations, and more.” – Lay (2012)

Linear algebra is a core mathematical discipline that serves as a critical foundation in data science. Numerous techniques in data analysis and machine learning rely on linear algebra concepts such as matrices, vectors, and linear transformations. In data science, linear algebra enables us to efficiently handle large datasets and apply various algorithms used for modeling and analysis.

1.1 Key Concepts in Linear Algebra:

1.1.1 Vectors and Vector Spaces

Vectors are frequently used to represent features or attributes in datasets. For instance, each data point in a dataset can be viewed as a vector in an n-dimensional space, where each dimension corresponds to a specific feature.
Vector spaces allow us to work with sets of vectors in a more abstract manner, facilitating methods like Principal Component Analysis (PCA) and dimensionality reduction.

In this section, we illustrate vectors and vector spaces by plotting a scatter plot of points in a 2D space.

1.1.2 Matrices and Matrix Operations

Matrices are used to organize data in a two-dimensional structure, with rows representing individual data points and columns representing their corresponding features.
Matrix operations, such as multiplication and inversion, are crucial for solving systems of linear equations, performing linear regression, and tackling optimization problems in machine learning.

1.1.3 Systems of Linear Equations

Many algorithms in data science, such as linear regression, require solving systems of linear equations to find optimal solutions that minimize prediction errors.

This 3D visualization helps to understand how the linear equations can be interpreted as planes in three-dimensional space and provides a clear visual representation of their intersection. You can include this section in your RMarkdown document for a more comprehensive illustration of systems of linear equations.

1.1.4 Linear Transformations

Including rotations and scalings, are instrumental in data preprocessing, normalization, and applying techniques like PCA for reducing dimensionality.

Linear algebra provides a strong conceptual framework for understanding the structure of data and the advanced algorithms used in data science. As such, it forms an essential part of the curriculum in this program.

# Define points of a square
square <- data.frame(x = c(-1, 1, 1, -1, -1), y = c(-1, -1, 1, 1, -1))

# Define a transformation matrix (e.g., scaling by 2)
transformation_matrix <- matrix(c(2, 0, 0, 2), nrow = 2)

# Apply the transformation
transformed_square <- as.data.frame(as.matrix(square[, 1:2]) %*% transformation_matrix)

# Create a plot for the transformation
fig4 <- plot_ly() %>%
  add_lines(data = square, x = ~x, y = ~y, name = 'Original Shape', line = list(color = 'blue')) %>%
  add_lines(data = transformed_square, x = ~V1, y = ~V2, name = 'Transformed Shape', line = list(color = 'red')) %>%
  layout(title = 'Linear Transformations',
         xaxis = list(title = 'X-axis'),
         yaxis = list(title = 'Y-axis'),
         showlegend = TRUE)
fig4

1.2 Applications of Linear Algebra

1.2.1 Finance: Portfolio Optimization

In portfolio optimization, we calculate the expected returns, variance, and covariance of assets using matrix operations.

Example: Suppose we have two assets with expected returns \(R = \begin{pmatrix} 0.10 \\ 0.15 \end{pmatrix}\) and a covariance matrix:

\[ \Sigma = \begin{pmatrix} 0.04 & 0.01 \\ 0.01 & 0.09 \end{pmatrix} \]

We can calculate the portfolio variance for equal weights \(w = \begin{pmatrix} 0.5 \\ 0.5 \end{pmatrix}\):

\[ \text{Portfolio Variance} = w^T \Sigma w = \begin{pmatrix} 0.5 & 0.5 \end{pmatrix} \begin{pmatrix} 0.04 & 0.01 \\ 0.01 & 0.09 \end{pmatrix} \begin{pmatrix} 0.5 \\ 0.5 \end{pmatrix} \]

The result would be:

\[ = 0.5(0.5 \times 0.04 + 0.5 \times 0.01) + 0.5(0.5 \times 0.01 + 0.5 \times 0.09) = 0.0275 \]

1.2.2 Business: Linear Regression

Linear regression predicts outcomes (e.g., sales) based on features like marketing spend. Using matrix notation, the model is:

\[ Y = X\beta + \epsilon \]

Where \(Y\) is the sales vector, \(X\) is the feature matrix, and \(\beta\) is the coefficients vector. For a small dataset:

\[ X = \begin{pmatrix} 1 & 2 \\ 1 & 3 \\ 1 & 4 \end{pmatrix}, \quad Y = \begin{pmatrix} 5 \\ 6 \\ 7 \end{pmatrix} \]

We can calculate the least-squares estimate of \(\beta\) as:

\[ \beta = (X^TX)^{-1}X^TY \]

1.2.3 Machine Learning: Matrix Multiplication in Neural Networks

In neural networks, inputs are multiplied by weight matrices. For example, given a weight matrix \(W\) and input vector \(X\):

\[ W = \begin{pmatrix} 0.2 & 0.8 \\ 0.6 & 0.4 \end{pmatrix}, \quad X = \begin{pmatrix} 0.5 \\ 0.3 \end{pmatrix} \]

The output is:

\[ WX = \begin{pmatrix} 0.2 & 0.8 \\ 0.6 & 0.4 \end{pmatrix} \begin{pmatrix} 0.5 \\ 0.3 \end{pmatrix} = \begin{pmatrix} 0.35 \\ 0.42 \end{pmatrix} \]

1.2.4 Physics and Engineering: Stress and Strain

In structural analysis, stress \(\sigma\) is calculated using a stress-strain matrix \(E\) and the strain vector \(\epsilon\):

\[ \sigma = E\epsilon \]

For example, if:

\[ E = \begin{pmatrix} 200 & 50 \\ 50 & 100 \end{pmatrix}, \quad \epsilon = \begin{pmatrix} 0.01 \\ 0.02 \end{pmatrix} \]

Then:

\[ \sigma = \begin{pmatrix} 200 & 50 \\ 50 & 100 \end{pmatrix} \begin{pmatrix} 0.01 \\ 0.02 \end{pmatrix} = \begin{pmatrix} 2.5 \\ 2.5 \end{pmatrix} \]

1.2.5 Computer Graphics: 3D Rotation

To rotate a 3D point by an angle \(\theta\) around the z-axis, the rotation matrix is:

\[ R_z(\theta) = \begin{pmatrix} \cos \theta & -\sin \theta & 0 \\ \sin \theta & \cos \theta & 0 \\ 0 & 0 & 1 \end{pmatrix} \]

For \(\theta = 90^\circ\) and point \(P = (1, 0, 0)\):

\[ R_z(90^\circ)P = \begin{pmatrix} 0 & -1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} 1 \\ 0 \\ 0 \end{pmatrix} = \begin{pmatrix} 0 \\ 1 \\ 0 \end{pmatrix} \]

1.2.6 Natural Language Processing: Word Embeddings

Word vectors can be represented in matrix form. For example, if \(v(\text{word1}) = [1, 0, 0]\) and \(v(\text{word2}) = [0, 1, 0]\), their similarity can be calculated using the dot product:

\[ v(\text{word1}) \cdot v(\text{word2}) = 1(0) + 0(1) + 0(0) = 0 \]

1.2.7 Image Processing: Image Compression

Image compression can use Singular Value Decomposition (SVD). For an image matrix \(A\), SVD decomposes it into \(A = U \Sigma V^T\). By retaining only the largest singular values in \(\Sigma\), we can approximate the image with less data.

1.2.8 Economics: Input-Output Model

An input-output model uses matrices to represent relationships between industries. If \(A\) is the input matrix and \(x\) is the output vector, the equilibrium output can be found as:

\[ x = (I - A)^{-1}d \]

Where \(d\) is the demand vector, and \(I\) is the identity matrix.

1.2.9 E-commerce: Recommendation System

Matrix factorization is used in recommendation systems. For a user-item matrix \(R\):

\[ R = U \Sigma V^T \]

Where \(U\) and \(V\) represent latent factors for users and items. We can approximate \(R\) by keeping only the top singular values in \(\Sigma\).

1.2.10 Health: Medical Imaging

In MRI, Fourier transforms (based on linear algebra) are used to reconstruct images. The transformation from raw data \(f(t)\) to the frequency domain \(F(s)\) is calculated as:

\[ F(s) = \int_{-\infty}^{\infty} f(t) e^{-2\pi ist} \, dt \]