Data as Structure: Vectors, Matrices, and Tensors
In practice, mathematical objects are simply ways of organizing data for computation. A vector is an ordered list of numbers representing a single data point in space. A matrix is a collection of these vectors, effectively creating a grid of information, while a tensor is a generalization of these structures into higher dimensions. The norm of a vector measures its magnitude (or length), and the dot product is the primary tool for measuring the similarity between two vectors—a fundamental operation in recommendation systems and semantic search. Projections allow us to map data into different spaces, which is essential for dimensionality reduction techniques.
Measuring Change: Gradients and Optimization
Machine learning models "learn" by minimizing error, which requires calculating how a model's output changes relative to its inputs. A derivative measures the rate of change for a single variable, while the gradient extends this to multiple variables, pointing in the direction of the steepest ascent. In practice, we use the Jacobian to understand how a vector-valued function changes and the Hessian (a matrix of second-order derivatives) to understand the curvature of the loss function. These tools allow us to navigate the loss function landscape, using optimization algorithms to find the parameters that minimize error.
Decision Making Under Uncertainty: Probability and Eigen-Decomposition
Learning is ultimately about making decisions when data is incomplete. Probability distributions quantify the likelihood of outcomes, while expectation, variance, and covariance describe the central tendency and spread of data. When analyzing the structure of data itself, eigenvectors and eigenvalues provide a way to decompose a matrix into its core components. These reveal the "principal directions" of a dataset, which is the mathematical foundation behind techniques like Principal Component Analysis (PCA) and the spectral methods used in modern deep learning architectures.