Mathematics for Machine Learning | Chapter 2 - Mathematics for Machine Learning - 读书笔记

# Definition of Vector

Vectors are special objects that can be added together and multiplied by scalars to produce another object of the same kind. From an abstract mathematical viewpoint, any object that satisfies these two properties can be considered a vector.

Under this definition, geometric vectors, polynomials, audio signals, elements of $\mathbb{R}^n$ , and so on are all vectors.

# 2.1 Systems of Linear Equations

General form of a system of linear equations

$\begin{gathered} a_{11}x_1 + \cdots + a_{1n}x_n = b_1 \\\\ \vdots \\\\ a_{m1}x_1 + \cdots + a_{mn}x_n = b_n \end{gathered}$

To give a systematic approach to solving linear equations, we introduce a compact notation:

\begin{gather} \begin{bmatrix} a_{11} \\\\ \vdots \\\\ a_{m1} \end{bmatrix}x_1 + \begin{bmatrix} a_{12} \\\\ \vdots \\\\ a_{m2} \end{bmatrix}x_2 + \cdots + \begin{bmatrix} a_{1n} \\\\ \vdots \\\\ a_{mn} \end{bmatrix}x_n = \begin{bmatrix} b_{1} \\\\ \vdots \\\\ b_{m} \end{bmatrix}\Leftrightarrow\\[2em] \begin{bmatrix} a_{11} & \cdots & a_{1n} \\\\ \vdots & & \vdots \\\\ a_{m1} & \cdots & a_{mn} \end{bmatrix} \begin{bmatrix} x_{1} \\\\ \vdots \\\\ x_{n} \end{bmatrix} = \begin{bmatrix} b_{1} \\\\ \vdots \\\\ b_{m} \end{bmatrix} \end{gather}

# 2.2 Matrices

Hardmard Product
An element-wise operation on matrix elements, i.e., $c_{ij} = a_{ij}b_{ij}$ .

# 2.3 Solving Systems of Linear Equations

# 2.3.1 Particular and General Solution

Consider the system of equations:

\begin{equation} \begin{bmatrix} 1 & 0 & 8 & -4\\\\ 0 & 1 & 2 & 12 \end{bmatrix} \begin{bmatrix} x_1\\\\ x_2\\\\ x_3\\\\ x_4 \end{bmatrix} = \begin{bmatrix} 42\\\\ 8 \end{bmatrix} \end{equation}

We want to find scalars $x_1, \dots, x_4$ , such that $\sum_{i = 1}^4 x_i\boldsymbol{c}_i = \boldsymbol{b}$ , where $\boldsymbol{c}_i$ is the $i$ th coloum of the matrix.

We find that

$\boldsymbol{b} = \begin{bmatrix} 42 \\\\ 8 \end{bmatrix} = 42\begin{bmatrix} 1 \\\\ 0 \end{bmatrix} + 8\begin{bmatrix} 0 \\\\ 1 \end{bmatrix}\text{.}$

Thus one solution is $[42, 8, 0, 0]^T$ , i.e., a paticular solution/ special solution.

We express the third column using the first two columns:

$\begin{bmatrix} 8 \\\\ 2 \end{bmatrix} = 8\begin{bmatrix} 1 \\\\ 0 \end{bmatrix} + 2\begin{bmatrix} 0 \\\\ 1 \end{bmatrix}$

so that $8\boldsymbol{c}_1 + 2\boldsymbol{c}_2 - \boldsymbol{c}_3 + 0\boldsymbol{c}_4 = \boldsymbol{0}$ . Assume $\lambda_1\in \mathbb{R}$ , we have

\begin{equation} \begin{bmatrix} 1 & 0 & 8 & -4\\\\ 0 & 1 & 2 & 12 \end{bmatrix}\left(\lambda_1 \begin{bmatrix} x_1\\\\ x_2\\\\ x_3\\\\ x_4 \end{bmatrix}\right) = \lambda_1(8\boldsymbol{c}_1 + 2\boldsymbol{c}_2 - \boldsymbol{c}_3) = \boldsymbol{0} \end{equation}

Following the same line of reasoning, we express the fourth column of the matrix using the first two columns and get:

\begin{equation} \begin{bmatrix} 1 & 0 & 8 & -4\\\\ 0 & 1 & 2 & 12 \end{bmatrix}\left(\lambda_2 \begin{bmatrix} x_1\\\\ x_2\\\\ x_3\\\\ x_4 \end{bmatrix}\right) = \lambda_2(-4\boldsymbol{c}_1 + 12\boldsymbol{c}_2 - \boldsymbol{c}_4) = \boldsymbol{0} \end{equation}

Then we get the general solution:

$\boldsymbol{x} = \begin{bmatrix} 42 \\\\ 8 \\\\ 0 \\\\ 0 \end{bmatrix} + \lambda_1\begin{bmatrix} 8 \\\\ 2 \\\\ -1 \\\\ 0 \end{bmatrix} + \lambda_2\begin{bmatrix} -4 \\\\ 12 \\\\ 0 \\\\ -1 \end{bmatrix}, \;\;\lambda_1, \lambda_2 \in \mathbb{R}$

General Approach

Find a particular solution to $\boldsymbol{Ax}=\boldsymbol{b}$ .
Find all solutions to $\boldsymbol{Ax} = \boldsymbol{0}$ .
Combine the solutions from steps 1. and 2. to the general solution.

# 2.3.2 Elementary Transformations

Exchange of two equations (rows in the matrix representing the system
of equations)
Multiplication of an equation (row) with a constant $\lambda\in\mathbb{R} \backslash \\$
Addition of two equations (rows)

Augmented Matrix
Assume $\boldsymbol{Ax}=\boldsymbol{b}$ , then we define the augmented matrix $[\boldsymbol{A} | \boldsymbol{b}]$ .

Row-Echelon Form

All rows that contain only zeros are at the bottom of the matrix; correspondingly, all rows that contain at least one nonzero element are on top of rows that contain only zeros.
Looking at nonzero rows only, the first nonzero number from the left
(also called the pivot or the leading coefficient) is always strictly to the right of the pivot of the row above it.

Basic and Free Variables
The variables corresponding to the pivots in the row-echelon form are called basic variables and the other variables are free variables. (Pivots are basic variables and the others are free variables.)

Obtaining a Particular Solution
Assume we have simplified augmented matrix:

$\begin{bmatrix} \begin{array}{ccccc|c} 1 & -2 & 1 & -1 & 1 & 0 \\\\ 0 & 0 & 1 & -1 & 3 & -2 \\\\ 0 & 0 & 0 & 1 & -2 & 1 \\\\ 0 & 0 & 0 & 0 & 0 & 0 \end{array} \end{bmatrix}$

We have $\boldsymbol{b} = \sum_{i = 1}^P \lambda_i \boldsymbol{p}_i$ , where $\boldsymbol{p}_i$ are the pivot columns. The $\lambda_i$ are determined easiest if we start with the rightmost pivot column and work our way to the left.

$\lambda_1\boldsymbol{p}_1 + \lambda_2\boldsymbol{p}_3 + \lambda_3\boldsymbol{p}_4 = \boldsymbol{b}\Leftrightarrow\\[1em] \lambda_1\begin{bmatrix} 1\\\\ 0\\\\ 0\\\\ 0 \end{bmatrix}+\lambda_2\begin{bmatrix} 1\\\\ 1\\\\ 0\\\\ 0 \end{bmatrix}+\lambda_3\begin{bmatrix} -1 \\\\ -1 \\\\ 1 \\\\ 0 \end{bmatrix} = \begin{bmatrix} 0 \\\\ -2 \\\\ 1\\\\ 0 \end{bmatrix}$

We can get $\lambda_3 = 1, \lambda_2 = -1, \lambda_1 = 2$ . Therefore we get the paticular solution $\boldsymbol{x} = [2, 0, -1, 1, 0]^T$ .

Reduced Row Echelon Form
An equation system is in reduced row-echelon form (also: row-reduced echelon form or row canonical form) if

It is in row-echelon form.
Every pivot is 1.
The pivot is the only nonzero entry in its column.

Gaussian Elimination
Gaussian elimination is an algorithm that performs elementary transformations to bring a system of linear equations into reduced row-echelon form.

Obtain Solutions to $\boldsymbol{Ax} = \boldsymbol{0}$
Assume we have a reduced row echelon form matrix:

\begin{equation} \boldsymbol{A} = \begin{bmatrix} \boldsymbol{1} & 3 & 0 & 0 & 3 \\\\ 0 & 0 & \boldsymbol{1} & 0 & 9 \\\\ 0 & 0 & 0 & \boldsymbol{1} & -4 \end{bmatrix} \end{equation}

The key idea for finding the solutions of $\boldsymbol{Ax} = \boldsymbol{0}$ is to look at the nonpivot columns, which we will need to express as a (linear) combination of the pivot columns.
Consider the second column, we have

$3\boldsymbol{p}_1 - \boldsymbol{p}_2 = \boldsymbol{0}$

Consider the fifth column, we have

$3\boldsymbol{p}_1 + 9\boldsymbol{p}_3 - 4\boldsymbol{p}_4 - \boldsymbol{p}_5 = \boldsymbol{0}$

Therefore, all solutions of $\boldsymbol{Ax} = \boldsymbol{0}$ are given by

$\boldsymbol{x} = \lambda_1\begin{bmatrix} 3 \\\\ -1 \\\\ 0 \\\\ 0\\\\ 0 \end{bmatrix} + \lambda_2\begin{bmatrix} 3 \\\\ 0 \\\\ 9 \\\\ -4 \\\\ -1 \end{bmatrix}, \;\;\lambda_{1,2} \in \mathbb{R}$

# 2.3.3 The Minus-1 Trick

Minus-1 Trick

\begin{equation} \boldsymbol{A} = \begin{bmatrix} \boldsymbol{1} & 3 & 0 & 0 & 3 \\\\ 0 & 0 & \boldsymbol{1} & 0 & 9 \\\\ 0 & 0 & 0 & \boldsymbol{1} & -4 \end{bmatrix} \end{equation}

We now augment this matrix to a $5\times 5$ matrix by adding rows of the
form $[0\;\cdots\;0\;-1\;\;0\; \cdots\;0]$ at the places where the pivots on the diagonal are missing and obtain:

$\tilde{\boldsymbol{A}} = \begin{bmatrix} 1 & 3 & 0 & 0 & 3 \\\\ \boldsymbol{0} & \boldsymbol{-1} & \boldsymbol{0}& \boldsymbol{0}& \boldsymbol{0}\\ 0 & 0 & 1 & 0 & 9 \\\\ 0 & 0 & 0 & 1 & -4 \\\\ \boldsymbol{0} & \boldsymbol{0}& \boldsymbol{0}& \boldsymbol{0}& \boldsymbol{-1}\\ \end{bmatrix}$

From this form, we can immediately read out the solutions of $\boldsymbol{Ax} = \boldsymbol{0}$ by
taking the columns of $\tilde{\boldsymbol{A}}$ , which contain $-1$ on the diagonal:

$\boldsymbol{x} = \lambda_1\begin{bmatrix} 3 \\\\ -1 \\\\ 0 \\\\ 0\\\\ 0 \end{bmatrix} + \lambda_2\begin{bmatrix} 3 \\\\ 0 \\\\ 9 \\\\ -4 \\\\ -1 \end{bmatrix}, \;\;\lambda_{1,2} \in \mathbb{R}$

These columns form a basis (Section 2.6.1) of the solution space of $\boldsymbol{Ax} = \boldsymbol{0}$ , which we will later call the kernel or null space (see Section 2.7.3).

Calculating the Inverse
Using elementary transformation, we can get:

$\left[\boldsymbol{A} | \boldsymbol{I}_n\right]\leadsto \cdots \leadsto \left[\boldsymbol{I}_n | \boldsymbol{A}^{-1}\right]$

# 2.3.4 Algorithms for Solving a System of Linear Equations

When there is no solution, we need to resort to approximate
solutions. One way to solve the approximate problem is using the approach of linear regression.

Moore-Penrose pseudo-inverse

$\boldsymbol{Ax} = \boldsymbol{b} \Leftrightarrow \boldsymbol{A}^T\boldsymbol{Ax} = \boldsymbol{A}^T\boldsymbol{b} \Leftrightarrow \boldsymbol{x} = (\boldsymbol{A}^T\boldsymbol{A})^{-1}\boldsymbol{A}^T\boldsymbol{b}$

The Moore-Penrose pseudo-inverse $(\boldsymbol{A}^T\boldsymbol{A})^{-1}\boldsymbol{A}^T$ determine the solution, which also corresponds to the minimum norm least-squares solution.

Disadvantage
It requires many computations for the matrix-matrix product and computing the inverse of $\boldsymbol{A}^T\boldsymbol{A}$ . Moreover, for reasons of numerical precision it is generally not recommended to compute the inverse or pseudo-inverse.

In practice, systems of many linear equations are solved indirectly, by either stationary iterative methods, such as the Richardson method, the Jacobi method, the Gauss-Seidel method, and the successive over-relaxation method, or Krylov subspace methods, such as conjugate gradients, generalized minimal residual, or biconjugate gradients.

# 2.4 Vector Spaces

# 2.4.1 Groups

Definition (Group)
Consider a set $\mathcal{G}$ and an operation $\otimes: \mathcal{G} \times \mathcal{G} \to \mathcal{G}$ defined on $\mathcal{G}$ . Then $G:=(\mathcal{G}, \otimes)$ is called a group if the following hold:

Closure of $\mathcal{G}$ under $\otimes$ : $\forall x, y \in \mathcal{G}: x\otimes y \in \mathcal{G}$ .
Associativity: $\forall x, y, z \in \mathcal{G}: (x \otimes y) \otimes z = x \otimes (y \otimes z)$ .
Neutral element: $\exists e \in \mathcal{G}\forall x \in \mathcal{G}: x\otimes e = x\;\text{and}\;e \otimes x = x$ .
Inverse element: $\forall x \in \mathcal{G} \exists y \in \mathcal{G}: x\otimes y = e\;\text{and}\; y \otimes x = e$ ( $e$ is the neutral element and $y$ is usually denoted as $x^{-1}$ ).

Abelian group
If additionally $\forall x,y\in \mathcal{G}: x \otimes y = y \otimes x$ , then $G = (\mathcal{G}, \otimes)$ is an Abelian group.

Definition (General Linear Group)
The set of regular (invertible) matrices $\boldsymbol{A} \in \mathbb{R}^{n \times n}$ is a group with respect to matrix multiplication and is called general linear group $GL(n, \mathbb{R})$ . However, since matrix multiplication is not commutative, the group is not Abelian.

# 2.4.2 Vector Spaces

Definition (Vector Space)
A real-valued vector space $V = (\mathcal{V}, +, \cdot)$ is a set $\mathcal{V}$ with inner operation $+$ and outer operation $\cdot$

$\begin{aligned} +&:\mathcal{V} \times \mathcal{V} \rightarrow \mathcal{V}\\ \cdot&: \mathbb{R} \times \mathcal{V} \to \mathcal{V} \end{aligned}$

where

$(\mathcal{V}, +)$ is an Abelian group.
Distributivity:
1. \forall \lambda \in \mathbb{R}, \boldsymbol{x}, \boldsymbol{y} \in \mathcal{V}:\lambda\cdot(\boldsymbol{x}+\boldsymbol{y}) = \lambda\cdot\boldsymbol{x} + \lambda\cdot\boldsymbol
2. \forall \lambda, \psi \in \mathbb{R}, \boldsymbol{x} \in \mathcal{V}: (\lambda + \psi)\cdot \boldsymbol{x} = \lambda\cdot \boldsymbol{x} + \psi\cdot\boldsymbol
Associativity (outer operation): \forall\lambda,\psi\in\mathbb{R}, \boldsymbol{x}\in \mathcal{V}: \lambda\cdot(\psi\cdot\boldsymbol{x}) = (\lambda\psi)\cdot\boldsymbol
Neutral element with respect to the outer operation: \forall\boldsymbol{x} \in \mathcal{V}: 1\cdot \boldsymbol{x} = \boldsymbol

Multiplication by scalars and scalar product is different! (e.g.: $\boldsymbol{x}, \boldsymbol{y} \in \mathcal{V}, \lambda\in\mathbb{R}, \lambda\boldsymbol{x}\in\mathcal{V}$ is a multiplication by scalars while $\boldsymbol{x}^T\boldsymbol{y}\in\mathbb{R}$ is a scalar product)

# 2.4.3 Vector Subspaces

Definition (Vector Subspace)
Let $V = (\mathcal{V}, +, \cdot)$ be a vector space and $\mathcal{U} \in \mathcal{V}, \mathcal{U} \ne \emptyset$ . Then $U = (\mathcal{U}, +, \cdot)$ is called vector subspace of $V$ (or linear subspace) if $U$ is a vector space with the vector space operation $+$ and $\cdot$ restricted to $\mathcal{U}\times\mathcal{U}$ and $\mathbb{R}\times\mathcal{U}$ . We write $U \in V$ to denote a subspace $U$ of $V$ . To determine $(\mathcal{U}, +, \cdot)$ is a subspace of $V$ we still do need to show

$\mathcal{U}\ne\emptyset$ , in particular: $\boldsymbol{0}\in\mathcal{U}$ .
Closure of $\mathcal{U}$ :
a. With respect to the outer operation: $\forall\lambda\in\mathbb{R}\forall\boldsymbol{x}\in\mathcal{U}:\lambda\boldsymbol{x}\in\mathcal{U}$ .
b. With respect to the inner operation: $\forall\boldsymbol{x},\boldsymbol{y}\in\mathcal{U}:\boldsymbol{x}+\boldsymbol{y}\in\mathcal{U}$ .

Some Conclutions

For every vector space $V$ , the trivial subspaces are $V$ itself and $\\{\boldsymbol{0}\\}$ .
The solution set of a homogeneous system of linear equations $\boldsymbol{Ax} = \boldsymbol{0}$ with $n$ unknowns $\boldsymbol{x} = [x_1, x_2, \cdots,x_n]^T$ is a subspace of $\mathbb{R}^n$ .
The solution of an inhomogeneous system of linear equations $\boldsymbol{Ax} = \boldsymbol{b}, \boldsymbol{b}\ne\boldsymbol{0}$ is not a subspace of $\mathbb{R}^n$ .
The intersection of arbitrarily many subspaces is a subspace itself.

# 2.5 Linear Independence

Definition (Linear Combination)
Consider a vector space $V$ and a finite number of vectors $\boldsymbol{x}_1,\dots,\boldsymbol{x}_k\in V$ . Then, every $\boldsymbol{v}\in V$ of the form

$\boldsymbol{v} = \lambda_1\boldsymbol{x}_1+\cdots+\lambda_k\boldsymbol{x}_k=\sum_{i = 1}^{k}\lambda_i\boldsymbol{x}_i\in V$

with $\lambda_1,\dots,\lambda_k\in\mathbb{R}$ is a linear combination of the vectors $\boldsymbol{x}_1,\dots,\boldsymbol{x}_k$ .

Definition (Linear (In)dependence)
Consider a vector space $V$ with $k\in\mathbb{N}$ and $\boldsymbol{x}\_1,\dots,\boldsymbol{x}\_k\in V$ . If there is a non-trivial linear combination, such that $\boldsymbol{0} = \sum_{i = 1}^k\lambda_i\boldsymbol{x}_i$ with at least one $\lambda_i\ne0$ , the vectors $\boldsymbol{x}_1,\dots,\boldsymbol{x}_k$ are linear dependent. If only the trivial solution exists, i.e., $\lambda_1 = \cdots = \lambda_k = 0$ , the vectors $\boldsymbol{x}_1,\dots,\boldsymbol{x}_k$ are linear independent.

Important properties

If at least one of the vectors $\boldsymbol{x}_1,\dots,\boldsymbol{x}_k$ is $\boldsymbol{0}$ then they are linearly dependent. The same holds if two vectors are identical.
The vectors $\\{\boldsymbol{x}_1,\dots,\boldsymbol{x}_k:\boldsymbol{x}_i\ne0, i = 1,\dots, k\\}, k \ge 2$ , are linear independent if and only if (at least) one of them is a linear combination of the others. In particular, if one vector is a multiple of another vector, i.e., $\boldsymbol{x}_i = \boldsymbol{x}_j,\lambda\in\mathbb{R}$ , then the set $\\{\boldsymbol{x}_1,\dots,\boldsymbol{x}_k:\boldsymbol{x}_i\ne0, i = 1,\dots, k\\}$ is linearly dependent.
A practical way of checking whether vectors $k\in\mathbb{N}$ $k \in N$ and $\boldsymbol{x}_1,\dots,\boldsymbol{x}_k\in V$ $x_{1}, \dots, x_{k} \in V$ are linearly independent is to use Gaussian elimination: Write all vectors as columns of a matrix $\boldsymbol{A}$ $A$ and perform Gaussian elimination until the matrix is in row echelon form (the reduced row-echelon form is unnecessary here):
- The pivot columns indicate the vectors, which are linearly independent of the vectors on the left. Note that there is an ordering of vectors when the matrix is built.
- The non-pivot columns can be expressed as linear combinations of the pivot columns on their left.
  All column vectors are linearly independent if and only if all columns are pivot columns. If there is at least one non-pivot column, the columns (and, therefore, the corresponding vectors) are linearly dependent.

Consider a vector space $V$ with $k$ linearly independent vectors $\boldsymbol{b}_1, \cdots, \boldsymbol{b}_k$ and $m$ linear combinations

$\begin{gathered} \boldsymbol{x}_1 = \sum_{i = 1}^k\lambda_{i1}\boldsymbol{b}_i,\\\\ \vdots\\\\ \boldsymbol{x}_m = \sum_{i = 1}^k\lambda_{im}\boldsymbol{b}_i. \end{gathered}$

Defining $\boldsymbol{B} = [\boldsymbol{b}_1, \cdots, \boldsymbol{b}_k]$ as the matrix whose columns are the linearly independent vectors $\boldsymbol{b}_1, \cdots, \boldsymbol{b}_k$ , we have:

$\boldsymbol{x}_j = \boldsymbol{B}\boldsymbol{\lambda}_j,\;\;\boldsymbol{\lambda}_j = \begin{bmatrix} \lambda_{1j}\\\\ \vdots\\\\ \lambda_{kj} \end{bmatrix},\;\;j = 1, \cdots, m.$

To test whether $\boldsymbol{x}\_1,\dots,\boldsymbol{x}\_m$ is linearly independent, we follow the general approach of testing whether $\sum_{j = 1}^m \psi_j\boldsymbol{x}_j = \boldsymbol{0}$ , i.e., to test whether

$\sum_{j = 1}^m\psi_j\boldsymbol{x}_j = \sum_{j = 1}^m\psi_j\boldsymbol{B}\boldsymbol{\lambda}_j = \boldsymbol{B}\sum_{j = 1}^m\psi_j\boldsymbol{\lambda}_j = \boldsymbol{0}.$

This means that $\\{\boldsymbol{x}_1,\dots,\boldsymbol{x}_m\\}$ are linearly independent if and only if the column vectors $\\{\boldsymbol{\lambda}_1, \cdots, \boldsymbol{\lambda}_m\\}$ are linearly independent.

In a vector space $V$ , $m$ linear combinations of $k$ vectors $\boldsymbol{x}_1,\dots,\boldsymbol{x}_k$ are linearly dependent if $m > k$ .
But why?

# 2.6 Basis and Rank

# 2.6.1 Generating Set and Basis

Definition (Generating Set and Span)
Consider a vector space $V = (\mathcal{V}, +, \cdot)$ and set of vectors $\mathcal{A} = \\{\boldsymbol{x}_1, \dots, \boldsymbol{x}_k\\} \subseteq \mathcal{V}$ . If every vector $\boldsymbol{v} \in \mathcal{V}$ can be expressed as a linear combination of $\boldsymbol{x}_1, \dots, \boldsymbol{x}_k$ , $\mathcal{A}$ is called a generating set of $V$ . The set of all linear combinations of vectors in $\mathcal{A}$ is called the span of $\mathcal{A}$ . If $\mathcal{A}$ spans the vector space $V$ , we write $V = \text{span}[\mathcal{A}]$ or $V = \text{span}[\boldsymbol{x}_1, \dots, \boldsymbol{x}_k]$ .

Definition (Basis)
Consider a vector space $V = (\mathcal{V}, +, \cdot)$ and $\mathcal{A} \subseteq \mathcal{V}$ . A generating set $\mathcal{A}$ of $V$ is called minimal if there exists no smaller set $\tilde{\mathcal{A}} \subsetneq \mathcal{A} \subseteq V$ that spans $V$ . Every linearly independent generating set of $V$ is minimal and is called a basis of $V$ .

Properties of Basis
Let $V = (\mathcal{V}, +, \cdot)$ be a vector space and $\mathcal{B} \subseteq \mathcal{V}, \mathcal{B} \neq \emptyset$ . Then, the following statements are equivalent:

$\mathcal{B}$ is a basis of $\mathcal{V}$ .
$\mathcal{B}$ is a minimal generating set.
$\mathcal{B}$ is a maximal linearly independent set of vectors in $V$ , i.e., adding any other vector to this set will make it linearly dependent.
Every vector $\boldsymbol{x} \in \mathcal{V}$ is a linear combination of vectors from $\mathcal{B}$ , and every linear combination is unique, i.e., with

$\boldsymbol{x} = \sum_{i = 1}^k \lambda_i \boldsymbol{b}_i = \sum_{i = 1}^k \psi_i \boldsymbol{b}_i$

and $\lambda_i, \psi_i \in \mathbb{R}, \boldsymbol{b}_i \in \mathcal{B}$ , it follows that $\lambda_i = \psi_i, i = 1, \dots, k$ .

Definition (Basis Vector)
Every vector space $V$ possesses many different bases. However, all bases possess the same number of elements, the basis vectors.

Definition (Dimension)
The dimension of $V$ is the number of basis vectors of $V$ , denoted as $\text{dim}(V)$ . If $U \subseteq V$ , i.e., $U$ is a subspace of $V$ , then $\text{dim}(U)\le \text{dim}(V)$ and $\text{dim}(U) = \text{dim}(V)$ if and only if $U = V$ . Intuitively, the dimension of a vector space can be thought of as the number of independent directions in this vector space. The dimension of a vector space is not necessarily the number of elements in a vector. For instance, the vector space $V = \text{span}\begin{bmatrix} 0\\\\1 \end{bmatrix}$ is one-dimensional, although the basis vector possesses two elements.

Get a basis of a subspace
A basis of a subspace $U = \text{span}[\boldsymbol{x}_1, \dots, \boldsymbol{x}_m] \subseteq \mathbb{R}^n$ can be found by excuting the following steps:

Write the spanning vectors as columns of a matrix $\boldsymbol{A}$ .
Determine the row-echelon form of $\boldsymbol{A}$ .
The spanning vectors associated with the pivot columns are a basis of $U$ .

# 2.6.2 Rank

Definition (Rank)
The number of linearly independent columns of a matrix $\boldsymbol{A} \in \mathbb{R}^{m\times n}$ equals the number of linearly independent rows and is called the rank of $\boldsymbol{A}$ and is denoted by $\text{rk}(\boldsymbol{A})$ .