Every matrix represents a linear transformation.

A special kind of matrix, called a diagonal matrix is very easy to interpret:

A = \begin{bmatrix} 2 & 0 \\ 0 & 1/2 \end{bmatrix}

It should be obvious why we call this a diagonal matrix. It has non-zero entries along the main diagonal and zeros everywhere else.

It should also not take much to imagine what the linear transformation represented by this matrix would look like. A vector that was transformed by this matrix would have its x component multiplied by 2 and its y component multiplied by \dfrac{1}{2}. It is a simple stretching/scaling.

Consider the example: x = \begin{pmatrix} 1 & 2 \end{pmatrix}:

Ax = \begin{bmatrix} 2 & 0 \\ 0 & 1/2 \end{bmatrix} \begin{bmatrix} 1 \\ 2 \end{bmatrix} = \begin{bmatrix} 2 \\ 1\end{bmatrix}

Here’s another matrix:

A = \begin{bmatrix} 3 & 1 \\ 1 & 3 \end{bmatrix}

It’s much harder to intuitively understand what linear transformation this matrix represents. This is still a simple 2×2 matrix, but notice how much more difficult it is to intuit what this matrix is doing simply because it is not diagonal.

As you increase the size of this matrix it would become impossible to interpret what linear transformation this matrix represents. That is not true of diagonal matrices. Diagonal matrices are always simple scaling/stretching, no matter how large they are.

Okay, diagonal matrices are great. So what? A is not a diagonal matrix so what can we do?

As long as the matrix is square we can perform a factorization called eigendecomposition.

Eigenvalues and Eigenvectors

A matrix’s eigenvectors are any vectors that satisfy this equation:

Ax = \lambda x

An eigenvector is a vector that, when multiplied by A, does not change direction. It is only scaled by some value \lambda. That value\lambda is called an eigenvalue. A square nxn matrix matrix can have anywhere up to n eigenvectors.

How do we find these eigenvalues and eigenvectors?

The equation above can be rewritten like this:

Ax - \lambda Ix = 0

We basically just subtracted \lambda x from both sides of the equation.

Factoring out the x we are left with:

(A - \lambda I) x = 0

This equation is telling us that the matrix (A - \lambda I) multiplied by some vector x gives us the zero vector.

Obviously, the zero vector itself multiplied by any matrix gives us the zero vector. We don’t care about that.

We want some other non-zero vector that, when multiplied by (A - \lambda x) gives us the zero vector. For that to be true, the determinant of (A - \lambda x) must be 0.

det(A - \lambda I) = 0

We know A and we know I so we can solve for \lambda:

det(A - \lambda I) = \begin{vmatrix} 3 - \lambda & 1 \\ 1 & 3 - \lambda \end{vmatrix} = 0

(3 - \lambda)^2 - 1 = 0

\lambda^2 -6\lambda + 8 = 0

\lambda_1 = 4 \;\;\; \lambda_1 = 2

Great! Now we have our eigenvalues. We can use them to compute our eigenvectors:

\begin{bmatrix} 3 - 4& 1 \\ 1 & 3 - 4 \end{bmatrix} \begin{bmatrix} v_1 \\ v_2 \end{bmatrix} = 0

To solve this system of linear equations we can get the matrix into reduced row echelon form, then solve the system:

\begin{bmatrix} 1 & -1 \\ 0 & 0 \end{bmatrix} \begin{bmatrix} v_1 \\ v_2 \end{bmatrix} = 0

v_2 is our free variable so we will choose 1. Plugging that into the first equation we get:

v = \begin{bmatrix} 1 \\ 1 \end{bmatrix}

The second eigenvector is computed in the same way. We get:

w = \begin{bmatrix} -1\\ 1 \end{bmatrix}

Eigendecomposition

Eigendecomposition is a way of factoring any square matrix into 3 different matrices multiplied together:

A = Q\Lambda Q^{-1}

  • A is any square matrix
  • \Lambda is a diagonal matrix of eigenvalues
  • Q is an orthogonal matrix of eigenvectors

We did most of the work needed for this factorization in the previous section about finding eigenvalues and eigenvectors.

The diagonal matrix \Lambda will have our eigenvalues along its diagonal in descending order:

\Lambda = \begin{bmatrix} 4 & 0 \\ 0 & 2 \end{bmatrix}

The fact that they are in descending order is very important. The magnitudes of the eigenvalues tell us how much of the linear transformation represented by A is “explained” by that eigenvalue.

We already have our eigenvectors, which will make up the columns of V. The eigenvectors must go in the same order as their associated eigenvalues in the columns of \Lambda. It also just so happens that these eigenvectors are orthogonal (their dot product is 0). All that’s left to do is to normalize our eigenvectors and make them the columns of V:

V = \begin{bmatrix} 1 / \sqrt{2} & -1 / \sqrt{2} \\ 1 / \sqrt{2} & 1 / \sqrt{2} \end{bmatrix}

It will not always be the case that the eigenvectors we found using the method in the previous section are orthogonal. This is a special property of symmetric matrices (which is why I chose this A as an example). If the eigenvectors were not orthogonal to each other you could create an orthonormal Q using the Gram-Schmidt process.

So all together our final matrix factorization for A looks like this:

A = \begin{bmatrix} 1 / \sqrt{2} & -1 / \sqrt{2} \\ 1 / \sqrt{2} & 1 / \sqrt{2} \end{bmatrix} \begin{bmatrix} 4 & 0 \\ 0 & 2 \end{bmatrix} \begin{bmatrix} 1 / \sqrt{2} & 1 / \sqrt{2} \\ -1 / \sqrt{2} & 1 / \sqrt{2} \end{bmatrix}

How do we interpret the Eigendecomposition

Now that we have our factorization it’s worth going over how we interpret it.

Q is orthogonal meaning that its columns are both unit length and orthogonal to each other. This means that the linear transformation represented by Q (or its transpose) will not change the magnitude of the vector(s) or the angles between them. It is just a rotation.

This is essentially a change of basis. Q^T transforms the input into the eigenspace of A. Then it scales it by the magnitudes of the eigenvalues by multiplying by \Lambda, a diagonal matrix. Finally, we multiply it by Q to rotate it back into input space.

This factorization is powerful because it allows us to understand A on a much more intuitive level. The magnitudes of the eigenvalues tell us how important their corresponding eigenvectors are to the linear transformation that A represents. We can break a linear transformation down into pieces and look at an easily understood diagonal matrix to understand what the important features of it are.

There are many variations on this motif of “diagonalizing” matrices — factoring them such that one of the factors is a diagonal matrix that is easy to interpret.

Eigendecomposition is great, but it has one unfortunate requirement: it requires a square matrix. In reality we are not always lucky enough to have a square matrix.

In the next blog post I will talk about another, closely related matrix factorization: Singular Value Decomposition. SVD can be performed on any matrix and it’s applications are very powerful.

Leave a comment