“[…] SVD, which may be thought of as providing a basis that is tailored to the specific data, as opposed to the FFT, which provides a generic basis.”
— Data-Driven Science and Engineering, Chapter 1
Introduction
Main Question: Why do we bother with all the different decompositions, like SVD, FT, and DMD?
Compressibility of Natural Signals
One of the reasons we even care for these decomposition is thanks to the idea that a lot of high-dimensional natural signals, such as any image we might encounter, can be represented with fewer bits of information than we originally encounter them in. If that is the case, we should be able to compress things! Right, so what are ways to go about that compression?
It turns out there are many ways.
But First, Basis
The key to downsizing is throwing away things that don’t spark (as much) joy. By decomposing an object we get a view of which things we can get away with throwing out: we want to find and get rid of components of an object which don’t contribute much to its overall essence.
In linear algebra, we typically try to gain a new perspective on an object of interest by looking for a different way to express it. Here, I talk about the basis as the “way to express” things. Then, changing your basis gives you that new perspective. While changing basis might make an object look different, it doesn’t
On a two-dimensional plane, we are familiar with the canonical basis \(\begin{bmatrix}1 & 0 \\ 0 & 1\end{bmatrix}\). Any two-dimensional vector can be expressed as a linear combination of these two vectors and it always works! That is the case because the canonical basis vectors are independent and they span the two-dimensional plane.
Let’s get the definition out of the way: the SVD, or the singular value decomposition, is a decomposition of any matrix \(A \in \mathbb{R}^{m \times n}\), such that you can express it as a product of three matrices: an orthonormal matrix \(U \in \mathbb{R}^{m \times m}\), followed by a “diagonal” matrix \(\Sigma \in \mathbb{R}^{m \times n}\), followed by another orthonormal matrix \(V \in \mathbb{R}^{n \times n}\), \[
A = U\Sigma V^*.
\]
Note the quotation marks around diagonal.
When a matrix is “orthonormal”, it means that all rows are orthogonal to each other and the norm of each row is 1. You can say the same about the columns too. A little formally, orthonormality for a matrix \(U\) is defined as \(U_i \cdot U_j = \delta_{i,j}\), where \(\delta_{i,j}\) is the Kronecker notation. These matrices have the nice property that they either flip or rotate the object which they are acting on.
In Kronecker notation \(\delta_{i,j}\), the value resolves to one when \(i=j\) and zero when \(i\neq j\).
When we are applying the transformation \(A\) to any vector \(x \in \mathbb{R}^{n}\), it is identical to multiplying it by these three matrices. With the SVD on hand, we can take advantage of the situation by breaking the transformation down into multiple steps. To demonstrate this, let’s assume we have a transformation
If \(A\) is acting on a vector \(x = \begin{bmatrix}1 & 1\end{bmatrix}^T\), it should result in a vector \(b = \begin{bmatrix}3 & 7\end{bmatrix}^T\). But with the SVD, we get three successive transformations, where the final transformation \(U \Sigma V^* x^T\) returns the expected vector \(b\). Notice that in the first and last transformations, there is no “stretching” because the length of the vector remains the same, instead it is just a rotation.
Code
import numpy as npfrom rich.pretty import pprint# Set options for printingnp.set_printoptions(precision=3, suppress=True)# Define A and xA = np.array([[1, 2], [3, 4]])x = np.array([[1], [1]])# Perform SVDU, S, Vh = np.linalg.svd(A, full_matrices=True)# The transformationsfirst = Vh @ xsecond = np.diag(S) @ firstthird = U @ secondrank1_approx = S[0] * U[:,:1] @ Vh[:1,:]rank1_approx_transform = rank1_approx @ x
Why the SVD is a “Tailored” Basis
The first and last transformation is a change of basis. Particularly, in the first transformation, we are changing from our canonical basis to a new basis defined by the column vectors of the \(V^T\) matrix, \(x' = V^T x\).
The vectors that consist this basis, while orthogonal to each other, are not obvious or codified anywhere. Nevertheless, they are derived from \(A\), which is why they are “tailored.” Technically, in a two-dimensional plane, any 2D rotation matrix is the eigenbasis to some linear transformation out there!
Why are these two sets of bases, \(U\) and \(V\) special?
They are special because the triplet \(\{u_k, v_k, \sigma_k\}\), also called the singular system, adhere to the following relationship: \[
\begin{aligned}
\sigma u_k &= Av \\
u^TA &= \sigma v^T \\
\end{aligned}
\]
To elaborate, \(u_k, v_k\) are a pair of vectors such that if you transform either one of them by \(A\), the other one will scale by \(\sigma_k\). This relation holds in both directions with the same scale factor.
Another View of the SVD
There is a another way to think about SVDs, which is as “a sum of rank-1 matrices”, a common framing in decomposition literature. To elaborate, when a matrix is rank-1, then regardless of the matrix dimension, we know that all columns (or rows) are dependent on each other. Metaphorically, the matrix “boils down” to just one vector. The benefit of this definition is that we can approximate our original matrix simply by selecting how many rank-1 matrices we want to include!
But why does that allow for compression? Well, if the original matrix is \(m \times n\), then we have \(mn\) elements to store. If we instead manage to approximate it with \(\sum_{1}^{k} m + n + 1\), and \(k << m,n\), then we have done well, thanks to SVD!
The Hilbert Space
Sometimes, it is helpful to think of functions as infinite (or very large) vectors. Like vectors, you can request a retrieval at a particular “index” for a function (e.g. \(f(5)\) vs. my_array[5]). Of course, for functions you can request a retrieval at “fractions of an index” (e.g. \(f(\frac{4}{13})\)), which is not a thing for arrays. Still, it’s not a terrible analogy!
Typically, you can compare two vectors by taking their dot product. And, as we did earlier, you can determine whether they are orthogonal to each other by checking if the dot product evaluates to zero. If we discretize functions so that they look like (massive) vectors, we can define a similar operation!
Assuming two discretized functions \(f, g\), we can write their dot product between the interval \([a,b]\) as
\[
f \cdot g = \sum\limits_{k=a}^{b} f(x) g(x).
\]
But, we actually want to scale this expression. Because, let’s say we double the discretization interval, then the formulation above will no longer provide the same answer. Our operation should be consistent. So, we write
\[
f \cdot g = \sum\limits_{x=a}^{b} f(x) g(x) \Delta x,
\]
such that \(\Delta x = \frac{b-a}{n-1}\), where \(n\) refers to the number of points after discretization. If we take the limit \(n \rightarrow \infty\), then \(\Delta x \rightarrow 0\), which converts the formulation above into
\[
f \cdot g = \int\limits_{a}^{b} f(x) g(x) dx.
\]
There you have it, we generalized dot products for functions and we are going to call it the inner product.
The Fourier Transform
Citation
BibTeX citation:
@online{aswani2025,
author = {Aswani, Nishant},
title = {SVD, {FT,} and {DMD}},
date = {2025-06-27},
url = {https://nishantaswani.com/articles/dmd/},
langid = {en}
}