Multi-Dimensional Scaling (MDS) with Eigen-Decomposition

Succintly from Wikipedia:

MDS is used to translate “information about the pairwise ‘distances’ among a set of $n$ objects or individuals” into a configuration of $n$ points mapped into an abstract Cartesian space.

Simply put, we have pairs of distances between arbitrary points and we want to roughly recover the original locations of these points, up to rotations, reflections, and translations.

We are going to see the intuition, build the algorithm with $fancy~math$, and then put that into code. We may also have maps, if I can figure out how to make that happen in kramdown. No promises though 😮‍💨.

Problem Setup

Let’s say we were given the following table of distances (km) between Chinese cities (Yeah, I know, lots of nasty floating points). I generated this table using this distance calculator.

  Beijing Shanghai Xiangyang Wuhan Chongqing Guangzhou Shenyang Chengdu
Beijing 0 1067 958 1053 1459 1884 628 1518
Shanghai 1067 0 892 693 1445 1211 1185 1663
Xiangyang 958 892 0 258 598 990 1479 781
Wuhan 1053 693 258 0 752 832 1490 977
Chongqing 1459 1445 598 752 0 977 2037 269
Guangzhou 1884 1211 990 832 977 0 2276 1237
Shenyang 628 1185 1479 1490 2037 2276 0 2126
Chengdu 1518 1663 781 977 269 1237 2126 0

Let’s give the matrix (gasp, it was a matrix all along) above a name: $\textbf{D}$, for distances.

$\textbf{D}$ is a $n \times n$ matrix where each entry $\textbf{D}_{i,j} = ||x_i - x_j||_2$.

Another way to say that: it’s a square matrix of 2-norms between $n$ points.

Here’s a rough outline of what we are going to do:

  1. Show why and how the algorithm works with math
  2. Apply the algorithm to a real dataset
  3. Plot it 🗺️


Could I Interest You with Some $Fancy ~Math$?

Okay, let’s build the algorithm.

⏹️ Square Elements in the Matrix ⏹️

We can actually do something kind of cool if we square each element in matrix $\textbf{D}$. Let’s start with that.

After doing so, we have a matrix where each element is $\textbf{D}_{i,j} = ||x_i - x_j||_2^2$.

Well, $\|x_i - x_j\|_2^2$ is really just $(x_i - x_j)^T(x_i - x_j)$.

Why? See here ➡️ ➡️ ➡️
2-norm = $\sqrt{\sum{x_i^2}}$

2-norm squared = ${(\sqrt{\sum{x_i^2}})}^2$ But, in our specific case, we have nothing to sum. Each $x$ is a single element, not a vector.

2-norm squared = $(\sqrt{x^2})^2 = x^2$

🪗 Expanding the Equation 🪗

If we expand $(x_i - x_j)^T(x_i - x_j)$, we get:

\[(x_i - x_j)^T(x_i - x_j) = x_i^2 + x_j^2 -2x_ix_j = \|x_i\|_2^2 + \|x_j\|_2^2 - 2x_ix_j\]

That’s some good news, because we have the term $x_ix_j$ in there. That’s pretty close to the information we want to recover! We are trying to recover the $x_i$ and $x_j$ terms. If we find a way to remove the squared 2-norms, (the pesky $|| x_i ||^2_2$ and $|| x_j ||^2_2$) we would be getting closer.

Let’s assume, for a moment, that our data is mean-centered. This means that the sum of each data point is 0 (i.e. $\sum_{i=1}^n x_i = 0$). This will come in handy below and is an operation we should carry out on our dataset when using this algorithm.

Then, if we carry out the sum of a single row in matrix $\textbf{D}$:

\[\begin{aligned} \sum_{k=1}^{n} \textbf{D}_{1,k} &= \| x_1 - x_1 \|^2_2 + \| x_1 - x_2 \|^2_2 + ... + \| x_1 - x_n \|^2_2 && \text{Summing over each pairwise distance} \\ \\ &= (\| x_1 \|^2_2 + \| x_1 \|^2_2 - 2x_1x_1) + ... + (\| x_1 \|^2_2 + \| x_n \|^2_2 - 2x_1x_n) && \text{Expanding} \\ \\ &= n\| x_1 \|^2_2 + \sum_{k=1}^{n}\| x_k \|^2_2 -2x_1\sum_{k=1}^{n} x_k && \text{Regrouping terms} \\ \\ &= n\| x_1 \|^2_2 + \sum_{k=1}^{n}\| x_k \|^2_2 -2x_1(0) && \text{Last term is zeroed due to mean-centering} \\ \\ &= n\| x_1 \|^2_2 + \sum_{k=1}^{n}\| x_k \|^2_2 && \text{Simplifying} \end{aligned}\]

Generalizing the result above from row $1$ to any row $i$:

\[\begin{aligned} \sum_{k=1}^{n} \textbf{D}_{i,k} &= n\color{purple}{| x_i \|^2_2} + \sum_{k=1}^{n}\| x_k \|^2_2 \end{aligned}\]

We want that highlighted term above! With that, we can get rid of it from our expanded equation!

Now, what happens if we sum every element in the matrix? This is equivalent to summing each row. And, we just figured out what a single row gives us. So, we can just sum that sum.

\[\begin{aligned} \sum_{l=1}^{n}\sum_{k=1}^{n} \textbf{D}_{l,k} &= n\| x_1 \|^2_2 + \sum_{k=1}^{n}\| x_k \|^2_2 + ... + n\| x_n \|^2_2 + \sum_{k=1}^{n}\| x_k \|^2_2 && \text{Sum of each row} \\ \\ &= n\| x_1 \|^2_2 + ... + n\| x_n \|^2_2 + n\sum_{k=1}^{n}\| x_k \|^2_2 && \text{Grouping the sum term} \\ \\ &= n\sum_{k=1}^{n}\| x_k \|^2_2 + n\sum_{k=1}^{n}\| x_k \|^2_2 && \text{Grouping the 2-norm terms} \\ \\ &= 2n\sum_{k=1}^{n}\| x_k \|^2_2 && \text{Simplifying} \\ \\ \end{aligned}\]

If you see the two results, you’ll notice that some terms can cancel out to extract the term we want!

🚮 Removing Unwanted Terms 🚮

Our original goal was to extract $x_ix_j$ from the expansion $\textbf{D}_{i,j} = ||x_i||_2^2 + ||x_j||_2^2 - 2x_ix_j$

If we want $|| x_i ||^2_2$, we can do:

\[\begin{align*} \| x_i \|^2_2 = \frac{\sum_{k=1}^{n} \textbf{D}_{i,k} - \frac{\sum_{l=1}^{n}\sum_{k=1}^{n} \textbf{D}_{l,k} }{2n}}{n} \end{align*}\]

or more cleanly:

\[\begin{align*} \| x_i \|^2_2 = \frac{1}{n}\sum_{k=1}^{n} \textbf{D}_{i,k} - \frac{1}{2n^2}\sum_{l=1}^{n}\sum_{k=1}^{n} \textbf{D}_{l,k} \end{align*}\]

Very similarly, if we want $|| x_j ||^2_2$ (only the subscript changed)

\[\begin{align*} \| x_j \|^2_2 = \frac{1}{n}\sum_{k=1}^{n} \textbf{D}_{j,k} - \frac{1}{2n^2}\sum_{l=1}^{n}\sum_{k=1}^{n} \textbf{D}_{l,k} \end{align*}\]

Putting this all together:

\[\begin{align*} x_ix_j &= -\frac{1}{2} [ \textbf{D}_{i,j} - \frac{1}{n}\sum_{k=1}^{n} \textbf{D}_{i,k} + \frac{1}{2n^2}\sum_{l=1}^{n}\sum_{k=1}^{n} \textbf{D}_{l,k} - \frac{1}{n}\sum_{k=1}^{n} \textbf{D}_{j,k} + \frac{1}{2n^2}\sum_{l=1}^{n}\sum_{k=1}^{n} \textbf{D}_{l,k} ] \\ \\ &= -\frac{1}{2} [ \textbf{D}_{i,j} - \frac{1}{n}\sum_{k=1}^{n} \textbf{D}_{i,k} - \frac{1}{n}\sum_{k=1}^{n} \textbf{D}_{j,k} + \frac{1}{n^2}\sum_{l=1}^{n}\sum_{k=1}^{n} \textbf{D}_{l,k} ] \end{align*}\]

Doing this for every element in the $\textbf{D}$ matrix, we come up with a new matrix $\textbf{D’}$:

\[\mathbf{D'} = \left\lceil \begin{matrix} x_{i}x_{1} & ... & x_{i}x_{n}\\ x_{j}x_{1} & ... & x_{j}x_{n}\\ ... & ... & ... \\ x_{n}x_{1} & ... & x_{n}x_{n} \end{matrix} \right\rceil\]

⬅️ Where are We Going with This? ➡️

Let’s take a step back and think about the ideal world where we had access to the original matrix of coordinates. Let’s call it matrix $\textbf{C}$. Its entries would look something like this:

\[\mathbf{C} = \left\lceil \begin{matrix} x_{i1} & x_{i2} & x_{i3} & ...\\ x_{j1} & x_{j2} & x_{j3} & ...\\ ... & ... & ... & ...\\ x_{n1} & x_{n2} & x_{n3} & ...\\ \end{matrix} \right\rceil\]

If we squared the matrix, we would have a matrix $\textbf{CC}^T$

\[\mathbf{CC}^T = \left\lceil \begin{matrix} x_{i}x_{1} & ... & x_{i}x_{n}\\ x_{j}x_{1} & ... & x_{j}x_{n}\\ ... & ... & ... \\ x_{n}x_{1} & ... & x_{n}x_{n} \end{matrix} \right\rceil\]

That looks pretty familiar, huh? It is exactly the matrix $\textbf{D’}$ we were able to generate from the original $\textbf{D}$ matrix.

So, how do we recover $\mathbf{C}$ from $\mathbf{D’}$?

🥁 Eigen-Decomposition 🥁

Eigendecomposition is a method to factorize a matrix so that it is represented by its eigenvalues and eigenvectors. Hence, we can rewrite a matrix like this:

\[A = \textbf{Q}\Lambda \textbf{Q}^{-1}\]

But, why does this matter?

Because, it is a way to recover $\mathbf{C}$!

An eigendecomposition works for diagonizable matrices. Given that $\mathbf{D’}$, which is equivalent to $\textbf{CC}^T$, is a real, symmetric matrix, we can observe the following:

\[\textbf{D'} = \textbf{CC}^T = \textbf{Q}\Lambda \textbf{Q}^{-1}\]

We couldn’t calculate $\textbf{CC}^T$, because we never had $\textbf{C}$. But we could calculate $\textbf{D’}$, which we were able to show is exactly the same! So, we can just take the eigendecomposition of that!

Thus, we can recover C, by calculating:

\[\textbf{C} = \textbf{Q}\Lambda^{1/2}\]

🌯 Summary of MDS 🌯

  1. Square all elements in the distance matrix
  2. Algebraically remove unwanted terms from each element (a.k.a mean-center your data)
  3. Carry out an eigen-decomposition
  4. Recover the locations from the decomposition


🤖 Alright, let's code it 🤖


# Import relevant packages
import numpy as np
import pandas as pd
import seaborn as sns
import math
import matplotlib.pyplot as plt

# Read data as pandas df
raw_data = pd.read_csv("china.txt", sep=" ", header=None, index_col=0)

# Square distances
raw_data = raw_data.pow(2)

# Initialize the gram matrix
gram = np.empty(raw_data.shape)

# Obtain the number of cities
n = raw_data.shape[0]

# Convert pandas df to np 2d array
data = raw_data.to_numpy()

The cell below directly corresponds to: \(-\frac{1}{2} [ \textbf{D}_{i,j} - \frac{1}{n}\sum_{k=1}^{n} \textbf{D}_{i,k} - \frac{1}{n}\sum_{k=1}^{n} \textbf{D}_{j,k} + \frac{1}{n^2}\sum_{l=1}^{n}\sum_{k=1}^{n} \textbf{D}_{l,k} ]\)

# Iterate through each distance and fill gram matrix
for i in range(data.shape[0]):
    for j in range(data.shape[1]):
        gram[i][j] = -0.5 * (
            data[i][j]
            - (data[i].sum() / n)
            - (data[j].sum() / n)
            + (data.sum() / (n ** 2))
        )

We take the eigendecomposition and sort by the top two eigenvalues and their corresponding eigenvectors. We select the top two because our coordinates lie on a 2D map.

# Take eigendecomposition
eig_vals, eig_vecs =  np.linalg.eig(gram)
eig_vals_sorted = np.sort(eig_vals)[::-1] 
eig_vecs_sorted = eig_vecs[:, eig_vals.argsort()[::-1]]


eig_vals_sorted = np.sqrt(eig_vals_sorted[:2]) * np.identity(2)
eig_vecs_sorted = eig_vecs_sorted[:, [0, 1]]

coords = np.matmul(eig_vecs_sorted, eig_vals_sorted)
coords = pd.DataFrame(coords, index=raw_data.index, columns=["x", "y"])

# plot
plt.figure(figsize=(15,8))
scatter = sns.scatterplot(x="x", y="y", data=coords)

a = pd.concat({"x": coords.x, "y": coords.y, "lbl": coords.index.to_series()}, axis=1)
sns.set(font_scale=1.0)
for i, point in a.iterrows():
    scatter.text(point["x"] + 50, point["y"], str(point["lbl"]))

scatter.set_title("Chinese Cities Mapped by Eigendecomposition")
plt.show()

png

The map above is… off. We said that eigendecomposition would provide us the answer up to some rotation and reflections. Here the map looks rotate and reflected across a single axis. We fix this below.

# Rotation matrix function
def rotate_matrix (x, y, angle, x_shift=0, y_shift=0, units="DEGREES"):
    # Shift to origin (0,0)
    x = x - x_shift
    y = y - y_shift

    # Convert degrees to radians
    if units == "DEGREES":
        angle = math.radians(angle)

    # Rotation matrix multiplication to get rotated x & y
    xr = (x * math.cos(angle)) - (y * math.sin(angle)) + x_shift
    yr = (x * math.sin(angle)) + (y * math.cos(angle)) + y_shift

    return xr, yr
coords = np.matmul(eig_vecs_sorted, eig_vals_sorted)
coords = pd.DataFrame(coords, index=raw_data.index, columns=["x", "y"])

# Flip y axis and rotate matrix by 130 degrees
# Done experimentally
coords.x, coords.y = rotate_matrix(coords.x,coords.y, 130)
coords.y = -coords.y

# plot
plt.figure(figsize=(15,8))
plt.axis('equal')
scatter = sns.scatterplot(x="x", y="y", data=coords)

a = pd.concat({"x": coords.x, "y": coords.y, "lbl": coords.index.to_series()}, axis=1)
sns.set(font_scale=1.0)
for i, point in a.iterrows():
    scatter.text(point["x"] + 50, point["y"], str(point["lbl"]))

scatter.set_title("Chinese Cities Mapped by Eigendecomposition")
plt.show()

Now, that looks like China!

png