Multi-Dimensional Scaling with Eigen-Decomposition

Author

Affiliation

Published

December 6, 2021

Succintly from Wikipedia:

MDS is used to translate “information about the pairwise ‘distances’ among a set of $n$ objects or individuals” into a configuration of $n$ points mapped into an abstract Cartesian space.

Simply put, we have pairs of distances between arbitrary points and we want to roughly recover the original locations of these points, up to rotations, reflections, and translations.

We are going to see the intuition, build the algorithm with $fancy~math$, and then put that into code. We may also have maps, if I can figure out how to make that happen in kramdown. No promises though 😮‍💨.

Problem Setup

Let’s say we were given the following table of distances (km) between Chinese cities (Yeah, I know, lots of nasty floating points). I generated this table using this distance calculator.

	Beijing	Shanghai	Xiangyang	Wuhan	Chongqing	Guangzhou	Shenyang	Chengdu
Beijing	0	1067	958	1053	1459	1884	628	1518
Shanghai	1067	0	892	693	1445	1211	1185	1663
Xiangyang	958	892	0	258	598	990	1479	781
Wuhan	1053	693	258	0	752	832	1490	977
Chongqing	1459	1445	598	752	0	977	2037	269
Guangzhou	1884	1211	990	832	977	0	2276	1237
Shenyang	628	1185	1479	1490	2037	2276	0	2126
Chengdu	1518	1663	781	977	269	1237	2126	0

Let’s give the matrix (gasp, it was a matrix all along) above a name: $\textbf{D}$, for distances.

$\textbf{D}$ is a $n \times n$ matrix where each entry $\textbf{D}_{i,j} = \|\|x_i - x_j\|\|_2$.

Another way to say that: it’s a square matrix of 2-norms between $n$ points.

Here’s a rough outline of what we are going to do: 1. Show why and how the algorithm works with math 2. Apply the algorithm to a real dataset 3. Plot it 🗺️

Could I Interest You with Some $Fancy ~Math$?

Okay, let’s build the algorithm.

⏹️ Square Elements in the Matrix ⏹️

We can actually do something kind of cool if we square each element in matrix $\textbf{D}$. Let’s start with that.

After doing so, we have a matrix where each element is $\textbf{D}_{i,j} = \|\|x_i - x_j\|\|_2^2$.

<div class="col-md-8">

    Well, $\|x_i - x_j\|_2^2$ is really just $(x_i - x_j)^T(x_i - x_j)$. 
    <br>
    <br>
    Why? See here ➡️ ➡️ ➡️
</div>
<div class="col-md-4">
2-norm = $\sqrt{\sum{x_i^2}}$
<br>
<br>
2-norm squared = ${(\sqrt{\sum{x_i^2}})}^2$
But, in our specific case, we have nothing to sum. Each $x$ is a single element, not a vector.
<br>
<br>
2-norm squared = $(\sqrt{x^2})^2 = x^2$
</div>

🪗 Expanding the Equation 🪗

If we expand $(x_i - x_j)^T(x_i - x_j)$, we get:

\[ (x_i - x_j)^T(x_i - x_j) = x_i^2 + x_j^2 -2x_ix_j = \|x_i\|_2^2 + \|x_j\|_2^2 - 2x_ix_j \]

That’s some good news, because we have the term $x_ix_j$ in there. That’s pretty close to the information we want to recover! We are trying to recover the $x_i$ and $x_j$ terms. If we find a way to remove the squared 2-norms, (the pesky $\|\| x_i \|\|^2_2$ and $\|\| x_j \|\|^2_2$) we would be getting closer.

Let’s assume, for a moment, that our data is mean-centered. This means that the sum of each data point is 0 (i.e. $\sum_{i=1}^n x_i = 0$). This will come in handy below and is an operation we should carry out on our dataset when using this algorithm.

Then, if we carry out the sum of a single row in matrix $\textbf{D}$:

\[ \begin{aligned} \sum_{k=1}^{n} \textbf{D}_{1,k} &= \| x_1 - x_1 \|^2_2 + \| x_1 - x_2 \|^2_2 + ... + \| x_1 - x_n \|^2_2 && \text{Summing over each pairwise distance} \\ \\ &= (\| x_1 \|^2_2 + \| x_1 \|^2_2 - 2x_1x_1) + ... + (\| x_1 \|^2_2 + \| x_n \|^2_2 - 2x_1x_n) && \text{Expanding} \\ \\ &= n\| x_1 \|^2_2 + \sum_{k=1}^{n}\| x_k \|^2_2 -2x_1\sum_{k=1}^{n} x_k && \text{Regrouping terms} \\ \\ &= n\| x_1 \|^2_2 + \sum_{k=1}^{n}\| x_k \|^2_2 -2x_1(0) && \text{Last term is zeroed due to mean-centering} \\ \\ &= n\| x_1 \|^2_2 + \sum_{k=1}^{n}\| x_k \|^2_2 && \text{Simplifying} \end{aligned} \]

Generalizing the result above from row $1$ to any row $i$:

$$ \[\begin{aligned} \sum_{k=1}^{n} \textbf{D}_{i,k} &= n\color{purple}{| x_i \|^2_2} + \sum_{k=1}^{n}\| x_k \|^2_2 \end{aligned}\]

We want that highlighted term above! With that, we can get rid of it from our expanded equation!

Now, what happens if we sum every element in the matrix? This is equivalent to summing each row. And, we just figured out what a single row gives us. So, we can just sum that sum.

\[ \begin{aligned} \sum_{l=1}^{n}\sum_{k=1}^{n} \textbf{D}_{l,k} &= n\| x_1 \|^2_2 + \sum_{k=1}^{n}\| x_k \|^2_2 + ... + n\| x_n \|^2_2 + \sum_{k=1}^{n}\| x_k \|^2_2 && \text{Sum of each row} \\ \\ &= n\| x_1 \|^2_2 + ... + n\| x_n \|^2_2 + n\sum_{k=1}^{n}\| x_k \|^2_2 && \text{Grouping the sum term} \\ \\ &= n\sum_{k=1}^{n}\| x_k \|^2_2 + n\sum_{k=1}^{n}\| x_k \|^2_2 && \text{Grouping the 2-norm terms} \\ \\ &= 2n\sum_{k=1}^{n}\| x_k \|^2_2 && \text{Simplifying} \\ \\ \end{aligned} \]

If you see the two results, you’ll notice that some terms can cancel out to extract the term we want!

🚮 Removing Unwanted Terms 🚮

Our original goal was to extract $x_ix_j$ from the expansion $\textbf{D}_{i,j} = \|\|x_i\|\|_2^2 + \|\|x_j\|\|_2^2 - 2x_ix_j$

If we want $\|\| x_i \|\|^2_2$, we can do:

\[ \begin{align*} \| x_i \|^2_2 = \frac{\sum_{k=1}^{n} \textbf{D}_{i,k} - \frac{\sum_{l=1}^{n}\sum_{k=1}^{n} \textbf{D}_{l,k} }{2n}}{n} \end{align*} \]

or more cleanly:

\[ \begin{align*} \| x_i \|^2_2 = \frac{1}{n}\sum_{k=1}^{n} \textbf{D}_{i,k} - \frac{1}{2n^2}\sum_{l=1}^{n}\sum_{k=1}^{n} \textbf{D}_{l,k} \end{align*} \]

Very similarly, if we want $\|\| x_j \|\|^2_2$ (only the subscript changed)

\[ \begin{align*} \| x_j \|^2_2 = \frac{1}{n}\sum_{k=1}^{n} \textbf{D}_{j,k} - \frac{1}{2n^2}\sum_{l=1}^{n}\sum_{k=1}^{n} \textbf{D}_{l,k} \end{align*} \]

Putting this all together:

\[ \begin{align*} x_ix_j &= -\frac{1}{2} [ \textbf{D}_{i,j} - \frac{1}{n}\sum_{k=1}^{n} \textbf{D}_{i,k} + \frac{1}{2n^2}\sum_{l=1}^{n}\sum_{k=1}^{n} \textbf{D}_{l,k} - \frac{1}{n}\sum_{k=1}^{n} \textbf{D}_{j,k} + \frac{1}{2n^2}\sum_{l=1}^{n}\sum_{k=1}^{n} \textbf{D}_{l,k} ] \\ \\ &= -\frac{1}{2} [ \textbf{D}_{i,j} - \frac{1}{n}\sum_{k=1}^{n} \textbf{D}_{i,k} - \frac{1}{n}\sum_{k=1}^{n} \textbf{D}_{j,k} + \frac{1}{n^2}\sum_{l=1}^{n}\sum_{k=1}^{n} \textbf{D}_{l,k} ] \end{align*} \]

Doing this for every element in the $\textbf{D}$ matrix, we come up with a new matrix $\textbf{D'}$:

\[ \mathbf{D'} = \left\lceil \begin{matrix} x_{i}x_{1} & ... & x_{i}x_{n}\\ x_{j}x_{1} & ... & x_{j}x_{n}\\ ... & ... & ... \\ x_{n}x_{1} & ... & x_{n}x_{n} \end{matrix} \right\rceil \]

⬅️ Where are We Going with This? ➡️

Let’s take a step back and think about the ideal world where we had access to the original matrix of coordinates. Let’s call it matrix $\textbf{C}$. Its entries would look something like this:

\[ \mathbf{C} = \left\lceil \begin{matrix} x_{i1} & x_{i2} & x_{i3} & ...\\ x_{j1} & x_{j2} & x_{j3} & ...\\ ... & ... & ... & ...\\ x_{n1} & x_{n2} & x_{n3} & ...\\ \end{matrix} \right\rceil \]

If we squared the matrix, we would have a matrix $\textbf{CC}^T$

\[ \mathbf{CC}^T = \left\lceil \begin{matrix} x_{i}x_{1} & ... & x_{i}x_{n}\\ x_{j}x_{1} & ... & x_{j}x_{n}\\ ... & ... & ... \\ x_{n}x_{1} & ... & x_{n}x_{n} \end{matrix} \right\rceil \]

That looks pretty familiar, huh? It is exactly the matrix $\textbf{D'}$ we were able to generate from the original $\textbf{D}$ matrix.

So, how do we recover $\mathbf{C}$ from $\mathbf{D'}$?

🥁 Eigen-Decomposition 🥁

Eigendecomposition is a method to factorize a matrix so that it is represented by its eigenvalues and eigenvectors. Hence, we can rewrite a matrix like this:

\[ A = \textbf{Q}\Lambda \textbf{Q}^{-1} \]

But, why does this matter?

Because, it is a way to recover $\mathbf{C}$!

An eigendecomposition works for diagonizable matrices. Given that $\mathbf{D'}$, which is equivalent to $\textbf{CC}^T$, is a real, symmetric matrix, we can observe the following:

\[ \textbf{D'} = \textbf{CC}^T = \textbf{Q}\Lambda \textbf{Q}^{-1} \]

We couldn’t calculate $\textbf{CC}^T$, because we never had $\textbf{C}$. But we could calculate $\textbf{D'}$, which we were able to show is exactly the same! So, we can just take the eigendecomposition of that!

Thus, we can recover C, by calculating:

\[ \textbf{C} = \textbf{Q}\Lambda^{1/2} \]

🌯 Summary of MDS 🌯

Square all elements in the distance matrix
Algebraically remove unwanted terms from each element (a.k.a mean-center your data)
Carry out an eigen-decomposition
Recover the locations from the decomposition

### 🤖 Alright, let's code it 🤖

# Import relevant packages
import numpy as np
import pandas as pd
import seaborn as sns
import math
import matplotlib.pyplot as plt

# Read data as pandas df
raw_data = pd.read_csv("china.txt", sep=" ", header=None, index_col=0)

# Square distances
raw_data = raw_data.pow(2)

# Initialize the gram matrix
gram = np.empty(raw_data.shape)

# Obtain the number of cities
n = raw_data.shape[0]

# Convert pandas df to np 2d array
data = raw_data.to_numpy()

The cell below directly corresponds to: \[ -\frac{1}{2} [ \textbf{D}_{i,j} - \frac{1}{n}\sum_{k=1}^{n} \textbf{D}_{i,k} - \frac{1}{n}\sum_{k=1}^{n} \textbf{D}_{j,k} + \frac{1}{n^2}\sum_{l=1}^{n}\sum_{k=1}^{n} \textbf{D}_{l,k} ] \]

# Iterate through each distance and fill gram matrix
for i in range(data.shape[0]):
    for j in range(data.shape[1]):
        gram[i][j] = -0.5 * (
            data[i][j]
            - (data[i].sum() / n)
            - (data[j].sum() / n)
            + (data.sum() / (n ** 2))
        )

We take the eigendecomposition and sort by the top two eigenvalues and their corresponding eigenvectors. We select the top two because our coordinates lie on a 2D map.

# Take eigendecomposition
eig_vals, eig_vecs =  np.linalg.eig(gram)
eig_vals_sorted = np.sort(eig_vals)[::-1] 
eig_vecs_sorted = eig_vecs[:, eig_vals.argsort()[::-1]]


eig_vals_sorted = np.sqrt(eig_vals_sorted[:2]) * np.identity(2)
eig_vecs_sorted = eig_vecs_sorted[:, [0, 1]]

coords = np.matmul(eig_vecs_sorted, eig_vals_sorted)
coords = pd.DataFrame(coords, index=raw_data.index, columns=["x", "y"])

# plot
plt.figure(figsize=(15,8))
scatter = sns.scatterplot(x="x", y="y", data=coords)

a = pd.concat({"x": coords.x, "y": coords.y, "lbl": coords.index.to_series()}, axis=1)
sns.set(font_scale=1.0)
for i, point in a.iterrows():
    scatter.text(point["x"] + 50, point["y"], str(point["lbl"]))

scatter.set_title("Chinese Cities Mapped by Eigendecomposition")
plt.show()

The map above is… off. We said that eigendecomposition would provide us the answer up to some rotation and reflections. Here the map looks rotate and reflected across a single axis. We fix this below.

# Rotation matrix function
def rotate_matrix (x, y, angle, x_shift=0, y_shift=0, units="DEGREES"):
    # Shift to origin (0,0)
    x = x - x_shift
    y = y - y_shift

    # Convert degrees to radians
    if units == "DEGREES":
        angle = math.radians(angle)

    # Rotation matrix multiplication to get rotated x & y
    xr = (x * math.cos(angle)) - (y * math.sin(angle)) + x_shift
    yr = (x * math.sin(angle)) + (y * math.cos(angle)) + y_shift

    return xr, yr

coords = np.matmul(eig_vecs_sorted, eig_vals_sorted)
coords = pd.DataFrame(coords, index=raw_data.index, columns=["x", "y"])

# Flip y axis and rotate matrix by 130 degrees
# Done experimentally
coords.x, coords.y = rotate_matrix(coords.x,coords.y, 130)
coords.y = -coords.y

# plot
plt.figure(figsize=(15,8))
plt.axis('equal')
scatter = sns.scatterplot(x="x", y="y", data=coords)

a = pd.concat({"x": coords.x, "y": coords.y, "lbl": coords.index.to_series()}, axis=1)
sns.set(font_scale=1.0)
for i, point in a.iterrows():
    scatter.text(point["x"] + 50, point["y"], str(point["lbl"]))

scatter.set_title("Chinese Cities Mapped by Eigendecomposition")
plt.show()

Now, that looks like China!

Citation

BibTeX citation:

@online{aswani2021,
  author = {Aswani, Nishant},
  title = {Multi-Dimensional {Scaling} with {Eigen-Decomposition}},
  date = {2021-12-06},
  url = {https://nishantaswani.com/articles/mds.html},
  langid = {en}
}

For attribution, please cite this work as:

Aswani, Nishant. 2021. “Multi-Dimensional Scaling with Eigen-Decomposition.” December 6, 2021. https://nishantaswani.com/articles/mds.html.