The Full Derivation
We derive the 2D diffusion equation from a discrete 2D random walk. To model our discrete random walk, we work with the Cartesian grid shown in Figure 1.
In the simplest case of a 2D random walk, initialized at the origin (white), we are allowed to take one step in the \(x\) or \(y\) direction for each unit of time, resulting in four possible locations (blue) after \(t=1\). At \(t=2\), we can take yet another step in any of the four directions. Assuming we took the first step upwards, the red circles \(and\) the origin are all possible locations we can arrive at after \(t=2\).
We leave out the other three red arrows to avoid visual clutter.
Pondering on \(t=2\), we can ask, what is the probability we arrived at the top most red location? Looking at the picture, it depends on the probability of picking that red spot (amongst all four red spots) when on the preceding blue spot \(and\) the probability we were at the blue spot. Symbolically,
\[
P(\text{top red after two steps}) = P(\text{selecting top red}) \cdot P(\text{neighbor blue after one step}).
\]
With only two time steps, there is just one way that we could have arrived at the top red spot after \(t=2\). However, if we played this out for a few more steps, it is not hard to imagine that you could arrive at the top red spot from any of its four neighbors. In fact, we can already visualize that scenario by looking at the origin: a return to the origin could be from any of the four blue locations.
Since we are on the 2D coordinate system, an arbitrary location on our grid can be described by a coordinate pair \((x,y)\). And, we can refer to any of its four neighbors with the set \(\{(x-1, y), (x+1, y), (x, y-1), (x, y+1)\}\). So, extending the equation above, if we wanted to describe the probability of arriving at \((x,y)\), we would write
\[\begin{aligned}
P(x,y,t) = \frac{1}{4} \bigl( P \left(x-1,y,t-1\right) + P \left(x+1,y,t-1\right) + \\ P \left(x,y-1,t-1\right)+ P \left(x,y+1,t-1\right) \bigr).
\end{aligned}\]
Instead of accounting for the path from just one neighbor, we additively account for all four neighbors. In the equation above, we have just fixed the probability of selecting \((x,y)\) as \(\frac{1}{4}\) and factored it out.
While we have modeled discrete movement, we have not really considered a continuous movement. The key to making that transition is to ask the same question with a fraction of a step in a fraction of a time unit. So, if \(\frac{1}{2}t\) elapsed, we might ask about the probability we have taken a \(\frac{1}{2}x\) or \(\frac{1}{2}y\) step. Using \(\Delta\) to represent any fraction, we can say
\[\begin{aligned}
P(x,y,t) = \frac{1}{4} \bigl( P \left(x+\Delta x,y,t-\Delta t \right) + P \left(x+\Delta x,y,t-\Delta t\right) + \\ P \left(x,y-\Delta y,t-\Delta t\right)+ P \left(x,y+\Delta y,t-\Delta t \right) \bigr).
\end{aligned}\]
Recalling the general form of a second-order Taylor series expansion with two variables,
\[\begin{aligned}
f(x, y) &= f(a, b) + (x-a) \frac{\partial f}{\partial x} \bigg|_{x=a, y=b} +
(y-b) \frac{\partial f}{\partial y}\bigg|_{x=a, y=b} + \\ &
(x-a)^{2} \frac{\partial^2 f}{\partial x^2}\bigg|_{x=a, y=b} +
(x-a)(y-b) \frac{\partial^2 f}{\partial x \partial y}\bigg|_{x=a, y=b} +
(y-b)^{2} \frac{\partial^2 f}{\partial y^2}\bigg|_{x=a, y=b},
\end{aligned}\]
we proceed with approximating the four terms:
\[\begin{aligned}
P(x+\Delta x, y, t-\Delta t) &\approx {\color{blue} P \left(x,y,t\right)} + \Delta x \frac{\partial P}{\partial x} {\color{blue}- \Delta t \frac{\partial P}{\partial t}} + \\
& {\color{blue} \frac{(\Delta x)^2}{2} \frac{\partial^2 P}{\partial x^2}} - (\Delta x \Delta t) \frac{\partial^2 P}{\partial x \partial t} + {\color{blue}(-\Delta t)^2 \frac{\partial^2 P}{\partial t^2}}
\end{aligned}\]
\[\begin{aligned}
P(x-\Delta x, y, t-\Delta t) &\approx {\color{blue} P \left(x,y,t\right)} - \Delta x \frac{\partial P}{\partial x} {\color{blue}- \Delta t \frac{\partial P}{\partial t}} + \\
& {\color{blue} \frac{(-\Delta x)^2}{2} \frac{\partial^2 P}{\partial x^2}} + (\Delta x \Delta t) \frac{\partial^2 P}{\partial x \partial t} + {\color{blue}(-\Delta t)^2 \frac{\partial^2 P}{\partial t^2}}
\end{aligned}\]
\[\begin{aligned}
P(x, y + \Delta y, t-\Delta t) &\approx {\color{blue} P \left(x,y,t\right)} + \Delta y \frac{\partial P}{\partial y} {\color{blue}- \Delta t \frac{\partial P}{\partial t}} + \\
& {\color{blue} \frac{(\Delta y)^2}{2} \frac{\partial^2 P}{\partial y^2}} - (\Delta y \Delta t) \frac{\partial^2 P}{\partial y \partial t} + {\color{blue}(-\Delta t)^2 \frac{\partial^2 P}{\partial t^2}}
\end{aligned}\]
\[\begin{aligned}
P(x, y-\Delta y, t-\Delta t) &\approx {\color{blue} P \left(x,y,t\right)} - \Delta y \frac{\partial P}{\partial y} {\color{blue}- \Delta t \frac{\partial P}{\partial t}} + \\
& {\color{blue} \frac{(-\Delta y)^2}{2} \frac{\partial^2 P}{\partial y^2}} + (\Delta y \Delta t) \frac{\partial^2 P}{\partial y \partial t} + {\color{blue}(-\Delta t)^2 \frac{\partial^2 P}{\partial t^2}}
\end{aligned}\]
When summing the four terms above, only the blue terms remain and the rest cancel out, resulting in
\[\begin{aligned}
P(x,y,t) &\approx \frac{1}{4} \left(
4P(x,y,t) - 4\Delta t \frac{\partial P}{\partial t} + 4\Delta t^2 \frac{\partial^2 P}{\partial t^2} + \Delta x^2 \frac{\partial^2 P}{\partial x^2} + \Delta y^2 \frac{\partial^2 P}{\partial y^2}
\right) \\
&\approx
P(x,y,t) - \Delta t \frac{\partial P}{\partial t} + \Delta t^2 \frac{\partial^2 P}{\partial t^2} + \frac{\Delta x^2}{4} \frac{\partial^2 P}{\partial x^2} + \frac{\Delta y^2}{4} \frac{\partial^2 P}{\partial y^2}
\\
\Delta t \frac{\partial P}{\partial t} - \Delta t^2 \frac{\partial^2 P}{\partial t^2} &\approx
\frac{\Delta x^2}{4} \frac{\partial^2 P}{\partial x^2} + \frac{\Delta y^2}{4} \frac{\partial^2 P}{\partial y^2}
\\
\frac{\partial P}{\partial t} - \Delta t \frac{\partial^2 P}{\partial t^2} &\approx
\frac{\Delta x^2}{4\Delta t} \frac{\partial^2 P}{\partial x^2} + \frac{\Delta y^2}{4\Delta t} \frac{\partial^2 P}{\partial y^2}
\\
\end{aligned}\]
At this point, we take the limits of \(\Delta x, \Delta y, \Delta t \rightarrow 0\), which allows us to simplify the equation above and define a diffusion coefficient, \(D\). Concretely, \(\frac{\Delta x^2}{4\Delta t} = \frac{\Delta y^2}{4\Delta t} = D\) with the assumption that \(\Delta x = \Delta y\), results in
\[
\frac{\partial P}{\partial t} - \Delta t \frac{\partial^2 P}{\partial t^2} \approx
D \left( \frac{\partial^2 P}{\partial x^2} + \frac{\partial^2 P}{\partial y^2} \right).
\]
Additionally, because we have taken the limit \(\Delta t \rightarrow 0\), it causes the \(\Delta t \frac{\partial^2 P}{\partial t^2}\) term to vanish relative to the other terms, returning the canonical form of the diffusion equation
\[
\frac{\partial P}{\partial t} \approx
D \left( \frac{\partial^2 P}{\partial x^2} + \frac{\partial^2 P}{\partial y^2} \right).
\]
The solution to the 2D diffusion equation is given by the 2D Gaussian distribution
\[
P(x,y,t) = \frac{1}{4\pi Dt} \exp{\left(- \frac{x^2+y^2}{4Dt} \right)}
\]
Ultimately, our approach models the probability that a particle is at a given location and time. If we have a solution of particles in a container, we can also use the model to estimate the density in any area of the container at a given time since the diffusion began.