Polynomial Approximations

Author
Affiliation

NYU Abu Dhabi, NYU Tandon

Published

February 16, 2026

From these blog posts: Lowry-Duda, 3B1B and these notes: 1, 2.

check out: https://lorene.obspm.fr/school/polynom.pdf

This article started off with a desire to build an intuition for Taylor approximations. I quickly found out that there are other polynomial approximation out there, which led me down this rabbit hole. I was surprised to encounter Vandermonde matrices and their ill-conditioning, which I had heard of but never really understood.

Most tutorials introduce the computational method. I don’t care for how to compute these interpolations. But, I am interested on seeing what kind of constraints lead to these methods. For example, Lagrange polynomials want to avoid solving a system of linear equations. Newton polynomials also want to avoid that, but want to alleviate issue with Lagrange polynomials. The linear systems framing really clarifies this.

Introduction

We often rely on polynomial functions to approximate a given function; this is called polynomial interpolation. The common answer to “why polynomials?” seems to be that calculus with them is simpler. From undergraduate experience, that feels true. I believe another answer is that the polynomials, when used for interpolation, are always unique.

But there are a few ways to go about this sort of approximation. The ones I have come across are Lagrange polynomials, Newton polynomials, and Taylor polynomials. For me, what seems to separate Lagrange/Newton interpolation from Taylor’s approximation, is that we can either use “anchors” or derivative information from the original function. Either way, we are using information from the original function to build our approximation.

Let’s say we are interested in approximating \(\sin(x)\), plotted below.

Code
import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(-np.pi, np.pi, 100)
plt.plot(x, np.sin(x))
plt.axhline(0, color='grey', lw=0.5)
plt.axvline(0, color='grey', lw=0.5)
plt.xlim(-np.pi, np.pi)
plt.ylim(-1.1, 1.1)
plt.show()
Figure 1: sin(x)

Round One

Before we use the clever and well-established approaches, it’s worth making a crude attempt. Let’s constraint ourselves to using monomials, i.e. \(\{x^0 = 1, x^1, x^2, \dots, x^i\}\). We are asking ourselves, can we sum together different monomials to get to \(\sin(x)\)? Just messing about on somewhere like Desmos, you can see that it might be possible. Below is a terrible, hand-crafted attempt using four monomials.

Code
x = np.linspace(-np.pi, np.pi, 100)
a, b, c, d = -0.5, 0, 4.3, -2.9
plt.plot(x, np.sin(x), label=r'$\sin(x)$')
plt.plot(x, a + b*x + c*x**2 + d*x**3, '--', label=rf'${a} + {b}x + {c}x^2 + {d}x^3$')
plt.axhline(0, color='grey', lw=0.5)
plt.axvline(0, color='grey', lw=0.5)
plt.xlim(-np.pi, np.pi)
plt.ylim(-1.1, 1.1)
plt.legend()
plt.show()
Figure 2: Playing with monomials

Linear System

Some constraints would probably help! We could force our polynomial to pass through certain points (what I’m calling ‘anchors’) on the \(\sin(x)\) curve. Since, we have four parameters to play with, we could pick four anchors, that is four \((x_i,y_i)\) pairs from \(\sin(x)\).

Setting this up with linear algebra machinery, we have the familiar inverse problem \(Ax=b\), where \(A\) is a matrix of monomial evaluations at the anchor points, \(x\) is the vector of parameters we want to solve for, and \(b\) is the vector of \(\sin(x)\) evaluations at the anchor points:

\[ \begin{bmatrix}1 & x_0 & x_0^2 & x_0^3 \\ 1 & x_1 & x_1^2 & x_1^3 \\ 1 & x_2 & x_2^2 & x_2^3 \\ 1 & x_3 & x_3^2 & x_3^3 \end{bmatrix} \begin{bmatrix}a \\ b \\ c \\ d \end{bmatrix} = \begin{bmatrix} \sin(x_0) \\ \sin(x_1) \\ \sin(x_2) \\ \sin(x_3) \end{bmatrix} \]

Taking a pause here, we have set up a problem where we have a fixed structure, i.e. a weighted combination of monomials up to degree three. Then, we are looking for a set of shared parameters \(\{a, b, c, d\}\) that work for all four anchors. To drive the point home,

\[ a + b x_i + c x_i^2 + d x_i^3 = \sin(x_i), \quad \text{for } i=0,1,2,3. \]

For our anchors, let’s pick \(x=\{-\pi, -\pi/2, \pi/2, \pi\}\). Plugging these values in gives us

\[ \begin{bmatrix}1 & -\pi & (-\pi)^2 & (-\pi)^3 \\ 1 & -\pi/2 & (-\pi/2)^2 & (-\pi/2)^3 \\ 1 & \pi/2 & (\pi/2)^2 & (\pi/2)^3 \\ 1 & \pi & \pi^2 & \pi^3 \end{bmatrix} \begin{bmatrix}a \\ b \\ c \\ d \end{bmatrix} = \begin{bmatrix} \sin(-\pi) \\ \sin(-\pi/2) \\ \sin(\pi/2) \\ \sin(\pi) \end{bmatrix}. \]

Calling numpy.linalg.solve and plotting the results gives us:

Code
A = np.array([[1, -np.pi, (-np.pi)**2, (-np.pi)**3],
              [1, -np.pi/2, (-np.pi/2)**2, (-np.pi/2)**3],
              [1, np.pi/2, (np.pi/2)**2, (np.pi/2)**3],
              [1, np.pi, (np.pi)**2, (np.pi)**3]])
b = np.array([np.sin(-np.pi), np.sin(-np.pi/2), np.sin(np.pi/2), np.sin(np.pi)])
params = np.linalg.solve(A, b)
print("Parameters:", params.round(4))

# Plot rounded
plt.plot(x, np.sin(x), label='sin(x)')
plt.plot(x, params[0] + params[1]*x + params[2]*x**2 + params[3]*x**3, '--', label=f'{params[0]:.2f} + {params[1]:.2f}x + {params[2]:.2f}x^2 + {params[3]:.2f}x^3')
plt.axhline(0, color='grey', lw=0.5)
plt.axvline(0, color='grey', lw=0.5)
plt.xlim(-np.pi, np.pi)
plt.ylim(-1.1, 1.1)
plt.legend()
plt.show()
Parameters: [ 0.      0.8488 -0.     -0.086 ]
Figure 3

Obviously, way better! Well, turns out, the matrix \(A\) we constructed is called a Vandermonde matrix and if our anchors are distinct, then the solution is unique. But, Vandermonde matrices are infamously ill-conditioned. See the callout for a discussion on this.

The linked note discusses the subtlety about ‘ill-conditioning in Vandermonde matrices’. The Vandermonde matrix can be ill-conditioned, defined via the condition number. Here, ill-conditioning means getting very different solutions with a small change in the coefficients. And, quantifying this via the condition number requires picking a matrix norm.

But, the note argues that there is another, more pressing, issue with Vandermonde matrices. It has to do with the monomial basis being badly conditioned. If we are approximating a function at an interval far from the origin (\(x\) is large), but the function values are small (\(f(x)\) is small), then we have:

\[ \begin{bmatrix}1 & \text{big}_1 & \text{big}_1^2 & \text{big}_1^3 \\ 1 & \text{big}_2 & \text{big}_2^2 & \text{big}_2^3 \\ 1 & \text{big}_3 & \text{big}_3^2 & \text{big}_3^3 \end{bmatrix} \begin{bmatrix}a \\ b \\ c \\ d \end{bmatrix} = \begin{bmatrix} \text{small}_1 \\ \text{small}_2 \\ \text{small}_3 \end{bmatrix}. \]

Let’s say we solve this system and find \(a, b, c, d\), but they are huge values. Then, evaluating the polynomial looks like: \[ \text{huge}_a + \text{huge}_b \cdot \text{big}_i + \text{huge}_c \cdot \text{big}_i^2 + \text{huge}_d \cdot \text{big}_i^3 = \text{small}_i. \]

Just evaluating the polynomial requires adding and subtracting huge numbers to get a small number. This is problematic because of floating point precision issues. But, the core of this issue lies in the monomial basis itself, because it grows so quickly.

Lagrange Polynomials

Lagrange polynomials are really not all that different! In the previous case, we had fixed our basis to be monomials and picked where to truncate that basis. Then, rows of \(A\) were just evaluating the monomials at the various anchor points:

\[ A = \begin{bmatrix}1 & x_0 & x_0^2 & x_0^3 \\ 1 & x_1 & x_1^2 & x_1^3 \\ 1 & x_2 & x_2^2 & x_2^3 \\ 1 & x_3 & x_3^2 & x_3^3 \end{bmatrix} \]

Lagrange polynomials are the answer to: what if we could set \(A\) to be \(I\), the identity matrix? Which polynomials would satisfy this, i.e. what’s the basis? So, we want to substitute the problematic \(\{1, x, x^2, \dots, x^n\}\) with some polynomials \(\{L_0(x), L_1(x), L_2(x), \dots L_n(x)\}\).

To me, this feels a bit uncomfortable. Originally, we were working with monomials which feel familiar because we spend a lot of time looking at lines, squared functions, etc. And, the idea was that to approximate any function (not just \(\sin(x)\)), we could use monomials as a basis. However, now we are constructing polynomials that are specific to the anchor points we picked. So, our basis is no longer ‘universal.’ But we are doing this to avoid Vandermonde matrices and the computational cost that comes with solving linear systems.

To construct a new \(A\), we require that for every anchor \(x_i\) we get \(L_i(x_i) = 1\) and \(L_i(x_j) = 0\), where \(j \neq i\). These two conditions would ensure that \(A=I\). In words, for a certain polynomial \(L_i\), its evaluation at anchor \(i\) must be 1, while its evaluation at all other anchors \(j \neq i\) must be 0. Since that means \(A=I\), there wouldn’t be a linear system to solve! Of course, the burden is now on deriving each of the \(L_i(x)\). Thankfully, we’ll see, that there is a neat formula.

Compactly, \[ L_i(x_j) = \delta_{ij}, \] where \(\delta_{ij}\) is the Kronecker delta.

Say we have four anchors again. To construct the first Lagrange polynomial, \(L_0(x)\), we know that it must have three roots. These roots will occur at \(x_1, x_2, x_3\), such that

\[ L_0(x) = c{(x - x_1)(x - x_2)(x - x_3)}. \]

By construction, if we plug in any of \(x_1, x_2, x_3\), the polynomial will evaluate to zero. Additionally, we allow ourselves a factor \(c\) to satisfy the second constraint, \(L_0(x_0) = 1\):

\[ \begin{aligned} L_0(x_0) = 1 &= c{(x_0 - x_1)(x_0 - x_2)(x_0 - x_3)} \\ c &= \frac{1}{(x_0 - x_1)(x_0 - x_2)(x_0 - x_3)}. \end{aligned} \]

Without \(c\), we wouldn’t be able to control what \(x_0\) evaluates to. Plugging \(c\) back into \(L_0(x)\) gives us

\[ L_0(x) = \frac{(x - x_1)(x - x_2)(x - x_3)}{(x_0 - x_1)(x_0 - x_2)(x_0 - x_3)}. \]

We follow this recipe for the remaining Lagrange polynomials. And as before, if we have \(n\) anchors, we can construct \(n\) Lagrange polynomials. The expression above generalizes to any number of anchors. Hence, Lagrange polynomials will often be generalized as:

\[ L_i(x) = \prod_{j=0, j \neq i}^{n-1} \frac{(x - x_j)}{(x_i - x_j)}. \]

Clearly, we can quite easily construct Lagrange polynomials for any set of anchors. But, they also have a flaw, which is that they are not a universal basis. Each set of anchors gives us a different set of Lagrange polynomials. So, if we change or add anchors, we have to re-derive the entire basis.

Recall that we started off with the form \(Ax=b\). But now, since \(A=I\) and our basis is \(\{L_0(x), L_1(x), \dots, L_{n-1}(x)\}\), we have that \(Ix=b\). Trivially, \(x_i=b_i\), and for our case \(b_i = \sin(x_i)\). Therefore, our approximation of \(\sin(x)\) looks like: \[\sin(x) \approx \sum_{i=0}^{n-1} \sin(x_i) L_i(x).\]

Using the same four anchors \(-\pi, -\pi/2, \pi/2, \pi\), we can construct the Lagrange polynomials and plot their weighted sum.

Code
def lagrange_polynomial(anchor_x, fn):

  anchor_y = [fn(val) for val in anchor_x]

  def p(x): 
    n = len(anchor_x)
    result = 0
    for i in range(n):
      term = 1
      for j in range(n):
        if i == j: continue
        term *= (x - anchor_x[j]) / (anchor_x[i] - anchor_x[j])
      result += anchor_y[i] * term
    
    return result
  
  return p

anchor_x = [-np.pi, -np.pi/2, np.pi/2, np.pi]
simulator = lagrange_polynomial(anchor_x, np.sin)
x = np.linspace(-np.pi, np.pi, 100)
y_approx = [simulator(val) for val in x]

plt.plot(x, np.sin(x), label='sin(x)')
plt.plot(x, y_approx, '--', label='Lagrange')
plt.axhline(0, color='grey', lw=0.5)
plt.axvline(0, color='grey', lw=0.5)
plt.xlim(-np.pi, np.pi)
plt.ylim(-1.1, 1.1)
plt.legend()
plt.show()
Figure 4: Lagrange polynomials of \(sin(x)\)

If this looks familiar, well, it should! This is exactly the polynomial we arrived at before, but without having to solve a linear system. The polynomial is the same because polynomial interpolation is unique. We simply took a different route to get there.

Newton Polynomials

Newton polynomials (\(N_i(x)\)), like Lagrange polynomials (\(L_i(x)\)), avoid Vandermonde matrices. Instead of \(A=I\), Newton polynomials construct \(A\) to be a lower triangular matrix, which we denote as \(A_L\). Given our framing, a question we must return to is, why is a lower triangular matrix better than the identity matrix? By now, we know that this implies a different basis.

The structure of our problem is now: \[ \begin{bmatrix}1 & 0 & 0 & 0 \\ 1 & (x_1 - x_0) & 0 & 0 \\ 1 & (x_2 - x_0) & (x_2 - x_1)(x_2 - x_0) & 0 \\ 1 & (x_3 - x_0) & (x_3 - x_1)(x_3 - x_0) & (x_3 - x_2)(x_3 - x_1)(x_3 - x_0) \end{bmatrix} \begin{bmatrix} a \\ b \\ c \\ d \end{bmatrix} = \begin{bmatrix} \sin(x_0) \\ \sin(x_1) \\ \sin(x_2) \\ \sin(x_3) \end{bmatrix}. \]

Not only is \(A_L\) lower triangular, but it also has a specific pattern, which we’re not quite sure of yet. But, it feels reasonable that demanding for \(A\) to be lower triangular would impose a particular structure on the basis polynomials. Then, let’s try to derive that structure. The first Newton polynomial is obvious. In any lower triangular matrix, the first column is all filled. We can make our lives easier by making the first column of \(A_L\) all ones. Then, simply, \(N_0(x) = 1\), the constant function.

This might very well be super trivial, but it is interesting to me! The first column is all ones; so, we are saying that \(N_0(x) = 1, \forall x\). That’s a constant function and it shows up in both the Newton method and the monomial basis solution as an explicit basis function. In the monomial basis, it is the \(x^0\) term.

Keep in mind that the constant term gets multiplied by a parameter \(a\). Hence, in the final sum we have an offset term controlled by \(a\). In our approximation, then, we sort of naturally find a term that neatly decouples the vertical offset from everything else. Interestingly, we don’t have a term like this in Lagrange polynomials!

For \(N_1(x)\), we need a function that \(N_1(x_0) = 0\). For all other anchors, we don’t care what \(N_1(x)\) evaluates to.

Citation

BibTeX citation:
@online{aswani2026,
  author = {Aswani, Nishant},
  title = {Polynomial {Approximations}},
  date = {2026-02-16},
  url = {https://nishantaswani.com/articles/poly_approx/},
  langid = {en}
}
For attribution, please cite this work as:
Aswani, Nishant. 2026. “Polynomial Approximations.” February 16, 2026. https://nishantaswani.com/articles/poly_approx/.