Interpolation may sound like a fancy mathematical exercise, but in many ways, it is much like what machine learning does.
- Start with a limited set of data points relating multiple variables
- Interpolate (basically, create a model)
- Construct a new function that can be used to predict any future or new point from the interpolation
So, the idea is — ingest, interpolate, predict.
Concretely, suppose we have a limited number of data points for a pair of variables (x,y) that have an unknown (and nonlinear) relationship between them i.e. y = f(x). From this limited data, we want to construct a prediction function that can generate y values for any given x values (within the same range that was used for the interpolation).
There are a lot of mathematical theories and work on this subject. You can certainly write your own algorithm to implement those interpolation methods. But why not take advantage of the open-source (and optimized) options?
Scipy interpolate
We start with a quadratic function where we have only 11 data points. The code to interpolate is basically a one-liner:
f1 = interp1d(x, y, kind='linear')
Note that this interp1d
class of Scipy has a __call__
method that returns back a function. This is the function f1
we get back as our prediction model.
Here the code we have used:
from scipy.interpolate import interp1d
import numpy as np
import matplotlib.pyplot as plt
from scipy import interpolate
NUM_DATA = 11
NUM_INTERPOLATE = 41
x = np.linspace(0, 10, num=NUM_DATA, endpoint=True)
y = x**2+7*x-28
f1 = interp1d(x, y, kind='linear')
xnew = np.linspace(0, 10, num=NUM_INTERPOLATE, endpoint=True)
fig, ax = plt.subplots(1,2,figsize=(6,3),dpi=120)
ax[0].scatter(x, y)
ax[0].set_title("Original data")
ax[1].scatter(x, y)
ax[1].plot(xnew, f1(xnew), color='orange',linestyle='--')
ax[1].set_title("Interpolation")
plt.show()
Let us go one degree higher to a cubic generating function. The interpolation result looks as smooth as ever.
x = np.linspace(0, 10, num=NUM_DATA, endpoint=True)
y = 0.15*x**3+0.23*x**2-7*x+18
f1 = interp1d(x, y, kind='linear')
xnew = np.linspace(0, 10, num=NUM_INTERPOLATE, endpoint=True)
fig, ax = plt.subplots(1,2,figsize=(6,3),dpi=120)
ax[0].scatter(x, y)
ax[0].set_title("Original data")
ax[1].scatter(x, y)
ax[1].plot(xnew, f1(xnew), color='red',linestyle='--')
ax[1].set_title("Interpolation")
plt.show()
Note that the original data may come from a cubic function, but this is still a ‘linear’ interpolation which is set by the kind
parameter as shown above. This means that the intermediate points between the original data points lie on a linear segment.
But Scipy offers quadratic or cubic splines too. Let’s see them in action.
Things can be a little tricky to handle with linear interpolation when the original data is not polynomial in nature or the data has inherent noise (natural for any scientific measurement).
Here is a demo example for a particularly tricky nonlinear example:
x = np.linspace(0, 10, num=NUM_DATA, endpoint=True)
y = 0.15*x**3+0.23*x**2-7*x+18
f1 = interp1d(x, y, kind='linear')
f3 = interp1d(x,y, kind='cubic')
xnew = np.linspace(0, 10, num=NUM_INTERPOLATE, endpoint=True)
fig, ax = plt.subplots(1,3,figsize=(6,3),dpi=120)
ax[0].scatter(x, y)
ax[0].set_title("Original data")
ax[1].scatter(x, y)
ax[1].plot(xnew, f1(xnew), color='red',linestyle='--')
ax[1].set_title("Linear")
ax[2].scatter(x, y)
ax[2].plot(xnew, f3(xnew), color='orange',linestyle='--')
ax[2].set_title("Cubic and Linear splines")
plt.show()