GitHub Repository: explore-for-students/python-tutorials
Path: blob/main/Lesson 7 - Interpolation.ipynb
⁹⁶⁸ views

Kernel: Python 3 (ipykernel)

Lesson 7 - Interpolation

Authors:

Yilber Fabian Bautista
Sean Tulin

By the end of this lecture you will be able to:

Perform 1-Dimensional interpolating functions using interp1d and InterpolatedUnivariateSpline inside the scipy library
Use the different interpolation kinds available for the interp1d method
Change the polynomial degree of an InterpolatedUnivariateSpline function.
Use the numeric differentiation and integration methods available from InterpolatedUnivariateSpline
Perform 2-Dimensional interpolations using RectBivariateSpline function from the scipy library
Practice your classes and pandas skills

Interpolations

In Lectures 3 we have briefly introduced interpolations as an optional section to access information in a given set of data where no data is available. In this tutorial we will expand on this. We will cover 1-Dimensional and 2-Dimensional interpolations.

1-Dimensional Interpolations

Given a set of data points $(x_i,f_i)$ , with $x_{min} \le x_i\le x_{max}$ , an interpolation function $f(x)$ is a curve that passes through all of the data points available in the data set. Once such a curve is found, it can be evaluated at different locations $f(x')$ , with $x_{min} \le x'\le x_{max}$ .

In this tutorial we will cover two functions in the scipy library to do 1-Dimensional interpolations; they are the interp1d and InterpolatedUnivariateSpline functions.

`interp1d`

See the documentation here. The basics syntax for using interp1d is the following: Given a set of x and y points, and interpolating function is created by simply typing

interp = interp1d(x, y)

Let us see an explicit example of an interpolation for a 1-Dimensional gaussian function

In [1]:

import numpy as np # For maths
from scipy.interpolate import interp1d # For 1-D interpolations
import matplotlib.pyplot as plt # For ploting

# Data points
x = np.linspace(0,10,15)
y = np.exp(-x**2/2)

# Perform the interpolation
f = interp1d(x,y)

# Finner x data set 
xp = np.linspace(0,10,30)

# Evaluate the interpolation 
fp = f(xp)

# Plot
plt.plot(x,y,'o',label = "data")
plt.plot(xp,fp,'r--',label = "interp")
plt.legend(loc = "upper right")
plt.show()

Out[1]:

As mentioned above, the interpolation function passes through all of the available data points originally given. However, interpolating functions created from interp1d only provide information inside the interval the original x-data set is defined in. If we wanted to evaluate the resulting function f, created from interp1d, outside the x-data interval (in our example, outside the interval $[0,10]$ ), we will get an error message. For instance if running

f(11)

the output will be

ValueError: A value in x_new is above the interpolation range.

which indicates we are trying to evaluate the interpolation outside the scope it was create from. Inferring information about the function outside the x-interval is known as an extrapolation. We will comment on this bellow.

In [2]:

# try it yourself

`interp1d` kinds

interp1d has different ways (kind) of doing the interpolation. The default value is kind = linear, where polynomials of degree 1 are used to approximate the curve inside every two-points interval where no data is available. That is, inside the intervals $[x_0,x_1], [x_1,x_2],\cdots , [x_{n-1},x_n]$ .

Linear polynomials

Linear polynomials used in interp1d with type = linear, to approximate the function inside every interval $[x_0,x_y]$ , are constructed from the equation for a straight line (see also here):

p_1(x)=f(x_{0})+{\frac {f(x_{1})-f(x_{0})}{x_{1}-x_{0}}}(x-x_{0}).

where $f(x_{0})$ and $f(x_{1})$ are the two-true values for the function, evaluated at the extremal points of the interval. Note that there will be a different equation for each interval in the data set, depending on the values $[x_0,x_y]$ .

Quadratic polynomials

The quadratic polynomials used in interp1d for type = quadratic are constructed from three-points intervals $[x_0,x_1,x_2]$ , where the functions has the true values $f(x_0),f(x_1)$ and $f(x_2)$ . These polynomials are given by the quadratic equation

p_{2}(x)=f(x_{0})\frac{(x-x_{1})(x-x_{2})}{(x_{0}-x_{1})(x_{0}-x_{2})}+f(x_{1})\frac{(x-x_{0})(x-x_{2})}{(x_{1}-x_{0})(x_{1}-x_{2})}+f(x_{2})\frac{(x-x_{0})(x-x_{1})}{(x_{2}-x_{0})(x_{2}-x_{1})},

and similar to the linear interpolation, the polynomial will be different for each three-points interval of the given data points.

Run the command

interp1d?

to see the different kind available to use.

Exercise 1

Run the following lines to do a comparison of the different interpolations kind available in the interp1d method.

from scipy.interpolate import interp1d # For 1-D interpolations
import matplotlib.pyplot as plt # For ploting

#%matplotlib notebook #uncomment this line for plot manipulation in jupyter notebook

#Data points
x = np.linspace(0,10,11)
y = np.sin(x) 

#finer  data to evaluate interpolation
xp = np.linspace(0,10,100) 

#plot initial data points
plt.plot(x,y,"o",label = "data")

# Do the interpolation for all the different kinds 
kind = ['linear', 'nearest', 'zero', 'slinear', 'quadratic', 'cubic']


for k in kind:
    # create the interpolation
    f = interp1d(x,y,kind = k)
    #evaluate the interpolation in the finer data points
    yp = f(xp)
    #plot interpolation
    plt.plot(xp,yp,label = k)
plt.legend(loc = 'lower right')  
#plt.show()

What can you conclude from the different ways of doing the interpolations?

In [3]:

# Try it yourself

`InterpolatedUnivariateSpline`

Let us now see a different function inside the scipy library to create interpolation. A spline interpolation is a way of interpolating a set of data, where the interpolant is a piecewise polynomial called a spline.

Figure taken from here. The vertexes of the spline are called knots. Notice that the set of all the polynomials used in interp1d effectively form a spline.

InterpolatedUnivariateSpline fits a spline y = spl(x) of degree $k$ with $k\le5$ , to the provided x, y data. Here $k$ is the polynomial degree to approximate the curve.

The syntax for interpolating a set of points using InterpolatedUnivariateSpline is very similar to the one for interp1d

spl = InterpolatedUnivariateSpline(x, y, k = 3)

The default value for the polynomial degree used in InterpolatedUnivariateSpline is $k=3$ , i.e. cubic polynomials are used in the approximate the curve in regions where no data points are available. See the InterpolatedUnivariateSpline documentation

Let us provide an specific example for our favorite 1-Dimensional gaussian function

In [4]:

from scipy.interpolate import InterpolatedUnivariateSpline

# Data points
x = np.linspace(0,10,15)
y = np.exp(-x**2/2)

# Interpolation
spl = InterpolatedUnivariateSpline(x,y)

# Finer data points
xp = np.linspace(0,10,30)
splp = spl(xp)

# Plotting
plt.plot(x,y,'o',label = "data")
plt.plot(xp,splp,'r--',label = "interp")
plt.legend(loc = "upper right")
plt.show()

Out[4]:

As we can see, the curve is smoother as compared to the interp1d for kind= linear.

Exercise 2

Compare the result from the InterpolatedUnivariateSpline from the previous example with the corresponding interp1d result for interpolation kind, kind='cubic', by computing the error $Error = ||f(x_p)- f_{\text{true}}(x_p)||$ where $f_{\text{true}}(x_p)$ is the true value of the function at the data points $x=x_p$ (recall the true function is $e^{-x^2/2}$ ). Hint: Use linalg.norm function from numpy library to compute the norm of the vector $f(x_p)- f_{\text{true}}(x_p)$ .

What do you conclude from the comparison?

In [5]:

# write your solution here

Why using InterpolatedUnivariateSpline?

Additional methods available

Unlike for interp1d, interpolating functions created from InterpolatedUnivariateSpline have additional methods that can be applied to the resulting functions. Among these methods we have numeric differentiation and integration.

Let us for instance compute the first antiderivative of our Spline for the previous example

In [6]:

#compute the antiderivative
integral_spl = spl.antiderivative(1)

#evaluate the function in the fine grid defined above
int_splp = integral_spl(xp)

#Plot the interpolation and its first antiderivative
plt.plot(x,y,'o',label = "data")
plt.plot(xp,splp,'r--',label = "interp")
plt.plot(xp,int_splp,label = "antiderivative")

plt.legend(loc = "upper right")
plt.show()

Out[6]:

Exercise 3

Do the analog plot for the first and second derivatives of the spline of our example. Hint: The derivative method has the analog syntax .derivative(n).
Compute the integral of the spline in the interval [1,4]. Hint: See .integral() method in the documentation
Explore the additional methods applied to the interpolating function resulting from using InterpolatedUnivariateSpline, mentioned in the documentation

In [7]:

# Write your solution here

Extrapolation

InterpolatedUnivariateSpline offers the possibility of extrapolate the resulting function to points outside the interval for which there is data. This is done through the keyword argument ext.

ext=0 is the default value in InterpolatedUnivariateSpline. It fits the function for values outside the data interval. If ext=1 or ‘zeros’, the returned value for the function outside the data interval will be `0.

Let us see an specific example for a polynomial function

In [8]:

x2 = np.linspace(0,10,15)
y2 = -x2 **2/2+3*x2 -5

spl2 = InterpolatedUnivariateSpline(x2,y2)
spl3 = InterpolatedUnivariateSpline(x2,y2,ext=1)

#finner grid outside the data interval
x3 = np.linspace(-5,40,50)
#true values finner grid
y3 = -x3 **2/2+3*x3 -5

We can check that outside the original x-interval, spl3 is set to zero, whereas spl2 evaluates to non zero values.

In [9]:

#try it yourself

We can also compare to the true value solution by computing the error function.

In [11]:

# Compararison to the true values
print(np.linalg.norm(y3 - spl2(x3)))
print(np.linalg.norm(y3 - spl3(x3)))

Out[11]:

1.1909122045998598e-09
2006.1017555327646

That is, the extrapolated function provides a good fit for the keyword value ext=0, whereas for ext=1 the fit is very different to the true value function, since it was fixed to zero outside the interpolation x-interval.

Fits for extrapolations obtained from InterpolatedUnivariateSpline, are accurate for simple functions like polynomial functions in our previous example. For more complicated functions, the extrapolation will not be as accurate.

Exercise 4

Convince yourself that given the interpolating function in our one dimensional gaussian function spl defined above, the extrapolation to greater or smaller values of $x$ , for $x$ outside the data interval $[0,10]$ does not agree with the true solution $e^{-x^2/2}$ . Hint: Compute the error function

In [ ]:

# Write your solution here

2-Dimensional interpolations

We will use RectBivariateSpline. Analog to the 1-Dimensional case, 2-dimensional interpolations can be made using the RectBivariateSpline method in the scipy library. RectBivariateSpline describes a spline s(x, y) of degrees $k_x$ and $k_y$ on the rectangle $[x_b, x_e] \times[y_b, y_e]$ calculated from a given set of data points (x, y, z). Likewise for the 1-Dimensional case, the default values for the polynomial degrees is $k_x=k_y=3$ .

Let us see an specific implementation for the interpolation of a 2-dimensional gaussian function

In [ ]:

import numpy as np
from scipy.interpolate import RectBivariateSpline
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D


# Coarse grid
dx, dy = 0.4, 0.4
xmax, ymax = 2, 4
x = np.arange(-xmax, xmax, dx)
y = np.arange(-ymax, ymax, dy)
X, Y = np.meshgrid(x, y)
Z = np.exp(-(X)**2 - (Y)**2)

# Do the interpolation
interp_spline = RectBivariateSpline(y, x, Z)

# Fine grid
dxp, dyp = 0.16, 0.16
xp = np.arange(-xmax, xmax, dxp)
yp = np.arange(-ymax, ymax, dyp)
Xp, Yp = np.meshgrid(xp,yp)
Zp = interp_spline(yp, xp)

#Plot the results
fig, ax = plt.subplots(nrows=1, ncols=2, subplot_kw={'projection': '3d'})
ax[0].plot_wireframe(X, Y, Z,label = 'data')

ax[1].plot_wireframe(Xp, Yp, Zp, color='g', label = 'interpolation')
for axes in ax:
    axes.set_zlim(-0.2,1)

fig.tight_layout()
fig.legend()
plt.show()

Exercise 5

Compute the integral of interp_spline defined in the 2-Dimensional example, in the rectangle $[1,2]\times[0,2]$ using the integral() method available for RectBivariateSpline. Hint: See the documentation

Exercise 6

In this exercise we will resume Exercise 3 in Tutorials 5 and 6. We will now have into account the effects of baryons in the galaxies rotation curves. Recall the rotation curves information is given here.

Hopefully after finishing Exercise 3 in Tutorial 6, your rotation_curve class looks similar to the following:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

class rotation_curve:
    def __init__(self, Galaxy_ID):
        self.Galaxy_ID = Galaxy_ID
        
        folder_path = 'Rotation curves/'
        self.Galaxy_circular = folder_path+ 'RotationCurve_'+ self.Galaxy_ID + '.csv'
        # create DataFrame
        self.df_circular = pd.read_csv(self.Galaxy_circular)
     
    def update_df_circular(self,key,data):
        self.df_circular[key] = data
   
    def add_M_DM(self):
        v_c = self.df_circular['circ velocity (km/s)']**2
        r_c = self.df_circular["radius (kpc)"]
        # Newton's constant
        GN = 4.302e-6 # km^2/s^2*kpc/Msol
        #Add DM mass column
        M_DM =  r_c * v_c/GN
        self.update_df_circular('DM Mass (M_sol)',M_DM)

In this exercise will modify this class to include baryon effects. Let us as usual split the exercise in several steps:

Remove the add_M_DM class function since this does not have baryon effects into account (or rather comment it out, it could be useful in step 4 below).
In the __init__ method, define the additional instance variable df_baryon, which creates the baryon DataFrame for the given Galaxy_ID. Create a test instance of rotation_curve, for a given galaxy ID. Check that when running the following lines

test = rotation_curve('IC2574')
test.df_circular.head()
test.df_baryon.head()

you get as output

Define the class method add_baryon. This will add two columns, stars circ velocity square (km/s) and gas circ velocity square (km/s), to to df_circular, from the square of the respective columns in df_baryon. There is however several subtleties we have to take into account when doing this:

Some values of gas circ velocity (km/s) and stars circ velocity square (km/s) in df_baryon are negative. As indicated here, the negative sign means that the square of the corresponding velocity is to be chosen negative (the negative sign is just the direction of the gravitational force at a given position). Then, in reality the columns to be added have to have the form: $v_{gas}^2\times \text{sign}(v_{gas})\\ v_{stars}^2\times \text{sign}(v_{stars})\\$
The radial data points used in the two DataFrames are different and therefore we want to have the values of stars circ velocity square (km/s) and gas circ velocity square (km/s) evaluated at the same radial positions as circ velocity in df_circular. For that we need to create interpolating functions for $v_{gas}^2\times \text{sign}(v_{gas})$ and $v_{stars}^2\times \text{sign}(v_{stars})$ as function of radial positions in df_baryon. Use InterpolatedUnivariateSpline to create such interpolating functions (hint: The np.sign() function could be useful). After we we have our two interpolating functions, we can evaluate them at the radial points given in df_circular, and further add the two new columns to df_circular containing the square of the stars and gas circular velocities evaluated these positions. Check that when running the following lines

test = rotation_curve('IC2574')
test.add_baryon()
test.df_circular.head()

you get as output

Define the class method add_DM_M, which will add the column DM mass DM Mass (M_sol) to the DataFrame df_circular. Recall the square of the DM velocity is given by $v^2_{DM}(r) = \sqrt{v_c^2(r)-\Upsilon^*v_{star}^2(r)-v_{gas}^2(r)} \, ,$ whereas the DM mass distribution is $M_{DM}(r) = r v^2_{DM}(r)/G_N \, .$

Use $\Upsilon^* = 0.6$ . Check that when running the lines

test = rotation_curve('UGC4325')
test.add_baryon()
test.add_DM_M()
test.df_circular.head()

you get as output

Use your plotting function defined by the end of exercise 3 Tutorial 6 to plot the DM mass distribution as a function of radial distance for all galaxies data in the Rotation Curve directory. This plot now has now removed the baryon contribution to the DM mass distribution curve.

In [ ]:

Lesson 7 - Interpolation

Interpolations

1-Dimensional Interpolations

`interp1d`

`interp1d` kinds

Exercise 1

`InterpolatedUnivariateSpline`

Exercise 2

Why using InterpolatedUnivariateSpline?

Additional methods available

Exercise 3

Extrapolation

Exercise 4

2-Dimensional interpolations

Exercise 5

Exercise 6

Product

Resources

Company

Lesson 7 - Interpolation

Interpolations

1-Dimensional Interpolations

interp1d

interp1d kinds

Exercise 1

InterpolatedUnivariateSpline

Exercise 2

Why using InterpolatedUnivariateSpline?

Additional methods available

Exercise 3

Extrapolation

Exercise 4

2-Dimensional interpolations

Exercise 5

Exercise 6

`interp1d`

`interp1d` kinds

`InterpolatedUnivariateSpline`