Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
explore-for-students
GitHub Repository: explore-for-students/python-tutorials
Path: blob/main/Lesson 7 - Interpolation.ipynb
968 views
Kernel: Python 3 (ipykernel)

Lesson 7 - Interpolation

Authors:

  • Yilber Fabian Bautista

  • Sean Tulin

By the end of this lecture you will be able to:

  • Perform 1-Dimensional interpolating functions using interp1d and InterpolatedUnivariateSpline inside the scipy library

  • Use the different interpolation kinds available for the interp1d method

  • Change the polynomial degree of an InterpolatedUnivariateSpline function.

  • Use the numeric differentiation and integration methods available from InterpolatedUnivariateSpline

  • Perform 2-Dimensional interpolations using RectBivariateSpline function from the scipy library

  • Practice your classes and pandas skills

Interpolations

In Lectures 3 we have briefly introduced interpolations as an optional section to access information in a given set of data where no data is available. In this tutorial we will expand on this. We will cover 1-Dimensional and 2-Dimensional interpolations.

1-Dimensional Interpolations

Given a set of data points (xi,fi)(x_i,f_i), with xminxixmaxx_{min} \le x_i\le x_{max}, an interpolation function f(x) f(x) is a curve that passes through all of the data points available in the data set. Once such a curve is found, it can be evaluated at different locations f(x)f(x'), with xminxxmaxx_{min} \le x'\le x_{max}.

In this tutorial we will cover two functions in the scipy library to do 1-Dimensional interpolations; they are the interp1d and InterpolatedUnivariateSpline functions.

interp1d

See the documentation here. The basics syntax for using interp1d is the following: Given a set of x and y points, and interpolating function is created by simply typing

interp = interp1d(x, y)

Let us see an explicit example of an interpolation for a 1-Dimensional gaussian function

import numpy as np # For maths from scipy.interpolate import interp1d # For 1-D interpolations import matplotlib.pyplot as plt # For ploting # Data points x = np.linspace(0,10,15) y = np.exp(-x**2/2) # Perform the interpolation f = interp1d(x,y) # Finner x data set xp = np.linspace(0,10,30) # Evaluate the interpolation fp = f(xp) # Plot plt.plot(x,y,'o',label = "data") plt.plot(xp,fp,'r--',label = "interp") plt.legend(loc = "upper right") plt.show()
Image in a Jupyter notebook

As mentioned above, the interpolation function passes through all of the available data points originally given. However, interpolating functions created from interp1d only provide information inside the interval the original x-data set is defined in. If we wanted to evaluate the resulting function f, created from interp1d, outside the x-data interval (in our example, outside the interval [0,10][0,10]), we will get an error message. For instance if running

f(11)

the output will be

ValueError: A value in x_new is above the interpolation range.

which indicates we are trying to evaluate the interpolation outside the scope it was create from. Inferring information about the function outside the x-interval is known as an extrapolation. We will comment on this bellow.

# try it yourself

interp1d kinds

interp1d has different ways (kind) of doing the interpolation. The default value is kind = linear, where polynomials of degree 1 are used to approximate the curve inside every two-points interval where no data is available. That is, inside the intervals [x0,x1],[x1,x2],,[xn1,xn][x_0,x_1], [x_1,x_2],\cdots , [x_{n-1},x_n].

Linear polynomials

Linear polynomials used in interp1d with type = linear, to approximate the function inside every interval [x0,xy][x_0,x_y], are constructed from the equation for a straight line (see also here):

p1(x)=f(x0)+f(x1)f(x0)x1x0(xx0).{\displaystyle p_1(x)=f(x_{0})+{\frac {f(x_{1})-f(x_{0})}{x_{1}-x_{0}}}(x-x_{0}).}

where f(x0)f(x_{0}) and f(x1)f(x_{1}) are the two-true values for the function, evaluated at the extremal points of the interval. Note that there will be a different equation for each interval in the data set, depending on the values [x0,xy][x_0,x_y].

Quadratic polynomials

The quadratic polynomials used in interp1d for type = quadratic are constructed from three-points intervals [x0,x1,x2][x_0,x_1,x_2], where the functions has the true values f(x0),f(x1)f(x_0),f(x_1) and f(x2)f(x_2). These polynomials are given by the quadratic equation

p2(x)=f(x0)(xx1)(xx2)(x0x1)(x0x2)+f(x1)(xx0)(xx2)(x1x0)(x1x2)+f(x2)(xx0)(xx1)(x2x0)(x2x1),p_{2}(x)=f(x_{0})\frac{(x-x_{1})(x-x_{2})}{(x_{0}-x_{1})(x_{0}-x_{2})}+f(x_{1})\frac{(x-x_{0})(x-x_{2})}{(x_{1}-x_{0})(x_{1}-x_{2})}+f(x_{2})\frac{(x-x_{0})(x-x_{1})}{(x_{2}-x_{0})(x_{2}-x_{1})},

and similar to the linear interpolation, the polynomial will be different for each three-points interval of the given data points.

Run the command

interp1d?

to see the different kind available to use.

Exercise 1

Run the following lines to do a comparison of the different interpolations kind available in the interp1d method.

from scipy.interpolate import interp1d # For 1-D interpolations import matplotlib.pyplot as plt # For ploting #%matplotlib notebook #uncomment this line for plot manipulation in jupyter notebook #Data points x = np.linspace(0,10,11) y = np.sin(x) #finer data to evaluate interpolation xp = np.linspace(0,10,100) #plot initial data points plt.plot(x,y,"o",label = "data") # Do the interpolation for all the different kinds kind = ['linear', 'nearest', 'zero', 'slinear', 'quadratic', 'cubic'] for k in kind: # create the interpolation f = interp1d(x,y,kind = k) #evaluate the interpolation in the finer data points yp = f(xp) #plot interpolation plt.plot(xp,yp,label = k) plt.legend(loc = 'lower right') #plt.show()

What can you conclude from the different ways of doing the interpolations?

# Try it yourself

InterpolatedUnivariateSpline

Let us now see a different function inside the scipy library to create interpolation. A spline interpolation is a way of interpolating a set of data, where the interpolant is a piecewise polynomial called a spline.

Figure taken from here. The vertexes of the spline are called knots. Notice that the set of all the polynomials used in interp1d effectively form a spline.

InterpolatedUnivariateSpline fits a spline y = spl(x) of degree kk with k5k\le5, to the provided x, y data. Here kk is the polynomial degree to approximate the curve.

The syntax for interpolating a set of points using InterpolatedUnivariateSpline is very similar to the one for interp1d

spl = InterpolatedUnivariateSpline(x, y, k = 3)

The default value for the polynomial degree used in InterpolatedUnivariateSpline is k=3k=3, i.e. cubic polynomials are used in the approximate the curve in regions where no data points are available. See the InterpolatedUnivariateSpline documentation

Let us provide an specific example for our favorite 1-Dimensional gaussian function

from scipy.interpolate import InterpolatedUnivariateSpline # Data points x = np.linspace(0,10,15) y = np.exp(-x**2/2) # Interpolation spl = InterpolatedUnivariateSpline(x,y) # Finer data points xp = np.linspace(0,10,30) splp = spl(xp) # Plotting plt.plot(x,y,'o',label = "data") plt.plot(xp,splp,'r--',label = "interp") plt.legend(loc = "upper right") plt.show()
Image in a Jupyter notebook

As we can see, the curve is smoother as compared to the interp1d for kind= linear.

Exercise 2

Compare the result from the InterpolatedUnivariateSpline from the previous example with the corresponding interp1d result for interpolation kind, kind='cubic', by computing the error Error=f(xp)ftrue(xp) Error = ||f(x_p)- f_{\text{true}}(x_p)|| where ftrue(xp)f_{\text{true}}(x_p) is the true value of the function at the data points x=xpx=x_p (recall the true function is ex2/2e^{-x^2/2}). Hint: Use linalg.norm function from numpy library to compute the norm of the vector f(xp)ftrue(xp)f(x_p)- f_{\text{true}}(x_p).

What do you conclude from the comparison?

# write your solution here

Why using InterpolatedUnivariateSpline?

Additional methods available

Unlike for interp1d, interpolating functions created from InterpolatedUnivariateSpline have additional methods that can be applied to the resulting functions. Among these methods we have numeric differentiation and integration.

Let us for instance compute the first antiderivative of our Spline for the previous example

#compute the antiderivative integral_spl = spl.antiderivative(1) #evaluate the function in the fine grid defined above int_splp = integral_spl(xp) #Plot the interpolation and its first antiderivative plt.plot(x,y,'o',label = "data") plt.plot(xp,splp,'r--',label = "interp") plt.plot(xp,int_splp,label = "antiderivative") plt.legend(loc = "upper right") plt.show()
Image in a Jupyter notebook

Exercise 3

  • Do the analog plot for the first and second derivatives of the spline of our example. Hint: The derivative method has the analog syntax .derivative(n).

  • Compute the integral of the spline in the interval [1,4]. Hint: See .integral() method in the documentation

  • Explore the additional methods applied to the interpolating function resulting from using InterpolatedUnivariateSpline, mentioned in the documentation

# Write your solution here

Extrapolation

InterpolatedUnivariateSpline offers the possibility of extrapolate the resulting function to points outside the interval for which there is data. This is done through the keyword argument ext.

ext=0 is the default value in InterpolatedUnivariateSpline. It fits the function for values outside the data interval. If ext=1 or ‘zeros’, the returned value for the function outside the data interval will be `0.

Let us see an specific example for a polynomial function

x2 = np.linspace(0,10,15) y2 = -x2 **2/2+3*x2 -5 spl2 = InterpolatedUnivariateSpline(x2,y2) spl3 = InterpolatedUnivariateSpline(x2,y2,ext=1) #finner grid outside the data interval x3 = np.linspace(-5,40,50) #true values finner grid y3 = -x3 **2/2+3*x3 -5

We can check that outside the original x-interval, spl3 is set to zero, whereas spl2 evaluates to non zero values.

#try it yourself

We can also compare to the true value solution by computing the error function.

# Compararison to the true values print(np.linalg.norm(y3 - spl2(x3))) print(np.linalg.norm(y3 - spl3(x3)))
1.1909122045998598e-09 2006.1017555327646

That is, the extrapolated function provides a good fit for the keyword value ext=0, whereas for ext=1 the fit is very different to the true value function, since it was fixed to zero outside the interpolation x-interval.

Fits for extrapolations obtained from InterpolatedUnivariateSpline, are accurate for simple functions like polynomial functions in our previous example. For more complicated functions, the extrapolation will not be as accurate.

Exercise 4

Convince yourself that given the interpolating function in our one dimensional gaussian function spl defined above, the extrapolation to greater or smaller values of xx, for xx outside the data interval [0,10][0,10] does not agree with the true solution ex2/2e^{-x^2/2}. Hint: Compute the error function

# Write your solution here

2-Dimensional interpolations

We will use RectBivariateSpline. Analog to the 1-Dimensional case, 2-dimensional interpolations can be made using the RectBivariateSpline method in the scipy library. RectBivariateSpline describes a spline s(x, y) of degrees kxk_x and kyk_y on the rectangle [xb,xe]×[yb,ye][x_b, x_e] \times[y_b, y_e] calculated from a given set of data points (x, y, z). Likewise for the 1-Dimensional case, the default values for the polynomial degrees is kx=ky=3k_x=k_y=3.

Let us see an specific implementation for the interpolation of a 2-dimensional gaussian function

import numpy as np from scipy.interpolate import RectBivariateSpline import matplotlib.pyplot as plt from mpl_toolkits.mplot3d import Axes3D # Coarse grid dx, dy = 0.4, 0.4 xmax, ymax = 2, 4 x = np.arange(-xmax, xmax, dx) y = np.arange(-ymax, ymax, dy) X, Y = np.meshgrid(x, y) Z = np.exp(-(X)**2 - (Y)**2) # Do the interpolation interp_spline = RectBivariateSpline(y, x, Z) # Fine grid dxp, dyp = 0.16, 0.16 xp = np.arange(-xmax, xmax, dxp) yp = np.arange(-ymax, ymax, dyp) Xp, Yp = np.meshgrid(xp,yp) Zp = interp_spline(yp, xp) #Plot the results fig, ax = plt.subplots(nrows=1, ncols=2, subplot_kw={'projection': '3d'}) ax[0].plot_wireframe(X, Y, Z,label = 'data') ax[1].plot_wireframe(Xp, Yp, Zp, color='g', label = 'interpolation') for axes in ax: axes.set_zlim(-0.2,1) fig.tight_layout() fig.legend() plt.show()

Exercise 5

Compute the integral of interp_spline defined in the 2-Dimensional example, in the rectangle [1,2]×[0,2][1,2]\times[0,2] using the integral() method available for RectBivariateSpline. Hint: See the documentation

Exercise 6

In this exercise we will resume Exercise 3 in Tutorials 5 and 6. We will now have into account the effects of baryons in the galaxies rotation curves. Recall the rotation curves information is given here.

Hopefully after finishing Exercise 3 in Tutorial 6, your rotation_curve class looks similar to the following:

import pandas as pd import numpy as np import matplotlib.pyplot as plt class rotation_curve: def __init__(self, Galaxy_ID): self.Galaxy_ID = Galaxy_ID folder_path = 'Rotation curves/' self.Galaxy_circular = folder_path+ 'RotationCurve_'+ self.Galaxy_ID + '.csv' # create DataFrame self.df_circular = pd.read_csv(self.Galaxy_circular) def update_df_circular(self,key,data): self.df_circular[key] = data def add_M_DM(self): v_c = self.df_circular['circ velocity (km/s)']**2 r_c = self.df_circular["radius (kpc)"] # Newton's constant GN = 4.302e-6 # km^2/s^2*kpc/Msol #Add DM mass column M_DM = r_c * v_c/GN self.update_df_circular('DM Mass (M_sol)',M_DM)

In this exercise will modify this class to include baryon effects. Let us as usual split the exercise in several steps:

  1. Remove the add_M_DM class function since this does not have baryon effects into account (or rather comment it out, it could be useful in step 4 below).

  2. In the __init__ method, define the additional instance variable df_baryon, which creates the baryon DataFrame for the given Galaxy_ID. Create a test instance of rotation_curve, for a given galaxy ID. Check that when running the following lines

test = rotation_curve('IC2574') test.df_circular.head() test.df_baryon.head()

you get as output

  1. Define the class method add_baryon. This will add two columns, stars circ velocity square (km/s) and gas circ velocity square (km/s), to to df_circular, from the square of the respective columns in df_baryon. There is however several subtleties we have to take into account when doing this:

  • Some values of gas circ velocity (km/s) and stars circ velocity square (km/s) in df_baryon are negative. As indicated here, the negative sign means that the square of the corresponding velocity is to be chosen negative (the negative sign is just the direction of the gravitational force at a given position). Then, in reality the columns to be added have to have the form: vgas2×sign(vgas)vstars2×sign(vstars) v_{gas}^2\times \text{sign}(v_{gas})\\ v_{stars}^2\times \text{sign}(v_{stars})\\

  • The radial data points used in the two DataFrames are different and therefore we want to have the values of stars circ velocity square (km/s) and gas circ velocity square (km/s) evaluated at the same radial positions as circ velocity in df_circular. For that we need to create interpolating functions for vgas2×sign(vgas)v_{gas}^2\times \text{sign}(v_{gas}) and vstars2×sign(vstars)v_{stars}^2\times \text{sign}(v_{stars}) as function of radial positions in df_baryon. Use InterpolatedUnivariateSpline to create such interpolating functions (hint: The np.sign() function could be useful). After we we have our two interpolating functions, we can evaluate them at the radial points given in df_circular, and further add the two new columns to df_circular containing the square of the stars and gas circular velocities evaluated these positions. Check that when running the following lines

test = rotation_curve('IC2574') test.add_baryon() test.df_circular.head()

you get as output

  1. Define the class method add_DM_M, which will add the column DM mass DM Mass (M_sol) to the DataFrame df_circular. Recall the square of the DM velocity is given by vDM2(r)=vc2(r)Υvstar2(r)vgas2(r),v^2_{DM}(r) = \sqrt{v_c^2(r)-\Upsilon^*v_{star}^2(r)-v_{gas}^2(r)} \, , whereas the DM mass distribution is MDM(r)=rvDM2(r)/GN.M_{DM}(r) = r v^2_{DM}(r)/G_N \, .

Use Υ=0.6\Upsilon^* = 0.6. Check that when running the lines

test = rotation_curve('UGC4325') test.add_baryon() test.add_DM_M() test.df_circular.head()

you get as output

  1. Use your plotting function defined by the end of exercise 3 Tutorial 6 to plot the DM mass distribution as a function of radial distance for all galaxies data in the Rotation Curve directory. This plot now has now removed the baryon contribution to the DM mass distribution curve.