Path: blob/main/Lesson 7 - Interpolation.ipynb
968 views
Lesson 7 - Interpolation
Authors:
Yilber Fabian Bautista
Sean Tulin
By the end of this lecture you will be able to:
Perform 1-Dimensional interpolating functions using interp1d and InterpolatedUnivariateSpline inside the scipy library
Use the different interpolation kinds available for the interp1d method
Change the polynomial degree of an InterpolatedUnivariateSpline function.
Use the numeric differentiation and integration methods available from InterpolatedUnivariateSpline
Perform 2-Dimensional interpolations using RectBivariateSpline function from the scipy library
Practice your classes and pandas skills
Interpolations
In Lectures 3 we have briefly introduced interpolations as an optional section to access information in a given set of data where no data is available. In this tutorial we will expand on this. We will cover 1-Dimensional and 2-Dimensional interpolations.
1-Dimensional Interpolations
Given a set of data points , with , an interpolation function is a curve that passes through all of the data points available in the data set. Once such a curve is found, it can be evaluated at different locations , with .
In this tutorial we will cover two functions in the scipy library to do 1-Dimensional interpolations; they are the interp1d and InterpolatedUnivariateSpline functions.
interp1d
See the documentation here. The basics syntax for using interp1d is the following: Given a set of x and y points, and interpolating function is created by simply typing
Let us see an explicit example of an interpolation for a 1-Dimensional gaussian function
As mentioned above, the interpolation function passes through all of the available data points originally given. However, interpolating functions created from interp1d only provide information inside the interval the original x-data set is defined in. If we wanted to evaluate the resulting function f, created from interp1d, outside the x-data interval (in our example, outside the interval ), we will get an error message. For instance if running
the output will be
which indicates we are trying to evaluate the interpolation outside the scope it was create from. Inferring information about the function outside the x-interval is known as an extrapolation. We will comment on this bellow.
interp1d kinds
interp1d has different ways (kind) of doing the interpolation. The default value is kind = linear, where polynomials of degree 1 are used to approximate the curve inside every two-points interval where no data is available. That is, inside the intervals .
Linear polynomials
Linear polynomials used in interp1d with type = linear, to approximate the function inside every interval , are constructed from the equation for a straight line (see also here):
where and are the two-true values for the function, evaluated at the extremal points of the interval. Note that there will be a different equation for each interval in the data set, depending on the values .
Quadratic polynomials
The quadratic polynomials used in interp1d for type = quadratic are constructed from three-points intervals , where the functions has the true values and . These polynomials are given by the quadratic equation
and similar to the linear interpolation, the polynomial will be different for each three-points interval of the given data points.
Run the command
to see the different kind available to use.
Exercise 1
Run the following lines to do a comparison of the different interpolations kind available in the interp1d method.
What can you conclude from the different ways of doing the interpolations?
InterpolatedUnivariateSpline
Let us now see a different function inside the scipy library to create interpolation. A spline interpolation is a way of interpolating a set of data, where the interpolant is a piecewise polynomial called a spline.

Figure taken from here. The vertexes of the spline are called knots. Notice that the set of all the polynomials used in interp1d effectively form a spline.
InterpolatedUnivariateSpline fits a spline y = spl(x) of degree with , to the provided x, y data. Here is the polynomial degree to approximate the curve.
The syntax for interpolating a set of points using InterpolatedUnivariateSpline is very similar to the one for interp1d
The default value for the polynomial degree used in InterpolatedUnivariateSpline is , i.e. cubic polynomials are used in the approximate the curve in regions where no data points are available. See the InterpolatedUnivariateSpline documentation
Let us provide an specific example for our favorite 1-Dimensional gaussian function
As we can see, the curve is smoother as compared to the interp1d for kind= linear.
Exercise 2
Compare the result from the InterpolatedUnivariateSpline from the previous example with the corresponding interp1d result for interpolation kind, kind='cubic', by computing the error where is the true value of the function at the data points (recall the true function is ). Hint: Use linalg.norm function from numpy library to compute the norm of the vector .
What do you conclude from the comparison?
Why using InterpolatedUnivariateSpline?
Additional methods available
Unlike for interp1d, interpolating functions created from InterpolatedUnivariateSpline have additional methods that can be applied to the resulting functions. Among these methods we have numeric differentiation and integration.
Let us for instance compute the first antiderivative of our Spline for the previous example
Exercise 3
Do the analog plot for the first and second derivatives of the spline of our example. Hint: The derivative method has the analog syntax
.derivative(n).Compute the integral of the spline in the interval
[1,4]. Hint: See.integral()method in the documentationExplore the additional methods applied to the interpolating function resulting from using
InterpolatedUnivariateSpline, mentioned in the documentation
Extrapolation
InterpolatedUnivariateSpline offers the possibility of extrapolate the resulting function to points outside the interval for which there is data. This is done through the keyword argument ext.
ext=0 is the default value in InterpolatedUnivariateSpline. It fits the function for values outside the data interval. If ext=1 or ‘zeros’, the returned value for the function outside the data interval will be `0.
Let us see an specific example for a polynomial function
We can check that outside the original x-interval, spl3 is set to zero, whereas spl2 evaluates to non zero values.
We can also compare to the true value solution by computing the error function.
That is, the extrapolated function provides a good fit for the keyword value ext=0, whereas for ext=1 the fit is very different to the true value function, since it was fixed to zero outside the interpolation x-interval.
Fits for extrapolations obtained from InterpolatedUnivariateSpline, are accurate for simple functions like polynomial functions in our previous example. For more complicated functions, the extrapolation will not be as accurate.
Exercise 4
Convince yourself that given the interpolating function in our one dimensional gaussian function spl defined above, the extrapolation to greater or smaller values of , for outside the data interval does not agree with the true solution . Hint: Compute the error function
2-Dimensional interpolations
We will use RectBivariateSpline. Analog to the 1-Dimensional case, 2-dimensional interpolations can be made using the RectBivariateSpline method in the scipy library. RectBivariateSpline describes a spline s(x, y) of degrees and on the rectangle calculated from a given set of data points (x, y, z). Likewise for the 1-Dimensional case, the default values for the polynomial degrees is .
Let us see an specific implementation for the interpolation of a 2-dimensional gaussian function
Exercise 5
Compute the integral of interp_spline defined in the 2-Dimensional example, in the rectangle using the integral() method available for RectBivariateSpline. Hint: See the documentation
Exercise 6
In this exercise we will resume Exercise 3 in Tutorials 5 and 6. We will now have into account the effects of baryons in the galaxies rotation curves. Recall the rotation curves information is given here.
Hopefully after finishing Exercise 3 in Tutorial 6, your rotation_curve class looks similar to the following:
In this exercise will modify this class to include baryon effects. Let us as usual split the exercise in several steps:
Remove the
add_M_DMclass function since this does not have baryon effects into account (or rather comment it out, it could be useful in step 4 below).In the
__init__method, define the additional instance variabledf_baryon, which creates the baryon DataFrame for the givenGalaxy_ID. Create a test instance ofrotation_curve, for a given galaxy ID. Check that when running the following lines
you get as output


Define the class method
add_baryon. This will add two columns,stars circ velocity square (km/s)andgas circ velocity square (km/s), to todf_circular, from the square of the respective columns indf_baryon. There is however several subtleties we have to take into account when doing this:
Some values of
gas circ velocity (km/s)andstars circ velocity square (km/s)indf_baryonare negative. As indicated here, the negative sign means that the square of the corresponding velocity is to be chosen negative (the negative sign is just the direction of the gravitational force at a given position). Then, in reality the columns to be added have to have the form:The radial data points used in the two DataFrames are different and therefore we want to have the values of
stars circ velocity square (km/s)andgas circ velocity square (km/s)evaluated at the same radial positions ascirc velocityindf_circular. For that we need to create interpolating functions for and as function of radial positions indf_baryon. UseInterpolatedUnivariateSplineto create such interpolating functions (hint: Thenp.sign()function could be useful). After we we have our two interpolating functions, we can evaluate them at the radial points given indf_circular, and further add the two new columns todf_circularcontaining the square of the stars and gas circular velocities evaluated these positions. Check that when running the following lines
you get as output

Define the class method
add_DM_M, which will add the column DM massDM Mass (M_sol)to the DataFramedf_circular. Recall the square of the DM velocity is given by whereas the DM mass distribution is
Use . Check that when running the lines
you get as output

Use your plotting function defined by the end of exercise 3 Tutorial 6 to plot the DM mass distribution as a function of radial distance for all galaxies data in the
Rotation Curvedirectory. This plot now has now removed the baryon contribution to the DM mass distribution curve.