Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
explore-for-students
GitHub Repository: explore-for-students/python-tutorials
Path: blob/main/Lesson 3 - Numerical calculus.ipynb
968 views
Kernel: Python 3 (ipykernel)

Lesson 3 - Numerical calculus

Authors:

  • Yilber Fabian Bautista

  • Keiwan Jamaly

  • Julia Lienert

  • Sean Tulin

In this lecture, we will cover methods for calculating derivatives and integrals numerically. While we will cover some of the theory behind these methods, the main practical takeaway is to gain familiarity with existing functions within the numpy and scipy libraries that implement these tasks. In Python, most basic tasks have been coded up already in a very efficient way, so knowing your way around Python's libraries saves a lot of work. We have already introduced numpy. scipy is another important and vast library with many algorithms for scientific computing.

By the end of this lecture you will be able to:

  • Do numerical differentiation and understand the principles behind it.

  • Perform basic numeric integration using python integration libraries

  • Use several integration techniques including: the rectangle, trapezoidal, and Simpson's rules.

  • Do numerical integration in 1 dimensions using quadratures using scipy (quad)

  • Perform higher dimensional integrals using scipy dblquad, tplquad

Numerical Differentiation

Differentiation is a fundamental tool in science. Naively one would expect the usual derivative definition will be used for numerics, that is

f(x+h)f(x)h\begin{equation*} \frac{f(x+h) - f(x)}{h} \end{equation*}

Expanding f(x+h)f(x+h) around h=0h=0 we verify, that it is indeed the differential of f(x)f(x).

f(x+h)f(x)h=f(x)+hf(x)+O(h2)f(x)h=f(x)+O(h)\begin{equation*} \frac{f(x+h) - f(x)}{h} = \frac{f(x) + hf'(x) + \mathcal{O}(h^2) - f(x)}{h} = f'(x) + \mathcal{O}(h) \end{equation*}

As we can see, the numerical errors propagates with O(h)\mathcal{O}(h). This is a good approximation to start with, however we can do better by simply changing our definition of the differential and expanding around h=0h=0 f(x+h)f(xh)2h=f(x)+hf(x)+h2/2f(x)(f(x)hf(x)+h2/2f(x))+O(h3)2h=f(x)+O(h2)\begin{equation*} \frac{f(x+h) - f(x-h)}{2h} = \frac{f(x) + hf'(x) + h^2/2 f''(x) - (f(x) - hf'(x) + h^2/2 f''(x)) + \mathcal{O}(h^3)}{2h} = f'(x) + \mathcal{O}(h^2) \end{equation*} This is called the central difference formula and is primarily used for any numerical differentiation, where now the numeric errors propagate as O(h2)\mathcal{O}(h^2)

Exercise 1

Using the techniques from above, prove that f(x+h)2f(x)+f(xh)h2=f(x)+O(h2)\begin{equation*} \frac{f(x+h) -2f(x) + f(x-h)}{h^2} = f''(x) + \mathcal{O}(h^2) \end{equation*}

Differentiation with scipy and numpy

The above numeric differentiation techniques are already implementations in scipy and numpy libraries. The two functions have different applications.

  • scipy.misc.derivative for calculating derivatives of known functions. That is, you can know what f(x)f(x) is and can calculate it at any point xx, and you want to calculate its derivative at a specific point.

  • numpy.gradient for calculating derivatives of unknown functions, where you know f(x)f(x) only on a grid of specific points xx.

Both functions use the central difference formula.

Example with scipy.misc.derivative

SciPy has the function scipy.misc.derivative, which allows us to directly compute the derivative of an analytic function. The syntax for using scipy.misc.derivative function is the following:

scipy.misc.derivative(func, x0, dx=1.0, n=1, args=(), order=3)

where func is the given function, x0 the point the derivative is calculated at, and dx is a float for spacing, in our previous formulas dx=h.

Let us see an specific example:

# include the function 'derivative' from SciPy from scipy.misc import derivative # define the function of which the derivative should be taken def test(x): return x**3 + x**2 # calculate the derivative, store the result of the calculation in the variable 'result' and print it result = derivative(test, 1.0, dx = 1e-6) print(result)
4.999999999921734

From analytic differentiation, we would have expected test(x)=3x2+2xtest'(x)=3x^2+2x, and evaluating at x=1x=1 will produce test(1)=5test'(1)=5. The difference with our numeric differentiation is then

result-5
-7.826628234397504e-11

We can see this is a number approximately of order O(dx2)1012\mathcal{O}(dx^2)\approx10^{-12} as expected from the central difference formula.

Example with numpy.gradient

Now, suppose we do not having a known function f(x)f(x), but only data points are known on a grid of points in xx. The derivative at each point is calculated using the formula of the centered derivative. Since both the leftmost and rightmost point do not have two neighbouring points, their derivatives are calculated using the forward (for the leftmost point) and the backward (for the rightmost point) derivative.

numpy includes the function np.gradient that does the job for us, provided an array with data points is given. The syntax is

np.gradient(f,dx)

where f is the array with the data points and dx is the spacing between the data points.

Notice that the numpy functions for taking derivatives only work for data points given as arrays but not for specific functions. For the latter case, one has to use scipy.misc.derivative.

Also note that f need not be a one-dimensional array or list. If f(x)f(x) is a vector function, evaluating f(x)f(x) on a grid of xx points will yield a 2D array. Then np.gradient will return an object of the same shape, representing the gradient f(x)\nabla f(x) at each grid point xx.

Let us see an specific example of how to use the np.gradient function in practice

import numpy as np import matplotlib.pyplot as plt # note: numpy and matplotlib are already included # create list of evenly spaced numbers x = np.linspace(0.0, 2.0*np.pi, 100) # save the spacing in the variable 'h'; since the numbers in 'x' are evenly spaced it doesn't #matter which difference we use h = x[10] - x[9] # create array with the sine of that numbers f = np.sin(x) # calculate the derivatives of the points in 'f' using the spacing 'h' and store the results in the variable 'deriv_bf deriv_f = np.gradient(f, h) # plot the array 'b' and its derivative 'deriv_b' plt.plot(x/np.pi, f, marker = 'o', color = 'red', label = 'f (sin(x))') plt.plot(x/np.pi, deriv_f, marker = 'o', color = 'blue', label = 'deriv_f (cos(x))') plt.xlabel(r'x [$\pi$]') plt.ylabel('f / deriv_f') plt.legend(loc = 'lower left', fontsize = 8) plt.show() # Note: As expected, 'f' shows the sine function and its derivative 'deriv_f' is the cosine!
Image in a Jupyter notebook

Exercise 2

Compute analytically derivative of sinx\sin{x} and evaluate it at the given x-array. Convince yourself that the difference of this result and the numeric derivative deriv_f is approximately of order O(h2)\mathcal{O}(h^2). In addition, evaluate the derivative using the scipy.misc.derivative, and compute the difference of this result and the previous two methods of computation. Plot the analytic, and the two numeric derivatives in a scale that makes visible their differences.

Exercise 3 (optional)

To get more practice at the computation of numerical derivatives with Python:

  1. Calculate the derivative of the function f(x)=x44x2+3x1f(x) = x^4 - 4x^2 + 3x - 1 at the point x=1.5x = 1.5 both analytically (by hand + using your calculator) and numerically (using h=dx=0.1h = dx = 0.1 and h=dx=0.01h = dx = 0.01). Compare the results. What do you expect to change when using different values for dxdx?

  2. Calculate the derivative of the function g(x)=1/xg(x) = 1/x at the point x=0.1x = 0.1 both analytically (by hand + using your calculator) and numerically (using h=dx=0.1h = dx = 0.1). What happens?

# solution to task 1
# solution to task 2

Numerical Integration

Given a 1-dimensional integral of the form abf(x)dx,\int_{a}^{b} f(x) dx, exploiting the linearity of the integral, this integral can be rewrite it as: abf(x)dx=i=0Nxixi+1f(x)dx\int_{a}^{b} f(x) dx = \sum_{i=0}^{N} \int_{x_i}^{x_{i+1}} f(x) dx where xi=a+ihx_i = a + i h and h=(ba)/Nh = (b-a)/N. i.e. the xix_i points are equally spaced.

If xix_i and xi+1x_{i+1} are close to each other, the integral xixi+1f(x)dx\int_{x_i}^{x_{i+1}} f(x) dx can be approximated with different forms.

Rectangle rule

The rectangle rule is the simplest form of the integral. Each integral in the previous sum is approximated by the area of a rectangle of base (xixi+1)(x_i - x_{i+1}) and height f(xi+xi+12) f(\frac{x_i + x_{i+1}}{2}), i.e. the curve passes through the central points in one of the rectangle's edges.

xixi+1f(x)dx=(xixi+1)f(xi+xi+12)+O(1/N2)\int_{x_i}^{x_{i+1}} f(x) dx = (x_i - x_{i+1}) f\left(\frac{x_i + x_{i+1}}{2}\right) + \mathcal{O}(1/N^2)

drawing

Trapezoidal rule

A little bit more complex is the trapezoidal rule. Each integral in the sum is approximated by the area of a trapezoid. Notice that the trapezoid has two different heights, and to compute its area we have have to evaluate the function at two different xix_i values. The area of the trapezoid is then computed by

xixi+1f(x)dx=(xixi+1)f(xi)+f(xi+1)2+O(1/N3)\int_{x_i}^{x_{i+1}} f(x) dx = (x_i - x_{i+1}) \frac{f(x_i) + f(x_{i+1})}{2} + \mathcal{O}(1/N^3)

drawing

Simpson's rule

The most complicated rule out of the three is the Simpson's rule. The function of each interval is approximated by a polynomial of order 2 and integrating that polynomial instead of the original function. It's error propagation is of order O(1/N5)\mathcal{O}(1/N^5)

drawing

As you can see, the error for these methods evolve differently for different values of NN. It's important to keep in mind, that different methods also require different numbers of function calls. So if f(x) is a very expensive function without a lot of fluctuations, it's better to use the rectangle rule, than the trapezoid rule.

There is also a way to vary the size of the intervals depending on the function, but this is a more advanced method. We will just mention it here, for you to know that this exists.

Also keep in mind, that the methods presented here are just the common three out of many which are optimized for different problems.

Source of images

Exercise 4

Using numpy, program the rectangle and trapezoid rules in the next code block to integrate the function f(x)=exf(x) = e^{-x} from x=0x=0 to x=1x=1. Compare it with the analytic result varying the the number of intervals and see what the difference for the mentioned methods of computation. Plot all your findings

# Write your solutions here

The Simpson's rule is quite complicated to implement and other integration methods can be even more complicated. For that reason, clever people have already done this for us and bundled them in a library. Thescipy library contains many methods for numerical integration. We will now take a look at some of them, but a comprehensive list is here.

scipy.integrate.quad

Scipy's quad function is the main workhorse that you should use for numerical one-dimensional integrals. It is based on the FORTRAN library QUADPACK. Scipy has generally a really good documentation that we encourage you read.

Given a 1-dimensional analytic integrand (or interpolated) function func, the 1-dimensional quad integration has the following syntax:

result, error = integrate.quad(func, x_min, x_max, args=())

where args are possible parameters entering in your function. Run the commands import scipy.integrate as integrate and integrate.quad? for additional documentation.

Let us see how to use quad integration with an specific example.

import numpy as np import scipy.integrate as integrate def f(x): return np.exp(-x) # calculate the integral of 'f' from 0 to 10 result, error = integrate.quad(f, 0, 10) print(result, error)
0.9999546000702375 2.8326146575791917e-14

The first number gives you the result, the second one the numerical error of the integration.

While quad is useful for known integrand functions, other methods implemented in the scipy library are useful when having integrand functions that are np.arrays. One of such a methods is the already mentioned trapezoid rule. The syntax for implementation of this integration method is the following:

scipy.integrate.trapezoid(y, x=None, dx=1.0, axis=-1)

or

scipy.integrate.trapz(y, x=None, dx=1.0, axis=-1)

Let us see an specific example of this.

x = np.linspace(0, 100, 1000) y = f(x) integrate.trapz(y, x)
1.000834863090761

As you can see, the result is fairly similar to the quad method, but you don't have an error estimate.

The last method we will be presenting here, is Simpson's method, which works similar as the trapezoid method. The syntax for the implementation is:

scipy.integrate.simpson(y, x=None, dx=1.0, axis=-1)

or

scipy.integrate.simps(y, x=None, dx=1.0, axis=-1)
x = np.linspace(0, 100, 1000) y = f(x) integrate.simps(y, x)
1.0000402922553666

Depending on the problem we are trying to solve, we can use either of the different integration methods in scipy library. While quad works just with functions and the step size is estimated on the fly, the implementation of the trapezoid and simpson methods work on arrays.

Keep in mind that one can convert an array into a function and a function into an array, for that, you we have prepared an optional section on interpolating functions, which we will extend in Lecture 7.

Exercise 5

In this exercise we will compute numerically the integral of the function

y(x,a)=0xeaϕ2dϕ,y(x,a) = \int_0^{x} e^{-a\phi^2} d\phi,

using the quad integration method described above. Here aa is a positive real number. For xx\to \infty, this integral has an analytic result

0eaϕ2dϕ=π4a\int_0^{\infty} e^{-a \phi^2} d\phi = \sqrt{\frac{\pi}{4a}}

which we will use to compare to our numeric integration.

  1. Write a function, which represents the integrand and a function y(x,a) which calculates the integral. Recall the quad function allows for integration when parameters are included inside the integrand, as well as infinite integration interval.

  2. Check that your function is well defined by calculating y(np.inf,10) and compared to the analytic result using a = 10.

  3. Use the interval x = np.linspace(0, 100, 100) to evaluate y(x,10)

  4. Plot your result.

# Write your results here

Two-dimensional integrals (scipy.integrate.dblquad)

Higher-dimensional integration is also possible to perform. In particular, for two dimensions we have scipy.integrate.dblquad.

For instance, the area of a circle in Cartesian coordinates can be computed from the double integral.

A=x=11y=1x21x2dydxA = \int_{x=-1}^{1} \int_{y=-\sqrt{1-x^2}}^{\sqrt{1-x^2}} dy dx

To understand this further, lets rewrite the integral, to make the variables match with the documentation:

x=aby=gfun(x)hfun(x)fun(y,x)dydx\int_{x=a}^{b} \int_{y=\text{gfun}(x)}^{\text{hfun}(x)} \text{fun}(y,x) dy dx

where:

a=1,b=1,gfun(x)=1x2,hfun(x)=1x2,fun(y,x)=1a = -1, \quad b = 1, \quad \text{gfun}(x) = -\sqrt{1-x^2}, \quad \text{hfun}(x) = \sqrt{1-x^2}, \quad \text{fun}(y,x) = 1

Let's program this integral using our dblquad documentation

a = -1 b = 1 def gfun(x): return -np.sqrt(1-x**2) def hfun(x): return np.sqrt(1-x**2) def fun(y, x): # the order of the arguments is important return 1 integrate.dblquad(fun, a, b, gfun, hfun)
(3.141592653589797, 2.000471344132393e-09)

And it is as easy as that!

Exercise 4

Using the same approach, calculate the volume of a sphere using the function scipy.integrate.tplquad, which is documented here.

# Write your code here

Interpolation (optional)

Given two arrays with data points (y, x), we have precise values where the data points are recorded, but sometimes one wants an estimate of the values in between these data points. A process called interpolation helps us to do this. We will roughly outline interpolation and discuss it in detail in Lecture 7.

Let's say that we have the following data points:

x = np.linspace(0.0, 5, 5) y = x**2*np.exp(x) plt.scatter(x, y)
<matplotlib.collections.PathCollection at 0x1ce3844a520>
Image in a Jupyter notebook

Suppose we want to estimate y(3), but as you can see, a value of y at x=3 doesn't exist. The simplest estimate is a linear interpolation, which means drawing a straight line between the points. This can be done with the scipy.interpolate.interp1d function.

import scipy.interpolate as interpolate y_linear = interpolate.interp1d(x, y) y_linear(3)
array(284.8654386)

scipy.interpolate.interp1d generates a function in the domain (min(x),max(x))(min(x), max(x)) It easily can be understood by plotting it with a high point density, to see that it truly continuous.

x_high_density = np.linspace(0.0, 5, 50) plt.scatter(x_high_density, y_linear(x_high_density))
<matplotlib.collections.PathCollection at 0x1ce39499130>
Image in a Jupyter notebook

There is also the possibility to use a quadratic formula for interpolation, in which case the function and its first derivative are continuous at each point. The keyword kind=2 specifies that we want a polynomial of second order.

y_quadric = interpolate.interp1d(x, y, kind=2) x_high_density = np.linspace(0.0, 5, 50) plt.scatter(x_high_density, y_quadric(x_high_density))
<matplotlib.collections.PathCollection at 0x1ce39561f40>
Image in a Jupyter notebook

A nice little hack to know is, that if you function is bijective, you can calculate the inverse with the inverse interpolation.

y_inverse_quadric = interpolate.interp1d(y, x, kind=1) # normally it is x, y x_inverse_high_density = np.linspace(y.min(), y.max(), 1000) plt.scatter(x_inverse_high_density, y_inverse_quadric(x_inverse_high_density))
<matplotlib.collections.PathCollection at 0x1ce395cdc10>
Image in a Jupyter notebook

There exist also interpolations for 'cubic', 'nearest', etc. depending on the problem you like to solve. Normally you need to experiment a little, to find the one, which fit's the best.