Path: blob/master/A.2 Optimize and speed up the code. (SOR algorithm, Cython and C).ipynb
1675 views
How to optimize the code?
In this notebook I want to show how to write efficient code and how cython and C code can help to improve the speed. I decided to consider as example the SOR algorithm. Here we can see how the algorithm presented in the Notebook A.1 Solution of linear equations can be modified for our specific needs (i.e. solving PDEs).
Again, if you are curious about the SOR and want to know more, have a look at the wiki page link.
Contents
Here we use a tridiagonal matrix A
with equal elements in the three diagonals:
This is the case of the Black-Scholes equation (in log-variables). The matrix A is quite big because we want to test the performances of the algorithms.
The linear system is always the same:
For simplicity I chose .
Python implementation
I wrote two functions to implement the SOR algorithm with the aim of solving PDEs.
SOR
uses matrix multiplications. The code is the same presented in the notebook A1: First it creates the matrices D,U,L (if A is sparse, it is converted into a numpy.array). Then it iterates the solutions until convergence.SOR2
iterates over all components of . It does not perform matrix multiplications but it considers each component of for the computations. The algorithm is the following:
This algorithm is taken from the SOR wiki page and it is equivalent to the algorithm presented in the notebook A1.
Let us see how fast they are: (well... how slow they are... be ready to wait about 6 minutes)
TOO BAD!
The second algorithm is very bad. There is an immediate improvement to do: We are working with a tridiagonal matrix. It means that all the elements not on the three diagonals are zero. The first piece of code to modify is OBVIOUSLY this:
There is no need to sum zero elements. Let us consider the new function:
OK ... it was easy!
But wait a second... if all the elements in the three diagonals are equal, do we really need a matrix? Of course, we can use sparse matrices to save space in memory. But do we really need any kind of matrix? The same algorithm can be written considering just the three values .
In the following algorithm, even if the gain in speed is not so much, we save a lot of space in memory!!
Cython
For those who are not familiar with Cython, I suggest to read this introduction link.
Cython, basically, consists in adding types to the python variables.
Let's see what happens to the speed when we add types to the previous pure python function (SOR4)
About 100 times faster!!!
That's good.
So... those who are not familiar with Cython maybe are confused about the new type np.float64_t
. We wrote:
The first line imports numpy module in the python space. It only gives access to Numpy’s pure-Python API and it occurs at runtime.
The second line gives access to the Numpy’s C API defined in the __init__.pxd
file (link to the file) during compile time.
Even if they are both named np
, they are automatically recognized. In __init__.pdx
it is defined:
The np.float64_t
represents the type double
in C.
Memoryviews
Let us re-write the previous code using the faster memoryviews. I suggest to the reader to have a fast look at the memoryviews manual in the link. There are no difficult concepts and the notation is not so different from the notation used in the previous function.
Memoryviews is another tool to help speed up the algorithm.
I have to admit that when I was writing the new code I realized that using the function norm
is not the optimal way. (I got an error because norm
only accepts ndarrays... so, thanks memoryviews 😃 ).
Well, the norm
function computes a square root, which still requires some computations.
We can define our own function distance2
(which is the square of the distance) that is compared with the square of the tolerance parameter eps * eps
. This is another improvement of the algorithm.
Good job!! Another improvement!
C code
The last improvement is to write the function in C code and call it from python.
Inside the folder src/C
you can find the header file SOR.h
and the implementation file SOR.c
(you will find also the mainSOR.c
if you want to test the SOR algorithm directly in C).
I will call the function SOR_abc
declared in the header SOR.h
.
First it is declared as extern, and then it is called inside SOR_c
with a cast to <double[:arr_memview.shape[0]]>
.
Well... it looks like that using Cython with memoryviews has the same performances as wrapping a C function.
For this reason, I used the cython version as solver in the class BS_pricer
.
We already compared some performances in the notebook 1.2 - BS PDE, and we saw that the SOR algorithm is slow compared to the LU or Thomas algorithms.
Just for curiosity, let us compare the speed of the python PDE_price method implemented with cython SOR algorithm, and a pricer with same SOR algorithm fully implemented in C.
Run the command make
to compile the C code:
Python program with Cython SOR method:
Pure C program: