Contact
CoCalc Logo Icon
StoreFeaturesDocsShareSupport News AboutSign UpSign In
| Download
Project: test
Views: 88694
Kernel: Python 3

Designing Kalman Filters

#format the book %matplotlib inline %load_ext autoreload %autoreload 2 from __future__ import division, print_function from book_format import load_style, figsize load_style()

Introduction

Author's note: clean up ball example, add control input example.

In the last chapter we worked with 'textbook' problems. These are problems that are easy to state, program in a few lines of code, and teach. Real world problems are rarely 'textbook'. In this chapter, and the remainder of the book, we will work with more realistic examples.

We will begin by tracking a robot in a 2D space, such as a field or warehouse. We will start with a simple noisy sensor that outputs noisy (x,y)(x,y) coordinates which we will need to filter to generate a 2D track. Once we have mastered this concept, we will extend the problem significantly with more sensors and then adding control inputs.

We will then move to a nonlinear problem. The world is nonlinear, but the Kalman filter is linear. Sometimes you can get away with using it for mildly nonlinear problems, sometimes you can't. I'll show you examples of both. This will set the stage for the remainder of the book, where we learn techniques for nonlinear problems.

Tracking a Robot

This first attempt at tracking a robot will closely resemble the 1D dog tracking problem of previous chapters. This will allow us to 'get our feet wet' with Kalman filtering. Instead of a sensor that outputs position in a hallway, we now have a sensor that supplies a noisy measurement of position in a 2D space. At each time tt it will provide an (x,y)(x,y) coordinate pair of the noisy measurement of the sensor's position in the field.

Implementation of code to interact with real sensors is beyond the scope of this book, so as before we will program simple simulations of the sensors. We will develop several of these sensors as we go, each with more complications, so as I program them I will just append a number to the function name.

So let's start with a very simple sensor, one that travels in a straight line. It is initialized with the initial position, velocity, and noise standard deviation. Each call to read() updates the position by one time step and returns the new measurement.

from numpy.random import randn import copy class PosSensor1(object): def __init__(self, pos=(0, 0), vel=(0, 0), noise_std=1.): self.vel = vel self.noise_std = noise_std self.pos = [pos[0], pos[1]] def read(self): self.pos[0] += self.vel[0] self.pos[1] += self.vel[1] return [self.pos[0] + randn() * self.noise_std, self.pos[1] + randn() * self.noise_std]

A quick test to verify that it works as we expect.

import matplotlib.pyplot as plt import numpy as np import book_plots as bp pos, vel = (4, 3), (2, 1) sensor = PosSensor1(pos, vel, noise_std=1) ps = np.array([sensor.read() for _ in range(50)]) bp.plot_measurements(ps[:, 0], ps[:, 1]);
Image in a Jupyter notebook

That looks correct. The slope is 1/2, as we would expect with a velocity of (2, 1), and the data seems to start at near (6, 4). It doesn't look realistic. This is still a 'textbook' representation. As we continue we will add complications that adds real world behavior.

Step 1: Choose the State Variables

As always, the first step is to choose our state variables. We are tracking in two dimensions and have a sensor that gives us a reading in each of those two dimensions, so we know that we have the two observed variables xx and yy. If we created our Kalman filter using only those two variables the performance would not be very good because we would be ignoring the information velocity can provide to us. We will want to incorporate velocity into our equations as well. I will represent this as

x=[xx˙yy˙]T\mathbf{x} = \begin{bmatrix}x & \dot{x} & y & \dot{y}\end{bmatrix}^\mathsf{T}

There is nothing special about this organization. I could have used [xyx˙y˙]T\begin{bmatrix}x & y & \dot{x} & \dot{y}\end{bmatrix}^\mathsf{T} or something less logical. I just need to be consistent in the rest of the matrices. I like keeping positions and locations next to each other because it keeps the covariances between positions and velocities in the same sub block of the covariance matrix. In my formulation P[1,0] contains the covariance of of xx and x˙\dot{x}. In the alternative formulation that covariance is at P[2, 0]. This gets worse as the number of dimension increases (e.g. 3D space, accelerations).

Let's pause and address how you identify the hidden variables. This example is somewhat obvious because we've already worked through the 1D case, but other problems won't be obvious There is no easy answer to this question. The first thing to ask yourself is what is the interpretation of the first and second derivatives of the data from the sensors. We do that because obtaining the first and second derivatives is mathematically trivial if you are reading from the sensors using a fixed time step. The first derivative is just the difference between two successive readings. In our tracking case the first derivative has an obvious physical interpretation: the difference between two successive positions is velocity.

Beyond this you can start looking at how you might combine the data from two or more different sensors to produce more information. This opens up the field of sensor fusion, and we will be covering examples of this in later sections. For now, recognize that choosing the appropriate state variables is paramount to getting the best possible performance from your filter. Once you have chosen hidden variables, you must run many tests to ensure that you are generating real results for them. The Kalman filter runs whatever model you give it; if your model cannot generate good information for the hidden variables the Kalman filter output will be nonsense.

Step 2: Design State Transition Function

Our next step is to design the state transition function. Recall that the state transition function is implemented as a matrix F\mathbf{F} that we multiply with the previous state of our system to get the next state, like so.

x=Fx\mathbf{x}^- = \mathbf{Fx}

I will not belabor this as it is very similar to the 1-D case we did in the previous chapter. The state equations are

x=1x+Δtx˙+y+0y˙vx=0x+1x˙+1y+0y˙y=0x+0x˙+1y+Δty˙vy=0x+0x˙+0y+1y˙\begin{aligned} x^- &= 1x + \Delta t \dot{x} + y + 0 \dot{y} \\ v_x &= 0x + 1\dot{x} + 1y + 0 \dot{y} \\ y^- &= 0x + 0\dot{x} + 1y + \Delta t \dot{y} \\ v_y &= 0x + 0\dot{x} + 0y + 1 \dot{y} \end{aligned}

Laying it out that way shows us both the values and row-column organization required for F\small\mathbf{F}. In linear algebra, we would write this as:

[xx˙yy˙]=[1Δt000100001Δt0001][xx˙yy˙]\begin{bmatrix}x \\ \dot{x} \\ y \\ \dot{y}\end{bmatrix}^- = \begin{bmatrix}1& \Delta t& 0& 0\\0& 1& 0& 0\\0& 0& 1& \Delta t\\ 0& 0& 0& 1\end{bmatrix}\begin{bmatrix}x \\ \dot{x} \\ y \\ \dot{y}\end{bmatrix}

So, let's do this in Python. It is very simple; the only thing new here is setting dim_z to 2. We will see why it is set to 2 in step 4.

from filterpy.kalman import KalmanFilter tracker = KalmanFilter(dim_x=4, dim_z=2) dt = 1. # time step 1 second tracker.F = np.array([[1, dt, 0, 0], [0, 1, 0, 0], [0, 0, 1, dt], [0, 0, 0, 1]])

Step 3: Design the Process Noise Matrix

We will use FilterPy's functions to compute the Q\mathbf{Q} matrix for us. For simplicity I will assume the noise is a discrete time Wiener process - that it is constant for each time period. This assumption allows me to use a variance to specify how much I think the model changes between steps. Revisit the Kalman Filter Math chapter if this is not clear.

from scipy.linalg import block_diag from filterpy.common import Q_discrete_white_noise q = Q_discrete_white_noise(dim=2, dt=dt, var=0.05) tracker.Q = block_diag(q, q) print(tracker.Q)
[[ 0.0125 0.025 0. 0. ] [ 0.025 0.05 0. 0. ] [ 0. 0. 0.0125 0.025 ] [ 0. 0. 0.025 0.05 ]]

Here I assume the noise in x and y are independent, so the covariances between any x and y variable should be zero. This allows me to compute Q\mathbf{Q} for one dimension, and then use block_diag to copy it for the x and y axis. The printed result shows how much noise we add during each evolution (prediction).

Step 4: Design the Control Function

We haven't yet added controls to our robot, so there is nothing to be done for this step. The KalmanFilter class initializes B to zero under the assumption that there is no control input, so there is no code to write. If you like, you can be explicit and set tracker.B to 0, but as you can see it already has that value.

print(tracker.B)
0

Step 5: Design the Measurement Function

The measurement function H\mathbf{H} defines how we go from the state variables to the measurements using the equation z=Hx\mathbf{z} = \mathbf{Hx}. In this case we have measurements for (x,y), so we will design z\mathbf{z} as [xy]T\begin{bmatrix}x & y\end{bmatrix}^\mathsf{T} which is dimension 2x1. Our state variable is size 4x1. We can deduce the required size for H\textbf{H} by recalling that multiplying a matrix of size MxN by NxP yields a matrix of size MxP. Thus,

(2×1)=(a×b)(4×1)=(2×4)(4×1)(2\times 1) = (a\times b)(4 \times 1) = (2\times 4)(4\times 1)

So, H\textbf{H} is 2x4.

Filling in the values for H\textbf{H} is easy because the measurement is the position of the robot, which is the xx and yy variables of the state x\textbf{x}. Let's make this slightly more interesting by deciding we want to change units. The measurements are returned in feet, and that we desire to work in meters. H\textbf{H} changes from state to measurement, so the conversion is feet=meters/0.3048\mathsf{feet} = \mathsf{meters} / 0.3048. This yields

H=[10.30480000010.30480]\mathbf{H} = \begin{bmatrix} \frac{1}{0.3048} & 0 & 0 & 0 \\ 0 & 0 & \frac{1}{0.3048} & 0 \end{bmatrix}

which corresponds to these linear equations

zx=(x0.3048)+(0vx)+(0y)+(0vy)=x0.3048zy=(0x)+(0vx)+(y0.3048)+(0vy)=y0.3048\begin{aligned} z_x &= (\frac{x}{0.3048}) + (0* v_x) + (0*y) + (0 * v_y) = \frac{x}{0.3048}\\ z_y &= (0*x) + (0* v_x) + (\frac{y}{0.3048}) + (0 * v_y) = \frac{y}{0.3048} \end{aligned}

This is a simple problem, and we could have found the equations directly without going through the dimensional analysis that I did above. But it is useful to remember that the equations of the Kalman filter imply a specific dimensionality for all of the matrices, and when I start to get lost as to how to design something it is useful to look at the matrix dimensions.

Here is my implementation:

tracker.H = np.array([[1/0.3048, 0, 0, 0], [0, 0, 1/0.3048, 0]])

Step 6: Design the Measurement Noise Matrix

In this step we model the noise in our sensor. For now we will make the simple assumption that the xx and yy variables are independent white Gaussian processes. That is, the noise in x is not in any way dependent on the noise in y, and the noise is normally distributed about the mean 0. For now let's set the variance for xx and yy to be 5 meters2^2. They are independent, so there is no covariance, and our off diagonals will be 0. This gives us:

R=[σx2σyσxσxσyσy2]=[5005]\mathbf{R} = \begin{bmatrix}\sigma_x^2 & \sigma_y\sigma_x \\ \sigma_x\sigma_y & \sigma_{y}^2\end{bmatrix} = \begin{bmatrix}5&0\\0&5\end{bmatrix}

It is a 2×22{\times}2 matrix because we have 2 sensor inputs, and covariance matrices are always of size n×nn{\times}n for nn variables. In Python we write:

tracker.R = np.array([[5, 0], [0, 5]]) print(tracker.R)
[[5 0] [0 5]]

Initial Conditions

For our simple problem we will set the initial position at (0,0) with a velocity of (0,0). Since that is a pure guess, we will set the covariance matrix P\small\mathbf{P} to a large value.

x=[0000],P=[500000050000005000000500]\mathbf{x} = \begin{bmatrix}0\\0\\0\\0\end{bmatrix}, \, \mathbf{P} = \begin{bmatrix}500&0&0&0\\0&500&0&0\\0&0&500&0\\0&0&0&500\end{bmatrix}

The Python implementation is

tracker.x = np.array([[0, 0, 0, 0]]).T tracker.P = np.eye(4) * 500.

Implement the Filter

Design is complete, now we just have to write the code to run the filter and output the data in the format of our choice. We will run the code for 30 iterations.

def tracker1(): tracker = KalmanFilter(dim_x=4, dim_z=2) dt = 1.0 # time step tracker.F = np.array([[1, dt, 0, 0], [0, 1, 0, 0], [0, 0, 1, dt], [0, 0, 0, 1]]) tracker.u = 0. tracker.H = np.array([[1/0.3048, 0, 0, 0], [0, 0, 1/0.3048, 0]]) tracker.R = np.eye(2) * 5 q = Q_discrete_white_noise(dim=2, dt=dt, var=0.05) tracker.Q = block_diag(q, q) tracker.x = np.array([[0, 0, 0, 0]]).T tracker.P = np.eye(4) * 500. return tracker # simulate robot movement N = 30 sensor = PosSensor1 ([0, 0], (2, 1), 1.) zs = np.array([np.array([sensor.read()]).T for _ in range(N)]) # run filter robot_tracker = tracker1() mu, cov, _, _ = tracker.batch_filter(zs) # plot results zs *= .3048 # convert to meters bp.plot_filter(mu[:, 0], mu[:, 2]) bp.plot_measurements(zs[:, 0], zs[:, 1]) plt.legend(loc=2) plt.gca().set_aspect('equal') plt.xlim((0, 20));
Image in a Jupyter notebook

I encourage you to play with this, setting Q\mathbf{Q} and R\mathbf{R} to various values. However, we did a fair amount of that sort of thing in the last chapters, and we have a lot of material to cover, so I will move on to more complicated cases where we will also have a chance to experience changing these values.

Now I will run the same Kalman filter with the same settings, but also plot the covariance ellipse for xx and yy. First, the code without explanation, so we can see the output. I print the last covariance to see what it looks like. But before you scroll down to look at the results, what do you think it will look like? You have enough information to figure this out but this is still new to you, so don't be discouraged if you get it wrong.

from filterpy.stats import plot_covariance_ellipse # simulate robot movement N = 30 sensor = PosSensor1([0, 0], (2, 1), 1.) zs = np.array([np.array([sensor.read()]).T for _ in range(N)]) # run filter robot_tracker = tracker1() mu, cov, _, _ = robot_tracker.batch_filter(zs) for x, P in zip(mu, cov): # covariance of x and y cov = np.array([[P[0, 0], P[2, 0]], [P[0, 2], P[2, 2]]]) mean = (x[0, 0], x[2, 0]) plot_covariance_ellipse(mean, cov=cov, fc='g', alpha=0.15) # plot results zs *= .3048 # convert to meters bp.plot_filter(mu[:, 0], mu[:, 2]) bp.plot_measurements(zs[:, 0], zs[:, 1]) plt.legend(loc=2) plt.gca().set_aspect('equal') plt.xlim((0, 20));
Image in a Jupyter notebook

Did you correctly predict what the covariance matrix and plots would look like? Perhaps you were expecting a tilted ellipse, as in the last chapters. If so, recall that in those chapters we were not plotting xx against yy, but xx against x˙\dot{x}. xx is correlated to x˙\dot{x}, but xx is not correlated or dependent on yy. Therefore our ellipses are not tilted. Furthermore, the noise for both xx and yy are modeled to have the same value, 5, in R\mathbf{R}. If we were to set R to, for example,

R=[10005]\mathbf{R} = \begin{bmatrix}10&0\\0&5\end{bmatrix}

we would be telling the Kalman filter that there is more noise in xx than yy, and our ellipses would be longer than they are tall.

The final P tells us everything we need to know about the correlation between the state variables. If we look at the diagonal alone we see the variance for each variable. In other words P0,0\mathbf{P}_{0,0} is the variance for x, P1,1\mathbf{P}_{1,1} is the variance for x˙\dot{x}, P2,2\mathbf{P}_{2,2} is the variance for y, and P3,3\mathbf{P}_{3,3} is the variance for y˙\dot{y}. We can extract the diagonal of a matrix using numpy.diag().

np.diag(tracker.P)
array([ 0.25675401, 0.10095634, 0.25675401, 0.10095634])

The covariance matrix contains four 2×22{\times}2 matrices that you should be able to easily pick out. This is due to the correlation of xx to x˙\dot{x}, and of yy to y˙\dot{y}. The upper left hand side shows the covariance of xx to x˙\dot{x}.

c = tracker.P[0:2, 0:2] print(c) plot_covariance_ellipse((0, 0), cov=c, fc='g', alpha=0.2)
[[ 0.25675401 0.10192183] [ 0.10192183 0.10095634]]
Image in a Jupyter notebook

The covariance contains the data for xx and x˙\dot{x} in the upper left because of how it is organized. Recall that entries Pi,j\mathbf{P}_{i,j} and Pj,i\mathbf{P}_{j,i} contain σ1σ2\sigma_1\sigma_2.

Finally, let's look at the lower left side of P\mathbf{P}, which is all 0s. Why 0s? Consider P3,0\mathbf{P}_{3,0}. That stores the term pσ3σ0p\sigma_3\sigma_0, which is the covariance between y˙\dot{y} and xx. These are independent, so the term will be 0. The rest of the terms are for similarly independent variables.

tracker.P[2:4, 0:2]
array([[ 0., 0.], [ 0., 0.]])

The Effect of Filter Order

We have only studied tracking position and velocity. It has worked well, but only because I have been selecting problems for which this is an appropriate choice. You know have enough experience with the Kalman filter to consider this in more general terms.

What do I mean by order? In the context of these system models it is the number of derivatives required to accurately model a system. Consider a system that does not change, such as the height of a building. There is no change, so there is no need for a derivative, and the order of the system is zero. We could express this in an equation as x=312.5x = 312.5.

A first order system has a first derivative. For example, change of position is velocity, and we can write this as

v=dxdtv = \frac{dx}{dt}

which we integrate into the Newtonian equation

x=vt+x0.x = vt + x_0.

This is also called a constant velocity model, because of the assumption of a constant velocity.

A second order has a second derivative. The second derivative of position is acceleration, with the equation

a=d2xdt2a = \frac{d^2x}{dt^2}

which we integrate into

x=12at2+v0t+x0.x = \frac{1}{2}at^2 +v_0t + x_0.

This is also known as a constant acceleration model.

Another, equivalent way of looking at this is to consider the order of the polynomial. The constant acceleration model has a second derivative, so it is second order. Likewise, the polynomial x=12at2+v0t+x0x = \frac{1}{2}at^2 +v_0t + x_0 is second order.

When we design the state variables and process model we must choose the order of the system we want to model. Let's say we are tracking something with a constant velocity. No real world process is perfect, and so there will be slight variations in the velocity over short time period. You might reason that the best approach is to use a second order filter, allowing the acceleration term to deal with the slight variations in velocity.

In practice that doesn't work well. To thoroughly understand this issue lets see the effects of using a process model that does not match the order of the system being filtered.

First we need a system to filter. I'll write a class to simulate an object with constant velocity. Essentially no physical system has a truly constant velocity, so on each update we alter the velocity by a small amount. I also write a sensor to simulate Gaussian noise in a sensor. The code is below, and a plot an example run to verify that it is working correctly.

class ConstantVelocityObject(object): def __init__(self, x0=0, vel=1., noise_scale=0.06): self.x = x0 self.vel = vel self.noise_scale = noise_scale def update(self): self.vel += randn() * self.noise_scale self.x += self.vel return (self.x, self.vel) def sense(x, noise_scale=1.): return x[0] + randn()*noise_scale np.random.seed(124) obj = ConstantVelocityObject() xs, zs = [], [] for i in range(50): x = obj.update() z = sense(x) xs.append(x) zs.append(z) xs = np.asarray(xs) bp.plot_track(xs[:, 0]) bp.plot_measurements(range(len(zs)), zs) plt.legend(loc='best');
Image in a Jupyter notebook

I am satisfied with this plot. The track is not perfectly straight due to the noise that we added to the system - this could be the track of a person walking down the street, or perhaps of an aircraft being buffeted by variable winds. There is no intentional acceleration here, so we call it a constant velocity system. Again, you may be asking yourself that since there is in fact a tiny bit of acceleration going on why would we not use a second order Kalman filter to account for those changes? Let's find out.

How does one design a zero order, first order, or second order Kalman filter? We have been doing it all along, but just not using those terms. It might be slightly tedious, but I will elaborate fully on each - if the concept is clear to you feel free to skim a bit.

Zero Order Kalman Filter

A zero order Kalman filter is just a filter that tracks with no derivatives. We are tracking position, so that means we only have a state variable for position (no velocity or acceleration), and the state transition function also only accounts for position. Using the matrix formulation we would say that the state variable is

x=[x]\mathbf{x} = \begin{bmatrix}x\end{bmatrix}

The state transition function is very simple. There is no change in position, so we need to model x=xx=x; in other words, x at time t+1 is the same as it was at time t. In matrix form, our state transition function is

F=[1]\mathbf{F} = \begin{bmatrix}1\end{bmatrix}

The measurement function is very easy. Recall that we need to define how to convert the state variable x\mathbf{x} into a measurement. We will assume that our measurements are positions. The state variable only contains a position, so we get

H=[1]\mathbf{H} = \begin{bmatrix}1\end{bmatrix}

Let's write a function that constructs and returns a zero order Kalman filter.

def ZeroOrderKF(R, Q, P=20): """ Create zero order Kalman filter. Specify R and Q as floats.""" kf = KalmanFilter(dim_x=1, dim_z=1) kf.x = np.array([0.]) kf.R *= R kf.Q *= Q kf.P *= P kf.F = np.eye(1) kf.H = np.eye(1) return kf

First Order Kalman Filter

A first order Kalman filter tracks a first order system, such as position and velocity. We already did this for the dog tracking problem above, so this should be very clear. But let's do it again.

A first order system has position and velocity, so the state variable needs both of these. The matrix formulation could be

x=[xx˙]\mathbf{x} = \begin{bmatrix}x\\\dot{x}\end{bmatrix}

So now we have to design our state transition. The Newtonian equations for a time step are:

xt=xt1+vΔtvt=vt1\begin{aligned} x_t &= x_{t-1} + v\Delta t \\ v_t &= v_{t-1}\end{aligned}

Recall that we need to convert this into the linear equation

[xx˙]=F[xx˙]\begin{bmatrix}x\\\dot{x}\end{bmatrix} = \mathbf{F}\begin{bmatrix}x\\\dot{x}\end{bmatrix}

Setting

F=[1Δt01]\mathbf{F} = \begin{bmatrix}1 &\Delta t\\ 0 & 1\end{bmatrix}

gives us the equations above.

Finally, we design the measurement function. The measurement function needs to implement

z=Hxz = \mathbf{Hx}

Our sensor still only reads position, so it should take the position from the state, and 0 out the velocity, like so:

H=[10]\mathbf{H} = \begin{bmatrix}1 & 0 \end{bmatrix}

This function constructs and returns a first order Kalman filter.

def FirstOrderKF(R, Q, dt): """ Create zero order Kalman filter. Specify R and Q as floats.""" kf = KalmanFilter(dim_x=2, dim_z=1) kf.x = np.zeros(2) kf.P *= np.array([[100, 0], [0, 1]]) kf.R *= R kf.Q = Q_discrete_white_noise(2, dt, Q) kf.F = np.array([[1., dt], [0., 1]]) kf.H = np.array([[1., 0]]) return kf

Second Order Kalman Filter

A second order Kalman filter tracks a second order system, such as position, velocity and acceleration. The state variable will be

x=[xx˙x¨]\mathbf{x} = \begin{bmatrix}x\\\dot{x}\\\ddot{x}\end{bmatrix}

So now we have to design our state transition. The Newtonian equations for a time step are:

xt=xt1+vt1Δt+0.5at1Δt2vt=vt1Δt+at1at=at1\begin{aligned} x_t &= x_{t-1} + v_{t-1}\Delta t + 0.5a_{t-1} \Delta t^2 \\ v_t &= v_{t-1} \Delta t + a_{t-1} \\ a_t &= a_{t-1}\end{aligned}

Recall that we need to convert this into the linear equation

[xx˙x¨]=F[xx˙x¨]\begin{bmatrix}x\\\dot{x}\\\ddot{x}\end{bmatrix} = \mathbf{F}\begin{bmatrix}x\\\dot{x}\\\ddot{x}\end{bmatrix}

Setting

F=[1Δt.5Δt201Δt001]\mathbf{F} = \begin{bmatrix}1 & \Delta t &.5\Delta t^2\\ 0 & 1 & \Delta t \\ 0 & 0 & 1\end{bmatrix}

gives us the equations above.

Finally, we design the measurement function. The measurement function needs to implement

z=Hxz = \mathbf{Hx}

Our sensor still only reads position, so it should take the position from the state, and 0 out the velocity, like so:

H=[100]\mathbf{H} = \begin{bmatrix}1 & 0 & 0\end{bmatrix}

This function constructs and returns a second order Kalman filter.

def SecondOrderKF(R_std, Q, dt): """ Create zero order Kalman filter. Specify R and Q as floats.""" kf = KalmanFilter(dim_x=3, dim_z=1) kf.x = np.zeros(3) kf.P[0, 0] = 100 kf.P[1, 1] = 1 kf.P[2, 2] = 1 kf.R *= R_std**2 kf.Q = Q_discrete_white_noise(3, dt, Q) kf.F = np.array([[1., dt, .5*dt*dt], [0., 1., dt], [0., 0., 1.]]) kf.H = np.array([[1., 0., 0.]]) return kf

Evaluating the Performance

Now we can run each Kalman filter against the simulation and evaluate the results.

How do we evaluate the results? We can do this qualitatively by plotting the track and the Kalman filter output and eyeballing the results. However, a rigorous approach uses mathematics. Recall that system covariance matrix P\mathbf{P} contains the computed variance and covariances for each of the state variables. The diagonal contains the variance. Remember that roughly 99% of all measurements fall within 3σ3\sigma if the noise is Gaussian. If this is not clear please review the Gaussian chapter before continuing, as this is an important point.

So we can evaluate the filter by looking at the residuals between the estimated state and actual state and comparing them to the standard deviations which we derive from P\mathbf{P}. If the filter is performing correctly 99% of the residuals will fall within 3σ3\sigma. This is true for all the state variables, not just for the position.

I must mention that this is only true for simulated systems. Real sensors are not perfectly Gaussian, and you may need to expand your criteria to, say, 5σ5\sigma with real sensor data.

So let's run the first order Kalman filter against our first order system and access it's performance. You can probably guess that it will do well, but let's look at it using the standard deviations.

First, let's write a routine to generate the noisy measurements for us.

def simulate_system(Q, count): obj = ConstantVelocityObject(x0=.0, vel=0.5, noise_scale=Q) xs, zs = [], [] for i in range(count): x = obj.update() z = sense(x) xs.append(x) zs.append(z) return np.asarray(xs), np.asarray(zs)

And now a routine to perform the filtering.

def filter_data(kf, zs): xs, ps = [], [] for z in zs: kf.predict() kf.update(z) xs.append(kf.x) ps.append(kf.P.diagonal()) # just save variances return np.asarray(xs), np.asarray(ps)

And to plot the track results.

def plot_kf_output(xs, filter_xs, zs, title=None, aspect_equal=True): bp.plot_filter(filter_xs[:, 0]) bp.plot_track(xs[:, 0]) if zs is not None: bp.plot_measurements(zs) bp.show_legend() bp.set_labels(title=title, x='meters', y='time (sec)') if aspect_equal: plt.gca().set_aspect('equal') plt.xlim((-1, len(xs))) plt.show()

Now we are prepared to run the filter and look at the results.

R, Q = 1, 0.03 xs, zs = simulate_system(Q=Q, count=50) kf = FirstOrderKF(R, Q, dt=1) filter_xs1, ps1 = filter_data(kf, zs) plt.figure() plot_kf_output(xs, filter_xs1, zs)
Image in a Jupyter notebook

It looks like the filter is performing well, but it is hard to tell exactly how well. Let's look at the residuals and see if they help. You may have noticed that in the code above I saved the covariance at each step. I did that to use in the following plot. The ConstantVelocityObject class returns a tuple of (position, velocity) for the real object, and this is stored in the array xs, and the filter's estimates are in filter_xs1.

def plot_residuals(xs, filter_xs, Ps, title, y_label, stds=1): res = xs - filter_xs plt.plot(res) bp.plot_residual_limits(Ps, stds) bp.set_labels(title, 'time (sec)', y_label) plt.show()
plot_residuals(xs[:, 0], filter_xs1[:, 0], ps1[:, 0], title='First Order Position Residuals(1$\sigma$)', y_label='meters')
Image in a Jupyter notebook

How do we interpret this plot? The residual is drawn as the jagged line - the difference between the measurement and the predicted position. If there was no measurement noise and the Kalman filter prediction was always perfect the residual would always be zero. So the ideal output would be a horizontal line at 0. We can see that the residual is centered around 0, so this gives us confidence that the noise is Gaussian (because the errors fall equally above and below 0). The yellow area between dotted lines show the theoretical performance of the filter for 1 standard deviations. In other words, approximately 68% of the errors should fall within the dotted lines. The residual falls within this range, so we see that the filter is performing well, and that it is not diverging.

Let's look at the residuals for velocity.

plot_residuals(xs[:, 1], filter_xs1[:, 1], ps1[:, 1], title='First Order Velocity Residuals(1$\sigma$)', y_label='meters/sec')
Image in a Jupyter notebook

Again, as expected, the residual falls within the theoretical performance of the filter, so we feel confident that the filter is well designed for this system. Actually, almost all

Now let's do the same thing using the zero order Kalman filter. All of the code and math is largely the same, so let's just look at the results without discussing the implementation much.

kf0 = ZeroOrderKF(R, Q) filter_xs0, ps0 = filter_data(kf0, zs) plot_kf_output(xs, filter_xs0, zs)
Image in a Jupyter notebook

As we would expect, the filter has problems. Think back to the g-h filter, where we incorporated acceleration into the system. The g-h filter always lagged the input because there were not enough terms to allow the filter to adjust quickly enough to the changes in velocity. The same thing is happening here, just one order lower. On every predict() step the Kalman filter assumes that there is no change in position - if the current position is 4.3 it will predict that the position at the next time period is 4.3. Of course, the actual position is closer to 5.3. The measurement, with noise, might be 5.4, so the filter chooses an estimate part way between 4.3 and 5.4, causing it to lag the actual value of 5.3 by a significant amount. This same thing happens in the next step, the next one, and so on. The filter never catches up.

This raises a very important point. The assumption of 'constant' is an assumption of constant-ness between discrete samples only. The filter's output can still change over time.

Now let's look at the residuals. We are not tracking velocity, so we can only look at the residual for position.

plot_residuals(xs[:, 0], filter_xs0[:, 0], ps0[:, 0], title='Zero Order Position Residuals(3$\sigma$)', y_label='meters', stds=3)
Image in a Jupyter notebook

We can see that the filter diverges almost immediately. After a few seconds the residual exceeds the bounds of three standard deviations. It is important to understand that the covariance matrix P\mathbf{P} is only reporting the theoretical performance of the filter assuming all of the inputs are correct. In other words, this Kalman filter is diverging, but P\mathbf{P} implies that the Kalman filter's estimates are getting better and better with time because the variance is getting smaller. The filter has no way to know that you are lying to it about the system. This is sometimes referred to a smug filter - it is overconfident in its performance.

In this system the divergence is immediate and striking. In many systems it will only be gradual, and/or slight. It is important to look at charts like these for your systems to ensure that the performance of the filter is within the bounds of its theoretical performance.

Now let's try a second order system. This might strike you as a good thing to do. After all, we know there is a bit of noise in the movement of the simulated object, which implies there is some acceleration. Why not model the acceleration with a second order model? If there is no acceleration, the acceleration should just be estimated to be 0, right?. But is that what happens? Think about it before going on.

kf2 = SecondOrderKF(R, Q, dt=1) filter_xs2, ps2 = filter_data(kf2, zs) plot_kf_output(xs, filter_xs2, zs)
Image in a Jupyter notebook

Did this perform as you expected?

We can see that second order filter performs poorly compared to the first order filter. Why? This filter models acceleration, and so the large changes in the measurement gets interpreted as acceleration instead of noise. Thus the filter close tracks the noise. Not only that, but it overshoots the noise in places if the noise is consistently above or below the track because the filter incorrectly assumes an acceleration that does not exist, and so it's prediction goes further and further away from the track on each measurement. This is not a good state of affairs.

Still, the track doesn't look horrible. Let's see the story that the residuals tell. I will add a wrinkle here. The residuals for the second order system do not look terrible in that they do not diverge or exceed three standard deviations. However, it is very telling to look at the residuals for the first order vs the second order filter, so I have plotted both on the same graph.

res = xs[:, 0] - filter_xs2[:, 0] res1 = xs[:, 0] - filter_xs1[:, 0] plt.plot(res1, ls="--", label='order 1') plt.plot(res, label='order 2') bp.plot_residual_limits(ps2[:, 0]) bp.set_labels('Second Order Position Residuals', 'meters', 'time (sec)') plt.legend();
Image in a Jupyter notebook

The second order position residuals are slightly worse than the residuals of the first order filter, but they still fall within the theoretical limits of the filter. There is nothing very alarming here.

Now let's look at the residuals for the velocity.

res = xs[:, 1] - filter_xs2[:, 1] res1 = xs[:, 1] - filter_xs1[:, 1] plt.plot(res, label='order 2') plt.plot(res1, ls='--', label='order 1') bp.plot_residual_limits(ps2[:, 1]) bp.set_labels('Second Order Velocity Residuals', 'meters/sec', 'time (sec)') plt.legend();
Image in a Jupyter notebook

Here the story is very different. While the residuals of the second order system fall within the theoretical bounds of the filter's performance, we can see that the residuals are far worse than for the first order filter. This is the usual result for this scenario. The filter is assuming that there is acceleration that does not exist. It mistakes noise in the measurement as acceleration and this gets added into the velocity estimate on every predict cycle. Of course the acceleration is not actually there and so the residual for the velocity is much larger than is optimum.

I have one more trick up my sleeve. We have a first order system; i.e. the velocity is more-or-less constant. Real world systems are never perfect, so of course the velocity is never exactly the same between time periods. When we use a first order filter we account for that slight variation in velocity with the process noise. The matrix Q\mathbf{Q} is computed to account for this slight variation. If we move to a second order filter we are now accounting for the changes in velocity. Perhaps now we have no process noise, and we can set Q\mathbf{Q} to zero!

kf2 = SecondOrderKF(R, 0, dt=1) filter_xs2, ps2 = filter_data(kf2, zs) plot_kf_output(xs, filter_xs2, zs)
Image in a Jupyter notebook

To my eye that looks quite good! The filter quickly converges to the actual track. Success!

Or, maybe not. Setting the process noise to 0 tells the filter that the process model is perfect. I've yet to hear of a perfect physical system. Let's look at the performance of the filter over a longer period of time.

np.random.seed(25944) xs500, zs500 = simulate_system(Q=Q, count=500) kf2 = SecondOrderKF(R, 0, dt=1) filter_xs2, ps2 = filter_data(kf2, zs500) plot_kf_output(xs500, filter_xs2, zs500) plot_residuals(xs500[:, 0], filter_xs2[:, 0], ps2[:, 0], 'Zero Order Position Residuals', 'meters')
Image in a Jupyter notebookImage in a Jupyter notebook

We can see that the performance of the filter is abysmal. We can see that in the track plot where the filter diverges from the track for an extended period of time.The residual plot makes the problem more apparent. Just before the 100th update the filter diverges sharply from the theoretical performance. It might be converging at the end, but I doubt it. The entire time, the filter is reporting smaller and smaller variances. Do not trust the filter's covariance matrix to tell you if the filter is performing well!

Why is this happening? Recall that if we set the process noise to zero we are telling the filter to use only the process model. The measurements end up getting ignored. The physical system is not perfect, and so the filter is unable to adapt to this imperfect behavior.

Maybe just a really low process noise? Let's try that.

np.random.seed(32594) xs2000, zs2000 = simulate_system(Q=0.0001, count=2000) kf2 = SecondOrderKF(R, 0, dt=1) filter_xs2, ps2 = filter_data(kf2, zs2000) plot_kf_output(xs2000, filter_xs2, zs2000) plot_residuals(xs2000[:, 0], filter_xs2[:, 0], ps2[:, 0], 'Seceond Order Position Residuals', 'meters')
Image in a Jupyter notebookImage in a Jupyter notebook

Again, the residual plot tells the story. The track looks very good, but the residual plot shows that the filter is diverging for significant periods of time.

How should you think about all of this? You might argue that the last plot is 'good enough' for your application, and perhaps it is. I warn you however that a diverging filter doesn't always converge. With a different data set, or a physical system that performs differently you can end up with a filter that veers further and further away from the measurements.

Also, let's think about this in a data fitting sense. Suppose I give you two points, and tell you to fit a straight line to the points.

plt.scatter([1, 2], [1, 1], s=100, c='r') plt.plot([0, 3], [1, 1]);
Image in a Jupyter notebook

A straight line is the only possible answer. Furthermore, the answer is optimal. If I gave you more points you could use a least squares fit to find the best line, and the answer would still be optimal (in a least squares sense).

But suppose I told you to fit a higher order polynomial to those two points. There is now an infinite number of answers to the problem. For example, infinite number of parabolas (which are second order) pass through those points. When the Kalman filter is of higher order than your physical process it also has an infinite number of solutions to choose from. The answer is not just non-optimal, it often diverges and never recovers.

For best performance you need a filter whose order matches the system's order.. In many cases that will be easy to do - if you are designing a Kalman filter to read the thermometer of a freezer it seems clear that a zero order filter is the right choice. But what order should we use if we are tracking a car? Order one will work well while the car is moving in a straight line at a constant speed, but cars turn, speed up, and slow down, in which case a second order filter will perform better. That is the problem addressed in the Adaptive Filters chapter. There we will learn how to design a filter that adapts to changing order in the tracked object's behavior.

With that said, a lower order filter can track a higher order process so long as you add enough process noise and you keep the discretization period small (100 samples a second are usually locally linear). The results will not be optimal, but they can still be very good, and I always reach for this tool first before trying an adaptive filter. Let's look at an example with acceleration. First, the simulation.

class ConstantAccelerationObject(object): def __init__(self, x0=0, vel=1., acc=0.1, acc_noise_scale=.1): self.x = x0 self.vel = vel self.acc = acc self.acc_noise_scale = acc_noise_scale def update(self): self.acc += randn() * self.acc_noise_scale self.vel += self.acc self.x += self.vel return (self.x, self.vel, self.acc) R, Q = 6., 0.02 def simulate_acc_system(R, Q, count): obj = ConstantAccelerationObject(acc_noise_scale=Q) zs = [] xs = [] for i in range(count): x = obj.update() z = sense(x, R) xs.append(x) zs.append(z) return np.asarray(xs), zs np.random.seed(124) xs, zs = simulate_acc_system(R=R, Q=Q, count=80) plt.plot(xs[:, 0]);
Image in a Jupyter notebook

Now we will filter the data using a second order filter.

np.random.seed(124) xs, zs = simulate_acc_system(R=R, Q=Q, count=80) kf2 = SecondOrderKF(R, Q, dt=1) fxs2, ps2 = filter_data(kf2, zs) plot_kf_output(xs, fxs2, zs, aspect_equal=False) plot_residuals(xs[:, 0], fxs2[:, 0], ps2[:, 0], 'Second Order Position Residuals', 'meters')
Image in a Jupyter notebookImage in a Jupyter notebook

We can see that the filter is performing within the theoretical limits of the filter.

Now let's use a lower order filter. As already demonstrated the lower order filter will lag the signal because it is not modeling the acceleration. However, we can account for that (to an extent) by increasing the size of the process noise. The filter will treat the acceleration as noise in the process model. The result will be suboptimal, but if designed well it will not diverge. Choosing the amount of extra process noise is not an exact science. You will have to experiment with representative data. Here, I've multiplied it by 10, and am getting good results.

kf2 = FirstOrderKF(R, Q * 10, dt=1) fxs2, ps2 = filter_data(kf2, zs) plot_kf_output(xs, fxs2, zs, aspect_equal=False) plot_residuals(xs[:, 0], fxs2[:, 0], ps2[:, 0], 'Second Order Position Residuals', 'meters')
Image in a Jupyter notebookImage in a Jupyter notebook

Think about what will happen if you make the process noise many times larger than it needs to be. A large process noise tells the filter to favor the measurements, so we would expect the filter to closely mimic the noise in the measurements. Let's find out.

kf1 = FirstOrderKF(R, Q * 10000, dt=1) fxs1, ps1 = filter_data(kf2, zs) plot_kf_output(xs, fxs1, zs, aspect_equal=False) plot_residuals(xs[:, 0], fxs1[:, 0], ps2[:, 0], 'Second Order Position Residuals', 'meters')
Image in a Jupyter notebookImage in a Jupyter notebook

Sensor Fusion

Early in the g-h filter chapter we discussed designing a filter for two scales, one accurate and one inaccurate. We determined that we should always include the information from the inaccurate filter - we should never discard any information. If you do not recall this discussion, consider this edge case. You have two scales, one accurate to 1 lb, the second to 10 lbs. You weigh yourself, and the first reads 170 lbs, and the second reads 181 lbs. If we only used the data from the first scale we would only know that our weight is in the range of 169 to 171 lbs. However, when we incorporate the measurement from the second scale we can decide that the correct weight can only be 171 lbs, since 171 lbs is exactly at the outside range of the second measurement, which would be 171 to 191 lbs. Of course most measurements don't offer such a exact answer, but the principle applies to any measurement.

So consider a situation where we have two sensors measuring the system. How shall we incorporate that into our Kalman filter?

Suppose we have a train or cart on a railway. It has a sensor attached to the wheels counting revolutions, which can be converted to a distance along the track. Then, suppose we have a GPS-like sensor which I'll call a 'position sensor' mounted to the train which reports position. I'll explain why I don't just use a GPS in the next section. Thus, we have two measurements, both reporting position along the track. Suppose further that the accuracy of the wheel sensor is 1m, and the accuracy of the position sensor is 10m. How do we combine these two measurements into one filter? This may seem quite contrived, but aircraft use sensor fusion to fuse the measurements from sensors such as a GPS, INS, Doppler radar, VOR, the airspeed indicator, and more.

Kalman filters for inertial filters are very difficult, but fusing data from two or more sensors providing measurements of the same state variable (such as position) is quite easy. The relevant matrix is the measurement matrix H\mathbf{H}. Recall that this matrix tells us how to convert from the Kalman filter's state x\mathbf{x} to a measurement z\mathbf{z}. Suppose that we decide that our Kalman filter state should contain the position and velocity of the train, so that

x=[xx˙]\mathbf{x} = \begin{bmatrix}x \\ \dot{x}\end{bmatrix}

We have two measurements for position, so we will define the measurement vector to be a vector of the measurements from the wheel and the position sensor.

z=[xwheelxps]\mathbf{z} = \begin{bmatrix}x_{wheel} \\ x_{ps}\end{bmatrix}

So we have to design the matrix H\mathbf{H} to convert x\mathbf{x} to z\mathbf{z} . They are both positions, so the conversion is nothing more than multiplying by one:

[xwheelxps]=[1010][xx˙]\begin{bmatrix}x_{wheel} \\ x_{ps}\end{bmatrix} = \begin{bmatrix}1 &0 \\ 1& 0\end{bmatrix} \begin{bmatrix}x \\ \dot{x}\end{bmatrix}

To make it clearer, suppose that the wheel reports not position but the number of rotations of the wheels, where 1 revolution yields 2 meters of travel. In that case we would write

[xwheelxps]=[0.5010][xx˙]\begin{bmatrix}x_{wheel} \\ x_{ps}\end{bmatrix} = \begin{bmatrix}0.5 &0 \\ 1& 0\end{bmatrix} \begin{bmatrix}x \\ \dot{x}\end{bmatrix}

Now we have to design the measurement noise matrix R\mathbf{R}. Suppose that the measurement variance for the position is twice the variance of the wheel, and the standard deviation of the wheel is 1.5 meters. That gives us

σwheel=1.5σwheel2=2.25σps=1.52=3σps2=9.\begin{aligned} \sigma_{wheel} &= 1.5\\ \sigma^2_{wheel} &= 2.25 \\ \sigma_{ps} &= 1.5*2 = 3 \\ \sigma^2_{ps} &= 9. \end{aligned}

That is pretty much our Kalman filter design. We need to design for Q\mathbf{Q}, but that is invariant to whether we are doing sensor fusion or not, so I will just choose some arbitrary value.

So let's run a simulation of this design. I will assume a velocity of 10 m/s with an update rate of 0.1 seconds.

from numpy import array, asarray import numpy.random as random def fusion_test(wheel_sigma, ps_sigma, do_plot=True): dt = 0.1 kf = KalmanFilter(dim_x=2, dim_z=2) kf.F = array([[1., dt], [0., 1.]]) kf.H = array([[1., 0.], [1., 0.]]) kf.x = array([[0.], [1.]]) kf.Q *= array([[(dt**3)/3, (dt**2)/2], [(dt**2)/2, dt ]]) * 0.02 kf.P *= 100 kf.R[0, 0] = wheel_sigma**2 kf.R[1, 1] = ps_sigma**2 random.seed(1123) xs, zs, nom = [], [], [] for i in range(1, 100): m0 = i + randn()*wheel_sigma m1 = i + randn()*ps_sigma z = array([[m0], [m1]]) kf.predict() kf.update(z) xs.append(kf.x.T[0]) zs.append(z.T[0]) nom.append(i) xs = asarray(xs) zs = asarray(zs) nom = asarray(nom) res = nom - xs[:, 0] print('fusion std: {:.3f}'.format(np.std(res))) if do_plot: bp.plot_measurements(zs[:, 0], label='Wheel') plt.plot(zs[:, 1], linestyle='--', label='Pos Sensor') bp.plot_filter(xs[:, 0], label='Kalman filter') plt.legend(loc=4) plt.ylim(0, 100) fusion_test(1.5, 3.0)
fusion std: 0.391
Image in a Jupyter notebook

We can see the result of the Kalman filter in blue.

It may be somewhat difficult to understand the previous example at an intuitive level. Let's look at a different problem. Suppose we are tracking an object in 2D space, and have two radar systems at different positions. Each radar system gives us a range and bearing to the target. How do the readings from each data affect the results?

This is a nonlinear problem because we need to use a trigonometry to compute coordinates from a range and bearing, and we have not yet learned how to solve nonlinear problems with Kalman filters. So for this problem ignore the code that I use and just concentrate on the charts that the code outputs. We will revisit this problem in subsequent chapters and learn how to write this code.

I will position the target at (100, 100). The first radar will be at (50, 50), and the second radar at (150, 50). This will cause the first radar to measure a bearing of 45 degrees, and the second will report 135 degrees.

I will create the Kalman filter first, and then plot its initial covariance matrix. I am using a unscented Kalman filter.

from kf_design_internal import sensor_fusion_kf, set_radar_pos kf = sensor_fusion_kf() x0, p0 = kf.x.copy(), kf.P.copy() plot_covariance_ellipse(x0, cov=p0, fc='y', ec=None, alpha=0.6)
Image in a Jupyter notebook

We are equally uncertain about the position in x and y, so the covariance is circular.

Now we will update the Kalman filter with a reading from the first radar. I will set the standard deviation of the bearing error at 0.5$^\circ$, and the standard deviation of the distance error at 3.

from math import radians # set the error of the radar's bearing and distance kf.R[0, 0] = radians (.5)**2 kf.R[1, 1] = 3.**2 # compute position and covariance from first radar station set_radar_pos((50, 50)) dist = (50**2 + 50**2) ** 0.5 kf.predict() kf.update([radians(45), dist]) # plot the results x1, p1 = kf.x.copy(), kf.P.copy() plot_covariance_ellipse(x0, p0, fc='y', ec=None, alpha=0.6) plot_covariance_ellipse(x1, p1, fc='g', ec='k', alpha=0.6) plt.scatter([100], [100], c='y', label='Initial') plt.scatter([100], [100], c='g', label='1st station') plt.legend(scatterpoints=1, markerscale=3) plt.plot([92, 100], [92, 100], c='g', lw=2, ls='--');
Image in a Jupyter notebook

We can see the effect of the errors on the geometry of the problem. The radar station is to the lower left of the target. The bearing measurement is extremely accurate at σ=0.5\sigma=0.5^\circ, but the distance error is inaccurate at σ=3\sigma=3. I've shown the radar reading with the dotted green line. We can easily see the effect of the accurate bearing and inaccurate distance in the shape of the covariance ellipse.

Now we can incorporate the second radar station's measurement. The second radar is at (150,50), which is below and to the right of the target. Before you go on, think about how you think the covariance will change when we incorporate this new reading.

# compute position and covariance from first radar station set_radar_pos((150, 50)) kf.predict() kf.update([radians(135), dist]) plot_covariance_ellipse(x0, p0, fc='y', ec='k', alpha=0.6) plot_covariance_ellipse(x1, p1, fc='g', ec='k', alpha=0.6) plot_covariance_ellipse(kf.x, kf.P, fc='b', ec='k', alpha=0.6) plt.scatter([100], [100], c='y', label='Initial') plt.scatter([100], [100], c='g', label='1st station') plt.scatter([100], [100], c='b', label='2nd station') plt.legend(scatterpoints=1, markerscale=3) plt.plot([92, 100], [92, 100], c='g', lw=2, ls='--') plt.plot([108, 100], [92, 100], c='b', lw=2, ls='--');
Image in a Jupyter notebook

We can see how the second radar measurement altered the covariance. The angle to the target is orthogonal to the first radar station, so the effects of the error in the bearing and range are swapped. So the angle of the covariance matrix switches to match the direction to the second station. It is important to note that the direction did not merely change; the size of the covariance matrix became much smaller as well.

The covariance will always incorporate all of the information available, including the effects of the geometry of the problem. This formulation makes it particularly easy to see what is happening, but the same thing occurs if one sensor gives you position and a second sensor gives you velocity, or if two sensors provide measurements of position.

One final thing before we move on: sensor fusion is a vast topic, and my coverage is simplistic to the point of being misleading. For example, the Kalman Filter Math chapter talks about using iterated least squares to determine the position from a set of pseudorange readings from the satellites without using a Kalman filter. That is the usual but not exclusive way this computation is done in GPS receivers. If you are a hobbiest my coverage may get you started. A commercial grade filter requires very careful design of the fusion process. That is the topic of several books, and you will have to further your education by finding one that covers your domain.

Exercise: Can you Kalman Filter GPS outputs?

In the section above I have you apply a Kalman filter to 'GPS-like' sensor. Can you apply a Kalman filter to the output of a commercial Kalman filter? In other words, will the output of your filter be better than, worse than, or equal to the GPS's output?

Solution

Commercial GPS's have a Kalman filter built into them, and their output is the filtered estimate created by that filter. So, suppose you have a steady stream of output from the GPS consisting of a position and position error. Can you not pass those two pieces of data into your own filter?

Well, what are the characteristics of that data stream, and more importantly, what are the fundamental requirements of the input to the Kalman filter?

Inputs to the Kalman filter must be Gaussian and time independent. The output of the GPS is time dependent because the filter bases its current estimate on the recursive estimates of all previous measurements. Hence, the signal is not white, it is not time independent, and if you pass that data into a Kalman filter you have violated the mathematical requirements of the filter. So, the answer is no, you cannot get better estimates by running a KF on the output of a commercial GPS.

Another way to think of it is that Kalman filters are optimal in a least squares sense. There is no way to take an optimal solution, pass it through a filter, any filter, and get a 'more optimal' answer because it is a logical impossibility. At best the signal will be unchanged, in which case it will still be optimal, or it will be changed, and hence no longer optimal.

This is a difficult problem that hobbyists face when trying to integrate GPS, IMU's and other off the shelf sensors.

Let's look at the effect. A commercial GPS reports position, and an estimated error range. The estimated error just comes from the Kalman filter's P\mathbf{P} matrix. So let's filter some noisy data, take the filtered output as the new noisy input to the filter, and see what the result is. In other words, x\mathbf{x} will supply the z\mathbf{z} input, and P\mathbf{P} will supply the measurement covariance R\mathbf{R}. To exaggerate the effects somewhat to make them more obvious I will plot the effects of doing this one, and then a second time. The second iteration doesn't make any 'sense' (no one would try that), it just helps me illustrate a point. First, the code and plots.

np.random.seed(124) xs, zs = simulate_acc_system(R=R, Q=Q, count=30) kf0 = SecondOrderKF(R, Q, dt=1) kf1 = SecondOrderKF(R, Q, dt=1) kf2 = SecondOrderKF(R, Q, dt=1) # Filter measurements fxs0, ps0 = filter_data(kf0, zs) # filter twice more, using P as error fxs1, ps1, _, _ = kf1.batch_filter(fxs0[:, 0], ps[:, 0]) fxs2, _, _, _ = kf2.batch_filter(fxs1[:, 0], ps1[:, 0, 0]) plot_kf_output(xs, fxs0, zs, title='KF', aspect_equal=False) plot_kf_output(xs, fxs1, zs, title='1 iteration', aspect_equal=False) plot_kf_output(xs, fxs2, zs, title='2 iterations', aspect_equal=False)
Image in a Jupyter notebookImage in a Jupyter notebookImage in a Jupyter notebook

We see that the filtered output of the reprocessed signal is smoother, but it also diverges from the track. What is happening? Recall that the Kalman filter requires that the signal not be time correlated. However the output of the Kalman filter is time correlated because it incorporates all previous measurements into its estimate for this time period. So look at the last graph, for 2 iterations. The measurements start with several peaks that are larger than the track. This is 'remembered' (that is vague terminology, but I am trying to avoid the math) by the filter, and it has started to compute that the object is above the track. Later, at around 13 seconds we have a period where the measurements all happen to be below the track. This also gets incorporated into the memory of the filter, and the iterated output diverges far below the track.

Now let's look at this in a different way. The iterated output is not using z\mathbf{z} as the measurement, but the output of the previous Kalman filter estimate. So I will plot the output of the filter against the previous filter's output.

plot_kf_output(xs, fxs0, zs, title='KF', aspect_equal=False) plot_kf_output(xs, fxs1, fxs0[:, 0], title='1 iteration', aspect_equal=False) plot_kf_output(xs, fxs2, fxs1[:, 0], title='2 iterations', aspect_equal=False)
Image in a Jupyter notebookImage in a Jupyter notebookImage in a Jupyter notebook

I hope the problem with this approach is now apparent. In the bottom graph we can see that the KF is tracking the imperfect estimates of the previous filter, and incorporating delay into the signal as well due to the memory of the previous measurements being incorporated into the signal.

Exercise: Prove that the position sensor improves the filter

Devise a way to prove that fusing the position sensor and wheel measurements yields a better result than using the wheel alone.

Solution 1

Force the Kalman filter to disregard the position sensor measurement by setting the measurement noise for the position sensor to a near infinite value. Re-run the filter and observe the standard deviation of the residual.

fusion_test(1.5, 3.0, do_plot=False) fusion_test(1.5, 1e80, do_plot=False)
fusion std: 0.391 fusion std: 0.438

Here we can see the error in the filter where the position sensor measurement is almost entirely ignored is greater than that where it is used.

Solution 2

This is more work, but we can write a Kalman filter that only takes one measurement.

dt = 0.1 wheel_sigma = 1.5 kf = KalmanFilter(dim_x=2, dim_z=1) kf.F = array([[1., dt], [0., 1.]]) kf.H = array([[1., 0.]]) kf.x = array([[0.], [1.]]) kf.Q *= 0.01 kf.P *= 100 kf.R[0, 0] = wheel_sigma**2 random.seed(1123) nom = range(1, 100) zs = np.array([i + randn()*wheel_sigma for i in nom]) xs, _, _, _ = kf.batch_filter(zs) res = nom - xs[:, 0] print('std: {:.3f}'.format(np.std(res))) bp.plot_filter(xs[:, 0], label='Kalman filter') bp.plot_measurements(zs, label='Wheel') plt.legend(loc=4) plt.xlim((-1, 100));
std: 40.386
Image in a Jupyter notebook

On this run I got a standard deviation of 0.523 vs the value of 0.391 for the fused measurements.

Exercise: Different Data Rates

It is rare that two different sensor classes output data at the same rate. Assume that the position sensor produces an update at 1 Hz, and the wheel updates at 4 Hz. Design a filter that incorporates all of these measurements.

hint: This is a difficult problem in that I have not explained how to do this. Think about which matrices incorporate time, and which incorporate knowledge about the number and kind of measurements. All of these will have to be designed to work with this problem. If you can correctly enumerate those matrices you are most of the way to a solution.

#your solution here!

Solution

We can do this by setting the data rate to 0.25 seconds, which is 4 Hz. As we loop, on every iteration we call update() with only the wheel measurement. Then, every fourth time we will call update() with both the wheel and PS measurements.

This means that we vary the amount of data in the z parameter. The matrices associated with the measurement are H\mathbf{H} and R\mathbf{R}. In the code above we designed H to be

H=[1010]\mathbf{H} = \begin{bmatrix}1 &0 \\ 1& 0\end{bmatrix}

to account for the two measurements of position. When only the wheel reading is available, we must set

H=[10].\mathbf{H} = \begin{bmatrix}1 &0\end{bmatrix}.

The matrix R\mathbf{R} specifies the noise in each measurement. In the code above we set

R=[σwheel200σps2]\mathbf{R} = \begin{bmatrix}\sigma_{wheel}^2 &0 \\ 0 & \sigma_{ps}^2\end{bmatrix}

When only the wheel measurement is available, we must set

R=[σwheel2]\mathbf{R} = \begin{bmatrix}\sigma_{wheel}^2\end{bmatrix}

The two matrices that incorporate time are F\mathbf{F} and Q\mathbf{Q}. For example,

F=[1Δt01].\mathbf{F} = \begin{bmatrix}1 & \Delta t \\ 0& 1\end{bmatrix}.

Since the wheel and position sensor reading coincide once every 4 readings we can just set δt=0.25\delta t =0.25 and not modify it while filtering. If the readings did not coincide in each iteration you would have to calculate how much time has passed since the last predict, compute a new F\mathbf{F} and Q\mathbf{Q}, and then call predict() so the filter could make a correct prediction based on the time step required.

So here is the code.

def fusion_test(wheel_sigma, ps_sigma, do_plot=True): dt = 0.25 kf = KalmanFilter(dim_x=2, dim_z=2) kf.F = array([[1., dt], [0., 1.]]) kf.H = array([[1., 0.], [1., 0.]]) kf.x = array([[0.], [1.]]) kf.Q *= array([[(dt**3)/3, (dt**2)/2], [(dt**2)/2, dt ]]) * 0.02 kf.P *= 100 kf.R[0, 0] = wheel_sigma**2 kf.R[1, 1] = ps_sigma**2 random.seed(1123) xs = [] zs_wheel = [] zs_ps = [] nom = [] for i in range(1, 101): if i % 4 == 0: m0 = i + randn()*wheel_sigma m1 = i + randn()*ps_sigma z = array([[m0], [m1]]) kf.H = array([[1., 0.], [1., 0.]]) R = np.eye(2) R[0, 0] = wheel_sigma**2 R[1, 1] = ps_sigma**2 zs_wheel.append(m0) zs_ps.append((i, m1)) else: m0 = i + randn()*wheel_sigma z = array([m0]) kf.H = array([[1., 0.]]) R = np.eye(1) * wheel_sigma**2 zs_wheel.append(m0) kf.predict() kf.update(z, R) xs.append(kf.x.T[0]) nom.append(i) xs = asarray(xs) nom = asarray(nom) res = nom - xs[:, 0] print('fusion std: {:.3f}'.format(np.std(res))) if do_plot: bp.plot_measurements(zs_wheel, label='Wheel') plt.plot(*zip(*zs_ps), linestyle='--', label='Pos Sensor') bp.plot_filter(xs[:, 0], label='Kalman filter') plt.legend(loc=4) plt.ylim(0, 100) fusion_test(1.5, 3.0);
fusion std: 0.452
Image in a Jupyter notebook

We can see from the standard deviation that the performance is a bit worse than when the PS and wheel were measured in every update, but better than the wheel alone.

The code is fairly straightforward. The update() function optionally takes R as an argument, and I chose to do that rather than alter KalmanFilter.R, mostly to show that it is possible. Either way is fine. I modified KalmanFilter.H on each update depending on whether there are 1 or 2 measurements available. The only other difficulty was storing the wheel and PS measurements in two different arrays because there are a different number of measurements for each.

Tracking a Ball

Now let's turn our attention to a situation where the physics of the object that we are tracking is constrained. A ball thrown in a vacuum must obey Newtonian laws. In a constant gravitational field it will travel in a parabola. I will assume you are familiar with the derivation of the formula:

y=g2t2+vy0t+y0x=vx0t+x0\begin{aligned} y &= \frac{g}{2}t^2 + v_{y0} t + y_0 \\ x &= v_{x0} t + x_0 \end{aligned}

where gg is the gravitational constant, tt is time, vx0v_{x0} and vy0v_{y0} are the initial velocities in the x and y plane. If the ball is thrown with an initial velocity of vv at angle θ\theta above the horizon, we can compute vx0v_{x0} and vy0v_{y0} as

vx0=vcosθvy0=vsinθ\begin{aligned} v_{x0} = v \cos{\theta} \\ v_{y0} = v \sin{\theta} \end{aligned}

Because we don't have real data we will start by writing a simulator for a ball. As always, we add a noise term independent of time so we can simulate noise sensors.

from math import radians, sin, cos import math def rk4(y, x, dx, f): """computes 4th order Runge-Kutta for dy/dx. y is the initial value for y x is the initial value for x dx is the difference in x (e.g. the time step) f is a callable function (y, x) that you supply to compute dy/dx for the specified values. """ k1 = dx * f(y, x) k2 = dx * f(y + 0.5*k1, x + 0.5*dx) k3 = dx * f(y + 0.5*k2, x + 0.5*dx) k4 = dx * f(y + k3, x + dx) return y + (k1 + 2*k2 + 2*k3 + k4) / 6. def fx(x,t): return fx.vel def fy(y,t): return fy.vel - 9.8*t class BallTrajectory2D(object): def __init__(self, x0, y0, velocity, theta_deg=0., g=9.8, noise=[0.0, 0.0]): self.x = x0 self.y = y0 self.t = 0 theta = math.radians(theta_deg) fx.vel = math.cos(theta) * velocity fy.vel = math.sin(theta) * velocity self.g = g self.noise = noise def step(self, dt): self.x = rk4(self.x, self.t, dt, fx) self.y = rk4(self.y, self.t, dt, fy) self.t += dt return (self.x + randn()*self.noise[0], self.y + randn()*self.noise[1])

So to create a trajectory starting at (0, 15) with a velocity of 100ms100 \frac{m}{s} and an angle of 6060^\circ we would write:

traj = BallTrajectory2D(x0=0, y0=15, velocity=100, theta_deg=60)

and then call traj.position(t) for each time step. Let's test this

def test_ball_vacuum(noise): y = 15 x = 0 ball = BallTrajectory2D(x0=x, y0=y, theta_deg=60., velocity=100., noise=noise) t = 0 dt = 0.25 while y >= 0: x, y = ball.step(dt) t += dt if y >= 0: plt.scatter(x, y) plt.axis('equal'); #test_ball_vacuum([0, 0]) # plot ideal ball position test_ball_vacuum([1, 1]) # plot with noise
Image in a Jupyter notebook

This looks reasonable, so let's continue (exercise for the reader: validate this simulation more robustly).

Step 1: Choose the State Variables

We might think to use the same state variables as used for tracking the dog. However, this will not work. Recall that the Kalman filter state transition must be written as x=Fx+Bu\mathbf{x}^- = \mathbf{Fx} + \mathbf{Bu}, which means we must calculate the current state from the previous state. Our assumption is that the ball is traveling in a vacuum, so the velocity in x is a constant, and the acceleration in y is solely due to the gravitational constant gg. We can discretize the Newtonian equations using the well known Euler method in terms of Δt\Delta t are:

xt=xt1+vx(t1)Δtvxt=vxt1yt=yt1+vy(t1)Δtvyt=gΔt+vy(t1)\begin{aligned} x_t &= x_{t-1} + v_{x(t-1)} {\Delta t} \\ v_{xt} &= vx_{t-1} \\ y_t &= y_{t-1} + v_{y(t-1)} {\Delta t}\\ v_{yt} &= -g {\Delta t} + v_{y(t-1)} \\ \end{aligned}

sidebar: Euler's method integrates a differential equation stepwise by assuming the slope (derivative) is constant at time tt. In this case the derivative of the position is velocity. At each time step Δt\Delta t we assume a constant velocity, compute the new position, and then update the velocity for the next time step. There are more accurate methods, such as Runge-Kutta available to us, but because we are updating the state with a measurement in each step Euler's method is very accurate. If you need to use Runge-Kutta you will have to write your own predict() function which computes the state transition for x\mathbf{x}, and then uses the normal Kalman filter equation P=FPFT+Q\mathbf{P}=\mathbf{FPF}^\mathsf{T} + \mathbf{Q} to update the covariance matrix.

This implies that we need to incorporate acceleration for yy into the Kalman filter, but not for xx. This suggests the following state variables.

x=[xx˙yy˙y¨]\mathbf{x} = \begin{bmatrix} x \\ \dot{x} \\ y \\ \dot{y} \\ \ddot{y} \end{bmatrix}

However, the acceleration is due to gravity, which is a constant. Instead of asking the Kalman filter to track a constant we can treat gravity as what it really is - a control input. In other words, gravity is a force that alters the behavior of the system in a known way, and it is applied throughout the flight of the ball.

The equation for the state prediction is x=Fx+Bu\mathbf{x^-} = \mathbf{Fx} + \mathbf{Bu}. Fx\mathbf{Fx} is the familiar state transition function which we will use to model the position and velocity of the ball. The vector u\mathbf{u} lets you specify a control input into the filter. For a car the control input will be things such as the amount the accelerator and brake are pressed, the position of the steering wheel, and so on. For our ball the control input will be gravity. The matrix B\mathbf{B} models how the control inputs affect the behavior of the system. Again, for a car B\mathbf{B} will convert the inputs of the brake and accelerator into changes of velocity, and the input of the steering wheel into a different position and heading. For our ball tracking problem it will compute the velocity change due to gravity. We will go into the details of that in step 3. For now, we design the state variable to be

x=[xx˙yy˙]\mathbf{x} = \begin{bmatrix} x \\ \dot{x} \\ y \\ \dot{y} \end{bmatrix}

Step 2: Design State Transition Function

Our next step is to design the state transition function. Recall that the state transistion function is implemented as a matrix F\mathbf{F} that we multiply with the previous state of our system to get the next statex=Fx\mathbf{x}' = \mathbf{Fx}.

I will not belabor this as it is very similar to the 1-D case we did in the previous chapter. Our state equations for position and velocity would be:

x=(1x)+(Δtvx)+(0y)+(0vy)vx=(0x)+(1vx)+(0y)+(0vy)y=(0x)+(0vx)+(1y)+(Δtvy)vy=(0x)+(0vx)+(0y)+(1vy)\begin{aligned} x' &= (1*x) + (\Delta t * v_x) + (0*y) + (0 * v_y) \\ v_x &= (0*x) + (1*v_x) + (0*y) + (0 * v_y) \\ y' &= (0*x) + (0* v_x) + (1*y) + (\Delta t * v_y) \\ v_y &= (0*x) + (0*v_x) + (0*y) + (1*v_y) \end{aligned}

Note that none of the terms include gg, the gravitational constant. As I explained in the previous function we will account for gravity using the control input of the Kalman filter. In matrix form we write this as:

F=[1Δt000100001Δt0001]\mathbf{F} = \begin{bmatrix} 1 & \Delta t & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & \Delta t \\ 0 & 0 & 0 & 1 \end{bmatrix}

Step 3: Design the Control Input Function

We will use the control input to account for the force of gravity. The term Bu\mathbf{Bu} is added to x\mathbf{x^-} to account for how much x\mathbf{x^-} changes due to gravity. We can say that Bu\mathbf{Bu} contains [ΔxgΔxg˙ΔygΔyg˙]T\begin{bmatrix}\Delta x_g & \Delta \dot{x_g} & \Delta y_g & \Delta \dot{y_g}\end{bmatrix}^\mathsf{T}.

If we look at the discretized equations we see that gravity only affect the velocity for yy.

xt=xt1+vx(t1)Δtvxt=vxt1yt=yt1+vy(t1)Δtvyt=gΔt+vy(t1)\begin{aligned} x_t &= x_{t-1} + v_{x(t-1)} {\Delta t} \\ v_{xt} &= vx_{t-1} \\ y_t &= y_{t-1} + v_{y(t-1)} {\Delta t}\\ v_{yt} &= -g {\Delta t} + v_{y(t-1)} \\ \end{aligned}

Therefore we want the product Bu\mathbf{Bu} to equal [000gΔt]T\begin{bmatrix}0 & 0 & 0 & -g \Delta t \end{bmatrix}^\mathsf{T}. In some sense it is arbitrary how we define B\mathbf{B} and u\mathbf{u} so long as multiplying them together yields this result. For example, we could define B=1\mathbf{B}=1 and u=[000gΔt]T\mathbf{u} = \begin{bmatrix}0 & 0 & 0 & -g \Delta t \end{bmatrix}^\mathsf{T}. But this doesn't really fit with our definitions for B\mathbf{B} and u\mathbf{u}, where u\mathbf{u} is the control input, and B\mathbf{B} is the control function. The control input is g-g for the velocity of y. So this is one possible definition.

B=[000000000000000Δt],u=[000g]\mathbf{B} = \begin{bmatrix}0&0&0&0 \\ 0&0&0&0 \\0&0&0&0 \\0&0&0&\Delta t\end{bmatrix}, \mathbf{u} = \begin{bmatrix}0\\0\\0\\-g\end{bmatrix}

To me this seems a bit excessive; I would suggest we might want u\mathbf{u} to contain the control input for the two dimensions xx and yy, which suggests

B=[0000000Δt],u=[0g]\mathbf{B} = \begin{bmatrix}0&0 \\ 0&0 \\0&0 \\0&\Delta t\end{bmatrix}, \mathbf{u} = \begin{bmatrix}0\\-g\end{bmatrix}

You might prefer to only provide control inputs that actually exist, and there is no control input for xx, so we arrive at

B=[000Δt],u=[g]\mathbf{B} = \begin{bmatrix}0 \\ 0 \\0\\ \Delta t\end{bmatrix}, \mathbf{u} = \begin{bmatrix}-g\end{bmatrix}

I've seen people use B=[0000000000000001],u=[000gΔt]\mathbf{B} = \begin{bmatrix}0&0&0&0 \\ 0&0&0&0 \\0&0&0&0 \\0&0&0&1\end{bmatrix}, \mathbf{u} = \begin{bmatrix}0\\0\\0\\-g \Delta t\end{bmatrix}

While this does produce the correct result, I am resistant to putting time into u\mathbf{u} as time is not a control input, it is what we use to convert the control input into a change in state, which is the job of B\mathbf{B}.

Step 4: Design the Measurement Function

The measurement function defines how we go from the state variables to the measurements using the equation z=Hx\mathbf{z} = \mathbf{Hx}. We will assume that we have a sensor that provides us with the position of the ball in (x,y), but cannot measure velocities or accelerations. Therefore our function must be:

[zxzy]=[10000010][xx˙yy˙]\begin{bmatrix}z_x \\ z_y \end{bmatrix}= \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 \end{bmatrix} * \begin{bmatrix} x \\ \dot{x} \\ y \\ \dot{y} \end{bmatrix}

where

H=[10000010]\mathbf{H} = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 \end{bmatrix}

Step 5: Design the Measurement Noise Matrix

As with the robot, we will assume that the error is independent in xx and yy. In this case we will start by assuming that the measurement error in x and y are 0.5 meters. Hence,

R=[0.5000.5]\mathbf{R} = \begin{bmatrix}0.5&0\\0&0.5\end{bmatrix}

Step 6: Design the Process Noise Matrix

We are assuming a ball moving in a vacuum, so there should be no process noise. We have 4 state variables, so we need a 4×44{\times}4 covariance matrix:

Q=[0000000000000000]\mathbf{Q} = \begin{bmatrix}0&0&0&0\\0&0&0&0\\0&0&0&0\\0&0&0&0\end{bmatrix}

Step 7: Design the Initial Conditions

We already performed this step when we tested the state transition function. Recall that we computed the initial velocity for xx and yy using trigonometry, and set the value of x\mathbf{x} with:

omega = radians(omega) vx = cos(omega) * v0 vy = sin(omega) * v0 f1.x = np.array([[x, vx, y, vy]]).T

With all the steps done we are ready to implement our filter and test it. First, the implementation:

from math import sin,cos,radians def ball_kf(x, y, omega, v0, dt, r=0.5, q=0.): kf = KalmanFilter(dim_x=4, dim_z=2, dim_u=1) ay = .5*dt**2 kf.F = np.array([[1, dt, 0, 0], # x = x0 + dx*dt [0, 1, 0, 0], # dx = dx0 [0, 0, 1, dt], # y = y0 + dy*dt [0, 0, 0, 1]]) # dy = dy0 kf.H = np.array([[1, 0, 0, 0], [0, 0, 1, 0]]) kf.B = np.array([[0,0,0,.1]]).T kf.R *= r kf.Q *= q omega = radians(omega) vx = cos(omega) * v0 vy = sin(omega) * v0 kf.x = np.array([[x, vx, y, vy]]).T return kf

Now we will test the filter by generating measurements for the ball using the ball simulation class.

y = 1. x = 0. theta = 35. # launch angle v0 = 80. dt = 1/10. # time step g = 9.8 # gravitational constant ball = BallTrajectory2D(x0=x, y0=y, theta_deg=theta, velocity=v0, noise=[.2, .2]) kf = ball_kf(x, y, theta, v0, dt) t = 0 xs = [] ys = [] u = -g while kf.x[2, 0] > 0: t += dt x, y = ball.step(dt) z = np.array([[x, y]]).T kf.update(z) xs.append(kf.x[0, 0]) ys.append(kf.x[2, 0]) kf.predict(u=u) p1 = plt.scatter(x, y, color='r', marker='.', s=75, alpha=0.5) p2, = plt.plot(xs, ys, lw=2) plt.legend([p2, p1], ['Kalman filter', 'Measurements'], scatterpoints=1);
Image in a Jupyter notebook

We see that the Kalman filter reasonably tracks the ball. However, as already explained, this is a trivial example because we have no process noise. We can predict trajectories in a vacuum with arbitrary precision; using a Kalman filter in this example is a needless complication. A least squares curve fit would give identical results.

Tracking a Ball in Air

author's note - I originally had ball tracking code in 2 different places in the book. One has been copied here, so now we have 2 sections on ball tracking. I need to edit this into one section, obviously. Sorry for the duplication.

We are now ready to design a practical Kalman filter application. For this problem we assume that we are tracking a ball traveling through the Earth's atmosphere. The path of the ball is influenced by wind, drag, and the rotation of the ball. We will assume that our sensor is a camera; code that we will not implement will perform some type of image processing to detect the position of the ball. This is typically called blob detection in computer vision. However, image processing code is not perfect; in any given frame it is possible to either detect no blob or to detect spurious blobs that do not correspond to the ball. Finally, we will not assume that we know the starting position, angle, or rotation of the ball; the tracking code will have to initiate tracking based on the measurements that are provided. The main simplification that we are making here is a 2D world; we assume that the ball is always traveling orthogonal to the plane of the camera's sensor. We have to make that simplification at this point because we have not yet discussed how we might extract 3D information from a camera, which necessarily provides only 2D data.

Implementing Air Drag

Our first step is to implement the math for a ball moving through air. There are several treatments available. A robust solution takes into account issues such as ball roughness (which affects drag non-linearly depending on velocity), the Magnus effect (spin causes one side of the ball to have higher velocity relative to the air vs the opposite side, so the coefficient of drag differs on opposite sides), the effect of lift, humidity, air density, and so on. I assume the reader is not interested in the details of ball physics, and so will restrict this treatment to the effect of air drag on a non-spinning baseball. I will use the math developed by Nicholas Giordano and Hisao Nakanishi in Computational Physics [1997].

Important: Before I continue, let me point out that you will not have to understand this next piece of physics to proceed with the Kalman filter. My goal is to create a reasonably accurate behavior of a baseball in the real world, so that we can test how our Kalman filter performs with real-world behavior. In real world applications it is usually impossible to completely model the physics of a real world system, and we make do with a process model that incorporates the large scale behaviors. We then tune the measurement noise and process noise until the filter works well with our data. There is a real risk to this; it is easy to finely tune a Kalman filter so it works perfectly with your test data, but performs badly when presented with slightly different data. This is perhaps the hardest part of designing a Kalman filter, and why it gets referred to with terms such as 'black art'.

I dislike books that implement things without explanation, so I will now develop the physics for a ball moving through air. Move on past the implementation of the simulation if you are not interested.

A ball moving through air encounters wind resistance. This imparts a force on the wall, called drag, which alters the flight of the ball. In Giordano this is denoted as

Fdrag=B2v2F_{drag} = -B_2v^2

where B2B_2 is a coefficient derived experimentally, and vv is the velocity of the object. FdragF_{drag} can be factored into xx and yy components with

Fdrag,x=B2vvxFdrag,y=B2vvyF_{drag,x} = -B_2v v_x\\ F_{drag,y} = -B_2v v_y

If mm is the mass of the ball, we can use F=maF=ma to compute the acceleration as

ax=B2mvvxay=B2mvvya_x = -\frac{B_2}{m}v v_x\\ a_y = -\frac{B_2}{m}v v_y

Giordano provides the following function for B2m\frac{B_2}{m}, which takes air density, the cross section of a baseball, and its roughness into account. Understand that this is an approximation based on wind tunnel tests and several simplifying assumptions. It is in SI units: velocity is in meters/sec and time is in seconds.

B2m=0.0039+0.00581+exp[(v35)/5]\frac{B_2}{m} = 0.0039 + \frac{0.0058}{1+\exp{[(v-35)/5]}}

Starting with this Euler discretization of the ball path in a vacuum: x=vxΔty=vyΔtvx=vxvy=vy9.8Δt\begin{aligned} x &= v_x \Delta t \\ y &= v_y \Delta t \\ v_x &= v_x \\ v_y &= v_y - 9.8 \Delta t \end{aligned}

We can incorporate this force (acceleration) into our equations by incorporating accelΔtaccel * \Delta t into the velocity update equations. We should subtract this component because drag will reduce the velocity. The code to do this is quite straightforward, we just need to break out the Force into xx and yy components.

I will not belabor this issue further because the computational physics is beyond the scope of this book. Recognize that a higher fidelity simulation would require incorporating things like altitude, temperature, ball spin, and several other factors. My intent here is to impart some real-world behavior into our simulation to test how our simpler prediction model used by the Kalman filter reacts to this behavior. Your process model will never exactly model what happens in the world, and a large factor in designing a good Kalman filter is carefully testing how it performs against real world data.

The code below computes the behavior of a baseball in air, at sea level, in the presence of wind. I plot the same initial hit with no wind, and then with a tail wind at 10 mph. Baseball statistics are universally done in US units, and we will follow suit here (http://en.wikipedia.org/wiki/United_States_customary_units). Note that the velocity of 110 mph is a typical exit speed for a baseball for a home run hit.

from math import sqrt, exp, cos, sin, radians def mph_to_mps(x): return x * .447 def drag_force(velocity): """ Returns the force on a baseball due to air drag at the specified velocity. Units are SI""" return (0.0039 + 0.0058 / (1. + exp((velocity-35.)/5.))) * velocity v = mph_to_mps(110.) y = 1 x = 0 dt = .1 theta = radians(35) def solve(x, y, vel, v_wind, launch_angle): xs = [] ys = [] v_x = vel*cos(launch_angle) v_y = vel*sin(launch_angle) while y >= 0: # Euler equations for x and y x += v_x*dt y += v_y*dt # force due to air drag velocity = sqrt ((v_x-v_wind)**2 + v_y**2) F = drag_force(velocity) # euler's equations for vx and vy v_x = v_x - F*(v_x-v_wind)*dt v_y = v_y - 9.8*dt - F*v_y*dt xs.append(x) ys.append(y) return xs, ys x, y = solve(x=0, y=1, vel=v, v_wind=0, launch_angle=theta) p1 = plt.scatter(x, y, color='blue', label='no wind') x, y = solve(x=0, y=1, vel=v, v_wind=mph_to_mps(10), launch_angle=theta) p2 = plt.scatter(x, y, color='green', marker="v", label='10mph wind') plt.legend(scatterpoints=1);
Image in a Jupyter notebook

We can easily see the difference between the trajectory in a vacuum and in the air. I used the same initial velocity and launch angle in the ball in a vacuum section above. We computed that the ball in a vacuum would travel over 240 meters (nearly 800 ft). In the air, the distance is just over 120 meters, or roughly 400 ft. 400ft is a realistic distance for a well hit home run ball, so we can be confident that our simulation is reasonably accurate.

Without further ado we will create a ball simulation that uses the math above to create a more realistic ball trajectory. I will note that the nonlinear behavior of drag means that there is no analytic solution to the ball position at any point in time, so we need to compute the position step-wise. I use Euler's method to propagate the solution; use of a more accurate technique such as Runge-Kutta is left as an exercise for the reader. That modest complication is unnecessary for what we are doing because the accuracy difference between the techniques will be small for the time steps we will be using.

from math import radians, sin, cos, sqrt, exp class BaseballPath(object): def __init__(self, x0, y0, launch_angle_deg, velocity_ms, noise=(1.0, 1.0)): """ Create 2D baseball path object (x = distance from start point in ground plane, y=height above ground) x0,y0 initial position launch_angle_deg angle ball is travelling respective to ground plane velocity_ms speeed of ball in meters/second noise amount of noise to add to each position in (x, y) """ omega = radians(launch_angle_deg) self.v_x = velocity_ms * cos(omega) self.v_y = velocity_ms * sin(omega) self.x = x0 self.y = y0 self.noise = noise def drag_force(self, velocity): """ Returns the force on a baseball due to air drag at the specified velocity. Units are SI """ B_m = 0.0039 + 0.0058 / (1. + exp((velocity-35.)/5.)) return B_m * velocity def update(self, dt, vel_wind=0.): """ compute the ball position based on the specified time step and wind velocity. Returns (x, y) position tuple. """ # Euler equations for x and y self.x += self.v_x*dt self.y += self.v_y*dt # force due to air drag v_x_wind = self.v_x - vel_wind v = sqrt(v_x_wind**2 + self.v_y**2) F = self.drag_force(v) # Euler's equations for velocity self.v_x = self.v_x - F*v_x_wind*dt self.v_y = self.v_y - 9.81*dt - F*self.v_y*dt return (self.x + randn()*self.noise[0], self.y + randn()*self.noise[1])

Now we can test the Kalman filter against measurements created by this model.

y = 1. x = 0. theta = 35. # launch angle v0 = 50. dt = 1/10. # time step ball = BaseballPath(x0=x, y0=y, launch_angle_deg=theta, velocity_ms=v0, noise=[.3,.3]) f1 = ball_kf(x, y, theta, v0, dt, r=1.) f2 = ball_kf(x, y, theta, v0, dt, r=10.) t = 0 xs = [] ys = [] xs2 = [] ys2 = [] while f1.x[2, 0] > 0: t += dt x, y = ball.update(dt) z = np.array([[x, y]]).T f1.update(z) f2.update(z) xs.append(f1.x[0, 0]) ys.append(f1.x[2, 0]) xs2.append(f2.x[0, 0]) ys2.append(f2.x[2, 0]) f1.predict() f2.predict() p1 = plt.scatter(x, y, color='r', marker='o', s=75, alpha=0.5) p2, = plt.plot(xs, ys, lw=2) p3, = plt.plot(xs2, ys2, lw=4) plt.legend([p1, p2, p3], ['Measurements', 'Kalman filter(R=0.5)', 'Kalman filter(R=10)'], loc='best', scatterpoints=1);
Image in a Jupyter notebook

I have plotted the output of two different Kalman filter settings. The measurements are depicted as green circles, a Kalman filter with R=0.5 as a thin blue line, and a Kalman filter with R=10 as a thick red line. These R values are chosen merely to show the effect of measurement noise on the output, they are not intended to imply a correct design.

We can see that neither filter does very well. At first both track the measurements well, but as time continues they both diverge. This happens because the state model for air drag is nonlinear and the Kalman filter assumes that it is linear. If you recall our discussion about nonlinearity in the g-h filter chapter we showed why a g-h filter will always lag behind the acceleration of the system. We see the same thing here - the acceleration is negative, so the Kalman filter consistently overshoots the ball position. There is no way for the filter to catch up so long as the acceleration continues, so the filter will continue to diverge.

What can we do to improve this? The best approach is to perform the filtering with a nonlinear Kalman filter, and we will do this in subsequent chapters. However, there is also what I will call an 'engineering' solution to this problem as well. Our Kalman filter assumes that the ball is in a vacuum, and thus that there is no process noise. However, since the ball is in air the atmosphere imparts a force on the ball. We can think of this force as process noise. This is not a particularly rigorous thought; for one thing, this force is anything but Gaussian. Secondly, we can compute this force, so throwing our hands up and saying 'it's random' will not lead to an optimal solution. But let's see what happens if we follow this line of thought.

The following code implements the same Kalman filter as before, but with a non-zero process noise. I plot two examples, one with Q=.1, and one with Q=0.01.

def plot_ball_with_q(q, r=1., noise=0.3): y = 1. x = 0. theta = 35. # launch angle v0 = 50. dt = 1/10. # time step ball = BaseballPath(x0=x, y0=y, launch_angle_deg=theta, velocity_ms=v0, noise=[noise,noise]) f1 = ball_kf(x, y, theta, v0, dt, r=r, q=q) t = 0 xs = [] ys = [] while f1.x[2, 0] > 0: t += dt x, y = ball.update(dt) z = np.array([[x, y]]).T f1.update(z) xs.append(f1.x[0, 0]) ys.append(f1.x[2, 0]) f1.predict() p1 = plt.scatter(x, y, color='r', marker='o', s=75, alpha=0.5) p2, = plt.plot(xs, ys, lw=2) plt.legend([p1, p2], ['Measurements', 'Kalman filter'], scatterpoints=1) plt.show() plot_ball_with_q(0.01) plot_ball_with_q(0.1)
Image in a Jupyter notebookImage in a Jupyter notebook

The second filter tracks the measurements fairly well. There appears to be a bit of lag, but very little.

Is this a good technique? Usually not, but it depends. Here the nonlinearity of the force on the ball is fairly constant and regular. Assume we are trying to track an automobile - the accelerations will vary as the car changes speeds and turns. When we make the process noise higher than the actual noise in the system the filter will opt to weigh the measurements higher. If you don't have a lot of noise in your measurements this might work for you. However, consider this next plot where I have increased the noise in the measurements.

plot_ball_with_q(0.01, r=3, noise=3.) plot_ball_with_q(0.1, r=3, noise=3.)
Image in a Jupyter notebookImage in a Jupyter notebook

This output is terrible. The filter has no choice but to give more weight to the measurements than the process (prediction step), but when the measurements are noisy the filter output will just track the noise. This inherent limitation of the linear Kalman filter is what lead to the development of nonlinear versions of the filter.

With that said, it is certainly possible to use the process noise to deal with small nonlinearities in your system. This is part of the 'black art' of Kalman filters. Our model of the sensors and of the system are never perfect. Sensors are non-Gaussian and our process model is never perfect. You can mask some of this by setting the measurement errors and process errors higher than their theoretically correct values, but the trade off is a non-optimal solution. Certainly it is better to be non-optimal than to have your Kalman filter diverge. However, as we can see in the graphs above, it is easy for the output of the filter to be very bad. It is also very common to run many simulations and tests and to end up with a filter that performs very well under those conditions. Then, when you use the filter on real data the conditions are slightly different and the filter ends up performing terribly.

For now we will set this problem aside, as we are clearly misapplying the Kalman filter in this example. We will revisit this problem in subsequent chapters to see the effect of using various nonlinear techniques. In some domains you will be able to get away with using a linear Kalman filter for a nonlinear problem, but usually you will have to use one or more of the techniques you will learn in the rest of this book.

Tracking Noisy Data

If we are applying a Kalman filter to a thermometer in an oven in a factory then our task is done once the Kalman filter is designed. The data from the thermometer may be noisy, but there is never doubt that the thermometer is reading the temperature of some other oven. Contrast this to our current situation, where we are using computer vision to detect ball blobs from a video camera. For any frame we may detect or may not detect the ball, and we may have one or more spurious blobs - blobs not associated with the ball at all. This can occur because of limitations of the computer vision code, or due to foreign objects in the scene, such as a bird flying through the frame. Also, in the general case we may have no idea where the ball starts from. A ball may be picked up, carried, and thrown from any position, for example. A ball may be launched within view of the camera, or the initial launch might be off screen and the ball merely travels through the scene. There is the possibility of bounces and deflections - the ball can hit the ground and bounce, it can bounce off a wall, a person, or any other object in the scene.

Consider some of the problems that can occur. We could be waiting for a ball to appear, and a blob is detected. We initialize our Kalman filter with that blob, and look at the next frame to detect where the ball is going. Maybe there is no blob in the next frame. Can we conclude that the blob in the previous frame was noise? Or perhaps the blob was valid, but we did not detect the blob in this frame.

author's note: not sure if I want to cover this. If I do, not sure I want to cover this here.