Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place.
Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place.
One Dimensional Kalman Filters
One Dimensional Kalman Filters
Now that we understand the discrete Bayes filter and Gaussians we are prepared to implement a 1D Kalman filter. We will do this exactly as we did the discrete Bayes filter - rather than starting with equations we will develop the code step by step based on reasoning about the problem.
Tracking A Dog
As in the Discrete Bayes Filter chapter we will be tracking a moving object in a long hallway at work, such as a dog or robot. However, assume that in our latest hackathon someone created an RFID tracker that provides a reasonably accurate position for our dog. Suppose the hallway is 100 meters long. The sensor returns the distance of the dog from the left end of the hallway in meters. So, 23.4 would mean the dog is 23.4 meters from the left end of the hallway.
Naturally, the sensor is not perfect. A reading of 23.4 could correspond to a real position of 23.7, or 23.0. However, it is very unlikely to correspond to a real position of say 47.6. Testing during the hackathon confirmed this result - the sensor is 'reasonably' accurate, and while it had errors, the errors are small. Furthermore, the errors seemed to be evenly distributed on both sides of the measurement; a true position of 23 m would be equally likely to be measured as 22.9 as 23.1.
Before we try to implement the filter, let's start by implementing a simulation of the dog's movement and of the sensor. After all, the filter needs to filter data collected from the dog, so we need to understand this movement and sensor behavior to have any hope of filtering it.
Simulating the Sensor Measurement
Simulating an RFID system is beyond the scope of this book, so we will use a simple model. We said that the sensor is equally likely to give a measurement that is too large as too small. Furthermore, it is more likely to be a small error than a large one. In the Discrete Bayes chapter we modelled this error with a probability. But now we know about Gaussians, so let's use them instead. We can model the error in a system with the Gaussian , where is the state (such as position or velocity) and is the variance in that state. So instead of saying our measurement is 22.9, we would say our measurement is 22.9 with a variance of 0.35, or . The variance accounts for the error in the sensor.
This is a simple model, but it works very well in practice. Many sensors near Gaussian errors, and as you will see the Kalman filter is based on this assumption. Of course, if the sensor errors are not nearly Gaussian your simulations of the filter performance are likely to mislead you.
In the literature you will often see this charaterized with an equation like:
Here is the measurement model - a function which describes how measurements are made from the current system state. In other words, it is the measurement we would get for the state if the sensor was perfect. is the error that is in the measurement, and hence is the actual measurement that results. This is not meant to imply that we can somehow measure or know the exact value of , it is a mathematical definition that can be restated as
The utility of this equation will become apparent later in the book.
Simulating the Dog's movement
Let's assume that the dog moves at a constant velocity. We can use Newton's equation to compute the new position based on the old position and velocity like so:
This is a model of the dog's behavior, and we call it the system model. However, clearly it is not perfect. The dog might speed up or slow down due to hills, winds, or whimsy. We could add those things to the model if we knew about them, but there will always be more things that we don't know. this is not due to the tracked object being a living creature. The same holds if we are tracking a missile, aircraft, or paint nozzle on a factory robotic painter. There will always be unpredictable errors in physical systems. We can model this mathematically by saying the dog's actual position is the prediction that comes from our system model plus what we call the process noise. In the literature this is usually written something like this:
Here is our system model and is some unknown error between the system model's prediction of position and the actual position. Clearly will change over time, and as with the measurement equation in the last section we are not implying that can be derived analytically.
Implemention in Python
We will want our implementation to correctly model the noise both in the movement and the process model. You may recall from the Gaussians chapter that we can use numpy.random.randn()
to generate a random number with a mean of zero and a standard deviation of one. Thus, if we want a random number with a standard deviation of 0.5 we'd multipy the value returned by randn()
by 0.5.
Our model assumes that the velocity of our dog is constant, but we acknowledged that things like hills and obstacles make that very unlikely to be true. We can simulate this by adding a randomly generated value to the velocity each time we compute the dog's position.
To model the sensor we need to take the current position of the dog and then add some noise to it proportional to the Gaussian of the measurement noise. This is simply
I don't like the bookkeeping required to keep track of the various values for position, velocity, and standard deviations so I will implement the simulation as a class.
The constructor __init()__
initializes the DogSimulation class with an initial position x0
, velocity vel
, and the variance in the measurement and noise. The move()
function has the dog move based on its velocity and the process noise. The sense_position()
function compute and returns a measurement of the dog's position based on the current position and the sensor noise.
We need to convert the variances that are passed into __init__()
into standard deviations because randn()
is scaled by the standard deviation. Variance is the standard deviation square, so we take the square root of the variance in __init__()
.
Okay, so lets look at the output of the DogSimulation
class. We will start by setting the variances to 0 to check that the class does what we think it does. Zero variance means there is no noise in the signal or in the system model, so the results should be a straight line.
The constructor initialized the dog at position 0 with a velocity of 1 (move 1.0 to the right). So we would expect to see an output of 1..10, and indeed that is what we see.
Now let's inject some noise in the signal.
NumPy uses a random number generator to generate the normal distribution samples. The numbers I see as I write this are unlikely to be the ones that you see. If you run the cell above multiple times, you should get a slightly different result each time. I could use numpy.random.seed(some_value)
to force the results to be the same each time. This would simplify my explanations in some cases, but would ruin the interactive nature of this chapter. To get a real feel for how normal distributions and Kalman filters work you will probably want to run cells several times, observing what changes, and what stays roughly the same.
So the output of the sensor should be a wavering dotted red line drawn over a straight black line. The black line shows the actual position of the dog, and the dotted red line is the noisy signal produced by the simulated RFID sensor. Please note that the red dotted line was manually plotted - we do not yet have a filter that recovers that information!
If you are running this in an interactive IPython Notebook, I strongly urge you to run the script several times in a row. You can do this by putting the cursor in the cell containing the Python code and pressing CTRL+Enter. Each time it runs you should see a different sensor output.
I also urge you to adjust the noise setting to see the result of various values. However, since you may be reading this in a read only notebook, I will show several examples. The first plot shows the noise set to 100.0, and the second shows noise set to 0.5.
Now lets see the effect of the process noise. My simulation is not meant to exactly model any given physical process. On each call to sense_position()
I modify the current velocity by randomly generated process noise. However, I strive to keep the velocity close to the initial value, as I am assuming that there is a control mechanism in place trying to maintain the same speed. In this case, the control mechanism would be the dog's brain! For an automobile it could be a cruise control or the human driver.
So let's first look at the plot with some process noise but no measurement noise to obscure the results.
We can see that the position wanders slightly from the ideal track. You may have thought that the track would have wandered back and forth the ideal path like the measurement noise did, but recall that we are modifying velocity on each, not the position. So once the track has deviated, it will remain deviated until the random changes in velocity happen to result in the track going back to the original track.
Finally, let's look at the combination of measurement noise and process noise.
Math with Gaussians
Let's say we believe that our dog is at 23 meters, and the variance is 5, or . We can represent that in a plot:
Notice how this relates to the bar charts from the Discrete Bayes chapter. In that chapter we drew bars of various heights to depict the probability that the dog was at any given position. In many cases those bars took on a shape very similar to this chart. The differences are that the bars depict the probability of a discrete position, whereas this chart depicts the probability distribution of a continuous range of positions.
This graph corresponds to a fairly inexact belief. While we believe that the dog is at 23, note that roughly speaking positions 21 to 25 are quite likely as well. Let's assume for the moment our dog is standing still, and we query the sensor again. This time it returns 23.2 as the position. Can we use this additional information to improve our estimate of the dog's position?
Intuition suggests 'yes'. Consider: if we read the sensor 100 times and each time it returned a value between 21 and 25, all centered around 23, we should be very confident that the dog is somewhere very near 23. Of course, a different physical interpretation is possible. Perhaps our dog was randomly wandering back and forth in a way that exactly emulated a normal distribution. But that seems extremely unlikely - I certainly have never seen a dog do that. So the only reasonable assumption is that the dog was mostly standing still at 23.0.
Let's look at 100 sensor readings in a plot:
Eyeballing this confirms our intuition - no dog moves like this. However, noisy sensor data certainly looks like this. So let's proceed and try to solve this mathematically. But how?
Recall the discrete Bayes code for adding a measurement to a preexisting belief:
Note that the algorithm is essentially computing:
The measurement term might not be obvious, but recall that measurement in this case was always 1 or 0, and so it was left out for convenience. Here prior carries the both the colloquial sense of previoius, but also the Bayesian meaning. In Bayesian terms the prior is the probability after the prediction and before the measurements have been incorporated.
If we are implementing this with Gaussians, we might expect it to be implemented as:
where measurement is a Gaussian returned from the sensor. But does that make sense? Can we multiply Gaussians? If we multiply a Gaussian with a Gaussian is the result another Gaussian, or something else?
The algebra for this is not extremely hard, but it is messy. It uses Bayes theorem to compute the posterior (new Gaussian) given the prior (old Gaussian) and evidence (the measurement. I've placed the math at the bottom of the chapter if you are interested in the details. Here I will present the results. The subscript stands for the measurement.
In other words the result of multiplying two Gaussians is a Gaussian is
Without doing a deep analysis we can immediately infer some things. First and most importantly the result of multiplying two Gaussians is another Gaussian. The mean is a scaled sum of the prior and the measurement. The variance is a combination of the variances of the variances of the input. We conclude from this that the variances are completely unaffected by the values of the mean!
Let's examine the result of multiplying with itself. This corresponds to getting 23.0 as the sensor value twice in a row. But before you look at the result, what do you think the result will look like? What should the new mean be? Will the variance be wider, narrower, or the same?
The result of the multiplication is taller and narrow than the original Gaussian but the mean is the same. Does this match your intuition of what the result should have been?
If we think of the Gaussians as two measurements, this makes sense. If I measure twice and get 23 meters each time, I should conclude that the length is close to 23 meters. Thus the mean should be 23. I am more confident with two measurements than with one, so the variance of the result should be smaller.
"Measure twice, cut once" is a useful saying and practice due to this fact! The Gaussian is a mathematical model of this physical fact, so we should expect the math to follow our physical process.
Now let's multiply two Gaussians (or equivalently, two measurements) that are partially separated. In other words, their means will be different, but their variances will be the same. What do you think the result will be? Think about it, and then look at the graph.
Another beautiful result! If I handed you a measuring tape and asked you to measure the distance from table to a wall, and you got 23 meters, and then a friend make the same measurement and got 25 meters, your best guess must be 24 meters if you trust the skills of your friends equally.
That is fairly counter-intuitive in more complicated situations, so let's consider it further. Perhaps a more reasonable assumption would be that either you or your coworker made a mistake, and the true distance is either 23 or 25, but certainly not 24. Surely that is possible. However, suppose the two measurements you reported as 24.01 and 23.99. In that case you would agree that in this case the best guess for the correct value is 24? Which interpretation we choose depends on the properties of the sensors we are using.
This topic is fairly deep, and I will explore it once we have completed our Kalman filter. For now I will merely say that the Kalman filter requires the interpretation that measurements are accurate, with Gaussian noise, and that a large error caused by misreading a measuring tape is not Gaussian noise.
Dealing with incorrect measurements, such as measuring the distance to the wrong object, is a topic we will need to treat separately. The filter assumes that the data you feed it is correct, just noisy. Engineers that design radar systems probably spend most of their effort on this problem.
For now I ask that you trust me. The math is correct, so we have no choice but to accept it and use it. We will see how the Kalman filter deals with movements vs error very soon. In the meantime, accept that 24 is the correct answer to this problem.
One final test of your intuition. What if the two measurements are widely separated?
This result bothered me quite a bit when I first learned it. If my first measurement was 10, and the next one was 50, why would I choose 30 as a result? And why would I be more confident? Doesn't it make sense that either one of the measurements is wrong, or that I am measuring a moving object? Shouldn't the result be nearer 50? And, shouldn't the variance be larger, not smaller?
Well, no. Recall the g-h filter chapter. In that chapter we agreed that if I weighed myself on two scales, and the first read 160 lbs while the second read 170 lbs, and both were equally accurate, the best estimate was 165 lbs. Furthermore I should be a bit more confident about 165 lbs vs 160 lbs or 170 lbs because I know have two readings, both near this estimate, increasing my confidence that neither is wildly wrong.
Of course, this example is quite exaggerated. The width of the Gaussians is fairly narrow, so this combination of measurements is extremely unlikely. It is hard to eyeball this, but the measurements are well over apart, so the probability of this happening is less than 1%. Still, it can happen, and the math is correct. In practice this is the sort of thing we use to decide if
Let's look at the math again to convince ourselves that the physical interpretation of the Gaussian equations makes sense. I'm going to switch back to using priors and measurements. The math and reasoning is the same whether you are using a prior and incorporating a measurement, or just trying to compute the mean and variance of two measurements. For Kalman filters we will be doing a lot more of the former than the latter, so let's get used to it.
If both have the same accuracy, then , and the resulting equation is
which is the average of the two means. If we look at the extreme cases, assume the first scale is very much more accurate than than the second one. At the limit, we can set , yielding
Finally, if we set , then the resulting equation is
This again fits our physical intuition of favoring the second, accurate scale over the first, inaccurate scale.
Interactive Example
Rather than going through a dozen examples, lets use an interactive example. Here you can use sliders to alter the mean and variance of two Gaussians. As you change the values a plot of the two Gaussians along with the product of the two is displayed.
Implementing the Update Step
Recall the discrete Bayes filter uses a NumPy array to encode our belief about the position of our dog at any time. That array stored our belief of our dog's position in the hallway using 10 discrete positions. This was very crude, because with a 100 meter hallway that corresponded to positions 10 meters apart. It would have been trivial to expand the number of positions to say 1,000, and that is what we would do if using it for a real problem. But the problem remains that the distribution is discrete and multimodal - it can express strong belief that the dog is in two positions at the same time.
Therefore, we will use a single Gaussian to reflect our current belief of the dog's position. In other words, we will use . Gaussians extend to infinity on both sides of the mean, so the single Gaussian will cover the entire hallway. They are unimodal, and seem to reflect the behavior of real-world sensors - most errors are small and clustered around the mean. Here is the entire implementation of the update function for a Kalman filter:
Kalman filters are supposed to be hard! But this is very short and straightforward. All we are doing is multiplying the Gaussian that reflects our belief of where the dog is with the new measurement.
Perhaps this would be clearer if we used more specific names:
That is less abstract, which perhaps helps with comprehension, but it is poor coding practice. We are writing a Kalman filter that works for any problem, not just tracking dogs in a hallway, so we don't use variable names with 'dog' in them. Still, the update_dog()
function should make what we are doing very clear.
Let's look at an example. We will suppose that our current belief for the dog's position is . Don't worry about where that number came from. It may appear that we have a chicken and egg problem, in that how do we know the position before we sense it, but we will resolve that shortly. We will create a DogSimulation
object initialized to be at position 0.0, and with no velocity, and modest noise. This corresponds to the dog standing still at the far left side of the hallway. Note that we mistakenly believe the dog is at position 2.0, not 0.0.
Because of the random numbers I do not know the exact values that you see, but the position should have converged very quickly to almost 0 despite the initial error of believing that the position was 2.0. Furthermore, the variance should have quickly converged from the initial value of 5.0 to 0.238.
By now the fact that we converged to a position of 0.0 should not be terribly surprising. All we are doing is computing new_pos = old_pos * measurement
and the measurement is a normal distribution around 0, so we should get very close to 0 after 20 iterations. But the truly amazing part of this code is how the variance became 0.238 despite every measurement having a variance of 5.0.
If we think about the physical interpretation of this is should be clear that this is what should happen. If you sent 20 people into the hall with a tape measure to physically measure the position of the dog you would be very confident in the result after 20 measurements - more confident than after 1 or 2 measurements. So it makes sense that as we make more measurements the variance gets smaller.
Mathematically it makes sense as well. Recall the computation for the variance after the multiplication: . We take the reciprocals of the sigma from the measurement and prior belief, add them, and take the reciprocal of the result. Think about that for a moment, and you will see that this will always result in smaller numbers as we proceed.
Implementing Predictions
That is a beautiful result, but it is not yet a filter. We assumed that the dog was sitting still, an extremely dubious assumption. Certainly it is a fairly useless one - who needs a filter to track stationary objects? The discrete Bayes filter used a loop of predict and update functions, and we must do the same to accommodate movement.
How do we perform the predict function with Gaussians? Recall the discrete Bayes method:
In a nutshell, we shift the probability vector by the amount we believe the animal moved, and adjust the probability. How do we do that with Gaussians?
It turns out that we add them. Think of the case without Gaussians. I think my dog is at 7.3 meters, and he moves 2.6 meters to right, where is he now? Obviously, 7.3 + 2.6 = 9.9. He is at 9.9 meters. Abstractly, the algorithm is new_pos = old_pos + dist_moved
. It does not matter if we use floating point numbers or Gaussians for these values, the algorithm must be the same.
How is addition for Gaussians performed? It turns out to be very simple:
In other words
We can make this more readable by changing the subscripts.
All we do is add the means and the variance separately! Does that make sense? Think of the physical representation of this abstract equation. is the old position, and is how far we moved. Surely it makes sense that our new position is .
What about the variance? It is perhaps harder to form an intuition about this. However, recall that with the predict()
function for the discrete Bayes filter we always lost information - our confidence after the update was lower than our confidence before the update. We don't really know where the dog is moving, so the confidence should get smaller (variance gets larger). is the amount of uncertainty added to the system due to the imperfect prediction about the movement, and so we would add that to the existing uncertainty.
I assure you that the equation for Gaussian addition is correct, and derived by basic algebra. Therefore it is reasonable to expect that if we are using Gaussians to model physical events, the results must correctly describe those events.
Now is a good time to either work through the algebra to convince yourself of the mathematical correctness of the algorithm, or to work through some examples and see that it behaves reasonably. This book will do the latter.
So, here is our implementation of the predict function:
What is left? Just calling these functions. Discrete Bayes did nothing more than loop over the update()
and predict()
functions, so let's do the same.
Before we discuss the code and results, let's try to understand this better with this chart showing one predict update cyle.
This chart is very similar to one we already saw in the g-h Filter chapter. At each predict step we take our existing estimate and update it based on our system model. We then make a measurement. We know neither the measurement or prediction is perfect, so we choose a value in between the two. This is scaled so that the new estimate is closer to the one we trust more. We then step time forward one step, and what was the new estimate is considered to be the prior estimate, and the process repeats.
Now let's walk through the code and output.
Here we set the initial position at 0 m. We set the variance to 400 m, which is a standard deviation of 20 meters. You can think of this as saying "I believe with 99% accuracy the position is 0 plus or minus 60 meters". This is because with Gaussians ~99% of values fall within 3$\sigma$ of the mean.
So how do I know the velocity? Magic? Consider it a prediction, or perhaps we have a secondary velocity sensor. If this is a robot then this would be a control input to the robot. In the next chapter we will learn how to handle situations where you don't have a velocity sensor or input, so please accept this simplification for now.
Here the variance for the predict step is process_variance
and the variance for the sensor is sensor_variance
. The meaning of sensor variance should be clear - it is how much variance there is in each sensor reading. The process variance is how much error there is in the prediction model. We are predicting that at each time step the dog moves forward one meter. Dogs rarely do what we expect, and of course things like hills or the whiff of a squirrel will change his progress. If this was a robot responding to digital commands the performance would be better, but not perfect. Perhaps a hand made robot would have a variance of , and an industrial robot might have . These are not 'magic' numbers; the square root of the express the distance error in meters. It is easy to get a Kalman filter working by just plugging in numbers, but if the numbers do not reflect reality the performance of the filter will be poor.
Next we initialize the simulator and run it with
It may seem very 'convenient' to set the simulator to the same position as our guess, and it is. Do not fret. In the next example we will see the effect of a wildly inaccurate guess for the dog's initial position. We move the dog for 10 steps forward and store the measurements in zs
via a list comprehension.
It is traditional to call the measurement , and so I follow that convention here. When I was learning this topic I found the standard names very obscure. However, if you wish to read the literature you will have to become used to them, so I will not use a more readable variable name such as m
or measure
. It is something you have to memorize if you want to work with Kalman filters. As we continue I will introduce more of the standard naming into the code. The s
at the end of the name is a plural - I'm saying there are many z
s in the list.
The next code allocates an array to store the output of the filtered positions.
Now we enter our predict() ... update()
loop.
We call the predict()
function with the Gaussian for the dog's position, and another Gaussian for its movement. pos[0], pos[1]
is the mean and variance for the position, and 'vel, process_variance' is the mean and variance for the movement. The first time through the loop we get
What is this saying? After the prediction, we believe that we are at 1.0, and the variance is now 401. Recall we started at 400. The variance got worse, which is always what happens during the prediction step because it involves a loss of information.
Then we call the update function of our filter and save the result in our positions
array.
For this I get this as the result:
What is happening? The dog is actually at 1.0 but the measured position is 1.3536 due to the sensor noise. That is pretty far off from the predicted value of 1. The filter incorporates that measurement into its prediction. This is the first time through the loop and so the variance is 401 m. A large variance implies that the current prediction is very poor, so the filter estimates the position to be 1.3518 - very close to the measurement. Intuition tells us that the results should get better as we make more measurements, so let's hope that this is true for our filter as well.
Now look at the variance: 1.9901 m. It has dropped tremendously from 401 m. Why? Well, the RFID has a reasonably small variance of 2.0 m, so we trust it far more than our previous belief.
Now the software loops, calling predict()
and update()
in turn. By the end the final estimated position is 15.0529 vs the actual position of 14.8383. The variance has converged to 1.0 m.
Now look at the plot. The noisy measurements are plotted in with a dotted red line, and the filter results are in the solid blue line. Both are quite noisy, but notice how much noisier the measurements (red line) are. This is your first Kalman filter and it seems to work!
Before we continue I want to point out how few lines of code actually perform the filtering. Most of the code is either initialization, storing of data, simulating the dog movement, and printing results. The code that performs the filtering is the very succinct
In this example I only plotted 10 data points so the output from the print statements would not overwhelm us. Now let's look at the filter's performance with more data. This time we will plot both the output of the filter and the variance. The variance is plotted as a lightly shaded yellow area between dotted lines. I've increased the size of the process and sensor variance so they are easier to see on the chart - for a real Kalman filter of course you will not be randomly changing these values.
Here we can see that the variance converges very quickly to roughly 2.1623 in 10 steps. We interpret this as meaning that we become very confident in our position estimate very quickly. The first few measurements are unsure due to our uncertainty in our guess at the initial position, but the filter is able to quickly determine an accurate estimate.
Before I go on, I want to emphasize that this code fully implements a 1D Kalman filter. If you have tried to read the literature, you are perhaps surprised, because this looks nothing like the complex, endless pages of math in those books. To be fair, the math gets more complicated in multiple dimensions, but not by much. So long as we worry about using the equations rather than deriving them we can create Kalman filters without a lot of effort. Moreover, I hope you'll agree that you have a decent intuitive grasp of what is happening. We represent our beliefs with Gaussians, and our beliefs get better over time because more measurements means more data to work with.
Animating the Tracking
If you are reading this in IPython Notebook you will be able to see an animation of the filter tracking the dog directly below this sentence.
The top plot shows the output of the filter in green, and the measurements with a dashed red line. The bottom plot shows the Gaussian at each step.
When the track first starts you can see that the measurements varies quite a bit from the initial prediction. At this point the Gaussian probability is small (the curve is low and wide) so the filter does not trust its prediction. As a result, the filter adjusts its estimate a large amount. As the filter innovates you can see that as the Gaussian becomes taller, indicating greater certainty in the estimate, the filter's output becomes very close to a straight line. At x=15
and greater you can see that there is a large amount of noise in the measurement, but the filter does not react much to it compared to how much it changed for the firs noisy measurement.
The Kalman Filter is a g-h Filter
Earlier we set , to get
We can see that we are scaling the prior and the measurement by some factor. Think back to the The g-h Filter chapter and this plot:
The Kalman filter equation is performing this computation! We have two measurements, or a measurement and a prediction, and the new estimate is chosen as some value between the two. So, as promised, the Kalman filter is a g-h filter.
I have not made it explicit in this chapter, but the scaling factor is called the Kalman gain. In this equation
we can set the Kalman gain to .9, and rewrite the equation as
That is a general equation for the Kalman filter in the one dimensional case.
How does this differ from the g-h filter? In the g-h filter we chose the constants and and didn't vary them. Here the scaling factors are chosen dynamically based on the current variance of the filter and the variance of the measurement. in other words the Kalman gain is determined by a ratio of the variances of the prior and measurement. As the Kalman filter innovates the scaling factors change as we become more (or less) certain about our answer.
Some algebra will reveal the relationship between the Kalman filter and the g-h filter. If you are curious, I present it in Kalman Filter Math chapter.
At each time step we can show that equals
This make intuitive sense. When we perform an update we choose a value somewhere between the measurement and prediction. If the measurement is more accurate we will choose a value nearer it, and vice versa. This computation for is a ratio of the errors in the prediction and the measurement.
The equation for is similar.
We haven't discussed the covariance (denoted by ), so you may not fully understand this equation. Think of the numerator as the variance between the state and its velocity. Recall that is used to scale the state, and is used to scale the velocity of the state, and this equation should seem reasonable. In the next chapter it will become very clear.
The takeaway point is that g and h are specified fully by the variance and covariances of the measurement and predictions at time n. That is all the Kalman filter is. It is easier to think of the noise in the measurement and the process model, so we don't normally use the g-h formulation, but we could. It's the same thing written differently. The Kalman filter is a g-h filter.
Fixed Gain Filters
Embedded applications usually have extremely limited processors; many do not have floating point circuitry. These simple equations can impose a heavy burden on the chip. This is less true as technology advances, but do not underestimate the value of spending 10 cents less on a processor for devices that will be manufactured in the millions.
In the example I ran above the variance of the filter converged to a fixed value. This will always happen if the variance of the measurement is a constant. You can take advantage of this fact by running simulations to determine what the variance converges to. Then you can hard code this value into your filter. So long as you initialize the filter to a good starting guess (I recommend using the first measurement as your initial value the filter will perform very well.
So for embedded applications it is common to perform many simulations to determine a fixed Kalman gain value, and then hard code this into the application. Besides reducing chip cost this also makes verifying correctness easier. You can scoff at that since it is a tiny difference, but if you are making a medical device, a device that is being sent into space, or a device that will have one million copies made then an error can cost lives, end a scientific mission, or and/or cost millions of dollars. A professional engineer does everything she can, no matter how small, to reduce and eliminate risks.
Implementation in a Class
For many purposes the code above suffices. However, if you write enough of these filters the functions will become a bit annoying. For example, having to write
is a bit cumbersome and error prone. Let's investigate how we might implement this in a form that makes our lives easier.
First, values for the movement error and the measurement errors are typically constant for a given problem, so we only want to specify them once. We can store them in instance variables in the class. Second, it is annoying to have to pass in the state (pos in the code snippet above) and then remember to assign the output of the function back to that state, so the state should also be an instance variable. Our first attempt might look like:
That works, but I am going to use different naming. The Kalman filter literature uses math, not code, so it uses single letter variables, so I will start exposing you to it now. At first it seems impossibly terse, but as you become familiar with the nomenclature you'll see that the math formulas in the textbooks will have an exact one-to-one correspondence with the code. Unfortunately there is not a lot of meaning behind the naming unless you have training in control theory; you will have to memorize them. If you do not make this effort you will never be able to read the literature.
So, we use x
for the state (estimated value of the filter) and P
for the variance of the state. R
is the measurement error, and Q
is the process (prediction) error. This gives us:
Now we can implement the update()
and predict()
function. In the literature the measurement is usually named either z
or y
; I find y
is too easy to confuse with the y axis of a plot, so I like z
. I like to think I can hear a z
in measurement, which helps me remember what z
stands for. So for the update method we might write:
Finally, the movement is usually called u
, and so we will use that. So for the predict function we might write:
That give us the following code. Production code would require significant comments and error checking. However, in the next chapter we will develop Kalman filter code that works for any dimension so this class will never be more than a stepping stone for us.
Full Description of the Algorithm
In the first chapter I stated that the Kalman filter is a form of g-h filter. Recall the diagram we used:
This is what we have been doing in this chapter. But also recall this chart from the g-h Filter chapter, which displayed how we created a new weight estimate from the prediction of the previous day along with the current day's measurement.
With the exception of the axis labels of 'day' and 'weight' this is the same as the residual chart I made earlier in this chapter.
This is extremely important to understand: Every filter in this book implements the same algorithm, just with different mathematical details. We make a prediction, make a measurement, and then choose the new estimate to be somewhere between those two. The math can become challenging in later chapters, but the idea is easy to understand.
It is important to see past the details of the equations of a specific filter and understand what the equations are calculating and why. This is important not only so you can understand the filters in this book, but so that you can understand filters that you will see in other situations. There are a tremendous number of filters that we do not describe - the SVD filter, the unscented particle filter, the H filter, the ensemble filter, and so many more. They all use different math to implement the same algorithm. The choice of math affects the quality of results, not the underlying ideas.
Here it is:
Initialization
Predict
Update
You will be hard pressed to find a Bayesian filter algorithm that does not fit into this form. Some filters will not include some aspect, such as error in the prediction, and others will have very complicated methods of computation, but this is what they all do. If you go back to the g-h filter chapter and reread it you'll recognize that the very simple thought experiments we did there result in this algorithm.
Exercise: Modify Variance Values
Modify the values of movement_variance
and sensor_variance
and note the effect on the filter and on the variance. Which has a larger effect on the variance convergence?. For example, which results in a smaller variance:
or:
Introduction to Designing a Filter
So far we have developed our filter based on the dog sensors introduced in the Discrete Bayesian filter chapter. We are used to this problem by now, and may feel ill-equipped to implement a Kalman filter for a different problem. To be honest, there is still quite a bit of information missing from this presentation. The next chapter will fill in the gaps. Still, lets get a feel for it by designing and implementing a Kalman filter for a thermometer. The sensor for the thermometer outputs a voltage that corresponds to the temperature that is being measured. We have read the manufacturer's specifications for the sensor, and it tells us that the sensor exhibits white noise with a standard deviation of 0.13.
We do not have a real sensor to read, so we will simulate the sensor with the following function.
We generate white noise with a given variance using the equation randn() * variance
. The specification gives us the standard deviation of the noise, not the variance, but recall that variance is the square of the standard deviation. Hence we raise 0.13 to the second power.
Now we need to write the Kalman filter processing loop. As with our previous problem, we need to perform a cycle of predicting and updating. The sensing step probably seems clear - call volt()
to get the measurement, pass the result into update()
function, but what about the predict step? We do not have a sensor to detect 'movement' in the voltage, and for any small duration we expect the voltage to remain constant. How shall we handle this?
As always, we will trust in the math. We have no known movement, so we will set that to zero. However, that means that we are predicting that the temperature will never change. If that is true, then over time we should become extremely confident in our results. Once the filter has enough measurements it will become very confident that it can predict the subsequent temperatures, and this will lead it to ignoring measurements that result due to an actual temperature change. This is called a smug filter, and is something you want to avoid. So we will add a bit of error to our prediction step to tell the filter not to discount changes in voltage over time. In the code below I set process_variance = .05**2
. This is the expected variance in the change of voltage over each time step. I chose this value merely to be able to show how the variance changes through the update and predict steps. For an real sensor you would set this value for the actual amount of change you expect. For example, this would be an extremely small number if it is a thermometer for ambient air temperature in a house, and a high number if this is a thermocouple in a chemical reaction chamber. We will say more about selecting the actual value in the next chapter.
Let's see what happens.
The first plot shows the individual sensor measurements vs the filter output. Despite a lot of noise in the sensor we quickly discover the approximate voltage of the sensor. In the run I just completed at the time of authorship, the last voltage output from the filter is , which is quite close to the used by the volt()
function. On other runs I have gotten larger and smaller results.
Spec sheets are what they sound like - specifications. Any individual sensor will exhibit different performance based on normal manufacturing variations. Values are often maximums - the spec is a guarantee that the performance will be at least that good. So, our sensor might have standard deviation of 1.8. If you buy an expensive piece of equipment it often comes with a sheet of paper displaying the test results of your specific item; this is usually very trustworthy. On the other hand, if this is a cheap sensor it is likely it received little to no testing prior to being sold. Manufacturers typically test a small subset of their output to verify that a sample falls within the desired performance range. If you have a critical application you will need to read the specification sheet carefully to figure out exactly what they mean by their ranges. Do they guarantee their number is a maximum, or is it, say, the error rate? Is every item tested? Is the variance normal, or some other distribution? Finally, manufacturing is not perfect. Your part might be defective and not match the performance on the sheet.
For example, I am looking at a data sheet for an airflow sensor. There is a field Repeatability, with the value . Is this a Gaussian? Is there a bias? For example, perhaps the repeatability is nearly 0.0% at low temperatures, and always nearly +0.50 at high temperatures. Data sheets for electrical components often contain a section of "Typical Performance Characteristics". These are used to capture information that cannot be easily conveyed in a table. For example, I am looking at a chart showing output voltage vs current for a LM555 timer. There are three curves showing the performance at different temperatures. The response is ideally linear, but all three lines are curved. This clarifies that errors in voltage outputs are probably not Gaussian - in this chip's case higher temperatures leads to lower voltage output, and the voltage output is quite nonlinear if the input current is very high.
As you might guess, modeling the performance of your sensors is one of the harder parts of creating a Kalman filter that performs well.
Animation
For those reading this in IPython Notebook, here is an animation showing the filter working. The top plot in the animation draws a green line for the predicted next voltage, then a red '+' for the actual measurement, draws a light red line to show the residual, and then draws a blue line to the filter's output. You can see that when the filter starts the corrections made are quite large, but after only a few updates the filter only adjusts its output by a small amount even when the measurement is far from it.
The lower plot shows the Gaussian belief as the filter innovates. When the filter starts the Gaussian curve is centered over 25, our initial guess for the voltage, and is very wide and short due to our initial uncertainty. But as the filter innovates, the Gaussian quickly moves to about 16.0 and becomes taller, reflecting the growing confidence that the filter has in it's estimate for the voltage. You will also note that the Gaussian's height bounces up and down a little bit. If you watch closely you will see that the Gaussian becomes a bit shorter and more spread out during the prediction step, and becomes taller and narrower as the filter incorporates another measurement.
Think of this animation in terms of the g-h filter. At each step the g-h filter makes a prediction, takes a measurement, computes the residual (the difference between the prediction and the measurement), and then selects a point on the residual line based on the scaling factor g. The Kalman filter is doing exactly the same thing, except that the scaling factor g varies with time. As the filter becomes more confident in its state the scaling factor favors the filter's prediction over the measurement.
If this is not clear, I urge you to go back and review the g-h chapter. This is the crux of the algorithms in this book.
Exercise(optional):
Write a function that runs the Kalman filter many times and record what value the voltage converges to each time. Plot this as a histogram. After 10,000 runs do the results look normally distributed? Does this match your intuition of what should happen?
use plt.hist(data, bins=100) to plot the histogram.
Solution
Discussion
The results do in fact look like a normal distribution. Each voltage is Gaussian, and the Central Limit Theorem guarantees that a large number of Gaussians is normally distributed.
Example: Extreme Amounts of Noise
I didn't put a lot of noise in the signal, and I also 'correctly guessed' that the dog was at position 0. How does the filter perform in real world conditions? Let's explore and find out. I will start by injecting a lot of noise in the RFID sensor. I will inject an extreme amount of noise - noise that apparently swamps the actual measurement. What does your intuition tell about how the filter will perform if the noise is allowed to be anywhere from -300 or 300. In other words, an actual position of 1.0 might be reported as 287.9, or -189.6, or any other number in that range. Think about it before you scroll down.
In this example the noise is extreme yet the filter still outputs a nearly straight line! This is an astonishing result! What do you think might be the cause of this performance?
We get a nearly straight line because our process error is small. A small process error tells the filter to trust the prediction more than the measurement, and the prediction is a straight line, so the filter outputs a straight line.
Example: Incorrect Process Variance
That last filter looks fantastic! Why wouldn't we set the process variance very low, as it guarantees the result will be straight and smooth?
The process variance tells the filter how much the system is changing over time. If you lie to the filter by setting this number artificially low the filter will not be able to react to the changes.
It is easy to see that the filter is not correctly responding to the measurements which are clearly telling the filter that the dog is changing speed. I encourage you to adjust the amount of movement in the dog vs process variance. we will also be studying this topic much more in the following chapter.
Example: Bad Initial Estimate
Now let's lets look at the results when we make a bad initial estimate of position. To avoid obscuring the results I'll reduce the sensor variance to 30, but set the initial position to 1000m. Can the filter recover from a 1000m initial error?
Again the answer is yes! Because we are relatively sure about our belief in the sensor () even after the first step we have changed our belief in the first position from 1000 to somewhere around 60.0 or so. After another 5-10 measurements we have converged to the correct value! So this is how we get around the chicken and egg problem of initial guesses. In practice we would likely assign the first measurement from the sensor as the initial value, but you can see it doesn't matter much if we wildly guess at the initial conditions - the Kalman filter still converges so long as the filter variances are chosen to match the actual process and measurement variances.
Example: Large Noise and Bad Initial Estimate
What about the worst of both worlds, large noise and a bad initial estimate?
This time the filter does struggle. Notice that the previous example only computed 100 updates, whereas this example uses 1000. By my eye it takes the filter 400 or so iterations to become reasonable accurate, but maybe over 600 before the results are good. Kalman filters are good, but we cannot expect miracles. If we have extremely noisy data and extremely bad initial conditions, this is as good as it gets.
Finally, let's make the suggest change of making our initial position guess be the first sensor measurement.
This simple change significantly improves the results. On some runs it takes 200 iterations or so to settle to a good solution, but other runs it converges very rapidly. This all depends on whether the initial measurement had a small amount or large amount of noise. A large amount of noise causes the initial estimate to be far from the dog's position.
200 iterations may seem like a lot, but the amount of noise we are injecting is truly huge. In the real world we use sensors like thermometers, laser range finders, GPS satellites, computer vision, and so on. None have the enormous error as shown here. A reasonable variance for a cheap thermometer might be 4C, and our code is using 30,000.
Exercise: Interactive Plots
Implement the Kalman filter using IPython Notebook's animation features to allow you to modify the various constants in real time using sliders. Refer to the section Interactive Gaussians in the Gaussian chapter to see how to do this. You will use the interact()
function to call a calculation and plotting function. Each parameter passed into interact()
automatically gets a slider created for it. I have built the boilerplate for this; you fill in the required code.
Solution
One possible solution follows. We have sliders for the start position, the amount of noise in the sensor, the amount we move in each time step, and how much movement error there is. Movement error is perhaps the least clear - it models how much the dog wanders off course at each time step, so we add that into the dog's position at each step. I set the random number generator seed so that each redraw uses the same random numbers, allowing us to compare the graphs as we move the sliders.
Exercise - Nonlinear Systems
Our equations are linear:
Do you suppose that this filter works well or poorly with nonlinear systems?
Implement a Kalman filter that uses the following equation to generate the measurement value for i in range(100):
Adjust the variance and initial positions to see the effect. What is, for example, the result of a very bad initial guess?
Solution
Discussion
Here we set a bad initial guess of 100. We can see that the filter never 'acquires' the signal. Note now the peak of the filter output always lags the peak of the signal by a small amount, and how the filtered signal does not come very close to capturing the high and low peaks of the input signal.
If we recall the g-h filter chapter we can understand what is happening here. The structure of the g-h filter requires that the filter output chooses a value part way between the prediction and measurement. A varying signal like this one is always accelerating, whereas our process model assumes constant velocity, so the filter is mathematically guaranteed to always lag the input signal.
Maybe we didn't adjust things 'quite right'. After all, the output looks like an offset sin wave. Let's test this assumption.
Exercise - Noisy Nonlinear Systems
Implement the same system, but add noise to the measurement.
Solution
Discussion
This is terrible! The output is not at all like a sin wave, except in the grossest way. With linear systems we could add extreme amounts of noise to our signal and still extract a very accurate result, but here even modest noise creates a very bad result.
Very shortly after practitioners began implementing Kalman filters they recognized the poor performance of them for nonlinear systems and began devising ways of dealing with it. Much of the remainder of this book is devoted to this problem and its various solutions.
Derivation From Bayes Theorem (Optional)
I gave you the equations for the product of two Gaussians but did not derive them for you. They are a direct result of applying Bayes rules to Gaussians, however. You don't need to understand this section to read the rest of the book or to use Kalman filters. You may need to know this if you read the literature, much of which is written in terms of Bayes theorem. It's a bit of neat algebra, so why not read on?
We can state the problem as: let the prior be , and measurement be . What is the posterior x given the measurement z?
Write the posterior as . Now we can use Bayes Theorem to state
is a normalizing constant, so we can create a proportinality
Now we subtitute in the equations for the Gaussians, which are
We can drop the leading terms, as they are constants, giving us
Now we multiply out the squared terms and group in terms of the posterior .
The last parentheses do not contain the posterior , so it can be treated as a constant and discarded.
Divide numerator and denominator by to get
Proportionality lets us create or delete constants at will, so we can factor this into
A Gaussian is
So we can see that has a mean of
and a variance of
which are the equations that we have been using.
Summary
The one dimensional Kalman filter that we describe in this chapter is a special, restricted case of the more general filter. Most texts do not discuss this form of the Kalman filter at all. However, I think it is a vital stepping stone. We started the book with the g-h filter, then implemented the discrete Bayes filter, and now implemented the one dimensional Kalman filter. I have tried to show you that each of these filters use the same algorithm and reasoning. The mathematics of the Kalman filter that we will learn shortly is fairly sophisticated, and it can be difficult to understand the underlying simplicity of the filter. That sophistication comes with significant benefits: the filters we design in the next chapter will markedly outperform the filters in this chapter.
This chapter takes time to assimilate. To truly understand it you will probably have to work through this chapter several times. I encourage you to change the various constants in the code and observe the results. Convince yourself that Gaussians are a good representation of a unimodal belief of the position of a dog in a hallway, the position of an aircraft in the sky, or the temperature of a chemical reaction chamber. Then convince yourself that multiplying Gaussians truly does compute a new belief from your prior belief and the new measurement. Finally, convince yourself that if you are measuring movement, that adding the Gaussians together updates your belief.
Most of all, spend enough time with the Full Description of the Algorithm section to ensure you understand the algorithm and how it relates to the g-h filter and discrete Bayes filter. There is just one 'trick' here - selecting a value somewhere between a prediction and a measurement. Each algorithm performs that 'trick' with different math, but all use the same logic.
If you understand this, you will be able to understand the generalized Kalman filter in the next chapter and the various extensions that have been make on them. If you do not fully understand this, reread this chapter. Try implementing the filter from scratch, just by looking at the equations and reading the text. Change the constants. Maybe try to implement a different tracking problem, like tracking stock prices. Experimentation will build your intuition and understanding of how these marvelous filters work.