Over the past decade or so, there have been significant advancements in the field of robotics, e.g., Boston Dynamics’ Spot, Da Vinci surgical robot, Tesla’s Autopilot and more. I had some experience working around robots but I’ve never really built my own from scratch, components, software, and all. To remedy this, I decided to take up building a robot as a “little” side project. Arguably, one of the most important aspects is state estimation, the problem of accurately determining variables about your robot, e.g., its position with respect to some global frame, velocity, acceleration, IMU biases, and other dynamical variables, given the robot’s sensors and physics kinematics of the robot. Previously, I didn’t have to worry about this since there was always some other team responsible for state estimation whose output I would use. But since I was constructing my own robot, I had to do this leg work myself. (Of course, I could have used an off-the-shelf product, but where’s the fun in that?) To that goal, this post aims to describe the underpinnings of a very common approach to state estimation: the extended kalman filter (EKF).

Kalman Filters

You probably read the title and thought, “wait, what’s a Kalman Filter in the first place? Shouldn’t we discuss that before extending it?” You’re absolutely right! I was planning on doing a writeup of Kalman Filters, but then I found this fantastic post: How a Kalman filter works, in pictures. (I have no affiliation with the author; I just thought the post was written really well, and I wouldn’t have many additional things to say about the topic.) Skim through that first before continuing so you get the gist of Kalman Filters.

I have just a few additions to make on top of that post:

I think the motion model is pretty intuitive, but the measurement model confused me the very first time I encountered it so I’ll provide a slightly different interpretation with some concrete examples. Suppose our state is like that in the post:

\[\hat{x}_k = \begin{bmatrix}p_k\\ v_k\end{bmatrix}\]

The measurement/observation model is a matrix $H_k$ that maps the state space into the sensor space. Imagine driving our robot along and taking a sensor measurement; where that sensor measurement lies is dependent on where we are and how fast we’re going, i.e., our state, when we multiply it by $\hat{x}_k$, we’re computing the estimated sensor reading. This is why $H_k$ acts on the state vector. Think of this as a way to map our state space to our observation space.

To give a more concrete example, suppose our robot had GPS, which we could use to sense the position component of our state vector but not the velocity component. Furthermore, the mapping is direct: the whole purpose of the GPS sensor is to tell you the position (plus a noise error). Then our $H_k$ matrix would look like the following:

\[\begin{aligned} \begin{bmatrix} 1 & 0 \end{bmatrix}\begin{bmatrix} p_k\\v_k \end{bmatrix} &= \begin{bmatrix} p_k \end{bmatrix}\\ H_k\hat{x}_k &= \begin{bmatrix} p_k \end{bmatrix}\\ \end{aligned}\]

In a slightly more complicated (and realistic) example in 2D space, we can still compute the $H_k$ matrix:

\[\begin{aligned} \begin{bmatrix} 1 & 0 & 0 & 0\\0 & 1 & 0 & 0 \end{bmatrix}\begin{bmatrix} p^{(1)}_k\\p^{(2)}_k\\v^{(1)}_k\\v^{(2)}_k \end{bmatrix} &= \begin{bmatrix} p^{(1)}_k\\p^{(2)}_k \end{bmatrix}\\ H_k\hat{x}_k &= \begin{bmatrix} p^{(1)}_k\\p^{(2)}_k \end{bmatrix} \end{aligned}\]

Note the superscripts represent spatial dimensions, not exponents!

The final thing I’ll say is about the intuitive meaning of the Kalman Gain $P_k H^T_k(H_k P_k H^T_k + R_k)^{-1}$. This is probably the most complicated-looking quantity in the Kalman Filter equations, but it has a very intuitive explanation. To see this, let’s rewrite it a little:

\[K = \displaystyle\frac{P_k H^T_k}{H_k P_k H^T_k + R_k}\]

It has two “inputs”: $P_k$, the covariance around the state, and $R_k$, the covariance around the sensor measurement.

Let’s consider the case where $P_k$ has small values and $R_k$ has large values. This means we’re very certain about our state but more uncertain about our sensors. Making these adjustments, we’ll see that $K$ will have very small values, and that means we won’t use our sensor measurements as much:

\[\hat{x}_k' = \hat{x}_k + K(z_k-H_k\hat{x}_k)\]

If $K$ has small values, then $\hat{x}_k’$ is dominated by $\hat{x}_k$, our forward-projected state.

Now let’s consider the opposite case: $R_k$ has small values and $P_k$ has large values. This means that there’s less uncertainty around our sensor measurements, i.e., our sensors are very accurate, and more uncertainty around our state. In that case, we definitely want to use our sensors to update our state since it will give us a better estimate than if we had very inaccurate sensors. Notice when $R_k\rightarrow 0$, $K\rightarrow H_k^{-1}$ (I’m being a little sloppy with the notation), which is just the matrix we use to map back to the state space from the observation space.

\[\hat{x}_k' = \hat{x}_k + K(z_k-H_k\hat{x}_k)\]

If $K$ has larger values, then $\hat{x}_k’$ will use both $\hat{x}_k$ and the sensor measurement $z_k-H_k\hat{x}_k$.

Extended Kalman Filters (EKFs)

Now that we have an understanding of the basics of Kalman Filters, we can extend them to work well for a wider range of problems. In the previous section, we saw how the Kalman Filter can be used to estimate our robot’s state using just a few different matrices. One significant caveat about the Kalman Filter is that it’s a linear filter!

Take another look at our predict and update steps: everything in them is linear. This is a problem because we’re relying on Gaussians: applying linear functions to a Gaussian will produce a Gaussian but applying a nonlinear function to a Gaussian might not!

This is the folly of Kalman Filters: they’re really good at modeling linear systems, but the world has many nonlinear systems. A more powerful and accurate, representation would be a nonlinear one.

So how can we construct a nonlinear version of Kalman Filters? Well we know that Kalman Filters work well for linear systems; if we can come up with a way to linearize our nonlinear system at the current estimate, then we can simply use the exact same Kalman Filter mechanics to solve our problem!

Armed with a little calculus knowledge, we can come up with a way to create a linear approximation of a system at a particular value. To see this, let’s use a simple example.

Suppose we have a parabola, e.g., the function $f(x) = x^2$. We want to create a linear approximation of our quadratic function at the point $x=2$. In other words, we want the tangent line at $x=2$. Using a little calculus, we could just compute the derivative $\displaystyle\frac{\mathrm{d}f}{\mathrm{d}x}$ and evaluate it at $x=2$. This would give us the slope of the tangent line at that point, and we can use that line to estimate our function around $x=2$. Notice that the farther away we move from $x=2$, the worse our estimate gets so it’s only good local to $x=2$, which is what makes this a local approximation.

Tangent line

Plot of $x^2$ in blue while the tangent line at $x=2$ is shown in orange.

Mathematically, we can represent our new approximation using the following function, where $a=2$.

\[\begin{aligned} \tilde{f}(x) &= f(a) + f'(a)\cdot(x-a)\\ \tilde{f}(x) &= f(2) + f'(2)\cdot(x-2)\\ &= 4 + 2(2)\cdot(x-2)\\ &= 4 + 4(x-2)\\ &= 4x - 4 \end{aligned}\]

What we’re actually doing is creating the Taylor series of $f$ at $x=2$, particularly just the first-order Taylor series, i.e., a line.

Notice the slope of this line is correctly $f’(a)$. The y-intercept is a bit more complicated, but can be worked out using some algebra.

So where does this apply to Kalman Filters? For EKFs, we replace the motion and sensor models with nonlinear functions $f(\hat{x}_{k-1}, u_k)$ and $h(\hat{x}_k)$, respectively. One important thing to note is that these functions accept vectors as inputs and produce vector outputs! This means we can’t directly apply the Taylor series equation we’ve seen to linearize our input. In other words, writing $\displaystyle\frac{\mathrm{d}h}{\mathrm{d}\hat{x}_k}$ is ambiguous; which output of $h$ are we referring to? Instead, I want to know how changing a particular input will affect a particular output. For this case, we need to compute all of the partial derivatives of our each output with respect to each input.

Let’s consider with an example we’ve seen before:

\[\begin{aligned} p_k &= p_{k-1} + \Delta t v_{k-1}\\ v_k &= \hphantom{p_{k-1}} \hphantom{+} \hphantom{\Delta t~~} v_{k-1} \end{aligned}\]

Recall this is just the Kalman Filter’s motion model. Let me re-write this in a slightly different way.

\[\begin{aligned} f^{(1)} &= x^{(1)} + \Delta t x^{(2)}\\ f^{(2)} &= \hphantom{x^{(1)} + \Delta t } x^{(2)} \end{aligned}\]

where the superscripts are actually vector components. So the above has $\mathbf{f}(\mathbf{x})$ where all of the $1$ components are $p$ and the $2$ components are $v$. I’ve also dropping the subscripts for now.

Now we have to be more specific when we compute partial derivatives. Consider the partial derivative $\displaystyle\frac{\partial f^{(1)}}{\partial x^{(1)}}$. This tells us how much the first component of $f$ changes with the first component of the input vector. In other words, this tells us how much the position at $k$ changes when the position at $k-1$ changes, holding the velocity constant.

$\displaystyle\frac{\partial f^{(2)}}{\partial x^{(1)}}$ tells us how much the second component of $f$ changes with the first component of the input vector. $\displaystyle\frac{\partial f^{(1)}}{\partial x^{(2)}}$ tells us how much the first component of $f$ changes with the second component of the input vector.

Let’s take all 4 partial derivatives:

\[\begin{aligned} \displaystyle\frac{\partial f^{(1)}}{\partial x^{(1)}} &= 1\\ \displaystyle\frac{\partial f^{(1)}}{\partial x^{(2)}} &= \Delta t\\ \displaystyle\frac{\partial f^{(2)}}{\partial x^{(1)}} &= 0\\ \displaystyle\frac{\partial f^{(2)}}{\partial x^{(2)}} &= 1\\ \end{aligned}\]

A better format would be to arrange them into a matrix:

\[\begin{aligned} J &= \begin{bmatrix} \displaystyle\frac{\partial f^{(1)}}{\partial x^{(1)}} & \displaystyle\frac{\partial f^{(1)}}{\partial x^{(2)}}\\ \displaystyle\frac{\partial f^{(2)}}{\partial x^{(1)}} & \displaystyle\frac{\partial f^{(2)}}{\partial x^{(2)}} \end{bmatrix}\\ &= \begin{bmatrix} 1 & \Delta t\\ 0 & 1 \end{bmatrix} \end{aligned}\]

This is called the Jacobian Matrix $J$. It’s a generalization of first derivatives for functions that accept vector inputs and produce vector outputs. When using these kinds of functions, anywhere we have first derivatives, we can put the Jacobian $J$ in its place.

For example, let’s take a look at our Taylor Series expansion again:

\[\tilde{f}(x) = f(a) + f'(a)\cdot(x-a)\\\]

But since $f$ is now a vector-valued function, we replace the derivative with the Jacobian and the equation has the identical form:

\[\tilde{\mathbf{f}}(\mathbf{x}) = \mathbf{f}(\mathbf{a}) + J(\mathbf{a})\cdot(\mathbf{x}-\mathbf{a})\\\]

So the Jacobian here is being used to create linear approximations at a vector $\mathbf{a}$, just like derivatives could with the scalar $a$!

Also, notice that we’ve shown $J=F$, the motion model. This shows EKFs are more general that just Kalman Filters: using the EKF formulation on a function that’s already linear just gives us the Kalman Filter! (This notion is actually far broader concept: linear functions of any variables can always be written in matrix form.)

Now that we know about the Jacobian, let’s see what our Extended Kalman Filter equations look like:

\[\begin{align} \hat{x}_k &= f(\hat{x}_{k-1}, u_k)\\ P_k &= F_k \hat{x}_{k-1} F^T_k + Q_k\\ \hat{x}_k' &= \hat{x}_k + K(z_k-h(\hat{x}_k))\\ P_k' &= P_k - KH_k P_k\\ K &= P_k H^T_k(H_k P_k H^T_k + R_k)^{-1} \end{align}\]

They look almost identical! First, we have a motion (and control) model function $f$ and a sensor model function $h$, both of which need not be linear! (But as we’ve shown, it’s OK if they are because then the EKF reduces to just the Kalman Filter!)

Also, the definitions of the $F$ and $H$ matrices have to change to be the Jacobians of $f$ and $h$, respectively.

\[\begin{aligned} F_k &= \displaystyle\displaystyle\frac{\partial f}{\partial \mathbf{x}}\Bigr|_{\mathbf{x}=x_{k-1},u_k}\\ H_k &= \displaystyle\displaystyle\frac{\partial h}{\partial \mathbf{x}}\Bigr|_{\mathbf{x}=x_{k-1}} \end{aligned}\]

Keep in mind that $f$ and $h$ accept vectors as inputs and return vectors as outputs. Besides those two changes, the EKF equations look identical to the Kalman Filter ones!

Let’s finish our discussion on the EKF with an example of computing the Jacobian of the sensor model function $h$. For this example, suppose our state vector contains four quantities: 2D position and 2D velocity. In other words, we have a robot in a plane.

\[x_k = \begin{bmatrix} p^{(1)}_k \\ p^{(2)}_k \\ v^{(1)}_k \\ v^{(2)}_k \end{bmatrix}\]

Our robot is equipped with a range-bearing sensor that can give us an angle $\theta$ and radius $r$.

Bearing sensor

Example of a bearing sensor detecting an object with respect to the robot frame.

Since our sensor model maps our state space into our sensor space, $h$ is a function that computes $\theta$ and $r$ given our state information, such as the object’s location relative to our robot $(b^{(x)}, b^{(y)})$.

With some geometry, we can figure out what $h$ should be.

\[h(\hat{x}_k)=\begin{bmatrix}r\\\theta\end{bmatrix} =\begin{bmatrix}\sqrt{b^{(x)}\cdot b^{(x)} + b^{(y)}\cdot b^{(y)}}\\\arctan{\displaystyle\frac{b^{(y)}}{b^{(x)}}}\end{bmatrix}\]

Now we can compute the Jacobian $H$ by looking at all of the partial derivatives of $h$.

\[H=\begin{bmatrix} \displaystyle\frac{\partial r}{\partial b^{(x)}} & \displaystyle\frac{\partial r}{\partial b^{(y)}} \\ \displaystyle\frac{\partial \theta}{\partial b^{(x)}} & \displaystyle\frac{\partial \theta}{\partial b^{(y)}} \\ \end{bmatrix} =\begin{bmatrix} \displaystyle\frac{b^{(x)}}{\sqrt{b^{(x)}\cdot b^{(x)} + b^{(y)}\cdot b^{(y)}}} & \displaystyle\frac{b^{(y)}}{\sqrt{b^{(x)}\cdot b^{(x)} + b^{(y)}\cdot b^{(y)}}}\\ -\displaystyle\frac{b^{(y)}}{b^{(x)}\cdot b^{(x)} + b^{(y)}\cdot b^{(y)}} & \displaystyle\frac{b^{(x)}}{b^{(x)}\cdot b^{(x)} + b^{(y)}\cdot b^{(y)}}\\ \end{bmatrix}\]

(I’ll leave the derivations as an exercise.)

With this Jacobian computed, we can simply use the equations of Kalman Filters.


A fundamental problem in robotics is state estimation, and one of the most common ways to solve this is with Kalman Filters. But Kalman Filters have the fundamental limitation that they only work for linear systems, and many systems we’re interested in and want to model are nonlinear. We can still use the Kalman Filter machinery if we take a linear approximation of our nonlinear system at its current state. The Jacobian is a general way to find a linear approximation locally; think of it like finding the slope of a tangent line at a point, but for functions with more than one inputs and outputs. Using the Jacobian, we have a linear approximation and can simply reuse most of the equations of Kalman Filters.

Writing this post reminded me how interesting and multidisciplinary of a field robotics is; expect more articles on this topic in the future 🙂