Kalman filter from scratch
The Bayes filter under a Gaussian assumption. Fifty lines of Python, worked numerically, with the intuition for why every step is what it is.
The Kalman filter is the simplest specialization of the Bayes filter: assume everything is Gaussian and linear, and the two-step recursion becomes matrix algebra. Fifty lines of code, runs at kilohertz. Understand it once and every advanced estimator is a recognizable variation.
The assumptions
- State evolves linearly:
x_t = F·x_{t-1} + B·u_t + w, with process noisew ~ N(0, Q). - Measurement is linear in state:
z_t = H·x_t + v, with measurement noisev ~ N(0, R). - Initial belief is Gaussian:
x_0 ~ N(μ_0, P_0).
Under these assumptions, the belief stays Gaussian forever — the posterior is still a Gaussian, just with different mean and covariance. We track two things: mean μ and covariance matrix P.
The five equations
That's all a Kalman filter is. Five equations. Predict (2), then update (3).
Predict
μ' = F·μ + B·u # mean under motion
P' = F·P·FT + Q # covariance grows by process noise
Update
y = z − H·μ' # innovation: measurement minus prediction
S = H·P'·HT + R # innovation covariance
K = P'·HT·S-1 # Kalman gain
μ = μ' + K·y # mean after measurement
P = (I − K·H)·P' # covariance shrinks
That's everything. The Kalman gain K is the key magic: it tells you how much to trust the new measurement vs. the prediction, balanced by their respective uncertainties. High sensor noise → small K (stick with the prediction). Small sensor noise → large K (follow the measurement).
Worked example: tracking a falling ball
State: position and velocity, x = [p; v]. We see the position with noise; we know gravity. Dynamics (in 1D, pointing up):
F = | 1 dt | B = | 0 | H = [1 0]
| 0 1 | | dt |
Q = | dt²·q 0 | R = [sigma_z²]
| 0 q |
We measure position only (H = [1 0]). We assume gravity is known: u = -9.81, so the state evolves as x' = F·x + B·u.
import numpy as np
dt = 0.01
F = np.array([[1, dt], [0, 1]])
B = np.array([[0], [dt]])
H = np.array([[1, 0]])
Q = np.diag([1e-4, 1e-3])
R = np.array([[0.1]]) # measurement noise std 0.3m
mu = np.array([[10.0], [0.0]]) # start at 10m, zero velocity
P = np.diag([1.0, 1.0])
true_p = 10.0
true_v = 0.0
g = -9.81
for t in range(200):
# Simulate the world
true_v = true_v + g * dt
true_p = true_p + true_v * dt
z = np.array([[true_p + np.random.randn() * 0.3]])
# Predict
u = np.array([[g]])
mu = F @ mu + B @ u
P = F @ P @ F.T + Q
# Update
y = z - H @ mu
S = H @ P @ H.T + R
K = P @ H.T @ np.linalg.inv(S)
mu = mu + K @ y
P = (np.eye(2) - K @ H) @ P
if t % 20 == 0:
print(f't={t*dt:.2f} truth=({true_p:.2f}, {true_v:.2f}) '
f'est=({mu[0,0]:.2f}, {mu[1,0]:.2f})')
The estimate tracks the truth. The filter recovers velocity even though we only measure position — by knowing the dynamics (position changes come from velocity), it infers the unobserved state.
That's the Kalman filter's party trick: it estimates states you never directly sensed.
Why the covariance matters
The diagonal of P is the variance of each state — in our example, the uncertainty about position and about velocity. Plot them over time:
- Right after a measurement, position uncertainty drops sharply (we just measured it).
- Velocity uncertainty drops slowly (inferred, not measured directly).
- Between measurements, both grow (process noise accumulating).
The filter gives you calibrated uncertainty for every estimate. That's what downstream code (controllers, planners) uses to make decisions under uncertainty. A PID that ignores this dies by noise; an MPC that uses it stays graceful.
Tuning Q and R
The two matrices every Kalman engineer spends time on:
R— measurement noise. Usually measurable: still the sensor, compute the variance of its readings.Q— process noise. Harder to measure. Represents unmodeled dynamics, disturbances, anything your state equation doesn't capture. Usually tuned by hand.
If the filter diverges: Q is too small (you're over-trusting the model). If it's sluggish: Q is too big (you're ignoring the dynamics and chasing noise). The fast way to tune: start with something reasonable, run on real data, check that the innovations y are approximately zero-mean with variance matching S. If innovations are too big relative to S, R or Q is underspecified.
Where the Kalman filter stops working
- Nonlinear dynamics. Dynamics like
θ̇ = v · cos(θ)aren't linear. Fix: Extended Kalman filter (linearize around the current mean) or UKF (propagate sigma points). - Non-Gaussian noise. A lidar that sometimes gives outliers. Fix: robust estimation, or move to particle filters.
- Multimodal beliefs. "I'm either on aisle 2 or aisle 5." A single Gaussian can't represent this. Fix: particle filter, Gaussian mixture filter, factor graphs.
Every advanced estimator is "a Kalman filter with one of the assumptions relaxed." That's why this chapter is the foundation.
Next
Extended and unscented Kalman filters — the two ways to handle nonlinearity while staying in the Gaussian-world. Then particle filters, factor graphs, and modern SLAM.
Comments
Sign in to post a comment.