Published2026-04-23·~16 min

Kalman filter from scratch

The Bayes filter under a Gaussian assumption. Fifty lines of Python, worked numerically, with the intuition for why every step is what it is.

by RobotForge

#slam#estimation#kalman

The Kalman filter is the simplest specialization of the Bayes filter: assume everything is Gaussian and linear, and the two-step recursion becomes matrix algebra. Fifty lines of code, runs at kilohertz. Understand it once and every advanced estimator is a recognizable variation.

The assumptions

State evolves linearly: x_t = F·x_{t-1} + B·u_t + w, with process noise w ~ N(0, Q).
Measurement is linear in state: z_t = H·x_t + v, with measurement noise v ~ N(0, R).
Initial belief is Gaussian: x_0 ~ N(μ_0, P_0).

Under these assumptions, the belief stays Gaussian forever — the posterior is still a Gaussian, just with different mean and covariance. We track two things: mean μ and covariance matrix P.

The five equations

That's all a Kalman filter is. Five equations. Predict (2), then update (3).

Predict

μ' = F·μ + B·u                  # mean under motion
P' = F·P·F^T + Q              # covariance grows by process noise

Update

y = z − H·μ'                    # innovation: measurement minus prediction
S = H·P'·H^T + R              # innovation covariance
K = P'·H^T·S^-1             # Kalman gain

μ = μ' + K·y                    # mean after measurement
P = (I − K·H)·P'                # covariance shrinks

That's everything. The Kalman gain K is the key magic: it tells you how much to trust the new measurement vs. the prediction, balanced by their respective uncertainties. High sensor noise → small K (stick with the prediction). Small sensor noise → large K (follow the measurement).

Worked example: tracking a falling ball

State: position and velocity, x = [p; v]. We see the position with noise; we know gravity. Dynamics (in 1D, pointing up):

F = | 1  dt |     B = | 0          |      H = [1  0]
    | 0   1 |         | dt         |

Q = | dt²·q   0 |     R = [sigma_z²]
    |    0   q |

We measure position only (H = [1 0]). We assume gravity is known: u = -9.81, so the state evolves as x' = F·x + B·u.

import numpy as np

dt = 0.01
F = np.array([[1, dt], [0, 1]])
B = np.array([[0], [dt]])
H = np.array([[1, 0]])
Q = np.diag([1e-4, 1e-3])
R = np.array([[0.1]])          # measurement noise std 0.3m

mu = np.array([[10.0], [0.0]])  # start at 10m, zero velocity
P = np.diag([1.0, 1.0])

true_p = 10.0
true_v = 0.0
g = -9.81

for t in range(200):
    # Simulate the world
    true_v = true_v + g * dt
    true_p = true_p + true_v * dt
    z = np.array([[true_p + np.random.randn() * 0.3]])

    # Predict
    u = np.array([[g]])
    mu = F @ mu + B @ u
    P = F @ P @ F.T + Q

    # Update
    y = z - H @ mu
    S = H @ P @ H.T + R
    K = P @ H.T @ np.linalg.inv(S)
    mu = mu + K @ y
    P = (np.eye(2) - K @ H) @ P

    if t % 20 == 0:
        print(f't={t*dt:.2f}  truth=({true_p:.2f}, {true_v:.2f})  '
              f'est=({mu[0,0]:.2f}, {mu[1,0]:.2f})')

The estimate tracks the truth. The filter recovers velocity even though we only measure position — by knowing the dynamics (position changes come from velocity), it infers the unobserved state.

That's the Kalman filter's party trick: it estimates states you never directly sensed.

Why the covariance matters

The diagonal of P is the variance of each state — in our example, the uncertainty about position and about velocity. Plot them over time:

Right after a measurement, position uncertainty drops sharply (we just measured it).
Velocity uncertainty drops slowly (inferred, not measured directly).
Between measurements, both grow (process noise accumulating).

The filter gives you calibrated uncertainty for every estimate. That's what downstream code (controllers, planners) uses to make decisions under uncertainty. A PID that ignores this dies by noise; an MPC that uses it stays graceful.

Tuning Q and R

The two matrices every Kalman engineer spends time on:

R — measurement noise. Usually measurable: still the sensor, compute the variance of its readings.
Q — process noise. Harder to measure. Represents unmodeled dynamics, disturbances, anything your state equation doesn't capture. Usually tuned by hand.

If the filter diverges: Q is too small (you're over-trusting the model). If it's sluggish: Q is too big (you're ignoring the dynamics and chasing noise). The fast way to tune: start with something reasonable, run on real data, check that the innovations y are approximately zero-mean with variance matching S. If innovations are too big relative to S, R or Q is underspecified.

Where the Kalman filter stops working

Nonlinear dynamics. Dynamics like θ̇ = v · cos(θ) aren't linear. Fix: Extended Kalman filter (linearize around the current mean) or UKF (propagate sigma points).
Non-Gaussian noise. A lidar that sometimes gives outliers. Fix: robust estimation, or move to particle filters.
Multimodal beliefs. "I'm either on aisle 2 or aisle 5." A single Gaussian can't represent this. Fix: particle filter, Gaussian mixture filter, factor graphs.

Every advanced estimator is "a Kalman filter with one of the assumptions relaxed." That's why this chapter is the foundation.

Extended and unscented Kalman filters — the two ways to handle nonlinearity while staying in the Gaussian-world. Then particle filters, factor graphs, and modern SLAM.

← Previous

The Bayes filter: the one equation behind everything

Extended and unscented Kalman filters