Published2026-04-23·~14 min

The Bayes filter: the one equation behind everything

Predict, then update. The two-step pattern every state estimator — Kalman, EKF, particle filter, SLAM — is a special case of. Learn it once and SLAM becomes readable.

by RobotForge

#slam#estimation#probability#fundamentals

Every state estimator a robot uses — Kalman filter, EKF, particle filter, GraphSLAM — is a specialization of the same two-step recursion. Learn the generic form once and every filter paper becomes an exercise in "what's their approximation?"

The scene

At time t, the robot has:

A state x_t — pose, velocity, whatever you want to estimate
A control command u_t it just issued
A sensor measurement z_t it just received

What it really has is belief about its state — a probability distribution, not a single value. The Bayes filter tracks how that belief evolves.

Predict spreads the belief by applying the motion model; Update sharpens it by incorporating the new measurement. Every state estimator is this loop.

The two steps

Define bel(x_t) as the current belief: the probability of each possible state given everything observed so far. The Bayes filter updates it in two steps.

Step 1: Prediction

Apply the motion model. "If I was here and I commanded this motion, where am I now?"

bel'(x_t) = ∫ p(x_t | x_{t-1}, u_t) · bel(x_{t-1}) d x_{t-1}

Read right to left: take the old belief, weigh each possible old state by how likely it would be to produce the current state under the commanded motion, sum over all old states.

Effect: the belief gets more spread out, because the motion model is noisy. You became less certain by moving.

Step 2: Update

Incorporate the measurement. "Given what I just saw, which of my predicted states is most likely?"

bel(x_t) = η · p(z_t | x_t) · bel'(x_t)

Where η is a normalization constant that makes the distribution sum to 1. Multiply the predicted belief by the measurement likelihood.

Effect: the belief gets sharper, because you received information. You became more certain by measuring.

That's the whole thing

Every estimation algorithm is "make these two integrals tractable." The choice of approximation gives you a different filter:

Kalman filter: assume everything is Gaussian, so the belief is a Gaussian, so the integrals become matrix algebra. 50 lines of code, runs at kilohertz.
Extended Kalman filter (EKF): same but linearize the motion/measurement models around the current mean. Good when nonlinearity is mild.
Unscented Kalman filter (UKF): propagate sigma points through the real nonlinear models instead of linearizing.
Particle filter: represent the belief as a cloud of samples. Any distribution, not just Gaussian. Expensive but flexible — ROS's amcl is one.
Histogram filter: discretize the state space into bins. Good for 1D and 2D problems, blows up in higher dimensions.

The algorithms disagree on the approximation. They agree on the recursion.

A 1D worked example

A robot on a number line. State is position. Motion is "I moved 1 meter." Measurement is "I see a landmark at x = 5." Let's run one iteration.

Prior belief (before anything happened): Gaussian at x = 0 with standard deviation 1.0.

Motion model: when I command "move 1," the true motion is Gaussian with mean 1 and standard deviation 0.3.

Measurement model: landmark sensed at 5 with Gaussian noise of standard deviation 0.5.

import numpy as np
from scipy.stats import norm

# Prior
mu, sigma = 0.0, 1.0

# Prediction step: add motion mean, variances add
mu = mu + 1.0
sigma = np.sqrt(sigma**2 + 0.3**2)
print(f'After prediction: mean={mu:.3f} std={sigma:.3f}')

# Update step: measurement at z=5, sensor std=0.5
# For two Gaussians, the posterior mean is a variance-weighted average
z, r = 5.0, 0.5
mu_post = (sigma**2 * z + r**2 * mu) / (sigma**2 + r**2)
sigma_post = np.sqrt((sigma**2 * r**2) / (sigma**2 + r**2))
print(f'After update:     mean={mu_post:.3f} std={sigma_post:.3f}')

After prediction: belief is at ~1.0 with spread ~1.04. After update: pulled toward the measurement (5), sharpened. That's a Kalman filter in 10 lines.

Why this matters for SLAM

SLAM is "estimate my pose and the map simultaneously." The state is pose + landmark positions. Everything else — motion model, measurement model, prediction, update — is the same Bayes filter. EKF-SLAM does it with one big Gaussian; GraphSLAM does it as an optimization over all past poses; FastSLAM factors the problem across landmarks. All Bayes filter variants.

Once you see this, every SLAM paper becomes: "here's our state representation, here's our motion model, here's our measurement model, here's our trick to make it scale." The recursion is the same.

Assumptions that are always wrong (slightly)

The Bayes filter assumes the Markov property: the current state contains all the information needed to predict the future. In real robots, the state you can actually track never captures everything — unmodeled dynamics, environmental effects, sensor biases. The result: all real filters drift. SLAM's job is to detect drift (via loop closure) and correct it.

What to learn next

Next lesson: the Kalman filter from scratch — the simplest Bayes filter specialization, worked numerically. After that, EKF (when the world is non-Gaussian), particle filters (when the world is non-Gaussian in a worse way), and factor graphs (the modern formulation).

Kalman filter from scratch