Published2026-04-25·~16 min

Probability and statistics for state estimation

Bayes, Gaussians, and the covariance matrix. The math behind every Kalman filter, particle filter, and SLAM paper — distilled to the parts a roboticist actually uses.

by RobotForge

#foundations#probability#estimation

Half of robotics math is probability. The good news: state estimation needs a thin slice of probability theory — Gaussians, Bayes, and what a covariance matrix represents. Master those and you can read every Kalman/SLAM paper without the math fogging up.

Random variables and distributions

A random variable X is a quantity whose value depends on chance. Its distribution describes which values are likely. Two flavors:

Discrete: X takes values from a finite list. Distribution is a probability mass function $P (X = x)$ .
Continuous: X takes values from a real range. Distribution is a probability density function $p (x)$ where $\int p (x) d x = 1$ .

Robotics uses continuous variables almost exclusively (position, velocity, orientation are all continuous).

Mean and variance — the two-number summary

μ = E [X] = \int x p (x) d x

σ^{2} = Var (X) = E [(X - μ)^{2}]

Mean is the "expected value" — where the distribution is centered. Variance is how spread out it is. Standard deviation σ is just √(variance) and has the same units as X.

For a Gaussian (the default in robotics), the entire distribution is described by μ and σ². No more parameters needed. That's why Kalman filters track only mean + covariance.

The Gaussian (a.k.a. normal)

p (x) = \frac{1}{σ 2 π} exp (- \frac{( x - μ ) ^{2}}{2 σ ^{2}})

Bell-shaped curve, centered at μ, with width σ. Three reasons it dominates robotics:

Central limit theorem: sums of many independent random variables tend to Gaussian. Sensor noise from many small sources ≈ Gaussian.
Closed under linear ops: applying $y = A x + b$ to a Gaussian gives a Gaussian. Multiplying two Gaussians (e.g. combining a prior with a likelihood) gives a Gaussian.
Computational simplicity: track just μ and σ² (or in higher dimensions, μ and Σ). Everything reduces to matrix algebra.

Multivariate Gaussian — the workhorse

For an n-dimensional state X, the multivariate Gaussian is parameterized by mean vector μ ∈ ℝⁿ and covariance matrix Σ ∈ ℝⁿˣⁿ:

p (x) = \frac{1}{( 2 π ) ^{n} ∣Σ∣} exp (- \frac{1}{2} (x - μ)^{T} Σ^{- 1} (x - μ))

The covariance matrix Σ tells you:

Diagonal entries $Σ_{ii}$ : variance of each state dimension.
Off-diagonal entries $Σ_{ij}$ : how dimensions co-vary. Positive = move together; negative = opposite; zero = uncorrelated.

Geometrically, the covariance matrix's eigenvectors are the principal axes of the uncertainty ellipsoid; the eigenvalues are its squared semi-axes. A "narrow ellipsoid" means tight estimate; a "wide ellipsoid" means uncertain.

Bayes' rule — the engine of inference

P (A ∣ B) = \frac{P ( B ∣ A ) P ( A )}{P ( B )}

Read out loud: "the probability of A given B equals the probability of B given A, times the prior probability of A, divided by a normalization." In estimation we use Bayes for: given my observation, what's the posterior over my state?

p (x_{t} ∣ z_{t}) \propto p (z_{t} ∣ x_{t}) p (x_{t})

$p (x_{t})$ — the prior: belief before the measurement
$p (z_{t} ∣ x_{t})$ — the likelihood: how well does state x explain measurement z?
$p (x_{t} ∣ z_{t})$ — the posterior: updated belief

Every state estimator (Kalman, EKF, particle filter) is a tractable approximation of this update.

Independence and conditional independence

Two random variables are independent if $p (x, y) = p (x) p (y)$ . They're conditionally independent given Z if $p (x, y ∣ z) = p (x ∣ z) p (y ∣ z)$ . The second is what makes graph-based SLAM tractable: given the robot's pose, two landmark observations are conditionally independent, so the joint factors into a graph of small terms.

The four operations you'll do constantly

1. Multiply two Gaussians (sensor fusion)

Two independent Gaussian beliefs over the same state combine into one Gaussian. In 1D:

μ_{post} = \frac{σ _{2}^{2} μ _{1} + σ _{1}^{2} μ _{2}}{σ _{1}^{2} + σ _{2}^{2}}, σ_{post}^{2} = \frac{σ _{1}^{2} σ _{2}^{2}}{σ _{1}^{2} + σ _{2}^{2}}

The posterior is sharper than either input (smaller variance) and weighted toward the more confident measurement.

2. Linearize and propagate

Pass a Gaussian through a linear function: $y = A x + b$ . The result is Gaussian with:

μ_{y} = A μ_{x} + b, Σ_{y} = A Σ_{x} A^{T}

This is what the prediction step of every Kalman filter does. For nonlinear functions, EKF linearizes (uses Jacobians) and applies the same formula.

3. Marginalize

If you have a joint Gaussian over (x, y) and want just x, drop y by reading off the corresponding mean and covariance blocks. Easy. Marginalizing in non-Gaussian distributions is generally hard.

4. Condition

If you have a joint over (x, y) and observe y = y₀, the conditional distribution of x given y₀ is also Gaussian, with closed-form formulas. This is the update step of Kalman filters in disguise.

Common probability mistakes in robotics code

Treating the inverse of σ as the right "weight." Use $1/ σ^{2}$ (the precision) when combining; $1/ σ$ appears nowhere natural.
Mixing standard deviation and variance. A "1 cm noise" sensor has σ = 0.01 m, σ² = 0.0001 m². Many bugs come from squaring or not squaring at the wrong place.
Adding standard deviations. Variance adds: $σ_{total}^{2} = σ_{1}^{2} + σ_{2}^{2}$ , not σ. The standard deviation of the sum is $σ_{1}^{2} + σ_{2}^{2}$ .
Trusting Gaussianity. Real sensor noise has heavy tails (outliers). Robust estimators (Huber loss, RANSAC) exist for a reason.

Exercise

Two GPS sensors give location estimates: sensor A says you're at x = 10 ± 2 m, sensor B says x = 8 ± 1 m. Fuse them. (Use the multiplication formula above.) Then simulate 1000 noisy GPS readings and verify your formula matches the empirical mean and variance. This 30-minute Python exercise is the smallest Kalman filter in the world.

Calculus for robots — derivatives, gradients, and the Jacobian (the matrix-valued partial derivative that connects joint velocities to end-effector velocities).

← Previous

Linear algebra refresher, robotics edition

Calculus for robots: derivatives, gradients, Jacobians