RobotForge
Published·~16 min

Probability and statistics for state estimation

Bayes, Gaussians, and the covariance matrix. The math behind every Kalman filter, particle filter, and SLAM paper — distilled to the parts a roboticist actually uses.

by RobotForge
#foundations#probability#estimation

Half of robotics math is probability. The good news: state estimation needs a thin slice of probability theory — Gaussians, Bayes, and what a covariance matrix represents. Master those and you can read every Kalman/SLAM paper without the math fogging up.

Random variables and distributions

A random variable X is a quantity whose value depends on chance. Its distribution describes which values are likely. Two flavors:

  • Discrete: X takes values from a finite list. Distribution is a probability mass function .
  • Continuous: X takes values from a real range. Distribution is a probability density function where .

Robotics uses continuous variables almost exclusively (position, velocity, orientation are all continuous).

Mean and variance — the two-number summary

Mean is the "expected value" — where the distribution is centered. Variance is how spread out it is. Standard deviation σ is just √(variance) and has the same units as X.

For a Gaussian (the default in robotics), the entire distribution is described by μ and σ². No more parameters needed. That's why Kalman filters track only mean + covariance.

The Gaussian (a.k.a. normal)

Bell-shaped curve, centered at μ, with width σ. Three reasons it dominates robotics:

  1. Central limit theorem: sums of many independent random variables tend to Gaussian. Sensor noise from many small sources ≈ Gaussian.
  2. Closed under linear ops: applying to a Gaussian gives a Gaussian. Multiplying two Gaussians (e.g. combining a prior with a likelihood) gives a Gaussian.
  3. Computational simplicity: track just μ and σ² (or in higher dimensions, μ and Σ). Everything reduces to matrix algebra.

Multivariate Gaussian — the workhorse

For an n-dimensional state X, the multivariate Gaussian is parameterized by mean vector μ ∈ ℝⁿ and covariance matrix Σ ∈ ℝⁿˣⁿ:

The covariance matrix Σ tells you:

  • Diagonal entries : variance of each state dimension.
  • Off-diagonal entries : how dimensions co-vary. Positive = move together; negative = opposite; zero = uncorrelated.

Geometrically, the covariance matrix's eigenvectors are the principal axes of the uncertainty ellipsoid; the eigenvalues are its squared semi-axes. A "narrow ellipsoid" means tight estimate; a "wide ellipsoid" means uncertain.

Bayes' rule — the engine of inference

Read out loud: "the probability of A given B equals the probability of B given A, times the prior probability of A, divided by a normalization." In estimation we use Bayes for: given my observation, what's the posterior over my state?

  • — the prior: belief before the measurement
  • — the likelihood: how well does state x explain measurement z?
  • — the posterior: updated belief

Every state estimator (Kalman, EKF, particle filter) is a tractable approximation of this update.

Independence and conditional independence

Two random variables are independent if . They're conditionally independent given Z if . The second is what makes graph-based SLAM tractable: given the robot's pose, two landmark observations are conditionally independent, so the joint factors into a graph of small terms.

The four operations you'll do constantly

1. Multiply two Gaussians (sensor fusion)

Two independent Gaussian beliefs over the same state combine into one Gaussian. In 1D:

The posterior is sharper than either input (smaller variance) and weighted toward the more confident measurement.

2. Linearize and propagate

Pass a Gaussian through a linear function: . The result is Gaussian with:

This is what the prediction step of every Kalman filter does. For nonlinear functions, EKF linearizes (uses Jacobians) and applies the same formula.

3. Marginalize

If you have a joint Gaussian over (x, y) and want just x, drop y by reading off the corresponding mean and covariance blocks. Easy. Marginalizing in non-Gaussian distributions is generally hard.

4. Condition

If you have a joint over (x, y) and observe y = y₀, the conditional distribution of x given y₀ is also Gaussian, with closed-form formulas. This is the update step of Kalman filters in disguise.

Common probability mistakes in robotics code

  • Treating the inverse of σ as the right "weight." Use (the precision) when combining; appears nowhere natural.
  • Mixing standard deviation and variance. A "1 cm noise" sensor has σ = 0.01 m, σ² = 0.0001 m². Many bugs come from squaring or not squaring at the wrong place.
  • Adding standard deviations. Variance adds: , not σ. The standard deviation of the sum is .
  • Trusting Gaussianity. Real sensor noise has heavy tails (outliers). Robust estimators (Huber loss, RANSAC) exist for a reason.

What to read next

  • Probabilistic Robotics, Thrun et al. — Chapter 2 is the cleanest motion-/measurement-model derivation in a textbook anywhere.
  • 3Blue1Brown's "Bayes Theorem" video — visual intuition.
  • Roger Labbe's Kalman and Bayesian Filters in Python — free, code-first; works through the math one operation at a time.

Exercise

Two GPS sensors give location estimates: sensor A says you're at x = 10 ± 2 m, sensor B says x = 8 ± 1 m. Fuse them. (Use the multiplication formula above.) Then simulate 1000 noisy GPS readings and verify your formula matches the empirical mean and variance. This 30-minute Python exercise is the smallest Kalman filter in the world.

Next

Calculus for robots — derivatives, gradients, and the Jacobian (the matrix-valued partial derivative that connects joint velocities to end-effector velocities).

Comments

    Sign in to post a comment.