Probability and statistics for state estimation
Bayes, Gaussians, and the covariance matrix. The math behind every Kalman filter, particle filter, and SLAM paper — distilled to the parts a roboticist actually uses.
Half of robotics math is probability. The good news: state estimation needs a thin slice of probability theory — Gaussians, Bayes, and what a covariance matrix represents. Master those and you can read every Kalman/SLAM paper without the math fogging up.
Random variables and distributions
A random variable X is a quantity whose value depends on chance. Its distribution describes which values are likely. Two flavors:
- Discrete: X takes values from a finite list. Distribution is a probability mass function .
- Continuous: X takes values from a real range. Distribution is a probability density function where .
Robotics uses continuous variables almost exclusively (position, velocity, orientation are all continuous).
Mean and variance — the two-number summary
Mean is the "expected value" — where the distribution is centered. Variance is how spread out it is. Standard deviation σ is just √(variance) and has the same units as X.
For a Gaussian (the default in robotics), the entire distribution is described by μ and σ². No more parameters needed. That's why Kalman filters track only mean + covariance.
The Gaussian (a.k.a. normal)
Bell-shaped curve, centered at μ, with width σ. Three reasons it dominates robotics:
- Central limit theorem: sums of many independent random variables tend to Gaussian. Sensor noise from many small sources ≈ Gaussian.
- Closed under linear ops: applying to a Gaussian gives a Gaussian. Multiplying two Gaussians (e.g. combining a prior with a likelihood) gives a Gaussian.
- Computational simplicity: track just μ and σ² (or in higher dimensions, μ and Σ). Everything reduces to matrix algebra.
Multivariate Gaussian — the workhorse
For an n-dimensional state X, the multivariate Gaussian is parameterized by mean vector μ ∈ ℝⁿ and covariance matrix Σ ∈ ℝⁿˣⁿ:
The covariance matrix Σ tells you:
- Diagonal entries : variance of each state dimension.
- Off-diagonal entries : how dimensions co-vary. Positive = move together; negative = opposite; zero = uncorrelated.
Geometrically, the covariance matrix's eigenvectors are the principal axes of the uncertainty ellipsoid; the eigenvalues are its squared semi-axes. A "narrow ellipsoid" means tight estimate; a "wide ellipsoid" means uncertain.
Bayes' rule — the engine of inference
Read out loud: "the probability of A given B equals the probability of B given A, times the prior probability of A, divided by a normalization." In estimation we use Bayes for: given my observation, what's the posterior over my state?
- — the prior: belief before the measurement
- — the likelihood: how well does state x explain measurement z?
- — the posterior: updated belief
Every state estimator (Kalman, EKF, particle filter) is a tractable approximation of this update.
Independence and conditional independence
Two random variables are independent if . They're conditionally independent given Z if . The second is what makes graph-based SLAM tractable: given the robot's pose, two landmark observations are conditionally independent, so the joint factors into a graph of small terms.
The four operations you'll do constantly
1. Multiply two Gaussians (sensor fusion)
Two independent Gaussian beliefs over the same state combine into one Gaussian. In 1D:
The posterior is sharper than either input (smaller variance) and weighted toward the more confident measurement.
2. Linearize and propagate
Pass a Gaussian through a linear function: . The result is Gaussian with:
This is what the prediction step of every Kalman filter does. For nonlinear functions, EKF linearizes (uses Jacobians) and applies the same formula.
3. Marginalize
If you have a joint Gaussian over (x, y) and want just x, drop y by reading off the corresponding mean and covariance blocks. Easy. Marginalizing in non-Gaussian distributions is generally hard.
4. Condition
If you have a joint over (x, y) and observe y = y₀, the conditional distribution of x given y₀ is also Gaussian, with closed-form formulas. This is the update step of Kalman filters in disguise.
Common probability mistakes in robotics code
- Treating the inverse of σ as the right "weight." Use (the precision) when combining; appears nowhere natural.
- Mixing standard deviation and variance. A "1 cm noise" sensor has σ = 0.01 m, σ² = 0.0001 m². Many bugs come from squaring or not squaring at the wrong place.
- Adding standard deviations. Variance adds: , not σ. The standard deviation of the sum is .
- Trusting Gaussianity. Real sensor noise has heavy tails (outliers). Robust estimators (Huber loss, RANSAC) exist for a reason.
What to read next
- Probabilistic Robotics, Thrun et al. — Chapter 2 is the cleanest motion-/measurement-model derivation in a textbook anywhere.
- 3Blue1Brown's "Bayes Theorem" video — visual intuition.
- Roger Labbe's Kalman and Bayesian Filters in Python — free, code-first; works through the math one operation at a time.
Exercise
Two GPS sensors give location estimates: sensor A says you're at x = 10 ± 2 m, sensor B says x = 8 ± 1 m. Fuse them. (Use the multiplication formula above.) Then simulate 1000 noisy GPS readings and verify your formula matches the empirical mean and variance. This 30-minute Python exercise is the smallest Kalman filter in the world.
Next
Calculus for robots — derivatives, gradients, and the Jacobian (the matrix-valued partial derivative that connects joint velocities to end-effector velocities).
Comments
Sign in to post a comment.