Planning under uncertainty
POMDPs, belief-space planning, and the algorithms for robots that can't see everything. Why most robot tasks are partially observable, and the practical methods that handle it.
Most planning theory assumes the robot knows its state perfectly. Real robots don't — sensors are noisy, objects move when you're not looking, GPS is approximate. Planning under uncertainty is the field that handles this honestly. The math is heavy, the tractable cases are narrow, and the production tools are still maturing — but the underlying ideas show up everywhere from autonomous driving safety to manipulation under occlusion.
The framework: POMDP
A Partially Observable Markov Decision Process. Like an MDP (covered in the RL lesson) but the agent doesn't observe the state directly — only an observation drawn from the state's likelihood.
Tuple (S, A, T, R, \Omega, O):
- S: states (truth).
- A: actions.
- T(s' | s, a): transition probabilities.
- R(s, a): reward.
- \Omega: observations.
- O(o | s, a): observation likelihood.
The agent never sees s directly; it sees o, infers a belief b(s) = posterior distribution over states.
Belief-space planning
Plan in the space of beliefs, not states. Each "node" in the search is a belief distribution; transitions update the belief via Bayes rule.
Optimal POMDP planning is undecidable in general; tractable in narrow cases. Key insight: the value function over beliefs is piecewise-linear and convex (Sondik 1971). Approximate by maintaining a finite set of α-vectors.
The algorithms
Exact methods (small problems only)
- Value iteration over α-vectors: tractable for problems with ~10–20 states.
- Witness algorithm, Incremental Pruning: classical exact methods. Don't scale.
Approximate methods (production)
- Point-based methods (PBVI, HSVI, SARSOP): sample beliefs; back up values at the samples. Scales to thousands of states.
- Monte Carlo Tree Search (POMCP, DESPOT): sample-based planning over belief trees. Best for continuous-state problems.
- QMDP: planning that ignores future information gain; treats current belief as a probability distribution and acts greedily. Fast but suboptimal.
Continuous-state belief-space planning
- iLQG over beliefs: extend iLQR (covered briefly in trajectory-optimization lesson) to belief states. Gaussian beliefs make it tractable.
- Belief-space RRT (BRRT): extend RRT to plan over beliefs. Used in research; rare in production.
Why POMDPs are hard
- Belief space is continuous: even a discrete-state POMDP has a continuous (simplex) belief space.
- Planning horizon matters: the agent might choose to take an action just to gain information (active perception), not to make progress on the goal directly. Encoding "I should look around" requires multi-step planning.
- Branching factor explodes: at each step, every observation leads to a different belief; the search tree branches by |\Omega|.
- Value functions are weird: piecewise-linear, convex, infinite-horizon discounted; all the elegance of MDP value iteration goes away.
Information-gathering actions
The most distinctive feature of POMDP solutions: the optimal policy often includes actions that gain information rather than directly reducing distance to the goal.
Examples:
- Active SLAM: drive to a vantage point that disambiguates between two map hypotheses.
- Object pose disambiguation: rotate the object in the gripper to confirm orientation before placing.
- Grasp verification: lift slightly and check tactile signal before committing to transit.
Classical motion planners can't choose "look first" actions — they don't model uncertainty. POMDPs can.
Production patterns
Full POMDP planning is rare in production. Approximations:
Most-likely-state assumption
Plan as if the most likely state were the truth. Fast; ignores uncertainty quality. Most production robotics does this implicitly.
Bounded uncertainty
Maintain a covariance ellipsoid; plan to be safe under any state inside it. Used in autonomous driving for "is the pedestrian going to step into the road?"
Information-aware MPC
Add an information-gain term to the MPC cost. The optimizer naturally prefers actions that reduce uncertainty when uncertainty is high.
Chance constraints
Replace hard constraints with probabilistic ones: "P(collision) ≤ 0.01." Handles uncertainty implicitly.
Real-world examples
- Autonomous driving: tracking pedestrians (uncertain trajectories); planning conservative paths. Recent papers use chance-constrained MPC and POMDP-inspired planners.
- Mobile robot localization: AMCL is a particle filter; planning over the particle distribution is implicitly POMDP.
- Object search: "find my keys in the kitchen." A canonical POMDP problem; classical planners don't naturally encode the multi-step search behavior.
- Surgical robotics: needle insertion under tissue uncertainty. POMDPs and chance-constrained planning here.
- Underwater autonomy: localization is poor; planning must plan to gather information.
The 2026 toolkit
- POMDPs.jl (Julia): the most polished open-source POMDP framework. Solvers, environments, examples.
- POMDPy (Python): smaller; for prototyping.
- OPSPlan: optimization over chance-constrained MPC. Used in robotics research.
- Custom-coded: most production POMDP-style planning is custom-coded with the specific problem structure exploited.
The deep-RL angle
Deep RL implicitly handles partial observability via recurrent networks (LSTM, GRU). Train a policy that takes observation history as input; the network internally maintains a hidden belief state.
For many practical problems (manipulation under occlusion, social navigation), recurrent deep RL outperforms classical POMDP solvers — at the cost of provability and sample efficiency.
Modern hybrid: classical state estimator (Kalman, particle filter) for perception; deep policy for action selection. The policy receives the estimator's belief; outputs actions.
What you'll actually do
For 95% of robotics work, you won't solve POMDPs. You'll:
- Design state estimators that quantify uncertainty.
- Use those uncertainty estimates as inputs to controllers and planners.
- Add safety margins / chance constraints / information-gain terms when uncertainty matters.
- Train deep RL policies that handle partial observability implicitly.
Full POMDP solvers stay in research, niche industrial applications, and surgical robotics.
Exercise
Set up a 4×4 grid-world POMDP: robot doesn't know its location; can move and observe nearby walls. Implement a particle-filter belief tracker + greedy QMDP planner. Watch the robot localize itself by deliberately moving to disambiguating cells. The "goal-directed information seeking" behavior is what POMDPs uniquely encode.
That's the Motion Planning track done
You've covered the full progression: configuration space → graph search → PRM → RRT/RRT* → trajectory optimization → CHOMP/STOMP → MPC-as-planner → TAMP → behavior trees → planning under uncertainty. With Foundations, Kinematics, Control, ROS 2, Manipulation, Learning, SLAM, Perception, and now Motion Planning, you have nine complete tracks covering the entirety of canonical robotics knowledge. The remaining tracks (Mobile/Legged, Simulators, Embedded, Frontiers) are platform and application breadth.
Comments
Sign in to post a comment.