Task and motion planning (TAMP)
Combining discrete logic (what to do) with continuous motion (how to move). The frontier of long-horizon autonomy — and the architecture behind 'put the dish in the dishwasher.'
RRT plans how to move an arm. PDDL plans which tasks to do. Real autonomy needs both: "to put the dish in the dishwasher, you must first open the dishwasher, which requires picking up nothing while reaching the handle, which requires standing in a clear pose..." Task And Motion Planning is the field that joins them. Hard problem; rapid progress; production-ready for narrow domains in 2026.
The two halves
Task planning (the what)
Discrete decisions: which actions in which order. Classically encoded as PDDL (Planning Domain Definition Language):
(:action pick-up
:parameters (?obj ?from)
:precondition (and (at robot ?from) (clear ?obj) (not (holding robot)))
:effect (and (holding ?obj) (not (clear ?obj))))
Solvers (Fast Downward, OPTIC) find action sequences satisfying preconditions and reaching goals. Classical AI; 50 years of research.
Motion planning (the how)
Continuous trajectories: collision-free, dynamically feasible. RRT, trajectory optimization, MPC. Covered in earlier lessons.
The marriage problem: PDDL preconditions like "the robot can pick up X from Y" require knowing whether a feasible motion exists from the current state to a grasp pose. That's a motion-planning query inside the task plan.
Why naive interleaving fails
Easy approach: task planner picks an action; motion planner checks if it's feasible; if not, ask task planner for a different action.
Problems:
- Combinatorial explosion: failed motion-plan attempts are expensive; checking feasibility of every action × every parameterization blows up fast.
- Geometric reasoning is hidden: "the cup is unreachable from this base position" needs to be discovered by failed plans, not deduced.
- Continuous parameters: where exactly to grasp the cup? Where to place the gripper? PDDL doesn't natively reason about real numbers.
The TAMP frameworks
Several flavors exist; all aim to interleave task and motion intelligently.
1. Hierarchical (HTN-based)
Decompose tasks into sub-tasks; each sub-task has motion-feasibility constraints. Plan top-down, refine bottom-up. Classical pattern; clear structure but limited generality.
2. Sample-based TAMP (PDDLStream, LGP)
Treat continuous parameters as samples. Stream feasible grasp poses, placements, etc. Search over discrete action sequences while sampling continuous parameters lazily.
Production-mature: PDDLStream (Caelan Garrett) is the open-source standard. Used in research and limited industrial deployments.
3. Logic-Geometric Programming (LGP, Toussaint)
Encode the entire problem as a single optimization over both discrete decision variables (which action) and continuous variables (poses, trajectories). Solve as a tree-structured optimization.
More monolithic; harder to scale but cleaner mathematical formulation.
4. Learning-augmented TAMP
Learn heuristics for feasibility checking, action selection, and parameter sampling from past plans. Recent direction; promising in research, not yet production-default.
The "put the dish in the dishwasher" example
Consider this 30-second task. TAMP must reason about:
- Sequence: open dishwasher, lift dish, place dish, close dishwasher. Wrong order → fail.
- Geometry: where to stand to reach the dishwasher. Where to grip the dish. Where in the rack to place it.
- Constraints: holding dish ≠ free hand for the dishwasher handle. Door must be open before placing.
- Recovery: dish too heavy, pick from a different angle. Door blocked, push base back.
A pure motion planner can't sequence these. A pure PDDL planner can't reason about whether you can physically reach the handle from where you're standing. TAMP is the merge.
Two extremes of approach
Symbolic-first
Task planner produces a candidate sequence. Motion planner verifies each action; reports failures back. Re-plan. Mostly classical AI.
Strengths: scalable to long sequences; clean separation of concerns. Weaknesses: motion-plan failures cause backtracking; geometric reasoning is reactive.
Geometric-first
Sample candidate placements / grasps / paths densely. Filter to geometrically-feasible ones. Run the task planner over feasible parameterizations.
Strengths: finds solutions even when symbolic constraints are tricky. Weaknesses: heavy sampling cost; doesn't scale to many objects.
Modern frameworks (PDDLStream) interleave both intelligently.
The 2026 production reality
Industrial applications use simplified TAMP:
- Hand-coded state machines: the most common solution for narrow domains. Each transition is geometrically reasoned about by an engineer; runtime just executes.
- Behavior trees: more flexible than state machines; mixable with motion planners. Powers Nav2's autonomy.
- PDDLStream + MoveIt: research and demo deployments. Solves multi-step pick-and-place reliably.
- VLA-as-planner: vision-language models generate plans; underlying motion planning is delegated to MoveIt or similar. Emerging in 2025–26.
The VLA / LLM angle
Recent work (SayCan, VoxPoser, Code-as-Policies) uses LLMs to generate multi-step plans from natural language. The LLM plays the role of the task planner; classical motion planners execute each step.
Differences from classical TAMP:
- LLM doesn't know about geometric feasibility — has to be told via affordances or grounded skills.
- Less guarantee than PDDL; more flexibility over open-ended tasks.
- Quality depends on prompt engineering and grounding.
Currently the easiest path to "household robot does this thing." Not yet as reliable as classical TAMP for known domains.
What still doesn't work
- Long horizons (10+ steps) in unstructured environments: error accumulates; planning takes minutes.
- Tight tolerances combined with reasoning: peg-in-hole assembly with multi-step setup. Each subtask works; orchestrating 8 of them with retry is brittle.
- Safety-critical multi-agent: TAMP for a single robot is hard; for multi-robot fleets in shared workspaces, an open problem.
- Online replanning: TAMP plans take seconds; a person walks into the kitchen, plan must update. State-of-the-art is improving but not real-time yet.
How to get into TAMP
- Read Caelan Garrett's PDDLStream papers. The clearest expositions of the field.
- Try PDDLStream on a simulated kitchen environment. The repo has examples.
- Build a hand-coded state-machine version of the same task. Note where the boundaries blur.
- Try the VLA-as-planner approach (Code-as-Policies tutorial). Compare planning quality.
Where the field is heading
The 2024+ research trend is hybrid: VLM/LLM at the top of the stack for flexible task understanding; PDDLStream-style symbolic-geometric reasoning underneath; MoveIt-class motion planning at the bottom. Each layer adds different capabilities.
Five years from now, "household robot autonomy" will likely be this stack. Today, it's research-grade.
Exercise
Set up PDDLStream's kitchen example. Define the task: clear all dishes from the counter to the sink. Watch the planner sequence pick-up + transit + place actions. Modify the kitchen layout; re-run. The first time the system replans around a new obstacle, TAMP feels like real intelligence.
Next
Behavior trees — the simpler structure that's eaten state machines for most production autonomy.
Comments
Sign in to post a comment.