Published2026-04-26·~14 min

Trajectory optimization: direct collocation and shooting

Planning via continuous optimization. The backbone of modern MPC, long-horizon arm motion, and aggressive drone trajectories. Direct collocation, shooting, and the practical patterns that converge.

by RobotForge

#planning#trajectory-optimization#mpc

Sampling-based planners produce a sequence of waypoints. Trajectory optimization produces a smooth, dynamically-feasible motion that minimizes a cost function. The math is heavier; the output is much better. Behind every aggressive drone, every smooth-walking quadruped, and most modern industrial arms, there's a trajectory optimizer.

The setup

Find a state trajectory x(t) and control trajectory u(t) over a time horizon [0, T] that minimize:

min  ∫₀ᵀ L(x(t), u(t)) dt + Φ(x(T))
x,u

subject to:
  ẋ = f(x, u)         # dynamics
  g(x, u) ≤ 0         # state and control limits
  x(0) = x₀          # initial condition
  x(T) ∈ X_goal      # goal set

L is the running cost (e.g., control effort, deviation from reference). \Phi is the terminal cost (penalty for ending far from goal). f is the system dynamics.

The optimization is infinite-dimensional: trajectories are functions of time. The trick is to discretize and solve a finite-dimensional NLP.

Two main discretization approaches

1. Shooting

Decision variables: only the controls u_0, u_1, ..., u_{N-1} at N time steps. The state trajectory comes from simulating forward through the dynamics.

Single shooting: simulate the entire trajectory from u. Sensitivity grows exponentially with horizon (if f is unstable).
Multiple shooting: divide the trajectory into segments; introduce intermediate states; add continuity constraints. More robust to instability.

Strengths: small NLP (only controls as decision variables); natural for unstable systems with multiple shooting.

Weaknesses: long horizons get expensive; sensitive to initial guess.

2. Direct collocation

Decision variables: both states x_0, ..., x_N and controls u_0, ..., u_{N-1}. The dynamics are imposed as constraints, not as forward simulation.

x_{k+1} = x_k + dt * f(x_k, u_k)   # explicit Euler — or RK4, or implicit

Imposed at each timestep as x_{k+1} - x_k - dt \cdot f = 0.

Strengths: large NLP, but solvers like IPOPT, SNOPT exploit sparsity efficiently. Robust to nonlinear dynamics. Better convergence for long horizons.

Weaknesses: more decision variables; implementations are heavier.

Production default: direct collocation. Used by Drake's trajectory optimizer, MIT's Underactuated Robotics course, and most aggressive drone planners.

The cost function

Standard quadratic running cost:

L(x, u) = (x - x_ref)ᵀ Q (x - x_ref) + uᵀ R u

Q penalizes state deviation from a reference trajectory; R penalizes control effort. Same Q/R weighting as LQR.

Plus terminal cost:

Φ(x_N) = (x_N - x_goal)ᵀ Q_f (x_N - x_goal)

Pulls the trajectory's endpoint to the goal.

For non-quadratic objectives (minimum time, minimum jerk), modify accordingly. The optimizer doesn't care.

Constraints

State limits: x_{\min} \le x_k \le x_{\max}. Joint angles, velocity bounds.
Control limits: u_{\min} \le u_k \le u_{\max}. Torque, thrust bounds.
Obstacle avoidance: d(x_k, \text{obstacle}) \ge 0. The hard one.
Boundary conditions: initial / terminal states fixed or constrained to a set.

Obstacle avoidance is non-convex (hence non-trivial). Workarounds:

Signed distance field as a constraint.
Decomposition into convex regions; force trajectory to stay in one.
Penalty in the cost function (soft constraint).

Solvers

Solver	License	Best at
IPOPT	EPL	General NLP; the open-source default
SNOPT	Commercial	Sparse SQP; aerospace standard
OSQP	Apache 2.0	Quadratic programs; very fast for linear MPC
CasADi	LGPL	Modeling layer; calls IPOPT/SNOPT under the hood
acados	BSD	Embedded MPC; auto-generates C code

Most academic and hobbyist work uses CasADi + IPOPT. Embedded production uses acados.

The 30-line CasADi example

import casadi as ca

opti = ca.Opti()
N = 50  # horizon

x = opti.variable(2, N+1)  # state: position, velocity
u = opti.variable(1, N)    # control: force
dt = 0.1

# Dynamics constraints (direct collocation)
for k in range(N):
    x_dot = ca.vertcat(x[1, k], u[0, k])  # x_dot = [v; F]
    opti.subject_to(x[:, k+1] == x[:, k] + dt * x_dot)

# Initial and goal
opti.subject_to(x[:, 0] == [0, 0])
opti.subject_to(x[:, -1] == [1, 0])

# Control limits
opti.subject_to(opti.bounded(-2, u, 2))

# Cost
opti.minimize(ca.sumsqr(u))

opti.solver('ipopt')
sol = opti.solve()

Thirty lines. Plug in real dynamics; it scales. CasADi is the bridge between mathematical formulation and a working solver.

What can go wrong

Non-convergence: bad initial guess for the trajectory. Always warm-start from a previous solution or a feasible heuristic.
Local minima: trajectory optimization is non-convex in general. Try multiple initial guesses or use sampling-based planning to seed.
Long horizons: NLP size grows linearly with horizon, but solver time grows worse. Hierarchical decomposition (plan coarse first, optimize segments) helps.
Hard contact dynamics: legged robots, peg-in-hole. Standard NLP solvers struggle; specialized methods (DDP, iLQR, PHASE) help.

Where this lives in production

Drone trajectory generation: minimum-snap polynomials (Mellinger 2011) — closed-form trajectory optimization for differentially-flat systems.
Quadruped locomotion: trajectory optimization for the center of mass + foot placements; whole-body control fills in the joints.
Industrial arms: MoveIt's TrajOpt plugin solves trajectory optimization for arm motion with collision constraints.
Autonomous driving: MPC running at 10–50 Hz is trajectory optimization with a receding horizon.

Exercise

In CasADi, set up trajectory optimization for a cart-pole swing-up: pendulum down → up. Use direct collocation, IPOPT solver. Watch the optimizer find a controller from scratch. Now add a torque limit; watch the trajectory change. Trajectory optimization is the kind of thing you do once, by hand, and then never want to write code without it again.

CHOMP and STOMP — gradient-based and sampling-based trajectory optimizers tailored for arm motion in cluttered scenes.

← Previous

RRT, RRT*, and variants

CHOMP and STOMP