Published2026-04-26·~14 min

Whole-body control for humanoids

Hierarchical task control, friction cone constraints, the QP-based formulation every humanoid team uses. Turning a high-level plan into 25+ DOF of coordinated joint motion.

by RobotForge

#mobile-robots#humanoid#whole-body-control

A humanoid has 25–60 joints. Every action — walking, reaching, balancing while pushed — uses many of them simultaneously. Whole-body control is the framework that coordinates them: solve a quadratic program at every control tick that satisfies multiple competing tasks (track footstep plan, maintain balance, reach with the arm, stay within joint limits) using all the joints jointly. The pattern under every humanoid team's stack.

The setup

At every tick, the robot has:

Current joint positions q, velocities \dot{q}.
A list of tasks: each task wants the robot to do something (track CoM trajectory, place left foot here, point gaze there, keep right hand at this pose).
Constraints: joint limits, friction cone constraints, contact non-slip, dynamic feasibility.

Solve for joint accelerations \ddot{q} (and thus torques \tau) that best satisfy the tasks subject to the constraints.

The quadratic program

min  Σ_i  w_i ‖J_i q̈ + J̇_i q̇ − ẍ_i,des‖²
q̈, τ, λ

subject to:
  M q̈ + h(q, q̇) = S^T τ + J_c^T λ        # dynamics
  J_c q̈ + J̇_c q̇ = 0                     # contacts: no foot motion
  λ ∈ friction cone                          # contact forces are realistic
  q̈, τ, λ within bounds                      # actuator + safety

Variables: joint accelerations, torques, contact forces.

Constraints encode physics: rigid-body dynamics, sticking contacts, friction at the feet, actuator limits.

Cost: weighted sum of task tracking errors. Each task has a Jacobian J_i mapping joint motion to task-space motion.

Hierarchical (priority-based) tasks

Some tasks are non-negotiable (don't fall over); some are nice-to-have (keep gaze on the target). Use task priorities:

Strict priority: solve top-priority tasks first; lower-priority tasks fill the null space without disturbing higher ones.
Soft priority: weighted sum with very different weights. Approximates strict priority when weight ratios are large.

Production humanoids use a stack of strict priorities for safety + reachability tasks, soft for the comfort layer.

Friction cones (the contact gotcha)

For each foot in contact with the ground, the contact force \lambda must be:

Pushing into the ground (normal component positive).
Within the friction cone (tangential component bounded by \mu \cdot \lambda_{\text{normal}}).

The friction cone is a quadratic constraint. Standard QP solvers (OSQP) handle it via linear approximation (polyhedral cones with 4 or 8 facets).

If the QP can't satisfy all constraints at once — typical when a task wants to push the friction cone outside its bounds — the lowest-priority tasks get sacrificed. Properly tuned, this means "comfort goals get violated; safety doesn't."

The complete framework: TSID, Pinocchio, OCS2

Open-source frameworks that implement whole-body QP:

Framework	Strengths
TSID (LAAS)	Production-tested; used by HRP-4, Pyrene
Pinocchio + ProxQP	Build your own; flexible
OCS2 (ETH)	MPC with whole-body — used by ANYmal, Digit research
Crocoddyl	DDP-based; multi-contact friendly

For ROS-based humanoid work, TSID + Pinocchio is the practical default. Build setup is a half-day; the QP solves at 1 kHz on commodity CPUs.

Solver speed

A typical humanoid whole-body QP has:

~30 joint accelerations.
~30 joint torques.
~24 contact forces (4 corners per foot, 3 components).
~50 inequality constraints.

OSQP solves it in 1–2 ms. Real-time-feasible at 500 Hz. Most humanoid teams run whole-body QP at 250–1000 Hz.

The MPC variant

Plain whole-body QP optimizes over the next instant only. Whole-body MPC plans over a horizon (10–50 steps). More foresight; more compute. iLQR-based formulations (OCS2, Crocoddyl) handle this efficiently.

Production teams typically run:

High-level footstep planner at 5–10 Hz.
Whole-body MPC over ~1 second horizon at 50–100 Hz.
Inverse-dynamics QP at 500–1000 Hz, tracking the MPC's plan.

Three layers, three timescales. Each one bigger and slower than the next.

What RL changes

Reinforcement learning replaces the lower layers with a learned policy. The Atlas backflip uses RL; Optimus walking uses RL; the 2024 humanoid wave is mostly RL.

But: the high-level planner (footstep, hand placement) often remains classical. The frontier in 2026 is hybrid:

Classical footstep + ZMP planner at 5 Hz.
RL policy trained against the classical plan, taking over the 500 Hz layer.

Best of both: RL handles complex contact dynamics; classical math gives the high-level structure.

Common gotchas

Contact model mismatch: simulator and real-robot contact differ. Train with domain randomization.
Compliance unmodeled: real harmonic-drive joints flex. Add a flexible-joint model or use joint torque sensors.
Solver failures: when the QP is infeasible (impossible task set), fall back to a safer reduced task set. Don't crash.
Numerical conditioning: large priority differences can make the QP ill-conditioned. Scale tasks; use double precision.

What humanoid teams actually do

Tesla Optimus (per public papers/talks):

End-to-end neural policy for low-level control.
High-level planner generates footstep + arm trajectories from VLA outputs.
Whole-body QP-style safety layer monitors and modifies in real time.

Boston Dynamics Atlas:

Whole-body MPC for trajectory optimization.
Reinforcement learning for the lowest-level joint tracking.
Same QP-style hierarchy underneath.

The vocabulary is shared even when the implementations differ.

Exercise

In Crocoddyl or Drake, set up a humanoid (Talos or Atlas URDF). Define three tasks: track CoM position, place each foot, and reach with the right hand. Run the QP at 100 Hz; visualize. Add a friction-cone constraint; test by setting the floor friction very low. Watch the robot fail to push hard tasks when slip is unavoidable. The first time the robot bowls itself over because it ignored a friction cone, the framework's value clicks.

Reinforcement-learned gaits — the 2024–26 revolution that replaced hand-tuned controllers across every major quadruped.

← Previous

ZMP, capture point, and the walking cookbook

Reinforcement-learned gaits: the 2024–26 revolution