Published2026-04-28·~16 min

Humanoid whole-body control and retargeting

The frontier of bipedal motion: balancing, retargeting human motion-capture data, and the production stacks running on Optimus, NEO, G1, and Atlas in 2026.

by RobotForge

#frontiers#humanoid#whole-body

In 2024 the humanoid robotics field exploded. Tesla Optimus, 1X NEO, Unitree G1, Figure 02, Boston Dynamics electric Atlas — five serious humanoids in production or near it. The control architecture is converging on three pillars: whole-body control, human-motion retargeting, and RL fine-tuning. Here's how the stack works in 2026.

The three-pillar architecture

Pillar 1: Whole-body controller

Covered in the Mobile & Legged track. A QP-based controller that takes high-level task goals (foot placement, hand position, base orientation) and outputs joint torques across 30+ DOF.

Production teams (Boston Dynamics, Apptronik, Sanctuary) have spent decades on this. Open-source equivalents: TSID, Pinocchio + ProxQP, OCS2. The math is well-understood; the engineering is hard.

Pillar 2: Human-motion retargeting

Take motion capture data of a human doing a task; convert to humanoid joint trajectories that respect the robot's kinematic limits and dynamics.

Why: humans are the source of training data. Demonstrating in MoCap suits + retargeting is far cheaper than teleoperating each humanoid through every task.

The standard pipeline:

Capture human motion (Vicon mocap, Xsens IMU suits, or smartphone-based pose estimation).
Extract human skeleton joint angles + 3D positions.
Solve an optimization that matches the human pose on the robot's skeleton, subject to robot joint limits and balance constraints.
Smooth the trajectory; check feasibility on the robot.

Retargeting is much easier than designing humanoid motion from scratch. The hard part is the optimization step — different proportions, different joint limits, different balance dynamics. Modern approaches use neural networks (PHC, AMP) to learn this mapping from data.

Pillar 3: RL fine-tuning

Retargeted motions are kinematically correct but not dynamically optimal. RL fine-tunes them: takes the retargeted trajectory as a reference; trains a policy to track it while maintaining balance and robustness to disturbances.

Recipe (per recent papers):

Reference motion (retargeted from human MoCap).
PPO with reward = imitation + balance + smoothness penalties.
Domain randomization (mass, friction, motor dynamics).
1–10 billion sim steps; 1–7 days on GPU cluster.
Deploy zero-shot to hardware.

The result: humanoids that walk, dance, do parkour-lite, recover from pushes — all from a single training framework.

What's actually running in production

Robot	Approach
Boston Dynamics electric Atlas	Whole-body MPC + RL fine-tuning per skill. Hybrid stack mature.
Tesla Optimus	End-to-end neural policies on top of FSD-style perception. RL-heavy.
Figure 02 / 03	Helix VLA on top of whole-body controller; OpenAI partnership for high-level reasoning.
1X NEO	Compliant tendon-driven; soft-control RL.
Unitree G1, H1	RL gaits (HumanoidGym-style); manipulation under development.

The technology is converging; differences are mostly hardware (rigid vs compliant, electric vs hydraulic) and training-data sources.

The 2024–26 wave of papers

HumanPlus (Stanford 2024): human → humanoid teleoperation pipeline; retargets human motion in real-time.
OmniH2O (Stanford 2024): robust whole-body teleoperation across many humanoids.
ASAP (Stanford / NVIDIA 2024): dynamic skill (parkour) on humanoid via RL + delta-action correction.
HumanoidGym (Tsinghua 2024): open-source training framework for humanoid RL. Used widely.
Helix (Figure 2025): dual-system VLA with both fast and slow networks.

The pace is staggering. New papers monthly; production deployments quarterly.

The teleoperation interface

Production humanoid data collection in 2026:

Apple Vision Pro: hands + head tracking. The recent darling.
Custom MoCap suits: Xsens, OptiTrack. Higher fidelity, more expensive.
Phone teleop: rough but accessible.
Bimanual GELLO / ALOHA: arms only; common for upper-body tasks.

Vision Pro + dedicated hand-tracking gloves is the 2026 consensus for full-body humanoid teleop. The cost has dropped from $50k MoCap rigs to $4k consumer hardware.

What's hard

Hand dexterity: humanoid hands are getting better but still well below humans. Tying shoelaces, threading needles — not yet.
Continuous operation: 24/7 deployment is nascent. Battery life, joint wear, software reliability all need decades of engineering more.
Heavy manipulation: lifting more than 5–10 kg with humanoid arms is risky. Joint heating, structural compliance.
Recovery from falls: pushing a humanoid still results in falls about 30% of the time. Get-up policies exist; they're slow.
Multi-task generalization: a humanoid trained to fold laundry doesn't automatically wash dishes. Per-task fine-tuning is the norm.

The career angle

Humanoid robotics in 2026 is hiring at boom levels. Compensation is competitive with AI labs; the impact is more tangible than abstract ML. Roles split:

Whole-body control engineers: classical robotics math.
RL researchers: training infrastructure and policy design.
Hardware engineers: actuators, joints, structural.
Data / teleop engineers: collection rigs, retargeting, scaling demonstration data.
Safety / systems engineers: deploying humanoids without injuring humans.

If you're entering robotics in 2026, humanoid is one of the highest-leverage subfields by hiring intensity.

Open-source paths in

Unitree G1 ($16k): cheapest production humanoid platform; RL recipes work; community growing.
HumanoidGym: train policies in simulation; deploy to G1 / H1 zero-shot.
K-Scale Bolt: open quadruped + manipulator combos; lower bar than full humanoid.
DIY humanoids: $500–$5000 print-and-build kits (Poppy, InMoov). Limited capability but real learning.

Exercise

Set up HumanoidGym. Train PPO on the G1 walking task overnight on a GPU. Watch the policy emerge. Then try retargeting a snippet of human MoCap data and see if the RL policy can imitate it. The first time the simulated G1 imitates your human walk with reasonable fidelity, the field's progress feels intimate.

Teleoperation rigs — the hardware behind the data that feeds these systems.

Teleoperation rigs: ALOHA, GELLO, phone-teleop