Published2026-04-26·~14 min

Mobile manipulation: arm + base coordination

An arm on wheels. Whole-body planning, the redundancy that buys you reach, the pitfalls of moving while carrying. The architecture behind every household robot in 2026.

by RobotForge

#manipulation#mobile-manipulation#whole-body

A 6-DOF arm bolted to a stationary table can reach a ~1 m sphere. Mount it on a mobile base and the workspace becomes the entire building. That's mobile manipulation — and it's the architecture behind almost every "household robot" 2026 demo, from Stretch to HSR to humanoid Atlas. The math is straightforward: the arm and base just become a higher-DOF arm. The engineering is everywhere else.

Why combine arm + base

Workspace expansion: an arm with 1 m reach + a 10 m × 10 m room → 100 m² of reachable surface vs. 3 m².
Redundancy: more DOF means the arm can avoid obstacles, joint limits, and singularities by re-positioning the base.
Task hierarchy: the base handles "go to the kitchen"; the arm handles "open the drawer." Each at its natural scale.

The cost: coordination. The arm and base run on different timescales (sub-millimeter vs centimeter), have different actuators, and have very different control loops.

Three control paradigms

1. Sequential (the simplest)

Move the base first, stop, then move the arm. Each is independently controlled. Common in early commercial robots.

Strengths: simple to implement; bugs in one don't affect the other.

Weaknesses: slow; awkward "stop and re-orient" motions; can't smoothly stretch beyond the arm's reach.

2. Joint planning (whole-body)

Treat the base as adding 3 DOF (x, y, θ for diff-drive) on top of the arm's 6–7 DOF. Plan a trajectory in the combined ~10D space.

Strengths: smooth coordinated motion; full redundancy exploited.

Weaknesses: planning is slower (higher dimension); the base moves in a non-holonomic way that planners must respect.

3. Stack-of-tasks

Hierarchical: high-priority tasks (collision avoidance) absorb DOF first; lower-priority (reach the goal) use the remaining null space. Used in industrial mobile manipulators.

Mathematically: solve \min ||J_1 \dot q - \dot x_1||^2 subject to constraints from higher-priority tasks. Stack tasks in order of importance.

Strengths: principled handling of multiple objectives; production-grade.

Weaknesses: harder to debug; gets gnarly when tasks conflict.

The redundancy bonus

An arm + diff-drive base has 9 DOF (6 arm + 3 base) for a 6-DOF end-effector task. That's 3 redundant DOF. Use them for:

Joint-limit avoidance: keep arm joints in the middle of their range by re-positioning the base.
Manipulability maximization: keep the arm in dexterous configurations.
Visual-occlusion avoidance: position the base so the arm doesn't block the wrist camera.
User comfort: orient the base so a human collaborator sees the arm clearly.

The null-space framework (covered in the iterative-IK lesson) implements all of these.

Common pitfalls

The base-stops-arm-stops dance

Sequential control. The robot drives, stops, swings the arm, stops, drives again. Looks robotic, slow, frustrating. Joint planning fixes this.

The slip surprise

The arm's reach is centimeter-precise; the base's odometry drifts. After the base moves 5 m, its position estimate may be off by 5 cm. The arm tries to grab a doorknob 5 cm from where it actually is. Solution: localize against the environment continuously, not just at the start.

Manipulating while moving

Push a heavy cart while walking. Now the contact dynamics affect the base's wheels. Whole-body force control is the correct framework; few platforms support it. Most workarounds are scripted state machines.

Carrying heavy objects

The arm holding a 5 kg load shifts the robot's center of mass. The base's tipping risk goes up. Production: limit base velocity when carrying; lower the arm; consider stability constraints in motion planning.

The 2026 platforms

Hello Robot Stretch 3: telescoping arm + diff-drive base. Optimized for in-home tasks. Open-source-friendly.
Toyota HSR: research platform, omnidirectional base, single arm. Common at conferences.
Boston Dynamics Spot Arm: Spot quadruped + 6-DOF arm. Inspection and door-opening.
Fetch Mobile Manipulator: legacy commercial; still in research labs.
Humanoid platforms: G1, NEO, Optimus — all mobile manipulators with two arms + a humanoid base.

Each has its own challenges; the math above applies to all.

Software stacks

Stack-of-tasks (LAAS-CNRS): research framework for hierarchical control. Used by HSR, PR2.
OCS2: optimal control solver for whole-body MPC. Used by ETH ANYmal + arm research.
MoveIt 2 + Nav2: combine the two ROS stacks. Sequential by default; use whole-body extensions for smoother motion.
Pinocchio + ProxQP: build your own QP-based whole-body controller. Fast and flexible.

The VLA angle

2024+ VLAs (π0, Gemini Robotics) are increasingly trained on mobile-manipulator data. They take pixels + proprioception (arm + base joints), output combined motor commands. This sidesteps the explicit hierarchical control altogether — the network learns the right combinations of arm/base motion from data.

Strengths: no hand-tuning of priorities. Weaknesses: hard to certify; needs lots of demonstration data; doesn't generalize to base types not in training.

The hybrid (classical hierarchy + learned residual) is the production sweet spot in 2026.

The "navigation while carrying" canonical task

Move from kitchen to dining room while holding a tray of glasses. What you need:

Stable carrying pose (arm + tray) that doesn't tip the base.
Speed limit calibrated to glass mass + base inertia.
Smooth base motion (no abrupt velocity changes).
Local replanning if a person crosses the path — without sloshing.
Posture transitions (set down, drive, pick up) at the endpoints.

This task touches every layer: planning, control, perception, safety. It's the test that separates "lab demo" from "deployed robot."

Exercise

In MuJoCo, set up a 6-DOF arm on a diff-drive base. Drive to a target 5 m away while keeping the end-effector level (constant z, level orientation). First implement sequential: drive, stop, level the arm. Then implement joint planning over the combined 9-DOF state. Compare execution time and end-effector smoothness. The smooth version is what production mobile manipulators look like.

Tactile sensing in manipulation — the missing modality that's starting to fill in. GelSight, DIGIT, e-skins, and how to use them.

← Previous

Dexterous manipulation: the hard problem

Tactile sensing in manipulation