RobotForge
Published·~14 min

Dexterous manipulation: the hard problem

Multi-finger hands, in-hand reorientation, and why this is still mostly a research frontier in 2026. The classical theory, the 2018–24 deep-learning revolution, and the open questions.

by RobotForge
#manipulation#dexterity#hands#research

A two-finger gripper picks objects; a four-finger hand starts to manipulate them. Re-orient a Rubik's cube without releasing it. Twirl a pen. Thread a needle. These tasks define dexterous manipulation — and they're where robotics still falls a long way short of a five-year-old human. Here's the state of play in 2026.

Why dexterous is harder than grasping

A grasp is one event: contact made, object lifted. Dexterous manipulation is many events in a row, with the contacts changing as you go.

  • Many simultaneous contacts: 3–10 fingers + thumb, plus the palm.
  • Contacts that come and go: rolling, sliding, breaking + remaking.
  • Friction-mediated everything: the difference between a pen rolling smoothly and slipping is in the friction cone.
  • State partial observability: the object's pose changes inside the hand and the cameras can't always see it.
  • Tactile sensing required: vision alone misses the contact information.

Each one of these is a hard problem; in dexterous manipulation, they all hit at once.

The hardware

Mainstream multi-fingered hands in 2026:

  • Allegro Hand (Wonik Robotics): 4 fingers × 4 DOF = 16 DOF. The research workhorse.
  • Shadow Hand (Shadow Robot): 24 DOF, anthropomorphic. Expensive ($150k+).
  • Inspire Hand: cheap (~$5k), 6 DOF, common on humanoids in 2026.
  • Robotiq 3-Finger: industrial, 3 fingers, simpler.
  • D'Manus / Faive: open-source designs gaining traction.

Adding tactile fingertips (GelSight, DIGIT, BioTac) doubles the hardware cost but adds the sensing dexterous manipulation arguably requires.

The classical approaches

Decades of research before deep learning:

Form and force closure (Reuleaux, 1875)

Same theory as grasping (covered in grasp-analysis lesson). Determines whether multi-finger contacts hold the object against any disturbance. Gives a yes/no answer for any given configuration.

Manipulability analysis

For each configuration of the hand-object system, compute Jacobians. Pick fingertip motions that move the object in the desired direction. Solves the "what should each finger do?" problem given a goal motion.

Rolling and sliding contact mechanics

Mathematical theory (Murray, Li, Sastry, 1994 textbook) for how rolling contacts change configuration. Beautiful math, hard to apply.

Sampling-based planning

RRT-style planning over the contact-state graph. Computationally expensive — many DOF, many contact modes.

The classical methods all work in principle. In practice they require accurate models, perfect sensing, and substantial computation. Real-world dexterous manipulation with classical methods has been demonstrated for narrow tasks but doesn't generalize.

The 2018+ deep-learning revolution

OpenAI's Rubik's cube hand (2019) was the watershed: a Shadow Hand with reinforcement learning solved a Rubik's cube in simulation, transferred to hardware via massive domain randomization. Showed that learned policies could do dexterous tasks that classical methods couldn't.

Since then:

  • In-hand reorientation (multiple labs): rotate a complex object 360° using only the fingers.
  • Object insertion with multiple fingers: combine grasping + manipulation in one policy.
  • DeepMind's hand work: SAC + curriculum learning solving in-hand block reorientation.
  • Tactile-conditioned policies: combining GelSight readings with vision; dramatically improves robustness.

The 2024+ trend: VLAs for dexterous tasks. π0 and Gemini Robotics demonstrate single policies that do grasping AND in-hand manipulation across many tasks. Still narrower and slower than humans, but the gap is closing.

Where dexterous still struggles

  • Speed: Shadow Hand at 50 Hz is sluggish compared to a human at 1000 Hz tactile sampling.
  • Force precision: humans modulate fingertip force across 4 orders of magnitude; robots cover 1–2.
  • Compliance: human fingers are soft; most robot fingers are rigid. The compliance changes everything about contact.
  • Generalization across objects: a policy trained on cubes doesn't trivially transfer to spheres.
  • Tool use: holding a tool while manipulating it (key in lock, screwdriver in screw) remains an open problem.

The hard problem under it all

The fundamental gap may be a sensing problem more than a control problem. Humans have ~17,000 mechanoreceptors per fingertip, sensitivity below the mass of a feather, and a brain that processes this in real time. Robots have a few-pixel GelSight image at best, and we're still figuring out what to do with it.

The 2024+ research direction: dense tactile sensing + tactile foundation models that learn to read fingertip data the way LLMs learned to read text.

Where dexterous works in 2026

  • Tightly-scoped industrial tasks: rolling a part 90° between two fingers; picking up flat objects from a flush surface. Classical methods + good fingertips solve these.
  • Reset-friendly research demos: Rubik's cube solving, blocks reorientation, when you can re-roll the dice if it fails.
  • Human-teleoperated: glove-style teleop maps human dexterity directly. Used in surgery, hazardous remote work.

Where it still doesn't

  • Tying shoelaces: deformable cords, multi-stage, requires fingertip-level sensing.
  • Buttoning a shirt: similar problem class, similar gap.
  • Fine surgical motion: high-bandwidth force control with sub-millimeter accuracy. Da Vinci systems are teleop, not autonomous.
  • Manipulating cards / paper: thin objects need multi-finger pinches and friction-aware sliding.

How to enter the field

In approximate order of difficulty:

  1. Read OpenAI's Rubik's cube paper (2019) and Anonymous Manipulation papers (2022+).
  2. Run an Allegro or Inspire hand simulation in MuJoCo. Verify you can do single-finger contacts.
  3. Train a PPO policy for a simple in-hand task (rotate a cube 90°). Watch it succeed in sim, fail on hardware.
  4. Add domain randomization. Try sim-to-real. Probably still fails.
  5. Add tactile sensing. Re-train with tactile observations. Probably works for simple tasks.
  6. Adapt a VLA (π0, OpenVLA) to your hand. Best 2026 results.

This is genuinely a research path more than an engineering one. Industrial dexterous-manipulation pipelines exist for narrow tasks; the open frontier is in academia.

The open questions

  • How much can VLAs scale to general dexterity? Current data suggests positive scaling but unclear ceiling.
  • Is dense tactile sensing enough, or do we need bio-inspired sensors (afferent receptor mimics)?
  • Are 16-DOF hands sufficient, or do we need 24+?
  • Does compliance hardware (soft / cable-driven) beat stiff hands for dexterity?
  • How important is the cerebellum-like fast control loop humans have, vs slow policy inference?

Exercise

In MuJoCo, simulate an Allegro Hand holding a cube. Implement a simple controller that rotates the cube 90° around its vertical axis using fingertip motions. Use the manipulability framework: compute the Jacobian from finger angles to cube angle; invert to get the required finger motion. Watch how brittle classical control is. Then try a learned policy (PPO from the mujoco_playground examples). The 2× improvement in robustness is the field's progress in one comparison.

Next

Mobile manipulation — coordinating an arm on a moving base for whole-body tasks like pushing carts and reaching beyond a fixed workspace.

Comments

    Sign in to post a comment.