RobotForge
Published·~12 min

Collecting demonstrations: teleop rigs that work

ALOHA, GELLO, phone-teleop, VR, leader-follower puppets. What each gives you, what each costs, and how to build one this weekend. The hardware behind every working VLA in 2026.

by RobotForge
#learning#teleoperation#demos

No teleoperation rig, no demonstrations. No demonstrations, no imitation learning. No imitation learning, no VLA fine-tunes — and no real-world RL bootstrap. Teleop is the unsexy hardware on which all of modern robot learning rests. The good news: in 2026 you can build one in a weekend for under $500.

What "good teleop" means for ML

Three properties matter for ML usefulness:

  • Fidelity: the operator's intent should reach the robot accurately. Sloppy teleop produces sloppy demos.
  • Frequency: demonstrations should be recorded at the policy's control rate (typically 30–60 Hz). Less rate, less data per minute.
  • Reproducibility: the same operator should produce similar demos session-to-session. High variance in demonstrations limits learning.

Add to this: low operator fatigue (so you can collect 200+ demos in a session) and easy synchronization with cameras.

The five rig types

1. Phone teleop

The phone is the input. Drag on the touchscreen → end-effector moves. Pinch → gripper. Tap to record/stop episode.

  • Strengths: zero hardware. Works for proof-of-concept on any arm.
  • Weaknesses: low fidelity. 2D screen → 3D motion. No haptic feedback. 10–20 demos/hour at best.
  • Best for: quick demos, hackathons, tasks that don't need precision.

2. VR controllers

Quest, Vive, Index. The headset gives you scene visualization; controllers map to end-effector pose. Trigger = gripper.

  • Strengths: 6-DOF input, high-frequency tracking, scene visualization. Free if you have a Quest.
  • Weaknesses: setup overhead. Calibration. Operator fatigue from VR weight. Hand-tracking less precise than puppet rigs.
  • Best for: when you have a Quest and want quick reasonable-quality demos.

3. Leader-follower puppet (GELLO and friends)

A small kinematic copy of the robot arm with rotary encoders at every joint. The operator manipulates the puppet; the encoder values are sent in real time to the actual robot, which mirrors the motion.

  • Strengths: extremely high fidelity. Joint-space teleop bypasses IK uncertainty. Operator can collect 50+ demos/hour. Cheap to build (~$200 for a 6-DOF puppet).
  • Weaknesses: arm-specific (you need a puppet matched to your robot's geometry). No haptic feedback (operator can't feel the robot's contacts).
  • Best for: most production-grade demo collection. The default for serious imitation learning in 2026.

4. ALOHA-style bimanual puppet

Same idea as GELLO but with two puppet arms for two robot arms. Specifically designed for bimanual tasks. The original ALOHA hardware is open-source ($20k); cheaper clones exist.

  • Strengths: only practical way to teleoperate two arms simultaneously. The hardware behind every bimanual VLA paper.
  • Weaknesses: expensive. Bench setup needed. Fatigue worse than single-arm.
  • Best for: bimanual manipulation research and any 2-arm robot deployments.

5. Master-slave with haptic feedback

Like GELLO but with motors in the puppet that resist motion based on the robot's contact forces. The operator feels when the robot touches things.

  • Strengths: highest-fidelity teleop available. Critical for delicate or contact-rich tasks (surgery, precise assembly).
  • Weaknesses: expensive. Geomagic Touch ($3k+) is the consumer-grade option; commercial systems run to $50k+.
  • Best for: surgical and assembly research where contact awareness is the bottleneck.

Comparison table

Rig Cost Fidelity Demos/hour
Phone$010
VR$300★★★20
GELLO$200★★★★50
ALOHA bimanual$2k–20k★★★★30
Haptic master-slave$3k–50k★★★★★40

Building a GELLO clone in a weekend

Materials:

  • Six small servos with magnetic encoders (Dynamixel XL330 or similar). $30 × 6 = $180.
  • 3D-printed arm matching your robot's joint geometry. ~$10 in PLA.
  • Microcontroller (Teensy 4.1 recommended for high-rate encoder reading). $30.
  • USB cable, mounting bracket, basic wiring. $30.

Total: ~$250 for a working puppet.

Software:

  • Read encoder values at 100+ Hz over USB.
  • Apply gravity compensation in the puppet (so the operator doesn't fight gravity).
  • Stream joint targets to the real robot's controller.

The original GELLO repo (UC Berkeley, 2024) provides STL files and firmware. Fork; modify for your specific arm.

The data-collection workflow

Once the rig works:

  1. Set up scene cameras (one looking at the workspace, one wrist-mounted).
  2. Define the task. Tag every demo with a natural-language label.
  3. Vary the conditions: lighting, object positions, distractors.
  4. Record demos in LeRobotDataset format.
  5. Push to Hugging Face for portability.

Aim for: 100 demos per task, 5–10 conditions of variation per task.

Common gotchas

  • Joint-space mismatch: GELLO's puppet must have the same joint axes as the robot. Even a 5° offset in joint orientation causes consistent misalignment in demos.
  • Latency: anything over 50 ms feels rubbery. Optimize the USB → MCU → robot pipeline.
  • Encoder drift: re-zero the puppet at the start of each session.
  • Gripper synchronization: the gripper trigger should be on the puppet, not on a separate controller. Operator's hand should never leave the puppet.
  • Camera lag: if your scene camera lags behind the action, you're producing misaligned (image, action) pairs. Synchronize timestamps carefully.

Beyond GELLO

  • UMI / Universal Manipulation Interface (Stanford, 2024): hand-held grippers used by the demonstrator without a robot — record from human hands, train policies to mimic. Faster collection, transfers well.
  • OpenTeach: bimanual VR teleop with haptic gloves.
  • Dataset-from-video: use videos of humans doing tasks; learn to imitate. Less precise but enormous data potential.

The 2026 best practice

For most projects: build a GELLO clone, collect 100–500 demos, fine-tune a VLA. The combination has demonstrated end-to-end success on dozens of contact-rich tasks at hobby budget. Five years ago this would have been a research project; today it's a weekend.

Exercise

Build (or buy) a GELLO clone for any 6-DOF arm you have. Collect 50 demos of a single task: pick up a block, place it in a target zone. Fine-tune ACT or OpenVLA on the demos. Deploy. Measure success rate. The first time the policy works end-to-end on hardware, the path from data to deployment becomes real.

Next

Fine-tuning a VLA — the conceptual side of what you do with those demos.

Comments

    Sign in to post a comment.