Published2026-04-26·~12 min

Collecting demonstrations: teleop rigs that work

ALOHA, GELLO, phone-teleop, VR, leader-follower puppets. What each gives you, what each costs, and how to build one this weekend. The hardware behind every working VLA in 2026.

by RobotForge

#learning#teleoperation#demos

No teleoperation rig, no demonstrations. No demonstrations, no imitation learning. No imitation learning, no VLA fine-tunes — and no real-world RL bootstrap. Teleop is the unsexy hardware on which all of modern robot learning rests. The good news: in 2026 you can build one in a weekend for under $500.

What "good teleop" means for ML

Three properties matter for ML usefulness:

Fidelity: the operator's intent should reach the robot accurately. Sloppy teleop produces sloppy demos.
Frequency: demonstrations should be recorded at the policy's control rate (typically 30–60 Hz). Less rate, less data per minute.
Reproducibility: the same operator should produce similar demos session-to-session. High variance in demonstrations limits learning.

Add to this: low operator fatigue (so you can collect 200+ demos in a session) and easy synchronization with cameras.

The five rig types

1. Phone teleop

The phone is the input. Drag on the touchscreen → end-effector moves. Pinch → gripper. Tap to record/stop episode.

Strengths: zero hardware. Works for proof-of-concept on any arm.
Weaknesses: low fidelity. 2D screen → 3D motion. No haptic feedback. 10–20 demos/hour at best.
Best for: quick demos, hackathons, tasks that don't need precision.

2. VR controllers

Quest, Vive, Index. The headset gives you scene visualization; controllers map to end-effector pose. Trigger = gripper.

Strengths: 6-DOF input, high-frequency tracking, scene visualization. Free if you have a Quest.
Weaknesses: setup overhead. Calibration. Operator fatigue from VR weight. Hand-tracking less precise than puppet rigs.
Best for: when you have a Quest and want quick reasonable-quality demos.

3. Leader-follower puppet (GELLO and friends)

A small kinematic copy of the robot arm with rotary encoders at every joint. The operator manipulates the puppet; the encoder values are sent in real time to the actual robot, which mirrors the motion.

Strengths: extremely high fidelity. Joint-space teleop bypasses IK uncertainty. Operator can collect 50+ demos/hour. Cheap to build (~$200 for a 6-DOF puppet).
Weaknesses: arm-specific (you need a puppet matched to your robot's geometry). No haptic feedback (operator can't feel the robot's contacts).
Best for: most production-grade demo collection. The default for serious imitation learning in 2026.

4. ALOHA-style bimanual puppet

Same idea as GELLO but with two puppet arms for two robot arms. Specifically designed for bimanual tasks. The original ALOHA hardware is open-source ($20k); cheaper clones exist.

Strengths: only practical way to teleoperate two arms simultaneously. The hardware behind every bimanual VLA paper.
Weaknesses: expensive. Bench setup needed. Fatigue worse than single-arm.
Best for: bimanual manipulation research and any 2-arm robot deployments.

5. Master-slave with haptic feedback

Like GELLO but with motors in the puppet that resist motion based on the robot's contact forces. The operator feels when the robot touches things.

Strengths: highest-fidelity teleop available. Critical for delicate or contact-rich tasks (surgery, precise assembly).
Weaknesses: expensive. Geomagic Touch ($3k+) is the consumer-grade option; commercial systems run to $50k+.
Best for: surgical and assembly research where contact awareness is the bottleneck.

Comparison table

Rig	Cost	Fidelity	Demos/hour
Phone	$0	★	10
VR	$300	★★★	20
GELLO	$200	★★★★	50
ALOHA bimanual	$2k–20k	★★★★	30
Haptic master-slave	$3k–50k	★★★★★	40

Building a GELLO clone in a weekend

Materials:

Six small servos with magnetic encoders (Dynamixel XL330 or similar). $30 × 6 = $180.
3D-printed arm matching your robot's joint geometry. ~$10 in PLA.
Microcontroller (Teensy 4.1 recommended for high-rate encoder reading). $30.
USB cable, mounting bracket, basic wiring. $30.

Total: ~$250 for a working puppet.

Software:

Read encoder values at 100+ Hz over USB.
Apply gravity compensation in the puppet (so the operator doesn't fight gravity).
Stream joint targets to the real robot's controller.

The original GELLO repo (UC Berkeley, 2024) provides STL files and firmware. Fork; modify for your specific arm.

The data-collection workflow

Once the rig works:

Set up scene cameras (one looking at the workspace, one wrist-mounted).
Define the task. Tag every demo with a natural-language label.
Vary the conditions: lighting, object positions, distractors.
Record demos in LeRobotDataset format.
Push to Hugging Face for portability.

Aim for: 100 demos per task, 5–10 conditions of variation per task.

Common gotchas

Joint-space mismatch: GELLO's puppet must have the same joint axes as the robot. Even a 5° offset in joint orientation causes consistent misalignment in demos.
Latency: anything over 50 ms feels rubbery. Optimize the USB → MCU → robot pipeline.
Encoder drift: re-zero the puppet at the start of each session.
Gripper synchronization: the gripper trigger should be on the puppet, not on a separate controller. Operator's hand should never leave the puppet.
Camera lag: if your scene camera lags behind the action, you're producing misaligned (image, action) pairs. Synchronize timestamps carefully.

Beyond GELLO

UMI / Universal Manipulation Interface (Stanford, 2024): hand-held grippers used by the demonstrator without a robot — record from human hands, train policies to mimic. Faster collection, transfers well.
OpenTeach: bimanual VR teleop with haptic gloves.
Dataset-from-video: use videos of humans doing tasks; learn to imitate. Less precise but enormous data potential.

The 2026 best practice

For most projects: build a GELLO clone, collect 100–500 demos, fine-tune a VLA. The combination has demonstrated end-to-end success on dozens of contact-rich tasks at hobby budget. Five years ago this would have been a research project; today it's a weekend.

Exercise

Build (or buy) a GELLO clone for any 6-DOF arm you have. Collect 50 demos of a single task: pick up a block, place it in a target zone. Fine-tune ACT or OpenVLA on the demos. Deploy. Measure success rate. The first time the policy works end-to-end on hardware, the path from data to deployment becomes real.

Fine-tuning a VLA — the conceptual side of what you do with those demos.

← Previous

Real-world RL: HIL-SERL and friends

Fine-tuning a VLA on your own data