Track 10
Learning for Robotics
Imitation, reinforcement, and the new wave of foundation models. The field that's redefining what robots can do.
12 published · 0 planned · 12 lessons total
- 01→
What are VLAs (Vision-Language-Action models) and why they matter
Publishedπ0, RT-2, OpenVLA, Gemini Robotics. A new kind of model that takes a camera image and a natural-language instruction and outputs robot motor commands. Here's what that unlocks — and what it doesn't.
~16 min
- 02→
Imitation learning 101: behavior cloning and DAgger
PublishedThe simplest way to train a robot policy: show it what to do, then make it copy. Here's how behavior cloning works, why it fails on its own, and the fix (DAgger) that made modern VLAs possible.
~15 min
- 03→
Diffusion policy explained
PublishedThe 2023 breakthrough that brought generative modeling to robot control. Why predicting actions via denoising beats predicting them directly — and why multimodal action distributions matter.
~14 min
- 04→
ACT and action chunking
PublishedThe architecture that made bimanual teleop policies actually work. Predict 16 actions at once, execute open-loop, repeat. Behind ALOHA, Mobile ALOHA, and many of the 2024+ VLAs.
~12 min
- 05→
Reinforcement learning primer for roboticists
PublishedMDPs, value functions, policy gradients — the RL minimum for a robotics audience. Skip the chess-playing examples; learn the math you'll use to train a quadruped.
~16 min
- 06→
PPO in practice: the hyperparameters that actually matter
PublishedThe algorithm that dominates robotics RL. Honest about what's art and what's science. The 8 hyperparameters that determine whether your training succeeds, fails, or wastes a week.
~14 min
- 07→
SAC and off-policy methods
PublishedWhen to reach beyond PPO. SAC's maximum-entropy framework, replay buffers, and the algorithms that win on real-world hardware where every minute of robot time matters.
~13 min
- 08→
Sim-to-real: domain randomization playbook
PublishedA policy trained in sim that works on real hardware was science fiction in 2018. In 2026 it's a weekend project — if you get the randomization right. Here's the playbook.
~16 min
- 09→
Real-world RL: HIL-SERL and friends
PublishedSkip the simulator entirely. Train on hardware, with humans in the loop, demonstrations seeding the buffer, safety wrappers everywhere. The methods letting hobbyists train policies on actual robots in 2026.
~13 min
- 10→
Collecting demonstrations: teleop rigs that work
PublishedALOHA, GELLO, phone-teleop, VR, leader-follower puppets. What each gives you, what each costs, and how to build one this weekend. The hardware behind every working VLA in 2026.
~12 min
- 11→
Fine-tuning a VLA on your own data
PublishedThe conceptual side of VLA adaptation. LoRA vs full fine-tune, action representations, loss functions, and the tradeoffs that determine whether a 200-demo run produces a working policy.
~14 min
- 12→
Open X-Embodiment and the dataset landscape
PublishedWhat datasets exist, what they contain, and how to actually use them. The base data behind every modern VLA, the benchmarks worth trusting, and how to contribute your own demos.
~12 min