Track 05
Perception & Computer Vision
How the robot sees. Cameras, lidar, classical CV, and the deep models that power modern robot perception.
10 published · 0 planned · 10 lessons total
- 01→
Camera models, calibration, and distortion
PublishedThe pinhole model, intrinsics, extrinsics, and distortion — and the 30-minute OpenCV workflow that turns a raw USB camera into a calibrated robotics sensor.
~15 min
- 02→
Classical CV: features, descriptors, matching
PublishedBefore deep learning, computer vision rested on three primitives — corner detectors, descriptors, and matchers. They still power half of modern SLAM and visual odometry. Here's why and how.
~14 min
- 03→
Optical flow and structure from motion
PublishedHow pixels move when the camera moves — and how to recover 3D from a moving monocular camera. The geometry that powers visual odometry, SLAM, and every drone that reconstructs a map from a single camera.
~15 min
- 04→
Depth: stereo and RGB-D sensors
PublishedThree ways to get depth: stereo triangulation, structured light, time-of-flight. What each measures, what each fails on, and when to pick which for your robot.
~12 min
- 05→
LiDAR: point clouds, filtering, ground segmentation
PublishedThe sensor that took over autonomous driving. Mechanical, solid-state, and FMCW; how to filter raw clouds; ground segmentation, the first step before any object detection.
~13 min
- 06→
Object detection for robots (YOLO, RT-DETR)
PublishedThe 2D detection model ecosystem, picked with a robotics lens — latency, small-object accuracy, edge deployment. The runtime constraints that separate paper benchmarks from production.
~13 min
- 07→
Semantic and instance segmentation
PublishedPixel-precise masks instead of bounding boxes. SAM, Mask2Former, YOLOv8-Seg — what each gives you, and how to pick the right tool for grasping, mapping, or scene understanding.
~13 min
- 08→
Sensor fusion: visual-inertial odometry
PublishedCombining a camera with an IMU for pose estimation that works where GPS fails. Why VIO is its own subfield, and the production-grade systems that fly drones indoors.
~13 min
- 09→
Deep learning for robot perception: what's different
PublishedLatency budgets, domain shift, and the ways robotics CV diverges from ImageNet practice. The reasons your favorite paper's model breaks on a Jetson, and what to do about it.
~12 min
- 10→
Vision-language models for embodied tasks
PublishedCLIP, GroundingDINO, SAM, and how modern pipelines glue them together for open-vocabulary robot perception. Detect 'the red mug' with no fine-tuning, segment it, grasp it.
~13 min