Track 05

Perception & Computer Vision

How the robot sees. Cameras, lidar, classical CV, and the deep models that power modern robot perception.

10 published · 0 planned · 10 lessons total

01
Camera models, calibration, and distortion
Published
The pinhole model, intrinsics, extrinsics, and distortion — and the 30-minute OpenCV workflow that turns a raw USB camera into a calibrated robotics sensor.
~15 min
→
02
Classical CV: features, descriptors, matching
Published
Before deep learning, computer vision rested on three primitives — corner detectors, descriptors, and matchers. They still power half of modern SLAM and visual odometry. Here's why and how.
~14 min
→
03
Optical flow and structure from motion
Published
How pixels move when the camera moves — and how to recover 3D from a moving monocular camera. The geometry that powers visual odometry, SLAM, and every drone that reconstructs a map from a single camera.
~15 min
→
04
Depth: stereo and RGB-D sensors
Published
Three ways to get depth: stereo triangulation, structured light, time-of-flight. What each measures, what each fails on, and when to pick which for your robot.
~12 min
→
05
LiDAR: point clouds, filtering, ground segmentation
Published
The sensor that took over autonomous driving. Mechanical, solid-state, and FMCW; how to filter raw clouds; ground segmentation, the first step before any object detection.
~13 min
→
06
Object detection for robots (YOLO, RT-DETR)
Published
The 2D detection model ecosystem, picked with a robotics lens — latency, small-object accuracy, edge deployment. The runtime constraints that separate paper benchmarks from production.
~13 min
→
07
Semantic and instance segmentation
Published
Pixel-precise masks instead of bounding boxes. SAM, Mask2Former, YOLOv8-Seg — what each gives you, and how to pick the right tool for grasping, mapping, or scene understanding.
~13 min
→
08
Sensor fusion: visual-inertial odometry
Published
Combining a camera with an IMU for pose estimation that works where GPS fails. Why VIO is its own subfield, and the production-grade systems that fly drones indoors.
~13 min
→
09
Deep learning for robot perception: what's different
Published
Latency budgets, domain shift, and the ways robotics CV diverges from ImageNet practice. The reasons your favorite paper's model breaks on a Jetson, and what to do about it.
~12 min
→
10
Vision-language models for embodied tasks
Published
CLIP, GroundingDINO, SAM, and how modern pipelines glue them together for open-vocabulary robot perception. Detect 'the red mug' with no fine-tuning, segment it, grasp it.
~13 min
→

Perception & Computer Vision

Camera models, calibration, and distortion

Classical CV: features, descriptors, matching

Optical flow and structure from motion

Depth: stereo and RGB-D sensors

LiDAR: point clouds, filtering, ground segmentation

Object detection for robots (YOLO, RT-DETR)

Semantic and instance segmentation

Sensor fusion: visual-inertial odometry

Deep learning for robot perception: what's different

Vision-language models for embodied tasks