RobotForge

Track 05

Perception & Computer Vision

How the robot sees. Cameras, lidar, classical CV, and the deep models that power modern robot perception.

10 published · 0 planned · 10 lessons total

  1. 01

    Camera models, calibration, and distortion

    Published

    The pinhole model, intrinsics, extrinsics, and distortion — and the 30-minute OpenCV workflow that turns a raw USB camera into a calibrated robotics sensor.

    ~15 min

  2. 02

    Classical CV: features, descriptors, matching

    Published

    Before deep learning, computer vision rested on three primitives — corner detectors, descriptors, and matchers. They still power half of modern SLAM and visual odometry. Here's why and how.

    ~14 min

  3. 03

    Optical flow and structure from motion

    Published

    How pixels move when the camera moves — and how to recover 3D from a moving monocular camera. The geometry that powers visual odometry, SLAM, and every drone that reconstructs a map from a single camera.

    ~15 min

  4. 04

    Depth: stereo and RGB-D sensors

    Published

    Three ways to get depth: stereo triangulation, structured light, time-of-flight. What each measures, what each fails on, and when to pick which for your robot.

    ~12 min

  5. 05

    LiDAR: point clouds, filtering, ground segmentation

    Published

    The sensor that took over autonomous driving. Mechanical, solid-state, and FMCW; how to filter raw clouds; ground segmentation, the first step before any object detection.

    ~13 min

  6. 06

    Object detection for robots (YOLO, RT-DETR)

    Published

    The 2D detection model ecosystem, picked with a robotics lens — latency, small-object accuracy, edge deployment. The runtime constraints that separate paper benchmarks from production.

    ~13 min

  7. 07

    Semantic and instance segmentation

    Published

    Pixel-precise masks instead of bounding boxes. SAM, Mask2Former, YOLOv8-Seg — what each gives you, and how to pick the right tool for grasping, mapping, or scene understanding.

    ~13 min

  8. 08

    Sensor fusion: visual-inertial odometry

    Published

    Combining a camera with an IMU for pose estimation that works where GPS fails. Why VIO is its own subfield, and the production-grade systems that fly drones indoors.

    ~13 min

  9. 09

    Deep learning for robot perception: what's different

    Published

    Latency budgets, domain shift, and the ways robotics CV diverges from ImageNet practice. The reasons your favorite paper's model breaks on a Jetson, and what to do about it.

    ~12 min

  10. 10

    Vision-language models for embodied tasks

    Published

    CLIP, GroundingDINO, SAM, and how modern pipelines glue them together for open-vocabulary robot perception. Detect 'the red mug' with no fine-tuning, segment it, grasp it.

    ~13 min