LiDAR: point clouds, filtering, ground segmentation
The sensor that took over autonomous driving. Mechanical, solid-state, and FMCW; how to filter raw clouds; ground segmentation, the first step before any object detection.
LiDAR (Light Detection and Ranging) measures distance by timing reflected laser pulses. The result is a point cloud — millions of (x, y, z) points per second describing the world's geometry. For autonomous vehicles, drones over featureless terrain, and any robot operating outdoors, LiDAR is the perception backbone. Here's the working knowledge.
Three sensor types in 2026
Mechanical (rotating)
The classic Velodyne / Ouster design. A spinning head fires multiple laser beams (16, 32, 64, 128) and rotates at ~10 Hz. The beams stack vertically; rotation gives 360° coverage.
- Strengths: 360° field of view, dense vertical sampling, mature software ecosystem.
- Weaknesses: moving parts wear out; physical size; vertical "rings" leave horizontal gaps; expensive ($5k–$80k).
Solid-state (e.g., Luminar Iris, Innoviz, Hesai)
No moving parts. MEMS mirrors, optical phased arrays, or scanning photonic elements steer the beam.
- Strengths: durable; smaller; cheaper at scale ($500–$3k for automotive); higher resolution within the FOV.
- Weaknesses: limited FOV (typically 60–120°, not 360°); newer ecosystem; vendor-specific.
Most 2026 production self-driving systems use solid-state. Most research and ROS-based robots still use mechanical.
FMCW (Frequency-Modulated Continuous Wave)
Modulates the laser frequency continuously; the receiver compares phase to extract both range AND velocity per pixel.
- Strengths: per-pixel velocity (Doppler effect); immune to interference from other lidars (each sensor uses a unique waveform).
- Weaknesses: more expensive; lower resolution; emerging tech.
Used by some autonomous-vehicle vendors for the second-generation rollout.
The point-cloud data structure
Each LiDAR scan is a list of points, each with at least:
- x, y, z: 3D position in the sensor frame.
- intensity: how strongly the point reflected (related to material).
- ring: which laser beam captured this point (0–127 typically).
- timestamp: nanosecond-precision when the point was captured.
Stored as binary structures (PCL .pcd format, ROS sensor_msgs/PointCloud2). Open with Open3D, PCL, or NumPy.
De-skewing
One LiDAR scan takes ~100 ms (the full rotation). The robot moves during this time. Without correction, the cloud is "smeared" along the motion direction.
Fix: combine the LiDAR with an IMU. Predict the robot's pose at each point's timestamp. Transform every point to a common reference time. Smearing eliminated.
This is why LiDAR + IMU fusion (covered in the LOAM lesson) is the production standard. Pure LiDAR without de-skewing has noticeable artifacts at city-driving speeds.
Filtering raw clouds
Most processing pipelines start by filtering:
- Range filter: drop points beyond a max range (e.g., 50 m). Long-range returns are noisy and rarely useful.
- Statistical outlier removal: drop points whose neighbors are far away. Removes spurious returns from rain, dust, sensor noise.
- Voxel downsample: grid the cloud at 5–10 cm resolution; keep one point per cell. Reduces compute load by 10–100×.
- ROI crop: keep only the relevant volume (e.g., front 30 m for driving). Speeds up downstream processing.
Open3D and PCL provide one-liners for each. Standard preprocessing.
Ground segmentation
The first task in any LiDAR pipeline: which points are ground, which are obstacles? Without this split, every grass blade is a navigation obstacle.
Classical approaches:
- RANSAC plane fit: fit a plane to the cloud; points within 10 cm of the plane are ground. Works on flat terrain; fails on slopes.
- Patchwork / Patchwork++ (KAIST 2022): segment-then-fit, handles slopes and curbs robustly.
- Cone-based: assume ground rises slowly with distance from sensor; reject points above the cone.
Modern alternatives:
- Learned ground segmentation: PointNet / RandLA-Net trained to label every point ground vs not. Handles complex terrain better than classical fits.
For autonomous driving, ground segmentation is the foundation of every detection / mapping pipeline. Get it wrong and downstream everything fails.
Object detection on point clouds
After ground removal, what's left is "things." Detection methods:
- Clustering: DBSCAN or Euclidean clustering on the non-ground points. Each cluster is a potential object. Fast; doesn't classify.
- Classical (template matching): match a known shape (car, pedestrian, cone) to clusters. Outdated.
- Deep learning (PointPillars, CenterPoint, BEVFusion): state-of-the-art. Outputs 3D bounding boxes + class labels in one forward pass.
For autonomous driving in 2026, BEVFusion and similar bird's-eye-view methods are the production standard. They fuse LiDAR + camera into a unified BEV representation; detection runs on top.
Compute realities
LiDAR data is massive:
- Velodyne 64-beam at 10 Hz: ~1.3 million points/sec.
- Ouster OS1-128: ~2.6M points/sec.
- Solid-state automotive: ~10M points/sec.
Naive Python processing chokes. For real-time work, use C++ (PCL), GPU pipelines (CUDA point operations), or sparse-3D-conv frameworks (MinkowskiEngine, OpenPCDet).
Frame conventions
LiDAR points come in the sensor frame. Standard conventions:
- x: forward.
- y: left.
- z: up.
(Right-handed, REP-103 compliant.) The transform from sensor to robot base is in your URDF; tf2 handles the conversion.
The 2026 workflow
For an autonomous outdoor robot:
- LiDAR + IMU streaming at 10 Hz / 200 Hz.
- De-skew per scan via IMU integration.
- Voxel downsample + ROI crop for efficiency.
- Ground segmentation (Patchwork++ classical, or learned).
- BEV detection (BEVFusion, CenterPoint).
- Tracking (Kalman filter on detections frame-to-frame).
- Output to motion planning.
Each step is open-source. Steps 4–6 still benefit from custom tuning per platform.
Common gotchas
- Reflective surfaces (metallic cars): ghost reflections. Aggressive filtering needed.
- Rain: water droplets reflect; cloud filled with noise. Most production systems run rain-detection + intensify filtering when wet.
- Multi-LiDAR interference: two lidars in the same scene see each other's pulses. FMCW eliminates this; mechanical doesn't.
- Calibration drift: vibration shifts the sensor; mounting matters more than people expect.
Exercise
Download the KITTI Velodyne data. Use Open3D to load and visualize a single scan. Apply RANSAC ground segmentation; observe how it fails on slopes. Try Patchwork. The visual difference is the field's progress in five years of papers.
Next
Object detection for robots — what 2D cameras give you when LiDAR is too expensive or too heavy.
Comments
Sign in to post a comment.