Camera models, calibration, and distortion
The pinhole model, intrinsics, extrinsics, and distortion — and the 30-minute OpenCV workflow that turns a raw USB camera into a calibrated robotics sensor.
A camera that isn't calibrated is a camera that lies. Every robotics perception pipeline — SLAM, grasping, object pose estimation — starts from calibrated camera parameters. Here's the model, the math, and the 30-minute workflow to get a real camera calibrated.
The pinhole model
A point in 3D at (X, Y, Z) in camera coordinates projects to pixel (u, v) via:
u = fx · (X / Z) + cx
v = fy · (Y / Z) + cy
Four parameters: fx, fy (focal length in pixels) and cx, cy (optical center). Packed into a 3×3 intrinsics matrix K:
K = | fx 0 cx |
| 0 fy cy |
| 0 0 1 |
For a typical USB webcam: fx ≈ fy ≈ 600, (cx, cy) near the image center. For a phone camera: 1200–1800. For a wide-angle: 400 or less.
Extrinsics: where the camera is in the world
Intrinsics describe how a camera projects. Extrinsics describe where it is. Given a 4×4 transform T_wc (world → camera), any world point goes to pixels via:
[x_cam; y_cam; z_cam; 1] = T_wc · [X_world; Y_world; Z_world; 1]
u, v = K · [x_cam/z_cam; y_cam/z_cam; 1]
Intrinsics are a property of the camera hardware and the lens — they don't change while you use the camera. Extrinsics change every time you move the camera (or the robot).
Distortion: lenses lie
Real lenses bend light. The classical model has two types of distortion:
- Radial — straight lines curve, more at the edges. Three coefficients:
k1, k2, k3. - Tangential — slight skew from lens/sensor misalignment. Two coefficients:
p1, p2.
Full OpenCV distortion model: (k1, k2, p1, p2, k3). Cheap wide-angle lenses have big k1. Phone cameras have small distortion. Fisheyes need a different model (cv2.fisheye).
The 30-minute OpenCV calibration
You need: your camera, a printed checkerboard pattern (9×6 inner corners is standard), and Python with OpenCV.
Print the checkerboard from opencv's pattern, tape it to a rigid board (important — it must be flat), and measure the square size in meters.
import cv2
import numpy as np
CHECKERBOARD = (9, 6)
SQUARE_M = 0.024 # your measurement
objp = np.zeros((CHECKERBOARD[0] * CHECKERBOARD[1], 3), np.float32)
objp[:, :2] = np.mgrid[0:CHECKERBOARD[0],
0:CHECKERBOARD[1]].T.reshape(-1, 2) * SQUARE_M
obj_points, img_points = [], []
cap = cv2.VideoCapture(0)
n_captured = 0
while n_captured < 20:
ok, frame = cap.read()
if not ok: continue
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
found, corners = cv2.findChessboardCorners(gray, CHECKERBOARD)
if found:
cv2.drawChessboardCorners(frame, CHECKERBOARD, corners, found)
cv2.imshow('cal', frame)
if cv2.waitKey(500) & 0xFF == ord(' '):
obj_points.append(objp)
img_points.append(corners)
n_captured += 1
print(f'captured {n_captured}/20')
else:
cv2.imshow('cal', frame)
cv2.waitKey(30)
_, K, dist, _, _ = cv2.calibrateCamera(
obj_points, img_points, gray.shape[::-1], None, None)
print('K =', K)
print('distortion =', dist.flatten())
np.savez('calibration.npz', K=K, dist=dist)
Move the board through 20 different poses — tilted left, right, up, down, close, far. Press space on each. The more diverse your views, the more accurate the calibration. Aim for reprojection error < 0.5 pixels for a usable calibration.
Using the calibration
Two common things you'll do with K and dist:
1. Undistort an image
frame_undistorted = cv2.undistort(frame, K, dist)
Now straight lines in the world are straight in the image. Every downstream algorithm that assumes a pinhole camera — stereo, SLAM, pose estimation — wants undistorted images.
2. Convert pixels to 3D rays
To reconstruct 3D from pixels, convert each pixel to a normalized ray in camera coordinates:
def pixel_to_ray(u, v, K):
return np.linalg.inv(K) @ np.array([u, v, 1.0])
This is the unit-vector direction from the camera through that pixel. Depth is the remaining unknown — a single monocular camera can't recover it. Stereo, RGB-D, or motion solves that.
Common calibration mistakes
- Flimsy checkerboard. Paper curls. Tape it to a flat board or glue it to a plate. 5mm of warping ruins your calibration.
- Too few views. Minimum 15; 25+ is better. Include tilts, not just translations.
- Calibrating at one focal setting, using at another. If your camera has autofocus, lock it before calibrating. Different focus = different focal length.
- Tiny checkerboard. The corners need to be detectable. For a webcam, a printed A4 is fine; for a high-res camera on a meter-scale scene, use a bigger board.
When the pinhole model isn't enough
- Fisheye / very wide-angle lenses — use
cv2.fisheye.calibratewith the Brown-Conrady or Kannala-Brandt model. - Depth cameras (RealSense, Azure Kinect) — factory-calibrated, but re-calibrate if accuracy matters. Each sensor in the rig (IR, RGB, projector) has its own intrinsics.
- Stereo rigs — add extrinsics between cameras and use
cv2.stereoCalibrate.
What this unlocks
A calibrated camera is the precondition for: PnP (recovering pose from 2D-3D correspondences), triangulation (3D from two cameras), visual odometry, SLAM, aruco/apriltag localization, model-based object pose estimation, and essentially every downstream 3D perception pipeline. Thirty minutes of calibration gives you weeks of downstream value.
Comments
Sign in to post a comment.