Published2026-04-23·~15 min

Camera models, calibration, and distortion

The pinhole model, intrinsics, extrinsics, and distortion — and the 30-minute OpenCV workflow that turns a raw USB camera into a calibrated robotics sensor.

by RobotForge

#perception#camera#calibration#opencv

A camera that isn't calibrated is a camera that lies. Every robotics perception pipeline — SLAM, grasping, object pose estimation — starts from calibrated camera parameters. Here's the model, the math, and the 30-minute workflow to get a real camera calibrated.

The pinhole model

A point in 3D at (X, Y, Z) in camera coordinates projects to pixel (u, v) via:

u = fx · (X / Z) + cx
v = fy · (Y / Z) + cy

Four parameters: fx, fy (focal length in pixels) and cx, cy (optical center). Packed into a 3×3 intrinsics matrix K:

K = | fx   0   cx |
    |  0  fy   cy |
    |  0   0    1 |

For a typical USB webcam: fx ≈ fy ≈ 600, (cx, cy) near the image center. For a phone camera: 1200–1800. For a wide-angle: 400 or less.

Extrinsics: where the camera is in the world

Intrinsics describe how a camera projects. Extrinsics describe where it is. Given a 4×4 transform T_wc (world → camera), any world point goes to pixels via:

[x_cam; y_cam; z_cam; 1] = T_wc · [X_world; Y_world; Z_world; 1]
u, v = K · [x_cam/z_cam; y_cam/z_cam; 1]

Intrinsics are a property of the camera hardware and the lens — they don't change while you use the camera. Extrinsics change every time you move the camera (or the robot).

Distortion: lenses lie

Real lenses bend light. The classical model has two types of distortion:

Radial — straight lines curve, more at the edges. Three coefficients: k1, k2, k3.
Tangential — slight skew from lens/sensor misalignment. Two coefficients: p1, p2.

Full OpenCV distortion model: (k1, k2, p1, p2, k3). Cheap wide-angle lenses have big k1. Phone cameras have small distortion. Fisheyes need a different model (cv2.fisheye).

The 30-minute OpenCV calibration

You need: your camera, a printed checkerboard pattern (9×6 inner corners is standard), and Python with OpenCV.

Print the checkerboard from opencv's pattern, tape it to a rigid board (important — it must be flat), and measure the square size in meters.

import cv2
import numpy as np

CHECKERBOARD = (9, 6)
SQUARE_M = 0.024   # your measurement

objp = np.zeros((CHECKERBOARD[0] * CHECKERBOARD[1], 3), np.float32)
objp[:, :2] = np.mgrid[0:CHECKERBOARD[0],
                       0:CHECKERBOARD[1]].T.reshape(-1, 2) * SQUARE_M

obj_points, img_points = [], []

cap = cv2.VideoCapture(0)
n_captured = 0
while n_captured < 20:
    ok, frame = cap.read()
    if not ok: continue
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    found, corners = cv2.findChessboardCorners(gray, CHECKERBOARD)
    if found:
        cv2.drawChessboardCorners(frame, CHECKERBOARD, corners, found)
        cv2.imshow('cal', frame)
        if cv2.waitKey(500) & 0xFF == ord(' '):
            obj_points.append(objp)
            img_points.append(corners)
            n_captured += 1
            print(f'captured {n_captured}/20')
    else:
        cv2.imshow('cal', frame)
        cv2.waitKey(30)

_, K, dist, _, _ = cv2.calibrateCamera(
    obj_points, img_points, gray.shape[::-1], None, None)

print('K =', K)
print('distortion =', dist.flatten())
np.savez('calibration.npz', K=K, dist=dist)

Move the board through 20 different poses — tilted left, right, up, down, close, far. Press space on each. The more diverse your views, the more accurate the calibration. Aim for reprojection error < 0.5 pixels for a usable calibration.

Using the calibration

Two common things you'll do with K and dist:

1. Undistort an image

frame_undistorted = cv2.undistort(frame, K, dist)

Now straight lines in the world are straight in the image. Every downstream algorithm that assumes a pinhole camera — stereo, SLAM, pose estimation — wants undistorted images.

2. Convert pixels to 3D rays

To reconstruct 3D from pixels, convert each pixel to a normalized ray in camera coordinates:

def pixel_to_ray(u, v, K):
    return np.linalg.inv(K) @ np.array([u, v, 1.0])

This is the unit-vector direction from the camera through that pixel. Depth is the remaining unknown — a single monocular camera can't recover it. Stereo, RGB-D, or motion solves that.

Common calibration mistakes

Flimsy checkerboard. Paper curls. Tape it to a flat board or glue it to a plate. 5mm of warping ruins your calibration.
Too few views. Minimum 15; 25+ is better. Include tilts, not just translations.
Calibrating at one focal setting, using at another. If your camera has autofocus, lock it before calibrating. Different focus = different focal length.
Tiny checkerboard. The corners need to be detectable. For a webcam, a printed A4 is fine; for a high-res camera on a meter-scale scene, use a bigger board.

When the pinhole model isn't enough

Fisheye / very wide-angle lenses — use cv2.fisheye.calibrate with the Brown-Conrady or Kannala-Brandt model.
Depth cameras (RealSense, Azure Kinect) — factory-calibrated, but re-calibrate if accuracy matters. Each sensor in the rig (IR, RGB, projector) has its own intrinsics.
Stereo rigs — add extrinsics between cameras and use cv2.stereoCalibrate.

What this unlocks

A calibrated camera is the precondition for: PnP (recovering pose from 2D-3D correspondences), triangulation (3D from two cameras), visual odometry, SLAM, aruco/apriltag localization, model-based object pose estimation, and essentially every downstream 3D perception pipeline. Thirty minutes of calibration gives you weeks of downstream value.

Classical CV: features, descriptors, matching