Rigid-body transforms: SO(3) and SE(3) demystified
Rotations and poses in 3D, without trig soup. Learn SO(3), SE(3), homogeneous matrices, and quaternions — and when to use each.
The 2-DOF arm lesson got away with cos/sin because the world was flat. In 3D you need real tools. Two objects, four representations, and one rule for picking the right one — that's the whole chapter.
What you're actually representing
A rigid body in 3D has two pieces of information:
- Position — a 3-vector, no controversy
- Orientation — a rotation, which has four common representations and one real definition
Orientation is the hard part. Let's fix the vocabulary first.
SO(3): the set of all 3D rotations
SO(3) is the mathematician's name for "the set of all rigid rotations of 3D space." An SO(3) element is a 3×3 matrix with two properties:
R · RT = I(orthogonal)det(R) = +1(proper, not a mirror flip)
Any such matrix rotates a vector: v' = R · v. Compose rotations by multiplying matrices: R_total = R2 · R1 means "do R1, then R2." Order matters — SO(3) is non-commutative.
SE(3): position + orientation together
SE(3) is the set of rigid-body transforms — rotation and translation. The standard representation is a 4×4 homogeneous transformation matrix:
T = | R t |
| 0 1 |
where R is 3×3 and t is 3×1
Apply a transform to a point p by promoting to 4D with a 1 on the end:
[p_new; 1] = T · [p; 1]
And compose transforms by matrix multiplication: T_total = T2 · T1.
Why the 4×4 rigmarole? Because now composition, inversion, and application all use the same operator (@ in NumPy). No special cases. This is the form used everywhere in robotics code.
Four ways to write a rotation
1. Rotation matrix — 9 numbers
What SO(3) literally is. Easy to apply (v' = R @ v), easy to compose (matrix multiply), but wasteful in storage and prone to numerical drift. Renormalize after 1000 multiplications.
2. Euler angles — 3 numbers (roll, pitch, yaw)
The one that feels intuitive. Use for input/output only. They have 24 conventions (ZYX, XYZ, etc.), and gimbal lock at the poles will embarrass you on demo day. Never interpolate Euler angles directly.
3. Axis-angle — 4 numbers
A rotation is "by angle θ around unit axis k." The Rodrigues formula gives the rotation matrix:
R = I + sin(θ)·K + (1 - cos(θ))·K² where K is the skew-symmetric matrix of k
Good for thinking. Rare in code.
4. Quaternion — 4 numbers (x, y, z, w)
The representation most robotics code uses internally. Four numbers, unit-norm constraint, no gimbal lock, smooth interpolation (slerp). Composition is quaternion multiplication, not particularly intuitive but fast. ROS's geometry_msgs/Quaternion stores exactly this.
When to use which
| Use case | Best representation |
|---|---|
| Applying a rotation to a point | Rotation matrix |
| Storing & transmitting | Quaternion |
| Interpolating between two rotations | Quaternion (slerp) |
| Reading or writing a config file | Euler angles (humans understand them) |
| Analytical work, proofs | Axis-angle |
A clean NumPy toolkit
Don't write quaternion or rotation math from scratch. Use scipy.spatial.transform.Rotation:
import numpy as np
from scipy.spatial.transform import Rotation as R
# Rotation by 90° around z
r = R.from_euler('z', 90, degrees=True)
print(r.as_matrix()) # 3×3 rotation matrix
print(r.as_quat()) # (x, y, z, w)
print(r.as_rotvec()) # axis × angle
# Apply to a vector
v = np.array([1, 0, 0])
print(r.apply(v)) # [0, 1, 0]
# Compose: first r1, then r2
r1 = R.from_euler('x', 45, degrees=True)
r2 = R.from_euler('y', 45, degrees=True)
r_total = r2 * r1 # NOTE: reversed order vs matrices
# Interpolate 50/50 between two orientations
from scipy.spatial.transform import Slerp
times = [0, 1]
rots = R.from_quat([r1.as_quat(), r2.as_quat()])
slerp = Slerp(times, rots)
print(slerp(0.5).as_quat())
Homogeneous transforms in practice
def make_T(R_mat, t):
T = np.eye(4)
T[:3, :3] = R_mat
T[:3, 3] = t
return T
# World → camera frame
T_wc = make_T(R.from_euler('z', 90, degrees=True).as_matrix(),
np.array([1.0, 2.0, 0.5]))
# Inverse: camera → world
def invert_T(T):
R_mat = T[:3, :3]
t = T[:3, 3]
T_inv = np.eye(4)
T_inv[:3, :3] = R_mat.T
T_inv[:3, 3] = -R_mat.T @ t
return T_inv
# Chain: world → camera → object
T_co = make_T(np.eye(3), np.array([0, 0, 1.5]))
T_wo = T_wc @ T_co
That's 90% of the SE(3) code you will ever write. Keep it in a utility module and move on.
The frame-naming habit that saves hours
Name every transform by its frames: T_wc = world-from-camera = "the pose of the camera expressed in the world frame." When you compose, the inner letters must match: T_wo = T_wc @ T_co. If they don't, you've made the classic inversion bug. This convention is worth its weight in gold.
What to know before moving on
Before the next kinematics lesson, make sure you can:
- Explain in one sentence what SO(3) and SE(3) are.
- Say when to use a rotation matrix, a quaternion, and Euler angles — without looking it up.
- Compose two transforms correctly and invert one.
If yes: on to Denavit-Hartenberg (legacy but still alive) and Product of Exponentials (the modern replacement).
Comments
Sign in to post a comment.