RobotForge
Published·~18 min

Rigid-body transforms: SO(3) and SE(3) demystified

Rotations and poses in 3D, without trig soup. Learn SO(3), SE(3), homogeneous matrices, and quaternions — and when to use each.

by RobotForge
#kinematics#math#transforms

The 2-DOF arm lesson got away with cos/sin because the world was flat. In 3D you need real tools. Two objects, four representations, and one rule for picking the right one — that's the whole chapter.

What you're actually representing

A rigid body in 3D has two pieces of information:

  • Position — a 3-vector, no controversy
  • Orientation — a rotation, which has four common representations and one real definition

Orientation is the hard part. Let's fix the vocabulary first.

SO(3): the set of all 3D rotations

SO(3) is the mathematician's name for "the set of all rigid rotations of 3D space." An SO(3) element is a 3×3 matrix with two properties:

  • R · RT = I (orthogonal)
  • det(R) = +1 (proper, not a mirror flip)

Any such matrix rotates a vector: v' = R · v. Compose rotations by multiplying matrices: R_total = R2 · R1 means "do R1, then R2." Order matters — SO(3) is non-commutative.

SE(3): position + orientation together

SE(3) is the set of rigid-body transforms — rotation and translation. The standard representation is a 4×4 homogeneous transformation matrix:

T = | R  t |
    | 0  1 |

where R is 3×3 and t is 3×1

Apply a transform to a point p by promoting to 4D with a 1 on the end:

[p_new; 1] = T · [p; 1]

And compose transforms by matrix multiplication: T_total = T2 · T1.

Why the 4×4 rigmarole? Because now composition, inversion, and application all use the same operator (@ in NumPy). No special cases. This is the form used everywhere in robotics code.

Four ways to write a rotation

1. Rotation matrix — 9 numbers

What SO(3) literally is. Easy to apply (v' = R @ v), easy to compose (matrix multiply), but wasteful in storage and prone to numerical drift. Renormalize after 1000 multiplications.

2. Euler angles — 3 numbers (roll, pitch, yaw)

The one that feels intuitive. Use for input/output only. They have 24 conventions (ZYX, XYZ, etc.), and gimbal lock at the poles will embarrass you on demo day. Never interpolate Euler angles directly.

3. Axis-angle — 4 numbers

A rotation is "by angle θ around unit axis k." The Rodrigues formula gives the rotation matrix:

R = I + sin(θ)·K + (1 - cos(θ))·K²  where K is the skew-symmetric matrix of k

Good for thinking. Rare in code.

4. Quaternion — 4 numbers (x, y, z, w)

The representation most robotics code uses internally. Four numbers, unit-norm constraint, no gimbal lock, smooth interpolation (slerp). Composition is quaternion multiplication, not particularly intuitive but fast. ROS's geometry_msgs/Quaternion stores exactly this.

When to use which

Use case Best representation
Applying a rotation to a pointRotation matrix
Storing & transmittingQuaternion
Interpolating between two rotationsQuaternion (slerp)
Reading or writing a config fileEuler angles (humans understand them)
Analytical work, proofsAxis-angle

A clean NumPy toolkit

Don't write quaternion or rotation math from scratch. Use scipy.spatial.transform.Rotation:

import numpy as np
from scipy.spatial.transform import Rotation as R

# Rotation by 90° around z
r = R.from_euler('z', 90, degrees=True)
print(r.as_matrix())      # 3×3 rotation matrix
print(r.as_quat())        # (x, y, z, w)
print(r.as_rotvec())      # axis × angle

# Apply to a vector
v = np.array([1, 0, 0])
print(r.apply(v))         # [0, 1, 0]

# Compose: first r1, then r2
r1 = R.from_euler('x', 45, degrees=True)
r2 = R.from_euler('y', 45, degrees=True)
r_total = r2 * r1         # NOTE: reversed order vs matrices

# Interpolate 50/50 between two orientations
from scipy.spatial.transform import Slerp
times = [0, 1]
rots = R.from_quat([r1.as_quat(), r2.as_quat()])
slerp = Slerp(times, rots)
print(slerp(0.5).as_quat())

Homogeneous transforms in practice

def make_T(R_mat, t):
    T = np.eye(4)
    T[:3, :3] = R_mat
    T[:3, 3] = t
    return T

# World → camera frame
T_wc = make_T(R.from_euler('z', 90, degrees=True).as_matrix(),
              np.array([1.0, 2.0, 0.5]))

# Inverse: camera → world
def invert_T(T):
    R_mat = T[:3, :3]
    t = T[:3, 3]
    T_inv = np.eye(4)
    T_inv[:3, :3] = R_mat.T
    T_inv[:3, 3] = -R_mat.T @ t
    return T_inv

# Chain: world → camera → object
T_co = make_T(np.eye(3), np.array([0, 0, 1.5]))
T_wo = T_wc @ T_co

That's 90% of the SE(3) code you will ever write. Keep it in a utility module and move on.

The frame-naming habit that saves hours

Name every transform by its frames: T_wc = world-from-camera = "the pose of the camera expressed in the world frame." When you compose, the inner letters must match: T_wo = T_wc @ T_co. If they don't, you've made the classic inversion bug. This convention is worth its weight in gold.

What to know before moving on

Before the next kinematics lesson, make sure you can:

  • Explain in one sentence what SO(3) and SE(3) are.
  • Say when to use a rotation matrix, a quaternion, and Euler angles — without looking it up.
  • Compose two transforms correctly and invert one.

If yes: on to Denavit-Hartenberg (legacy but still alive) and Product of Exponentials (the modern replacement).

Comments

    Sign in to post a comment.