Underactuated swing-up with energy shaping
Acrobots, cart-poles, walking robots — all have more degrees of freedom than actuators. You can't dictate every joint. Instead you exploit the dynamics. Energy shaping is the trick that does it.
A cart-pole has two degrees of freedom (cart position, pole angle) and one actuator (cart force). You can't independently command both. Welcome to underactuated control — where you have to exploit the dynamics instead of fighting them. The classical recipe: shape the system's energy until it has just enough to swing up to the unstable equilibrium, then catch it with LQR.
What "underactuated" means
If a system's mass matrix is full-rank but you have fewer actuators than degrees of freedom, you're underactuated. Examples:
- Cart-pole: cart force, but no torque on the pole.
- Acrobot: torque at one joint of a 2-DOF arm; the other joint is passive.
- Pendubot: same as acrobot but the actuated joint is the other one.
- Quadrotor: 4 propellers, 6 DOF (3 translation + 3 rotation). Can't translate sideways without first tilting.
- Walking robots: ankle has limited torque; you can't directly accelerate the CoM horizontally.
Feedback linearization (cancel the dynamics, control whatever) doesn't work — you don't have enough actuators to cancel everything. So the strategy changes.
Energy shaping: the key idea
The total mechanical energy of the system is
(kinetic + potential). For a cart-pole with the pole down, . For the pole upright, . To swing up, you need to add of energy.
You can't push the pole directly. But you can pump energy by moving the cart. The trick: at each moment, push the cart in a direction that does positive work against the swinging pole.
Practical control law:
Translated: "if the system has too little energy, accelerate the cart in the direction that adds energy to the pendulum's swing." Run this at every timestep. The pole's swing amplitude grows. Eventually it gets close to the upright.
The catch step: switch to LQR
Energy shaping pumps energy but doesn't stabilize anything — the pole reaches the top with non-zero velocity and falls past. The classical fix:
- Linearize the system around the upright equilibrium.
- Design an LQR controller for the linearized system.
- Switch from energy shaping to LQR when the state enters a small neighborhood (basin of attraction) of the upright.
The LQR catches the pole and stabilizes it inverted. Done — that's classical underactuated swing-up in one paragraph.
Why this works (the Lyapunov story)
Astrom and Furuta (1996) showed that the energy-shaping law is Lyapunov-stable: the candidate has . The energy converges to .
It doesn't converge to a specific configuration — only to the energy level. That's fine: among states with the right energy, the upright is a small subset, and LQR finishes the job.
Implementation in MuJoCo
def controller(state, time):
x, theta, x_dot, theta_dot = state
# Compute current energy
E = 0.5 * m * l**2 * theta_dot**2 - m * g * l * np.cos(theta)
E_des = m * g * l # upright
# Distance from upright (in linearized state)
delta = np.array([x, np.sin(theta), x_dot, theta_dot])
if abs(theta - np.pi) < 0.4 and abs(theta_dot) < 4.0:
# In LQR basin — stabilize
u = -K_lqr @ delta
else:
# Energy-shaping pump
u = k_swing * (E_des - E) * np.sign(theta_dot * np.cos(theta))
return np.clip(u, -u_max, u_max)
Twenty lines for a working swing-up controller. Tune so the swing builds at a reasonable rate, matches your actuator, from scipy.linalg.solve_continuous_are.
Beyond the cart-pole — what underactuation buys you
- Walking — bipedal walking is fundamentally underactuated. You can't push the floor laterally; you ride your falling momentum, catching it with the next foot. Modern controllers shape the CoM trajectory in the same energetic way.
- Quadrotor agility — fast quadrotor maneuvers exploit the coupling between attitude and translation. You can't translate without tilting first.
- Soft robots — many soft actuators are underactuated: a compliant hand grasps by exploiting the geometry, not by independently positioning every finger.
- Energy-efficient locomotion — passive-dynamic walkers (Mark Spong, Tad McGeer) walk down slopes with no actuation at all, using gravity. Adding minimal torque turns them into efficient bipeds.
The MIT 6.832 underactuated robotics course (Tedrake)
Russ Tedrake's course is the canonical text. The book / video lectures (free) cover:
- Pendulum, cart-pole, acrobot — analytical solutions and intuition.
- Trajectory optimization — when you can't analyze, optimize.
- Lyapunov analysis with sum-of-squares — computational stability proofs.
- Walking and running models.
- Modern reinforcement learning for underactuated systems.
If you're going deep into nonlinear control, this is the place.
Modern RL angle
Reinforcement learning, in 2026, dominates underactuated swing-up benchmarks. PPO with sparse reward ( for "pole upright") solves cart-pole in a few minutes on a laptop. SAC handles harder cases like the cart-pole's longer cousins.
RL doesn't give you stability proofs, doesn't give you tunable basins of attraction, and is brittle to perturbations. But it does work, and for many tasks where you don't need analytical guarantees, it's the path of least resistance.
The serious teams use both: RL discovers controllers, classical Lyapunov analysis verifies them post-hoc.
Common implementation gotchas
- Wrong basin size: switching from energy shaping to LQR too early — pole isn't yet near upright, LQR diverges. Tune the switching threshold experimentally.
- Saturation: real actuators have force limits. Energy shaping commands can saturate at the wrong moment, killing the pump. Reduce
k_swingor use clipping with anti-windup. - Friction: dissipates energy continuously. The energy you pump must exceed friction losses or the swing won't grow.
- Limit cycles: if oscillates near zero, the controller chatters. Add a deadband.
Exercise
In MuJoCo or PyBullet, simulate a cart-pole with reasonable mass/length values. Implement the swing-up controller above. Tune k_swing and the switching threshold. Then add 5% friction to the cart and re-tune. Then add a 1 N random disturbance every 0.5 seconds and re-tune. Each addition forces the controller to be more robust; the energy-shaping framework still applies.
Next
Real-time control — what changes when your control loop must hit 1 kHz with no jitter. RTOS, lock-free queues, and the kernel tricks that separate hobby control from production.
Comments
Sign in to post a comment.