RobotForge
Published·~16 min

MuJoCo and MJX: physics for learning

Open-source since 2021, batched on GPU since 2022 — MuJoCo is now the RL default. Here's the XML format, the Python API, and how MJX makes it 10,000× faster.

by RobotForge
#simulators#mujoco#mjx#rl

MuJoCo used to be the $3000/year research sim. Since 2021 it's Apache-2 open source, maintained by DeepMind. Since 2022 its JAX port (MJX) runs batched on GPU at thousands of parallel environments. For RL on robots in 2026, this is the default. Here's everything you need to start using it tomorrow.

Why MuJoCo specifically

  • Contacts that don't explode. MuJoCo's soft-constraint formulation makes multi-contact simulation (quadruped feet, manipulators grasping) stable where PyBullet and Gazebo often drift or penetrate.
  • Accurate enough for sim-to-real. DeepMind and many commercial labs deploy policies trained in MuJoCo onto real hardware with mild randomization.
  • Fast. CPU MuJoCo is already 5–10× faster than Gazebo for the same scene. MJX is another 100–1000× faster on a GPU.
  • Minimal install. pip install mujoco and you're done.

The MJCF file: describing a robot to MuJoCo

MuJoCo uses MJCF, its own XML format. Example: a 2-link planar arm.

<mujoco>
  <option timestep="0.01" integrator="RK4"/>
  <worldbody>
    <body name="link1" pos="0 0 1">
      <joint name="shoulder" type="hinge" axis="0 0 1"/>
      <geom type="capsule" size="0.05" fromto="0 0 0  0.5 0 0" rgba=".2 .7 1 1"/>
      <body name="link2" pos="0.5 0 0">
        <joint name="elbow" type="hinge" axis="0 0 1"/>
        <geom type="capsule" size="0.04" fromto="0 0 0  0.4 0 0" rgba=".2 1 .5 1"/>
      </body>
    </body>
  </worldbody>
  <actuator>
    <motor joint="shoulder" ctrlrange="-1 1"/>
    <motor joint="elbow" ctrlrange="-1 1"/>
  </actuator>
</mujoco>

Three sections: option (sim settings), worldbody (the physics objects in a tree of bodies, joints, and geoms), and actuator (motors that can command joints). That's the skeleton. Real robots add sensor, tendon, and contact blocks.

MJCF is more expressive than URDF. It supports proper actuator models, compliance, and contact parameters URDF can't express. Converting URDF → MJCF is common; the reverse is lossy.

The Python API

import mujoco
import numpy as np

model = mujoco.MjModel.from_xml_path('arm.xml')
data = mujoco.MjData(model)

# Step the simulation
for _ in range(1000):
    data.ctrl[0] = 0.5       # torque on shoulder
    data.ctrl[1] = -0.3      # torque on elbow
    mujoco.mj_step(model, data)

# Read state
print(data.qpos)             # joint positions
print(data.qvel)             # joint velocities
print(data.site_xpos)        # world positions of named sites

That's the entire core loop of a MuJoCo simulation. Everything else (visualization, sensors, randomization) is on top of this.

Visualizing the sim

import mujoco.viewer

with mujoco.viewer.launch_passive(model, data) as viewer:
    while viewer.is_running():
        mujoco.mj_step(model, data)
        viewer.sync()

A window pops up, you drag to orbit, scroll to zoom. The viewer runs in a separate thread so your control code isn't blocked.

MJX: the batched GPU version

CPU MuJoCo runs one simulation at a time. MJX runs thousands in parallel by implementing the same physics in JAX — a pure-functional array library that JIT-compiles to GPU (or TPU, or CPU vectorized).

import mujoco, mujoco.mjx as mjx
import jax, jax.numpy as jnp

model = mujoco.MjModel.from_xml_path('arm.xml')
mjx_model = mjx.put_model(model)

# 1024 parallel environments
BATCH = 1024
mjx_data = jax.vmap(lambda k: mjx.make_data(mjx_model))(jnp.arange(BATCH))

@jax.jit
def step(mjx_data, ctrl):
    mjx_data = mjx_data.replace(ctrl=ctrl)
    return mjx.step(mjx_model, mjx_data)

ctrl = jnp.zeros((BATCH, model.nu))
for _ in range(1000):
    mjx_data = jax.vmap(step, in_axes=(0, 0))(mjx_data, ctrl)

1024 simulations, one GPU, one JIT-compiled step function. On an RTX 4090 this runs about 100 million physics steps per second. That's the reason RL on MuJoCo became practical for individuals in 2023+.

Training an RL policy

You don't train MJX from scratch. You use Brax or mujoco_playground — both wrap MJX into Gym-style environments and ship PPO/SAC implementations out of the box.

mujoco_playground is the DeepMind-maintained collection and currently the best starting point:

pip install mujoco-playground brax

from mujoco_playground import registry, wrapper
env = registry.load('CartpoleBalance')
env = wrapper.BraxTraining(env)

# Now it's a standard Brax env, train with PPO as usual

Ten lines, thousands of parallel cartpoles, 30 minutes to a working PPO policy on a mid-range GPU. This is the stack that changed what hobbyists can do in 2023–26.

The gotchas

  • MJX doesn't support everything. Tendons, soft-body contact, and some sensor types are CPU-only. Check the compatibility matrix before committing.
  • JIT compilation is slow the first time. Expect 30–90 seconds before your first step runs. Cache it; subsequent runs are instant.
  • JAX is functional. No mutating state in place. If you're coming from PyTorch, budget a week for the mental adjustment.
  • URDF import is lossy. Tools like mujoco.from_urdf get you 80% there; review the output MJCF before training a policy on it.

When MuJoCo is not the right sim

  • You need photorealistic camera renders → use Isaac Sim or MuJoCo with the Madrona renderer.
  • You need a ROS-native workflow → use Gazebo.
  • You need soft bodies, fluids, deformables → MuJoCo is limited here; look at Drake or specialty sims.

A weekend project

Clone mujoco_playground. Pick the humanoid walking env. Train PPO for an hour. Watch the learning curve. Then open the MJCF, change link masses by 20%, retrain. Did the policy generalize? That one experiment teaches you more about sim-to-real than a semester of papers.

Comments

    Sign in to post a comment.