Jetson Orin for on-board AI
The embedded-AI platform that runs most robotics inference today — with honest latency numbers. What each variant does, the SDK landscape, and the gotchas that bite first-time users.
Jetson is NVIDIA's edge-AI line. For any robot running deep models on-board — perception, VLAs, motion planning — Jetson is overwhelmingly the platform of choice in 2026. Cost vs Raspberry Pi 5 is 3–10× higher; capability for AI workloads is 10–100× higher. Here's the lineup, the SDK story, and the practical reality of deploying.
The 2026 Jetson Orin lineup
| Model | AI TOPS | RAM | Power | Price (DevKit) |
|---|---|---|---|---|
| Orin Nano 4GB | 20 | 4 GB | 5–10 W | $249 |
| Orin Nano 8GB | 40 | 8 GB | 7–15 W | $499 |
| Orin NX 8GB | 70 | 8 GB | 10–20 W | $700 |
| Orin NX 16GB | 100 | 16 GB | 10–25 W | $900 |
| AGX Orin 64GB | 275 | 64 GB | 15–60 W | $2000 |
Pricing roughly the cost of an RTX 3060 desktop GPU at the AGX Orin level — but in 8 W and small enough to mount on a robot.
Picking the right Jetson
- Orin Nano 8GB: hobby robotics default. Runs YOLO + SLAM + light VLA inference at usable rates.
- Orin NX 16GB: serious mobile manipulator. Runs OpenVLA fine-tuned at ~5 fps; multiple perception models simultaneously.
- AGX Orin: autonomous-driving research, multi-camera fusion, large VLAs, full SLAM stacks.
For RobotForge-grade hobbyist work, Orin Nano 8GB is the right starting point. Upgrade only when you hit specific limits.
JetPack: the SDK
JetPack is NVIDIA's bundled SDK: Ubuntu OS + CUDA + cuDNN + TensorRT + ROS support. Flash it onto an SD card or NVMe; boot up; everything works.
Key components:
- CUDA: GPU programming. Most ML libraries use it transparently.
- TensorRT: inference optimization. 3–10× speedup vs raw PyTorch on the same hardware.
- DeepStream: video analytics framework. For multi-camera pipelines.
- Isaac ROS: NVIDIA's ROS 2 packages with GPU acceleration. SLAM, perception, VPI image processing.
Honest latency numbers
Practical inference latencies on Orin Nano 8GB (TensorRT, INT8 where possible):
| Model | Input | Latency |
|---|---|---|
| YOLOv8n | 640×640 | ~10 ms |
| YOLOv8m | 640×640 | ~50 ms |
| SAM-ViT-B (encoder) | 1024×1024 | ~500 ms |
| Depth Anything V2 small | 518×518 | ~80 ms |
| SmolVLA (450M) | 2 cameras + state | ~100 ms |
| OpenVLA (7B) | 2 cameras + state | ~600 ms (action chunked) |
For real-time control loops, run lightweight models (YOLO-n, SmolVLA). For action chunking (predict 16 actions, execute openly), the bigger models are usable.
Power modes and thermal throttling
Jetson's TDP is configurable via nvpmodel:
- 5W mode: low-power; significantly slower.
- 15W mode: standard for Orin Nano.
- MAX-N: full performance; needs active cooling.
If the GPU hits thermal limits (TJ ~ 90°C), it throttles. Active cooling (fan + heatsink) is mandatory for sustained workloads.
For mobile robots: factor cooling into the chassis. Sealed enclosures cook the Jetson.
The deployment workflow
- Train on a desktop GPU: PyTorch / JAX with full toolkit.
- Export to ONNX: standard intermediate format.
- Optimize with TensorRT on the Jetson: profile, quantize to INT8 if accuracy allows.
- Wrap in a ROS 2 node (or Isaac ROS pipeline): inference latency becomes a topic-publish rate.
- Profile:
nvidia-smi,tegrastats, NVIDIA Nsight Systems.
Common gotchas
- Driver version mismatch: the Jetson runs L4T (NVIDIA's Linux for Tegra). Don't
apt upgradeblindly; you can break CUDA. Pin packages or use NVIDIA's update system. - Insufficient power: Orin Nano needs 5V at 4–5 A peak. USB-C from a phone charger is borderline; use the barrel jack with a beefy supply.
- Storage I/O bottleneck: SD card is slow; use NVMe for any serious workload.
- Model fits but is slow: many 7B-parameter VLMs barely fit in 8 GB; might run at 1 fps. Use INT8 or smaller models for real-time work.
- Cross-compile traps: building heavy C++ on the Jetson is slow. Cross-compile from a desktop or use Docker buildx with arm64 emulation.
The Raspberry Pi alternative
For non-AI tasks (motor control, sensors, light vision):
- Raspberry Pi 5: $80, 5 W, 1–2 TOPS at FP32. Great for ROS nodes, Nav2, sensor fusion.
- Hailo-8 / -8L on Pi: PCIe accelerator adds 13–26 TOPS for ~$80. Together: 1–2 fps OpenVLA.
For pure cost-efficiency on light AI: Pi 5 + Hailo. For peak AI performance per dollar at scale: Jetson Orin Nano. For top-end: AGX Orin.
Companion architecture
Most production robots use a two-brain architecture:
- Jetson: perception, planning, high-level control.
- STM32 / ESP32: real-time motor control, sensor sampling, safety.
The MCU runs at predictable timing; the Jetson delivers high-level commands. They communicate via UART, CAN, or USB.
Exercise
Get a Jetson Orin Nano. Flash JetPack. Run a YOLOv8n inference at full FPS on a USB camera. Then export to TensorRT INT8; observe the speedup. The before/after demonstrates why Jetson exists: same model, 3× faster after optimization, all on a 15W board.
Next
TinyML — the other end of the AI spectrum, where neural nets fit into kilobyte-class microcontrollers.
Comments
Sign in to post a comment.