RobotForge
Published·~12 min

Jetson Orin for on-board AI

The embedded-AI platform that runs most robotics inference today — with honest latency numbers. What each variant does, the SDK landscape, and the gotchas that bite first-time users.

by RobotForge
#embedded#jetson#edge-ai

Jetson is NVIDIA's edge-AI line. For any robot running deep models on-board — perception, VLAs, motion planning — Jetson is overwhelmingly the platform of choice in 2026. Cost vs Raspberry Pi 5 is 3–10× higher; capability for AI workloads is 10–100× higher. Here's the lineup, the SDK story, and the practical reality of deploying.

The 2026 Jetson Orin lineup

Model AI TOPS RAM Power Price (DevKit)
Orin Nano 4GB204 GB5–10 W$249
Orin Nano 8GB408 GB7–15 W$499
Orin NX 8GB708 GB10–20 W$700
Orin NX 16GB10016 GB10–25 W$900
AGX Orin 64GB27564 GB15–60 W$2000

Pricing roughly the cost of an RTX 3060 desktop GPU at the AGX Orin level — but in 8 W and small enough to mount on a robot.

Picking the right Jetson

  • Orin Nano 8GB: hobby robotics default. Runs YOLO + SLAM + light VLA inference at usable rates.
  • Orin NX 16GB: serious mobile manipulator. Runs OpenVLA fine-tuned at ~5 fps; multiple perception models simultaneously.
  • AGX Orin: autonomous-driving research, multi-camera fusion, large VLAs, full SLAM stacks.

For RobotForge-grade hobbyist work, Orin Nano 8GB is the right starting point. Upgrade only when you hit specific limits.

JetPack: the SDK

JetPack is NVIDIA's bundled SDK: Ubuntu OS + CUDA + cuDNN + TensorRT + ROS support. Flash it onto an SD card or NVMe; boot up; everything works.

Key components:

  • CUDA: GPU programming. Most ML libraries use it transparently.
  • TensorRT: inference optimization. 3–10× speedup vs raw PyTorch on the same hardware.
  • DeepStream: video analytics framework. For multi-camera pipelines.
  • Isaac ROS: NVIDIA's ROS 2 packages with GPU acceleration. SLAM, perception, VPI image processing.

Honest latency numbers

Practical inference latencies on Orin Nano 8GB (TensorRT, INT8 where possible):

Model Input Latency
YOLOv8n640×640~10 ms
YOLOv8m640×640~50 ms
SAM-ViT-B (encoder)1024×1024~500 ms
Depth Anything V2 small518×518~80 ms
SmolVLA (450M)2 cameras + state~100 ms
OpenVLA (7B)2 cameras + state~600 ms (action chunked)

For real-time control loops, run lightweight models (YOLO-n, SmolVLA). For action chunking (predict 16 actions, execute openly), the bigger models are usable.

Power modes and thermal throttling

Jetson's TDP is configurable via nvpmodel:

  • 5W mode: low-power; significantly slower.
  • 15W mode: standard for Orin Nano.
  • MAX-N: full performance; needs active cooling.

If the GPU hits thermal limits (TJ ~ 90°C), it throttles. Active cooling (fan + heatsink) is mandatory for sustained workloads.

For mobile robots: factor cooling into the chassis. Sealed enclosures cook the Jetson.

The deployment workflow

  1. Train on a desktop GPU: PyTorch / JAX with full toolkit.
  2. Export to ONNX: standard intermediate format.
  3. Optimize with TensorRT on the Jetson: profile, quantize to INT8 if accuracy allows.
  4. Wrap in a ROS 2 node (or Isaac ROS pipeline): inference latency becomes a topic-publish rate.
  5. Profile: nvidia-smi, tegrastats, NVIDIA Nsight Systems.

Common gotchas

  • Driver version mismatch: the Jetson runs L4T (NVIDIA's Linux for Tegra). Don't apt upgrade blindly; you can break CUDA. Pin packages or use NVIDIA's update system.
  • Insufficient power: Orin Nano needs 5V at 4–5 A peak. USB-C from a phone charger is borderline; use the barrel jack with a beefy supply.
  • Storage I/O bottleneck: SD card is slow; use NVMe for any serious workload.
  • Model fits but is slow: many 7B-parameter VLMs barely fit in 8 GB; might run at 1 fps. Use INT8 or smaller models for real-time work.
  • Cross-compile traps: building heavy C++ on the Jetson is slow. Cross-compile from a desktop or use Docker buildx with arm64 emulation.

The Raspberry Pi alternative

For non-AI tasks (motor control, sensors, light vision):

  • Raspberry Pi 5: $80, 5 W, 1–2 TOPS at FP32. Great for ROS nodes, Nav2, sensor fusion.
  • Hailo-8 / -8L on Pi: PCIe accelerator adds 13–26 TOPS for ~$80. Together: 1–2 fps OpenVLA.

For pure cost-efficiency on light AI: Pi 5 + Hailo. For peak AI performance per dollar at scale: Jetson Orin Nano. For top-end: AGX Orin.

Companion architecture

Most production robots use a two-brain architecture:

  • Jetson: perception, planning, high-level control.
  • STM32 / ESP32: real-time motor control, sensor sampling, safety.

The MCU runs at predictable timing; the Jetson delivers high-level commands. They communicate via UART, CAN, or USB.

Exercise

Get a Jetson Orin Nano. Flash JetPack. Run a YOLOv8n inference at full FPS on a USB camera. Then export to TensorRT INT8; observe the speedup. The before/after demonstrates why Jetson exists: same model, 3× faster after optimization, all on a 15W board.

Next

TinyML — the other end of the AI spectrum, where neural nets fit into kilobyte-class microcontrollers.

Comments

    Sign in to post a comment.