Published2026-04-28·~12 min

Jetson Orin for on-board AI

The embedded-AI platform that runs most robotics inference today — with honest latency numbers. What each variant does, the SDK landscape, and the gotchas that bite first-time users.

by RobotForge

#embedded#jetson#edge-ai

Jetson is NVIDIA's edge-AI line. For any robot running deep models on-board — perception, VLAs, motion planning — Jetson is overwhelmingly the platform of choice in 2026. Cost vs Raspberry Pi 5 is 3–10× higher; capability for AI workloads is 10–100× higher. Here's the lineup, the SDK story, and the practical reality of deploying.

The 2026 Jetson Orin lineup

Model	AI TOPS	RAM	Power	Price (DevKit)
Orin Nano 4GB	20	4 GB	5–10 W	$249
Orin Nano 8GB	40	8 GB	7–15 W	$499
Orin NX 8GB	70	8 GB	10–20 W	$700
Orin NX 16GB	100	16 GB	10–25 W	$900
AGX Orin 64GB	275	64 GB	15–60 W	$2000

Pricing roughly the cost of an RTX 3060 desktop GPU at the AGX Orin level — but in 8 W and small enough to mount on a robot.

Picking the right Jetson

Orin Nano 8GB: hobby robotics default. Runs YOLO + SLAM + light VLA inference at usable rates.
Orin NX 16GB: serious mobile manipulator. Runs OpenVLA fine-tuned at ~5 fps; multiple perception models simultaneously.
AGX Orin: autonomous-driving research, multi-camera fusion, large VLAs, full SLAM stacks.

For RobotForge-grade hobbyist work, Orin Nano 8GB is the right starting point. Upgrade only when you hit specific limits.

JetPack: the SDK

JetPack is NVIDIA's bundled SDK: Ubuntu OS + CUDA + cuDNN + TensorRT + ROS support. Flash it onto an SD card or NVMe; boot up; everything works.

Key components:

CUDA: GPU programming. Most ML libraries use it transparently.
TensorRT: inference optimization. 3–10× speedup vs raw PyTorch on the same hardware.
DeepStream: video analytics framework. For multi-camera pipelines.
Isaac ROS: NVIDIA's ROS 2 packages with GPU acceleration. SLAM, perception, VPI image processing.

Honest latency numbers

Practical inference latencies on Orin Nano 8GB (TensorRT, INT8 where possible):

Model	Input	Latency
YOLOv8n	640×640	~10 ms
YOLOv8m	640×640	~50 ms
SAM-ViT-B (encoder)	1024×1024	~500 ms
Depth Anything V2 small	518×518	~80 ms
SmolVLA (450M)	2 cameras + state	~100 ms
OpenVLA (7B)	2 cameras + state	~600 ms (action chunked)

For real-time control loops, run lightweight models (YOLO-n, SmolVLA). For action chunking (predict 16 actions, execute openly), the bigger models are usable.

Power modes and thermal throttling

Jetson's TDP is configurable via nvpmodel:

5W mode: low-power; significantly slower.
15W mode: standard for Orin Nano.
MAX-N: full performance; needs active cooling.

If the GPU hits thermal limits (TJ ~ 90°C), it throttles. Active cooling (fan + heatsink) is mandatory for sustained workloads.

For mobile robots: factor cooling into the chassis. Sealed enclosures cook the Jetson.

The deployment workflow

Train on a desktop GPU: PyTorch / JAX with full toolkit.
Export to ONNX: standard intermediate format.
Optimize with TensorRT on the Jetson: profile, quantize to INT8 if accuracy allows.
Wrap in a ROS 2 node (or Isaac ROS pipeline): inference latency becomes a topic-publish rate.
Profile: nvidia-smi, tegrastats, NVIDIA Nsight Systems.

Common gotchas

Driver version mismatch: the Jetson runs L4T (NVIDIA's Linux for Tegra). Don't apt upgrade blindly; you can break CUDA. Pin packages or use NVIDIA's update system.
Insufficient power: Orin Nano needs 5V at 4–5 A peak. USB-C from a phone charger is borderline; use the barrel jack with a beefy supply.
Storage I/O bottleneck: SD card is slow; use NVMe for any serious workload.
Model fits but is slow: many 7B-parameter VLMs barely fit in 8 GB; might run at 1 fps. Use INT8 or smaller models for real-time work.
Cross-compile traps: building heavy C++ on the Jetson is slow. Cross-compile from a desktop or use Docker buildx with arm64 emulation.

The Raspberry Pi alternative

For non-AI tasks (motor control, sensors, light vision):

Raspberry Pi 5: $80, 5 W, 1–2 TOPS at FP32. Great for ROS nodes, Nav2, sensor fusion.
Hailo-8 / -8L on Pi: PCIe accelerator adds 13–26 TOPS for ~$80. Together: 1–2 fps OpenVLA.

For pure cost-efficiency on light AI: Pi 5 + Hailo. For peak AI performance per dollar at scale: Jetson Orin Nano. For top-end: AGX Orin.

Companion architecture

Most production robots use a two-brain architecture:

Jetson: perception, planning, high-level control.
STM32 / ESP32: real-time motor control, sensor sampling, safety.

The MCU runs at predictable timing; the Jetson delivers high-level commands. They communicate via UART, CAN, or USB.

Exercise

Get a Jetson Orin Nano. Flash JetPack. Run a YOLOv8n inference at full FPS on a USB camera. Then export to TensorRT INT8; observe the speedup. The before/after demonstrates why Jetson exists: same model, 3× faster after optimization, all on a 15W board.

TinyML — the other end of the AI spectrum, where neural nets fit into kilobyte-class microcontrollers.

← Previous

Wiring a robot without the magic smoke

TinyML: neural nets on microcontrollers