Debugging ROS 2: rqt, ros2 bag, rviz
The diagnostic toolkit that separates amateur ROS users from people who ship robots. Topic introspection, recording, replay, the rqt suite, RViz — the half-dozen tools that solve 80% of bugs.
A working ROS 2 system has 30+ nodes exchanging messages on dozens of topics at different rates. When something breaks, the question isn't "is my code wrong?" — it's "where in this graph did the data stop flowing?" The diagnostic toolkit is what turns that question into an answer in 5 minutes.
The five essentials
ros2 topic / node / service / action— the CLI introspection commands.ros2 bag— record and replay traffic.- RViz — visualize anything spatial (robot, frames, paths, point clouds).
- rqt — graphical introspection: graph, plot, console, parameters.
- Logging — get the levels right;
journalctlfor systemd-managed nodes.
1. ros2 topic — your first stop
# What topics exist?
ros2 topic list
# What's the message rate / bandwidth on a topic?
ros2 topic hz /scan
ros2 topic bw /camera/image_raw
# What does the data look like?
ros2 topic echo /odom
# What are the publishers and subscribers?
ros2 topic info /cmd_vel --verbose
# Send a message manually
ros2 topic pub --once /cmd_vel geometry_msgs/Twist \
'{linear: {x: 0.2}, angular: {z: 0.0}}'
If ros2 topic hz returns "no new messages", your publisher is broken or QoS-mismatched. If ros2 topic info shows a publisher and a subscriber but no data flows, it's almost always QoS — incompatible reliability or durability profiles. Check both sides.
2. ros2 node — process introspection
ros2 node list
ros2 node info /controller_server
Tells you what topics, services, actions, and parameters a node has. If a node doesn't show up, it crashed or wasn't started. Check ros2 launch output or the node's log.
3. ros2 bag — the time machine
Records every published message for later replay. The first thing to do when "the robot did something weird" is recreate the conditions:
ros2 bag record -a -o bug_session # record everything
# ... reproduce the bug ...
# Ctrl-C to stop
ros2 bag info bug_session
ros2 bag play bug_session # replay at original rate
ros2 bag play bug_session --rate 0.1 # 10× slow motion
Bags are full-system records. Replay them while running a debugger or RViz to find the exact moment things went wrong. Pair with --clock for time-synchronized replay.
For long sessions, record only the topics you need: ros2 bag record /tf /odom /cmd_vel. Camera + lidar at full resolution will fill a hard drive in minutes.
4. rqt — the graphical toolbox
rqt launches a customizable GUI with dozens of plugins. The ones I use weekly:
- rqt_graph: shows the node-topic graph. Did topic A actually connect to topic B's subscriber?
- rqt_plot: time-series plot of any numeric topic field. Why is the velocity oscillating?
- rqt_console: live log viewer with filters. Find the warning right before the crash.
- rqt_reconfigure: edit parameters at runtime. What if I cut Kp in half?
- rqt_tf_tree: visualize the TF tree. Is the camera frame really where I think?
ros2 run rqt_graph rqt_graph for the standalone version. Or just rqt and load whatever you want.
5. RViz — the spatial debugger
For anything geometric — pose, path, point cloud, mesh, marker — RViz is the answer. Out of the box you can show:
- The robot model, animated by the joint state.
- The TF tree, with arrows showing each frame.
- Point clouds, laser scans, depth images.
- Paths, goals, costmaps (Nav2).
- Custom markers from any node.
The single most useful pattern: drop a visualization_msgs/Marker publisher into your debugger code and stream "what does my algorithm think?" into RViz. A 30-line marker publisher beats hours of print debugging.
Logging discipline
ROS 2 has the standard levels: DEBUG, INFO, WARN, ERROR, FATAL. The default node logs at INFO. Don't be the engineer who only uses INFO.
self.get_logger().debug("entering callback") # for tracing
self.get_logger().info("Publisher created") # major lifecycle events
self.get_logger().warn("Sensor reading is stale")
self.get_logger().error("Failed to compute IK")
self.get_logger().fatal("Hardware lost") # before exit
Set the level at runtime:
ros2 run my_pkg my_node --ros-args --log-level my_node:=debug
Tail logs from a systemd service:
journalctl -u robotforge.service -f -o short-iso
The "why is nothing happening?" decision tree
ros2 node list— is the publisher node running?ros2 topic list— does the topic exist?ros2 topic hz— is it actually publishing?ros2 topic info --verbose— does the subscriber's QoS match the publisher's?ros2 topic echo— does the data look reasonable?- RViz: visualize whatever is supposed to happen — is anything showing up?
journalctl -u my_node: any warnings or errors?
This sequence solves 80% of "no data flowing" bugs in under 5 minutes.
The QoS gotcha (the #1 ROS 2 footgun)
ROS 1 just had topics. ROS 2 has topics with Quality of Service profiles. A publisher with best-effort, volatile QoS won't deliver to a subscriber with reliable, transient-local. They show up as connected but no messages arrive.
Diagnostic:
ros2 topic info /tf --verbose
Returns the QoS profiles of every publisher and subscriber. If you see "RELIABILITY: BEST_EFFORT" for the publisher and "RELIABILITY: RELIABLE" for the subscriber, that's your bug.
Fix: pick a compatible profile (use the rclpy.qos.qos_profile_sensor_data, qos_profile_default, etc.) or override on the subscriber side.
The rosbag → repro pipeline
For irreproducible bugs (happens once a day, hard to catch), record everything continuously, then replay around the failure:
# Continuously record, rotating files every 100 MB
ros2 bag record -a --max-bag-size 104857600 --max-bag-duration 60 -o continuous
# When the bug hits, copy the latest ~5 bags
# Replay them with the buggy node attached to a debugger
ros2 bag play recent_bag --clock
gdb --args ros2 run my_pkg suspect_node
This pattern catches Heisenbugs that no amount of logging would find.
Tools beyond the standard kit
- Foxglove Studio: a polished alternative to RViz. Better timeline scrubbing, better plot composition, panel save/restore.
- PlotJuggler: time-series visualization on steroids. Drag-and-drop topic fields, derivative/integral overlays, math operations.
- ros2 doctor: built-in health check for your ROS 2 install.
- rosbridge: WebSocket bridge so you can poke topics from a browser. Useful for ad-hoc dashboards.
Habits that prevent half of the bugs
- Log the exact parameter values you read on startup. Confirms the YAML loaded correctly.
- Diagnose with the GUI tools open. rqt_graph + rqt_plot + RViz tabs simultaneously catch issues you'd miss looking at one alone.
- Record a bag of every test. Cheap insurance.
- Standardize launch files: a
debug.launch.pythat starts the system + rosbag + rviz with the right config.
Exercise
Take any working ROS 2 system. Break it deliberately: change a topic name in one node, mismatch QoS in another, set a wrong parameter. For each, find the bug using only the diagnostic tools above (no looking at the source). Time yourself. Five minutes per bug is the working bar.
That's the ROS 2 track done
You've covered the full ROS 2 stack: install, mental model, publishers/subscribers, services/actions, launch, parameters, TF, URDF, ros2_control, Nav2, MoveIt, debugging. With this, you can read most ROS 2 codebases, set up a new robot from scratch, and debug your own systems efficiently. Move on to a domain track (Kinematics, Manipulation, Mobile, Learning) for what to do with ROS 2.
Comments
Sign in to post a comment.