Published2026-04-26·~12 min

Social navigation: robots around people

The frontier where metric planning meets human behavior modeling. Personal space, intent inference, and the hard task of being a 'polite' robot in a crowded space.

by RobotForge

#mobile-robots#social#human-robot-interaction

A robot navigating an empty corridor is solved. A robot navigating a corridor with a person walking the other way is open research. The difference is social: humans expect robots to be predictable, polite, and aware of body language. Pure metric planning ignores all this. Social navigation is the field where mobility meets human-robot interaction.

What "social" means

Personal space: humans have a buffer zone (~50 cm at rest, larger when moving). Robots that violate it feel threatening.
Predictability: humans pre-plan around others. Erratic robot motion forces them to react reactively, which feels unsafe.
Yielding: humans expect robots to yield in narrow corridors, hold doors, give way at crossings.
Communication: humans signal intent with body posture, gaze, hesitation. Robots need to do the same.
Compliance with norms: walk on the right (or left) side, queue properly, don't sneak up from behind.

None of these are explicitly captured by classical motion planning.

The classical approaches (limited)

Social Force Model (Helbing 1995)

Treat humans as pedestrians with attractive forces toward goals + repulsive forces from obstacles + social repulsion from each other. The robot is just another agent. Simple; works for crowd simulation; doesn't capture intent.

Pedestrian-aware costmaps

Add a cost layer in Nav2 that inflates pedestrian positions with directional costs (cost in front of moving people is higher, behind is lower). Easy to implement; works for low-density.

Reciprocal Velocity Obstacles (ORCA)

Each agent picks a velocity that's safe assuming all other agents do the same. Mathematically clean; popular for crowd simulators; humans don't actually behave this way.

The modern (learning-based) approaches

Inverse Reinforcement Learning (IRL)

Observe humans navigating; infer the reward function they're optimizing. Train a policy that maximizes the same reward.

Strengths: captures human-defined "social cost" without hand-engineering. Weaknesses: data-hungry; reward inference is ambiguous.

Deep RL with social constraints

Train an end-to-end policy from pedestrian observations to robot actions. Reward shaping includes "don't get close to people," "don't suddenly change direction near them," "yield in narrow corridors." Trained in simulation with crowds, deployed on hardware.

Examples: Sacson (Salesforce), CrowdBot. State-of-the-art in 2026.

Trajectory prediction + planning

Predict each pedestrian's future trajectory (using methods from the AV-stack lesson). Plan the robot's trajectory around the predicted ones. Treat humans as moving obstacles with known intentions.

Strengths: principled; modular. Weaknesses: prediction failures cascade.

Communicating intent

A robot that's about to turn left should signal it. Modalities:

Direction-indicating LEDs: most cobots and delivery robots have these.
Slowing before turning: humans interpret hesitation as intent communication.
Gaze direction (humanoid only): turn the head before the body. Strong signal.
Speech / sound: "excuse me" beeps. Common in airports, hospitals.
Body language (humanoid): lean forward when stopping vs side-step.

Most production social robots in 2026 use 1–3 of these. The 2024+ research direction is generating intent communication automatically from the planned trajectory.

The metrics question

How do you evaluate "social navigation"? Standard metrics:

Success rate: did the robot reach the goal? Necessary but not sufficient.
Time to goal: faster usually better.
Minimum distance to humans: too close → impolite.
Frequency of close calls: events where humans had to yield to the robot.
Subjective ratings: humans rate the robot's behavior. Slow + expensive but the only honest measure.

SocNav benchmark (2023+) standardizes simulation environments + evaluation. Used in research; not yet a production standard.

What humans actually expect

Field studies (Mavrogiannis et al. 2023+) found:

Robots should yield first in conflicts.
Robot speed should match crowd speed (don't blast through walking pedestrians).
Stopping is acceptable; sudden direction changes aren't.
Eye contact (or "perceived attention") matters even from non-humanoid robots — proxy via cameras / lidar visibly aiming.
Failure recovery should be visible: a frozen robot is more comforting than one that suddenly resumes.

None of these come from metric path-planning math. They're learned from pilot deployments.

Production deployments

Hospital delivery robots (TUG, Moxi): well-mapped corridors, predictable hours. Works.
Airport food / luggage robots: high-density, varied populations. Performance degrades; many use leashed teleop fallback.
Sidewalk delivery robots (Starship, Serve): outdoor; lower density; mostly successful.
Bookstore / cafe robots: novelty; performance is acceptable when collisions are at low speed.

The 2026 reality: dense human environments are still hard. Robots succeed in low-density / well-mapped spaces; struggle in chaotic ones.

Where the field is going

VLA-driven social policies: vision-language models that "understand" social cues from camera input.
Multi-robot social: multiple robots in the same human-shared space; cooperative behavior.
Cultural adaptation: social norms vary by country (queue spacing, eye-contact expectations). Localization beyond translation.
Robots-as-citizens: long-term residency in environments where robots and humans co-exist daily.

What you'll need

A pedestrian-prediction module (covered in the AV-stack lesson).
A social cost layer on top of Nav2's costmap, or a fully RL-based local planner.
Some form of intent communication (LEDs, slowdown, etc.).
A safety-stop mechanism (people will surprise the robot).
Logged failure cases for continuous improvement.

Exercise

In a sim with one robot and 5 simulated pedestrians (CrowdNav, SocNav), train a deep-RL policy with a basic social reward (success + distance penalty + collision penalty). Compare against a costmap-based Nav2 baseline. The RL policy yields more naturally; Nav2 plows through. Observing the difference is the field's progress in one comparison.

That's the Mobile & Legged track done

You've covered the full progression: differential drive → Ackermann → omnidirectional → quadruped gaits → ZMP / capture point → whole-body humanoid control → RL gaits → drone control → autonomous driving stack → social navigation. With this and the nine other completed tracks (Foundations, Kinematics, Control, ROS 2, SLAM, Perception, Planning, Manipulation, Learning), you have ten complete tracks covering the canonical robotics-engineering curriculum end-to-end. The remaining tracks (Simulators, Embedded, Frontiers) are tooling and breadth.

← Previous

Autonomous driving stack anatomy