Semantic Object-Goal Navigation on a Quadruped Robot in Known Environments
Built on Boston Dynamics Spot with a simple map then act workflow: record a clean 2D map plus trusted object instances, then come back and navigate to objects by name.
Stage 1: Pre run mapping
A short teleop pass builds a 2D occupancy map and records only confirmed object instances from RGB D into a small semantic database.
Stage 2: Object goal navigation
The robot localizes on the saved map and navigates to a selected object with a safe standoff goal using Nav2. Goal selection can be done from a CLI or by voice.
Thesis work at Dynamic Legged Systems, Istituto Italiano di Tecnologia. Everything runs onboard on an Intel NUC with no discrete GPU.
System overview
Hardware and sensing
The robot carries a small sensor set on purpose: a 2D LiDAR for mapping, localization, and costmaps, a RealSense T265 for visual inertial odometry, and a RealSense D435 for RGB D detections. The stack is designed for repeatable runs, not a one time demo.
ROS 2 pipeline
In mapping mode, SLAM Toolbox builds the 2D map while the semantic layer runs in parallel. In navigation mode, SLAM is off, the map is loaded, AMCL localizes, and Nav2 plans and drives Spot through the ROS 2 bridge.
Why two stages
It keeps runtime stable and lightweight. The robot does not try to rebuild the world during navigation. Instead it localizes on a fixed map and uses a compact object database for object goals.
What “confirmed” means
Raw detections are noisy. Objects are promoted to the database only after repeated support across time, with gating and de duplication so the final list stays small, stable, and usable for navigation.
Demo 1: Semantic mapping
This video shows the full semantic layer in the pre run phase. The detector runs on the D435 RGB stream, depth is used to back project detections into 3D, then TF is used to express those points in the map frame. Instead of writing every frame into the map, the system keeps a small memory:
- Proposal memory: new observations enter here first. This is where unstable detections die out.
- Static memory: only detections that are repeatedly supported get promoted. Nearby duplicates are merged so you do not end up with five chairs for the same chair.
- Recorder output: confirmed objects are periodically saved to a human readable YAML file. That file becomes the “semantic database” used in stage 2.
What I built
Confirmed only semantic layer
A ROS 2 node that turns RGB D detections into a compact set of stable object instances in the map frame, with filtering, confirmation, and de duplication.
Object goal interface for Nav2
A thin interface that reads the recorded objects and converts a chosen instance into a PoseStamped goal with standoff and facing constraints, then sends it to Nav2.
End to end Spot integration
Deployment on Spot with onboard compute, launch and configuration files, RViz views, rosbag logging, and a small supervision layer for repeated runs.
Real world evaluation
Tests in multiple indoor environments, including failure case analysis. The focus is integration quality and predictable behavior under real constraints.
Experiments and what I learned
The pipeline was evaluated on real runs in three environments: a church, the DLS lab, and a large IIT test room. Mapping runs validate that the same setup can produce usable 2D grids across very different layouts. Semantic mapping focuses on a small closed set of classes to keep the database compact and reliable. Navigation trials in the IIT room cover multiple object layouts and multi object scenarios.
Two practical limitations show up clearly in real runs: depth alignment errors can bias object placement a bit, and localization inconsistencies can cause early goal acceptance in edge cases. Both are visible in logs and RViz, so they are debuggable and fixable rather than hidden failures.