OpenIN: Open-Vocabulary Instance-Oriented Navigation in Dynamic Domestic Environments

Yujie Tang, Meiling Wang, Yinan Deng, Zibo Zheng, Jingchuan Deng, Sibo Zuo, Yufeng Yue*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

In daily domestic settings, frequently used objects like cups often have unfixed positions and multiple instances within the same category, and their carriers also frequently change. As a result, it becomes challenging for a robot to efficiently navigate to a specific instance. To tackle this challenge, the robot must capture and update scene changes and plans continuously. However, current object-navigation approaches primarily focus on the semantic level and lack the ability to dynamically update the scene representation. In contrast, this paper captures the relationships between frequently used objects and their static carriers. It constructs an open-vocabulary Carrier-Relationship Scene Graph (CRSG) and updates the carrying status during robot navigation to reflect the dynamic changes of the scene. Based on CRSG, we further propose an instance navigation strategy that models the navigation process as a Markov Decision Process. At each step, decisions are informed by the Large Language Model's commonsense knowledge and visual-language feature similarity. We designed a series of long-horizon navigation tasks for frequently used everyday items in the Habitat simulator. The results demonstrate that by updating the CRSG, the robot can navigate efficiently to moved targets. Additionally, we conducted extensive experiments on a real robot, demonstrating the effectiveness of our method and exploring its limitations.

Original languageEnglish
Pages (from-to)9256-9263
Number of pages8
JournalIEEE Robotics and Automation Letters
Volume10
Issue number9
DOIs
Publication statusPublished - 2025
Externally publishedYes

Keywords

  • Carrier-Relationship Scene Graph
  • Dynamic Scenes
  • Instance Navigation

Fingerprint

Dive into the research topics of 'OpenIN: Open-Vocabulary Instance-Oriented Navigation in Dynamic Domestic Environments'. Together they form a unique fingerprint.

Cite this