R2G: Reasoning to ground in 3D scenes

Yixuan Li, Zan Wang, Wei Liang*

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

摘要

We propose Reasoning to Ground (R2G), a neural symbolic model that grounds the target objects in 3D scenes in a reasoning manner. Unlike previous works that rely on end-to-end models for grounding, which often function as black boxes, our approach seeks to provide a more interpretable and reliable solution. R2Gexplicitly models the 3D scene using a semantic concept-based scene graph, recurrently simulates the attention transferring across object entities, and interpretably grounding the target objects with the highest attention score. Specifically, we embed multiple object properties within the graph nodes and spatial relations among entities within the edges through a predefined semantic vocabulary. To guide attention transfer, we employ learning or prompting-based approaches to interpret the referential utterance into reasoning instructions within the same semantic space. In each reasoning round, we either (1) merge current attention distribution with the similarity between instructions and embedded entity properties, or (2) shift the attention across the scene graph based on the similarity between instructions and embedded spatial relations. The experiments on Sr3D/Nr3D benchmarks show that R2G achieves a comparable result with the prior works while offering improved interpretability, breaking a new path for 3D grounding. The code and dataset for this work are available at:http://sites.google.com/view/reasoning-to-ground.

源语言英语
文章编号111728
期刊Pattern Recognition
168
DOI
出版状态已出版 - 12月 2025

指纹

探究 'R2G: Reasoning to ground in 3D scenes' 的科研主题。它们共同构成独一无二的指纹。

引用此