Category-Level 6-D Object Pose Estimation With Learnable Prior Embeddings for Robotic Grasping

Sheng Yu; Di Hua Zhai; Jian Yin; Yuanqing Xia

doi:10.1109/TIE.2025.3555019

Category-Level 6-D Object Pose Estimation With Learnable Prior Embeddings for Robotic Grasping

Sheng Yu, Di Hua Zhai^*, Jian Yin, Yuanqing Xia

^*此作品的通讯作者

科研成果: 期刊稿件 › 文章 › 同行评审

摘要

Category-level object pose estimation is crucial for predicting the poses of unknown objects within known categories. While methods relying on category-level object pose estimation with category priors necessitate prior training on datasets to acquire object priors, approaches for category-level object pose estimation without category priors lack relevant geometric information. To address these challenges, this article introduces a category-level object pose estimation method, PENet, based on learnable priors. The method utilizes a learnable category prior embedding to represent prior features and proposes a transformer-based prior embedding deformation module to initially deform the prior embedding from a global perspective to match the actual target object. Additionally, it introduces a transformer-based correspondence module to establish correspondence between instances and priors from a global perspective in order to further align the deformed feature embedding with the scene point cloud features. Experimental results demonstrate that the proposed method surpasses existing methods, achieving state-of-the-art performance on the dataset. Furthermore, the generalization ability of the proposed method is evaluated by applying PENet to object pose estimation on the Wild6D dataset, where it outperforms all related methods. Finally, the application of PENet to robotic grasping experiments on a real UR3 robot results in a higher success rate compared to previous methods.

源语言	英语
期刊	IEEE Transactions on Industrial Electronics
DOI	http://doi.org/10.1109/TIE.2025.3555019
出版状态	已接受/待刊 - 2025
已对外发布	是

访问文件

10.1109/TIE.2025.3555019

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{92ae2345be9e40399b736e399c055271,

title = "Category-Level 6-D Object Pose Estimation With Learnable Prior Embeddings for Robotic Grasping",

abstract = "Category-level object pose estimation is crucial for predicting the poses of unknown objects within known categories. While methods relying on category-level object pose estimation with category priors necessitate prior training on datasets to acquire object priors, approaches for category-level object pose estimation without category priors lack relevant geometric information. To address these challenges, this article introduces a category-level object pose estimation method, PENet, based on learnable priors. The method utilizes a learnable category prior embedding to represent prior features and proposes a transformer-based prior embedding deformation module to initially deform the prior embedding from a global perspective to match the actual target object. Additionally, it introduces a transformer-based correspondence module to establish correspondence between instances and priors from a global perspective in order to further align the deformed feature embedding with the scene point cloud features. Experimental results demonstrate that the proposed method surpasses existing methods, achieving state-of-the-art performance on the dataset. Furthermore, the generalization ability of the proposed method is evaluated by applying PENet to object pose estimation on the Wild6D dataset, where it outperforms all related methods. Finally, the application of PENet to robotic grasping experiments on a real UR3 robot results in a higher success rate compared to previous methods.",

keywords = "Category-level 6-D pose, grasping detection, object pose estimation, robot",

author = "Sheng Yu and Zhai, \{Di Hua\} and Jian Yin and Yuanqing Xia",

note = "Publisher Copyright: {\textcopyright} 1982-2012 IEEE.",

year = "2025",

doi = "10.1109/TIE.2025.3555019",

language = "English",

journal = "IEEE Transactions on Industrial Electronics",

issn = "0278-0046",

publisher = "IEEE Industrial Electronics Society",

}

TY - JOUR

T1 - Category-Level 6-D Object Pose Estimation With Learnable Prior Embeddings for Robotic Grasping

AU - Yu, Sheng

AU - Zhai, Di Hua

AU - Yin, Jian

AU - Xia, Yuanqing

PY - 2025

Y1 - 2025

N2 - Category-level object pose estimation is crucial for predicting the poses of unknown objects within known categories. While methods relying on category-level object pose estimation with category priors necessitate prior training on datasets to acquire object priors, approaches for category-level object pose estimation without category priors lack relevant geometric information. To address these challenges, this article introduces a category-level object pose estimation method, PENet, based on learnable priors. The method utilizes a learnable category prior embedding to represent prior features and proposes a transformer-based prior embedding deformation module to initially deform the prior embedding from a global perspective to match the actual target object. Additionally, it introduces a transformer-based correspondence module to establish correspondence between instances and priors from a global perspective in order to further align the deformed feature embedding with the scene point cloud features. Experimental results demonstrate that the proposed method surpasses existing methods, achieving state-of-the-art performance on the dataset. Furthermore, the generalization ability of the proposed method is evaluated by applying PENet to object pose estimation on the Wild6D dataset, where it outperforms all related methods. Finally, the application of PENet to robotic grasping experiments on a real UR3 robot results in a higher success rate compared to previous methods.

AB - Category-level object pose estimation is crucial for predicting the poses of unknown objects within known categories. While methods relying on category-level object pose estimation with category priors necessitate prior training on datasets to acquire object priors, approaches for category-level object pose estimation without category priors lack relevant geometric information. To address these challenges, this article introduces a category-level object pose estimation method, PENet, based on learnable priors. The method utilizes a learnable category prior embedding to represent prior features and proposes a transformer-based prior embedding deformation module to initially deform the prior embedding from a global perspective to match the actual target object. Additionally, it introduces a transformer-based correspondence module to establish correspondence between instances and priors from a global perspective in order to further align the deformed feature embedding with the scene point cloud features. Experimental results demonstrate that the proposed method surpasses existing methods, achieving state-of-the-art performance on the dataset. Furthermore, the generalization ability of the proposed method is evaluated by applying PENet to object pose estimation on the Wild6D dataset, where it outperforms all related methods. Finally, the application of PENet to robotic grasping experiments on a real UR3 robot results in a higher success rate compared to previous methods.

KW - Category-level 6-D pose

KW - grasping detection

KW - object pose estimation

KW - robot

UR - http://www.scopus.com/pages/publications/105004691253

U2 - 10.1109/TIE.2025.3555019

DO - 10.1109/TIE.2025.3555019

M3 - Article

AN - SCOPUS:105004691253

SN - 0278-0046

JO - IEEE Transactions on Industrial Electronics

JF - IEEE Transactions on Industrial Electronics

ER -

Category-Level 6-D Object Pose Estimation With Learnable Prior Embeddings for Robotic Grasping

摘要

访问文件

其它文件与链接

指纹

引用此