TY - JOUR
T1 - Category-Level 6-D Object Pose Estimation With Learnable Prior Embeddings for Robotic Grasping
AU - Yu, Sheng
AU - Zhai, Di Hua
AU - Yin, Jian
AU - Xia, Yuanqing
N1 - Publisher Copyright:
© 1982-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - Category-level object pose estimation is crucial for predicting the poses of unknown objects within known categories. While methods relying on category-level object pose estimation with category priors necessitate prior training on datasets to acquire object priors, approaches for category-level object pose estimation without category priors lack relevant geometric information. To address these challenges, this article introduces a category-level object pose estimation method, PENet, based on learnable priors. The method utilizes a learnable category prior embedding to represent prior features and proposes a transformer-based prior embedding deformation module to initially deform the prior embedding from a global perspective to match the actual target object. Additionally, it introduces a transformer-based correspondence module to establish correspondence between instances and priors from a global perspective in order to further align the deformed feature embedding with the scene point cloud features. Experimental results demonstrate that the proposed method surpasses existing methods, achieving state-of-the-art performance on the dataset. Furthermore, the generalization ability of the proposed method is evaluated by applying PENet to object pose estimation on the Wild6D dataset, where it outperforms all related methods. Finally, the application of PENet to robotic grasping experiments on a real UR3 robot results in a higher success rate compared to previous methods.
AB - Category-level object pose estimation is crucial for predicting the poses of unknown objects within known categories. While methods relying on category-level object pose estimation with category priors necessitate prior training on datasets to acquire object priors, approaches for category-level object pose estimation without category priors lack relevant geometric information. To address these challenges, this article introduces a category-level object pose estimation method, PENet, based on learnable priors. The method utilizes a learnable category prior embedding to represent prior features and proposes a transformer-based prior embedding deformation module to initially deform the prior embedding from a global perspective to match the actual target object. Additionally, it introduces a transformer-based correspondence module to establish correspondence between instances and priors from a global perspective in order to further align the deformed feature embedding with the scene point cloud features. Experimental results demonstrate that the proposed method surpasses existing methods, achieving state-of-the-art performance on the dataset. Furthermore, the generalization ability of the proposed method is evaluated by applying PENet to object pose estimation on the Wild6D dataset, where it outperforms all related methods. Finally, the application of PENet to robotic grasping experiments on a real UR3 robot results in a higher success rate compared to previous methods.
KW - Category-level 6-D pose
KW - grasping detection
KW - object pose estimation
KW - robot
UR - http://www.scopus.com/pages/publications/105004691253
U2 - 10.1109/TIE.2025.3555019
DO - 10.1109/TIE.2025.3555019
M3 - Article
AN - SCOPUS:105004691253
SN - 0278-0046
JO - IEEE Transactions on Industrial Electronics
JF - IEEE Transactions on Industrial Electronics
ER -