AITEPose: Learning an End-to-End Monocular 3D Human Pose Estimator via Auxiliary-Information-Driven Training Enhancement

Bowei Xie, Geyuan Liu, Fang Deng, Maobin Lu*

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

摘要

3D human pose estimation (3DHPE) from a single monocular RGB image is fundamental in many image-related fields, such as virtual reality, motion analysis, and human-computer interaction. To improve estimation accuracy, existing works typically integrate complex networks or divide monocular 3DHPE into multiple stages. However, complicating the estimation process to improve the estimation accuracy sacrifices the estimation speed and limits its application. To alleviate this, we propose AITEPose, an end-to-end model, which achieves higher monocular 3DHPE accuracy with a simpler model structure. Specifically, inspired by online knowledge distillation, we design an Auxiliary-Information-Driven Training Enhancement (AITE) framework. In the AITE framework, during training, an adjustment network is introduced between the prediction network and the loss function to incorporate auxiliary information and enhance the training process. Notably, the adjustment network is constructed by developing a novel cascaded Disturbance-Correction Module (DCM). It adjusts the poses to get more accurate results based on ground-truth bone lengths. Both AITE and DCM are employed only during training, thereby improving training outcomes without complicating the inference process. The AITEPose model achieves state-of-the-art performance for single-frame monocular 3DHPE on the most comprehensive dataset Human3.6M. To further validate the effectiveness of AITE and DCM, we design a monocular 2DHPE model, AITEPose2D, and conduct extensive ablation experiments on the COCO2017 dataset, demonstrating the robustness and generalizability of our proposed AITEPose.

指纹

探究 'AITEPose: Learning an End-to-End Monocular 3D Human Pose Estimator via Auxiliary-Information-Driven Training Enhancement' 的科研主题。它们共同构成独一无二的指纹。

引用此