FingerPoseNet: A finger-level multitask learning network with residual feature sharing for 3D hand pose estimation

Tekie Tsegay Tewolde; Ali Asghar Manjotho; Prodip Kumar Sarker; Zhendong Niu

doi:10.1016/j.neunet.2025.107315

FingerPoseNet: A finger-level multitask learning network with residual feature sharing for 3D hand pose estimation

Tekie Tsegay Tewolde, Ali Asghar Manjotho, Prodip Kumar Sarker, Zhendong Niu^*

^*此作品的通讯作者

计算机学院

科研成果: 期刊稿件 › 文章 › 同行评审

摘要

Hand pose estimation approaches commonly rely on shared hand feature maps to regress the 3D locations of all hand joints. Subsequently, they struggle to enhance finger-level features which are invaluable in capturing joint-to-finger associations and articulations. To address this limitation, we propose a finger-level multitask learning network with residual feature sharing, named FingerPoseNet, for accurate 3D hand pose estimation from a depth image. FingerPoseNet comprises three stages: (a) a shared base feature map extraction backbone based on pre-trained ResNet-50; (b) a finger-level multitask learning stage that extracts and enhances feature maps for each finger and the palm; and (c) a multitask fusion layer for consolidating the estimation results obtained by each subtask. We exploit multitask learning by decoupling the hand pose estimation task into six subtasks dedicated to each finger and palm. Each subtask is responsible for subtask-specific feature extraction, enhancement, and 3D keypoint regression. To enhance subtask-specific features, we propose a residual feature-sharing approach scaled up to mine supplementary information from all subtasks. Experiments performed on five challenging public hand pose datasets, including ICVL, NYU, MSRA, Hands-2019-Task1, and HO3D-v3 demonstrate significant improvements in accuracy compared with state-of-the-art approaches.

源语言	英语
文章编号	107315
期刊	Neural Networks
卷	187
DOI	http://doi.org/10.1016/j.neunet.2025.107315
出版状态	已出版 - 7月 2025

访问文件

10.1016/j.neunet.2025.107315

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{2df5eafd5ea9473ba7d5323f2bdb43d9,

title = "FingerPoseNet: A finger-level multitask learning network with residual feature sharing for 3D hand pose estimation",

abstract = "Hand pose estimation approaches commonly rely on shared hand feature maps to regress the 3D locations of all hand joints. Subsequently, they struggle to enhance finger-level features which are invaluable in capturing joint-to-finger associations and articulations. To address this limitation, we propose a finger-level multitask learning network with residual feature sharing, named FingerPoseNet, for accurate 3D hand pose estimation from a depth image. FingerPoseNet comprises three stages: (a) a shared base feature map extraction backbone based on pre-trained ResNet-50; (b) a finger-level multitask learning stage that extracts and enhances feature maps for each finger and the palm; and (c) a multitask fusion layer for consolidating the estimation results obtained by each subtask. We exploit multitask learning by decoupling the hand pose estimation task into six subtasks dedicated to each finger and palm. Each subtask is responsible for subtask-specific feature extraction, enhancement, and 3D keypoint regression. To enhance subtask-specific features, we propose a residual feature-sharing approach scaled up to mine supplementary information from all subtasks. Experiments performed on five challenging public hand pose datasets, including ICVL, NYU, MSRA, Hands-2019-Task1, and HO3D-v3 demonstrate significant improvements in accuracy compared with state-of-the-art approaches.",

keywords = "Hand pose estimation, Information sharing, Multitask learning, User behavior modeling, Virtual reality",

author = "Tewolde, \{Tekie Tsegay\} and Manjotho, \{Ali Asghar\} and Sarker, \{Prodip Kumar\} and Zhendong Niu",

note = "Publisher Copyright: {\textcopyright} 2025 Elsevier Ltd",

year = "2025",

month = jul,

doi = "10.1016/j.neunet.2025.107315",

language = "English",

volume = "187",

journal = "Neural Networks",

issn = "0893-6080",

publisher = "Elsevier Ltd.",

}

TY - JOUR

T1 - FingerPoseNet

T2 - A finger-level multitask learning network with residual feature sharing for 3D hand pose estimation

AU - Tewolde, Tekie Tsegay

AU - Manjotho, Ali Asghar

AU - Sarker, Prodip Kumar

AU - Niu, Zhendong

PY - 2025/7

Y1 - 2025/7

N2 - Hand pose estimation approaches commonly rely on shared hand feature maps to regress the 3D locations of all hand joints. Subsequently, they struggle to enhance finger-level features which are invaluable in capturing joint-to-finger associations and articulations. To address this limitation, we propose a finger-level multitask learning network with residual feature sharing, named FingerPoseNet, for accurate 3D hand pose estimation from a depth image. FingerPoseNet comprises three stages: (a) a shared base feature map extraction backbone based on pre-trained ResNet-50; (b) a finger-level multitask learning stage that extracts and enhances feature maps for each finger and the palm; and (c) a multitask fusion layer for consolidating the estimation results obtained by each subtask. We exploit multitask learning by decoupling the hand pose estimation task into six subtasks dedicated to each finger and palm. Each subtask is responsible for subtask-specific feature extraction, enhancement, and 3D keypoint regression. To enhance subtask-specific features, we propose a residual feature-sharing approach scaled up to mine supplementary information from all subtasks. Experiments performed on five challenging public hand pose datasets, including ICVL, NYU, MSRA, Hands-2019-Task1, and HO3D-v3 demonstrate significant improvements in accuracy compared with state-of-the-art approaches.

AB - Hand pose estimation approaches commonly rely on shared hand feature maps to regress the 3D locations of all hand joints. Subsequently, they struggle to enhance finger-level features which are invaluable in capturing joint-to-finger associations and articulations. To address this limitation, we propose a finger-level multitask learning network with residual feature sharing, named FingerPoseNet, for accurate 3D hand pose estimation from a depth image. FingerPoseNet comprises three stages: (a) a shared base feature map extraction backbone based on pre-trained ResNet-50; (b) a finger-level multitask learning stage that extracts and enhances feature maps for each finger and the palm; and (c) a multitask fusion layer for consolidating the estimation results obtained by each subtask. We exploit multitask learning by decoupling the hand pose estimation task into six subtasks dedicated to each finger and palm. Each subtask is responsible for subtask-specific feature extraction, enhancement, and 3D keypoint regression. To enhance subtask-specific features, we propose a residual feature-sharing approach scaled up to mine supplementary information from all subtasks. Experiments performed on five challenging public hand pose datasets, including ICVL, NYU, MSRA, Hands-2019-Task1, and HO3D-v3 demonstrate significant improvements in accuracy compared with state-of-the-art approaches.

KW - Hand pose estimation

KW - Information sharing

KW - Multitask learning

KW - User behavior modeling

KW - Virtual reality

UR - http://www.scopus.com/pages/publications/86000527195

U2 - 10.1016/j.neunet.2025.107315

DO - 10.1016/j.neunet.2025.107315

M3 - Article

C2 - 40081269

AN - SCOPUS:86000527195

SN - 0893-6080

VL - 187

JO - Neural Networks

JF - Neural Networks

M1 - 107315

ER -

FingerPoseNet: A finger-level multitask learning network with residual feature sharing for 3D hand pose estimation

摘要

访问文件

其它文件与链接

指纹

引用此