A Survey of Human-Object Interaction Detection With Deep Learning

Geng Han, Jiachen Zhao, Lele Zhang, Fang Deng*

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

2 引用 (Scopus)

摘要

Human-object interaction (HOI) detection has attracted significant attention due to its wide applications, including human-robot interactions, security monitoring, automatic sports commentary, etc. HOI detection aims to detect humans, objects, and their interactions in a given image or video, so it needs a higher-level semantic understanding of the image than regular object recognition or detection tasks. It is also more challenging technically because of some unique difficulties, such as multi-object interactions, long-tail distribution of interaction categories, etc. Currently, deep learning methods have achieved great performance in HOI detection, but there are few reviews describing the recent advance of deep learning-based HOI detection. Moreover, the current stage-based category of HOI detection methods is causing confusion in community discussion and beginner learning. To fill this gap, this paper summarizes, categorizes, and compares methods using deep learning for HOI detection over the last nine years. Firstly, we summarize the pipeline of HOI detection methods. Then, we divide existing methods into three categories (two-stage, one-stage, and transformer-based), distinguish them in formulas and schematics, and qualitatively compare their advantages and disadvantages. After that, we review each category of methods in detail, focusing on HOI detection methods for images. Moreover, we explore the development process of using foundation models for HOI detection. We also quantitatively compare the performance of existing methods on public HOI datasets. At last, we point out the future research direction of HOI detection.

源语言英语
页(从-至)3-26
页数24
期刊IEEE Transactions on Emerging Topics in Computational Intelligence
9
1
DOI
出版状态已出版 - 2025

指纹

探究 'A Survey of Human-Object Interaction Detection With Deep Learning' 的科研主题。它们共同构成独一无二的指纹。

引用此