UMD-Net: A Unified Multi-Task Assistive Driving Network Based on Multimodal Fusion

Wenzhuo Liu; Yicheng Qiao; Zhiwei Li; Wenshuo Wang; Wei Zhang; Jiayin Zhu; Yanhuan Jiang; Li Wang; Hong Wang; Huaping Liu; Kunfeng Wang

doi:10.1109/TITS.2025.3556852

UMD-Net: A Unified Multi-Task Assistive Driving Network Based on Multimodal Fusion

Wenzhuo Liu, Yicheng Qiao, Zhiwei Li, Wenshuo Wang^*, Wei Zhang, Jiayin Zhu, Yanhuan Jiang^*, Li Wang, Hong Wang, Huaping Liu, Kunfeng Wang

^*此作品的通讯作者

机械与车辆学院

科研成果: 期刊稿件 › 文章 › 同行评审

1 引用（Scopus）

摘要

In recent years, researchers have focused on identifying tasks related to driver state, traffic environment, and others to enhance the safety of autonomous driving assistance systems. However, current research on these tasks is conducted independently, neglecting the interconnections between the driver, traffic environment, and vehicle. In this paper, we propose a Unified Multi-task Assistive Driving Network Based on Multimodal Fusion (UMD-Net), the first unified model capable of recognizing four tasks simultaneously by utilizing multimodal data: driver behavior recognition, driver emotion recognition, traffic context recognition, and vehicle behavior recognition. In order to better enhance the synergistic effects between multiple tasks, we designed the position-sensitive multi-directional attention feature extraction subnetwork and recursive dynamic feature fusion module. The former captures the key features of multi-view images by different directions of attention mechanism to improve the generalization of the model across multiple tasks. The latter dynamically adjusts the fusion weight according to the multimodal features to enhance the representation ability of important features in multi-task learning. Our model was evaluated on the public dataset AIDE, achieving the best performance across all four tasks and a high accuracy of 95.31% in the traffic context recognition task, demonstrating the superiority of our approach.

源语言	英语
期刊	IEEE Transactions on Intelligent Transportation Systems
DOI	http://doi.org/10.1109/TITS.2025.3556852
出版状态	已接受/待刊 - 2025

访问文件

10.1109/TITS.2025.3556852

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{8099c8823046462aa671bdc7ba7de4f7,

title = "UMD-Net: A Unified Multi-Task Assistive Driving Network Based on Multimodal Fusion",

abstract = "In recent years, researchers have focused on identifying tasks related to driver state, traffic environment, and others to enhance the safety of autonomous driving assistance systems. However, current research on these tasks is conducted independently, neglecting the interconnections between the driver, traffic environment, and vehicle. In this paper, we propose a Unified Multi-task Assistive Driving Network Based on Multimodal Fusion (UMD-Net), the first unified model capable of recognizing four tasks simultaneously by utilizing multimodal data: driver behavior recognition, driver emotion recognition, traffic context recognition, and vehicle behavior recognition. In order to better enhance the synergistic effects between multiple tasks, we designed the position-sensitive multi-directional attention feature extraction subnetwork and recursive dynamic feature fusion module. The former captures the key features of multi-view images by different directions of attention mechanism to improve the generalization of the model across multiple tasks. The latter dynamically adjusts the fusion weight according to the multimodal features to enhance the representation ability of important features in multi-task learning. Our model was evaluated on the public dataset AIDE, achieving the best performance across all four tasks and a high accuracy of 95.31\% in the traffic context recognition task, demonstrating the superiority of our approach.",

keywords = "ADAS, driver state recognition, Multi-task learning, multimodal fusion, traffic environment recognition",

author = "Wenzhuo Liu and Yicheng Qiao and Zhiwei Li and Wenshuo Wang and Wei Zhang and Jiayin Zhu and Yanhuan Jiang and Li Wang and Hong Wang and Huaping Liu and Kunfeng Wang",

note = "Publisher Copyright: {\textcopyright} 2000-2011 IEEE.",

year = "2025",

doi = "10.1109/TITS.2025.3556852",

language = "English",

journal = "IEEE Transactions on Intelligent Transportation Systems",

issn = "1524-9050",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - UMD-Net

T2 - A Unified Multi-Task Assistive Driving Network Based on Multimodal Fusion

AU - Liu, Wenzhuo

AU - Qiao, Yicheng

AU - Li, Zhiwei

AU - Wang, Wenshuo

AU - Zhang, Wei

AU - Zhu, Jiayin

AU - Jiang, Yanhuan

AU - Wang, Li

AU - Wang, Hong

AU - Liu, Huaping

AU - Wang, Kunfeng

PY - 2025

Y1 - 2025

N2 - In recent years, researchers have focused on identifying tasks related to driver state, traffic environment, and others to enhance the safety of autonomous driving assistance systems. However, current research on these tasks is conducted independently, neglecting the interconnections between the driver, traffic environment, and vehicle. In this paper, we propose a Unified Multi-task Assistive Driving Network Based on Multimodal Fusion (UMD-Net), the first unified model capable of recognizing four tasks simultaneously by utilizing multimodal data: driver behavior recognition, driver emotion recognition, traffic context recognition, and vehicle behavior recognition. In order to better enhance the synergistic effects between multiple tasks, we designed the position-sensitive multi-directional attention feature extraction subnetwork and recursive dynamic feature fusion module. The former captures the key features of multi-view images by different directions of attention mechanism to improve the generalization of the model across multiple tasks. The latter dynamically adjusts the fusion weight according to the multimodal features to enhance the representation ability of important features in multi-task learning. Our model was evaluated on the public dataset AIDE, achieving the best performance across all four tasks and a high accuracy of 95.31% in the traffic context recognition task, demonstrating the superiority of our approach.

AB - In recent years, researchers have focused on identifying tasks related to driver state, traffic environment, and others to enhance the safety of autonomous driving assistance systems. However, current research on these tasks is conducted independently, neglecting the interconnections between the driver, traffic environment, and vehicle. In this paper, we propose a Unified Multi-task Assistive Driving Network Based on Multimodal Fusion (UMD-Net), the first unified model capable of recognizing four tasks simultaneously by utilizing multimodal data: driver behavior recognition, driver emotion recognition, traffic context recognition, and vehicle behavior recognition. In order to better enhance the synergistic effects between multiple tasks, we designed the position-sensitive multi-directional attention feature extraction subnetwork and recursive dynamic feature fusion module. The former captures the key features of multi-view images by different directions of attention mechanism to improve the generalization of the model across multiple tasks. The latter dynamically adjusts the fusion weight according to the multimodal features to enhance the representation ability of important features in multi-task learning. Our model was evaluated on the public dataset AIDE, achieving the best performance across all four tasks and a high accuracy of 95.31% in the traffic context recognition task, demonstrating the superiority of our approach.

KW - ADAS

KW - driver state recognition

KW - Multi-task learning

KW - multimodal fusion

KW - traffic environment recognition

UR - http://www.scopus.com/pages/publications/105002827562

U2 - 10.1109/TITS.2025.3556852

DO - 10.1109/TITS.2025.3556852

M3 - Article

AN - SCOPUS:105002827562

SN - 1524-9050

JO - IEEE Transactions on Intelligent Transportation Systems

JF - IEEE Transactions on Intelligent Transportation Systems

ER -

UMD-Net: A Unified Multi-Task Assistive Driving Network Based on Multimodal Fusion

摘要

访问文件

其它文件与链接

指纹

引用此