TY - JOUR
T1 - A novel multimodal computer-aided diagnostic model for pulmonary embolism based on hybrid transformer-CNN and tabular transformer
AU - Zhang, Wei
AU - Gu, Yu
AU - Ma, Hao
AU - Yang, Lidong
AU - Zhang, Baohua
AU - Wang, Jing
AU - Chen, Meng
AU - Lu, Xiaoqi
AU - Li, Jianjun
AU - Liu, Xin
AU - Yu, Dahua
AU - Zhao, Ying
AU - Tang, Siyuan
AU - He, Qun
N1 - Publisher Copyright:
© Australasian College of Physical Scientists and Engineers in Medicine 2025.
PY - 2025
Y1 - 2025
N2 - Pulmonary embolism (PE) is a life-threatening clinical problem where early diagnosis and prompt treatment are essential to reducing morbidity and mortality. While the combination of CT images and electronic health records (EHR) can help improve computer-aided diagnosis, there are many challenges that need to be addressed. The primary objective of this study is to leverage both 3D CT images and EHR data to improve PE diagnosis. First, for 3D CT images, we propose a network combining Swin Transformers with 3D CNNs, enhanced by a Multi-Scale Feature Fusion (MSFF) module to address fusion challenges between different encoders. Secondly, we introduce a Polarized Self-Attention (PSA) module to enhance the attention mechanism within the 3D CNN. And then, for EHR data, we design the Tabular Transformer for effective feature extraction. Finally, we design and evaluate three multimodal attention fusion modules to integrate CT and EHR features, selecting the most effective one for final fusion. Experimental results on the RadFusion dataset demonstrate that our model significantly outperforms existing state-of-the-art methods, achieving an AUROC of 0.971, an F1 score of 0.926, and an accuracy of 0.920. These results underscore the effectiveness and innovation of our multimodal approach in advancing PE diagnosis.
AB - Pulmonary embolism (PE) is a life-threatening clinical problem where early diagnosis and prompt treatment are essential to reducing morbidity and mortality. While the combination of CT images and electronic health records (EHR) can help improve computer-aided diagnosis, there are many challenges that need to be addressed. The primary objective of this study is to leverage both 3D CT images and EHR data to improve PE diagnosis. First, for 3D CT images, we propose a network combining Swin Transformers with 3D CNNs, enhanced by a Multi-Scale Feature Fusion (MSFF) module to address fusion challenges between different encoders. Secondly, we introduce a Polarized Self-Attention (PSA) module to enhance the attention mechanism within the 3D CNN. And then, for EHR data, we design the Tabular Transformer for effective feature extraction. Finally, we design and evaluate three multimodal attention fusion modules to integrate CT and EHR features, selecting the most effective one for final fusion. Experimental results on the RadFusion dataset demonstrate that our model significantly outperforms existing state-of-the-art methods, achieving an AUROC of 0.971, an F1 score of 0.926, and an accuracy of 0.920. These results underscore the effectiveness and innovation of our multimodal approach in advancing PE diagnosis.
KW - 3DCNN
KW - EHR
KW - Multimodal diagnoses
KW - Pulmonary embolism
KW - Swin transformer
UR - http://www.scopus.com/pages/publications/105006811237
U2 - 10.1007/s13246-025-01568-4
DO - 10.1007/s13246-025-01568-4
M3 - Article
C2 - 40411540
AN - SCOPUS:105006811237
SN - 2662-4729
JO - Physical and Engineering Sciences in Medicine
JF - Physical and Engineering Sciences in Medicine
ER -