Fusion Classification Method Based on Audiovisual Information Processing

Peiju Chen, Xuan Zhang, Huijun Zhao, Huiliang Cao, Xuemei Chen*, Xiaochen Liu*

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

1 引用 (Scopus)

摘要

In the presence of external interference, multimodal target classification plays a crucial role. Traditional single-modal classification systems are limited by the singularity of data representation and their sensitivity to environmental conditions, making it challenging to meet the robustness requirements for target classification under external disturbances. This paper addresses the inadequacies of single-modal target classification by proposing a target classification algorithm based on audiovisual fusion. The innovative contributions of this work are as follows. (1) To resolve the issue of the lack of correlation between audio signals and image signals, we introduce a method that converts audio signals into spectrograms and fuses them with target images. The advantage of this method is that the spectrogram can fully utilize the effective information in the audio, ensuring stability, while also effectively addressing the challenge of fusing one-dimensional time series audio signals with two-dimensional discrete image signals. (2) We propose a convolutional extraction and modal fusion network framework that incorporates an attention mechanism module during the fusion process, ensuring the stability and robustness of the fused data for audiovisual target classification. Validation was conducted on both a custom dataset and the YouTube-8M dataset. The experimental results indicate that the proposed method demonstrates improvements in accuracy of 2.9%, 2.4%, 1.2%, and 0.9% compared to other multimodal fusion target classification methods on the custom dataset. This demonstrates the effectiveness of the proposed multimodal fusion recognition approach and fully validates the theoretical rationale behind our method.

源语言英语
文章编号4104
期刊Applied Sciences (Switzerland)
15
8
DOI
出版状态已出版 - 4月 2025
已对外发布

指纹

探究 'Fusion Classification Method Based on Audiovisual Information Processing' 的科研主题。它们共同构成独一无二的指纹。

引用此