ACNTrack: Agent cross-attention guided Multimodal Multi-Object Tracking with Neural Kalman Filter

Lian Zhang, Lingxue Wang*, Yuzhen Wu, Mingkun Chen, Dezhi Zheng, Yi Cai

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Exploring and associating the complementary information from visible, thermal infrared, and low-light images is crucial for advancing Multimodal Multi-Object Tracking (MMOT). While previous studies have shown that efficient feature fusion modules can bolster tracking performance in complex environments, these methods often encounter constraints in global feature interaction and computational efficiency. We present a novel multimodal multi-object tracker based on a tracking-by-detection paradigm, comprising a multimodal detector and a data associator. A dual cross-attention feature fusion detection framework, predicated on an agent attention mechanism, is introduced to enhance feature interaction efficiency and effectively capture cross-modal complementary information. To more accurately capture detailed and complex information inherent in each modality, we propose a Feature Pyramid Shared Convolution (FPS-Conv) operation to supersede the Spatial Pyramid Pooling Fast (SPPF) operation within the detector. Additionally, a Neural Kalman Filter (NKF) is developed to augment the performance of the data associator, which dynamically adjusts process and observation noise in accordance with the current motion state. Our innovative fusion architecture significantly reduces computational complexity while maintaining high-quality feature interactions, and our proposed NKF demonstrates superior performance in handling diverse motion patterns compared to traditional fixed-parameter approaches. Experimental results validate these advantages, with our proposed method achieving state-of-the-art results on the KAIST, FLIR, and UniRTL test datasets and demonstrated competitive performance on the VT-MOT dataset.

Original languageEnglish
Article number130811
JournalNeurocomputing
Volume650
DOIs
Publication statusPublished - 14 Oct 2025

Keywords

  • Feature Pyramid Shared
  • Multi-object tracking
  • Multimodal image
  • Neural Kalman Filter

Fingerprint

Dive into the research topics of 'ACNTrack: Agent cross-attention guided Multimodal Multi-Object Tracking with Neural Kalman Filter'. Together they form a unique fingerprint.

Cite this