Efficient Multispectral Object Detection with attentive feature aggregation leveraging zero-shot implicit illumination guidance

Zhongxia Xiong, Ziying Yao, Xuan Liu, Wenyao Zhao, Jie Cao, Xinkai Wu*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)

Abstract

With visible imagery and thermal sensing data, multispectral object detection facilitates around-the-clock perception for applications such as autonomous driving. Infrared input serves as auxiliary data for cross-modality feature aggregation, a common approach demonstrated to be successful by numerous previous studies. Nevertheless, despite the inclusion of complex and time-consuming modules in many existing methods, effective information fusion remains a formidable challenge due to severe spatiotemporal misalignment and modality imbalance between visible and thermal images. Thus, this paper intends to lift both the accuracy and speed for RGB-infrared perception. To this end, an illumination-guided attentive feature aggregation model (EMOD) is introduced to achieve Efficient Multispectral Object Detection. Firstly, EMOD employs feature fusion with a local-to-nonlocal cross-modality attention mechanism, which not only mitigates pixel-wise positional variation but also captures context-level complementary information. Furthermore, to address the modality imbalance issue, a signal indicating illumination conditions is implicitly embedded into the aggregation module to guide attentive computation. Unlike previous works, this signal is more potent and practical as it functions by denoting regional lighting conditions and without requiring additional training labels. Comprehensive experiments are conducted on three widely used datasets, including KAIST, CVC-14 and FLIR. Without bells and whistles, EMOD surpasses state-of-the-art approaches in terms of both effectiveness and efficiency. For example, it achieves a 5.96 MR score on KAIST while maintaining a speed of 28 FPS on a low-cost GPU.

Original languageEnglish
Article number102939
JournalInformation Fusion
Volume118
DOIs
Publication statusPublished - Jun 2025

Keywords

  • Feature fusion
  • Light estimation
  • Multispectral
  • Object detection
  • Real time

Fingerprint

Dive into the research topics of 'Efficient Multispectral Object Detection with attentive feature aggregation leveraging zero-shot implicit illumination guidance'. Together they form a unique fingerprint.

Cite this