MIHNet: Multi-input hierarchical infrared image super-resolution method via collaborative CNN and Transformer

Yang Bai, Meijing Gao*, Huanyu Sun, Sibo Chen, Yunjia Xie, Yonghao Yan, Xiangrui Fan

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Due to the low spatial resolution of infrared imaging systems, the acquired images typically suffer from low contrast, insufficient detail, and blurred edges. To address this issue, this paper proposes a multi-input hierarchical infrared image super-resolution reconstruction method based on collaborative CNN and Transformer, termed MIHNet. The network adopts a multi-input encoder–decoder structure as the framework. Firstly, a Local–Global Feature Perception Module (LGFPM) is designed, consisting of the constructed Local Texture Attention Unit (LTAU) and the Global Transformer Attention Unit (GTAU), aimed at simultaneously enhancing the local detail and global structure reconstruction capabilities of infrared images. Secondly, a Feature Refinement Module (FRM) is constructed to enhance the encoded feature expression. Then, a Multi-level Feature Fusion (MFF) module is designed to fuse the encoding stage's features adaptively. Finally, a mixed loss function composed of pixel loss, structure loss, and texture loss is constructed to guide network optimization. Experiments on three public datasets demonstrate that the proposed method outperforms thirteen other comparison algorithms in subjective and objective evaluations. Furthermore, this method has been verified in the downstream task of infrared and visible image fusion, which further demonstrates that MIHNet achieves a good SR reconstruction effect.

Original languageEnglish
Article number106004
JournalInfrared Physics and Technology
Volume150
DOIs
Publication statusPublished - Nov 2025

Keywords

  • CNN
  • Deep learning
  • Image super-resolution
  • Infrared images
  • Transformer

Fingerprint

Dive into the research topics of 'MIHNet: Multi-input hierarchical infrared image super-resolution method via collaborative CNN and Transformer'. Together they form a unique fingerprint.

Cite this