TY - JOUR
T1 - MIHNet
T2 - Multi-input hierarchical infrared image super-resolution method via collaborative CNN and Transformer
AU - Bai, Yang
AU - Gao, Meijing
AU - Sun, Huanyu
AU - Chen, Sibo
AU - Xie, Yunjia
AU - Yan, Yonghao
AU - Fan, Xiangrui
N1 - Publisher Copyright:
© 2025 Elsevier B.V.
PY - 2025/11
Y1 - 2025/11
N2 - Due to the low spatial resolution of infrared imaging systems, the acquired images typically suffer from low contrast, insufficient detail, and blurred edges. To address this issue, this paper proposes a multi-input hierarchical infrared image super-resolution reconstruction method based on collaborative CNN and Transformer, termed MIHNet. The network adopts a multi-input encoder–decoder structure as the framework. Firstly, a Local–Global Feature Perception Module (LGFPM) is designed, consisting of the constructed Local Texture Attention Unit (LTAU) and the Global Transformer Attention Unit (GTAU), aimed at simultaneously enhancing the local detail and global structure reconstruction capabilities of infrared images. Secondly, a Feature Refinement Module (FRM) is constructed to enhance the encoded feature expression. Then, a Multi-level Feature Fusion (MFF) module is designed to fuse the encoding stage's features adaptively. Finally, a mixed loss function composed of pixel loss, structure loss, and texture loss is constructed to guide network optimization. Experiments on three public datasets demonstrate that the proposed method outperforms thirteen other comparison algorithms in subjective and objective evaluations. Furthermore, this method has been verified in the downstream task of infrared and visible image fusion, which further demonstrates that MIHNet achieves a good SR reconstruction effect.
AB - Due to the low spatial resolution of infrared imaging systems, the acquired images typically suffer from low contrast, insufficient detail, and blurred edges. To address this issue, this paper proposes a multi-input hierarchical infrared image super-resolution reconstruction method based on collaborative CNN and Transformer, termed MIHNet. The network adopts a multi-input encoder–decoder structure as the framework. Firstly, a Local–Global Feature Perception Module (LGFPM) is designed, consisting of the constructed Local Texture Attention Unit (LTAU) and the Global Transformer Attention Unit (GTAU), aimed at simultaneously enhancing the local detail and global structure reconstruction capabilities of infrared images. Secondly, a Feature Refinement Module (FRM) is constructed to enhance the encoded feature expression. Then, a Multi-level Feature Fusion (MFF) module is designed to fuse the encoding stage's features adaptively. Finally, a mixed loss function composed of pixel loss, structure loss, and texture loss is constructed to guide network optimization. Experiments on three public datasets demonstrate that the proposed method outperforms thirteen other comparison algorithms in subjective and objective evaluations. Furthermore, this method has been verified in the downstream task of infrared and visible image fusion, which further demonstrates that MIHNet achieves a good SR reconstruction effect.
KW - CNN
KW - Deep learning
KW - Image super-resolution
KW - Infrared images
KW - Transformer
UR - http://www.scopus.com/pages/publications/105011490991
U2 - 10.1016/j.infrared.2025.106004
DO - 10.1016/j.infrared.2025.106004
M3 - Article
AN - SCOPUS:105011490991
SN - 1350-4495
VL - 150
JO - Infrared Physics and Technology
JF - Infrared Physics and Technology
M1 - 106004
ER -