MIHNet: Multi-input hierarchical infrared image super-resolution method via collaborative CNN and Transformer

Yang Bai; Meijing Gao; Huanyu Sun; Sibo Chen; Yunjia Xie; Yonghao Yan; Xiangrui Fan

doi:10.1016/j.infrared.2025.106004

MIHNet: Multi-input hierarchical infrared image super-resolution method via collaborative CNN and Transformer

Yang Bai, Meijing Gao^*, Huanyu Sun, Sibo Chen, Yunjia Xie, Yonghao Yan, Xiangrui Fan

^*Corresponding author for this work

School of Integrated Circuits and Electronics

Research output: Contribution to journal › Article › peer-review

Abstract

Due to the low spatial resolution of infrared imaging systems, the acquired images typically suffer from low contrast, insufficient detail, and blurred edges. To address this issue, this paper proposes a multi-input hierarchical infrared image super-resolution reconstruction method based on collaborative CNN and Transformer, termed MIHNet. The network adopts a multi-input encoder–decoder structure as the framework. Firstly, a Local–Global Feature Perception Module (LGFPM) is designed, consisting of the constructed Local Texture Attention Unit (LTAU) and the Global Transformer Attention Unit (GTAU), aimed at simultaneously enhancing the local detail and global structure reconstruction capabilities of infrared images. Secondly, a Feature Refinement Module (FRM) is constructed to enhance the encoded feature expression. Then, a Multi-level Feature Fusion (MFF) module is designed to fuse the encoding stage's features adaptively. Finally, a mixed loss function composed of pixel loss, structure loss, and texture loss is constructed to guide network optimization. Experiments on three public datasets demonstrate that the proposed method outperforms thirteen other comparison algorithms in subjective and objective evaluations. Furthermore, this method has been verified in the downstream task of infrared and visible image fusion, which further demonstrates that MIHNet achieves a good SR reconstruction effect.

Original language	English
Article number	106004
Journal	Infrared Physics and Technology
Volume	150
DOIs	http://doi.org/10.1016/j.infrared.2025.106004
Publication status	Published - Nov 2025

Keywords

CNN
Deep learning
Image super-resolution
Infrared images
Transformer

Access to Document

10.1016/j.infrared.2025.106004

Cite this

@article{1961ea0069944bbdb903d32595a16ace,

title = "MIHNet: Multi-input hierarchical infrared image super-resolution method via collaborative CNN and Transformer",

abstract = "Due to the low spatial resolution of infrared imaging systems, the acquired images typically suffer from low contrast, insufficient detail, and blurred edges. To address this issue, this paper proposes a multi-input hierarchical infrared image super-resolution reconstruction method based on collaborative CNN and Transformer, termed MIHNet. The network adopts a multi-input encoder–decoder structure as the framework. Firstly, a Local–Global Feature Perception Module (LGFPM) is designed, consisting of the constructed Local Texture Attention Unit (LTAU) and the Global Transformer Attention Unit (GTAU), aimed at simultaneously enhancing the local detail and global structure reconstruction capabilities of infrared images. Secondly, a Feature Refinement Module (FRM) is constructed to enhance the encoded feature expression. Then, a Multi-level Feature Fusion (MFF) module is designed to fuse the encoding stage's features adaptively. Finally, a mixed loss function composed of pixel loss, structure loss, and texture loss is constructed to guide network optimization. Experiments on three public datasets demonstrate that the proposed method outperforms thirteen other comparison algorithms in subjective and objective evaluations. Furthermore, this method has been verified in the downstream task of infrared and visible image fusion, which further demonstrates that MIHNet achieves a good SR reconstruction effect.",

keywords = "CNN, Deep learning, Image super-resolution, Infrared images, Transformer",

author = "Yang Bai and Meijing Gao and Huanyu Sun and Sibo Chen and Yunjia Xie and Yonghao Yan and Xiangrui Fan",

note = "Publisher Copyright: {\textcopyright} 2025 Elsevier B.V.",

year = "2025",

month = nov,

doi = "10.1016/j.infrared.2025.106004",

language = "English",

volume = "150",

journal = "Infrared Physics and Technology",

issn = "1350-4495",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - MIHNet

T2 - Multi-input hierarchical infrared image super-resolution method via collaborative CNN and Transformer

AU - Bai, Yang

AU - Gao, Meijing

AU - Sun, Huanyu

AU - Chen, Sibo

AU - Xie, Yunjia

AU - Yan, Yonghao

AU - Fan, Xiangrui

PY - 2025/11

Y1 - 2025/11

N2 - Due to the low spatial resolution of infrared imaging systems, the acquired images typically suffer from low contrast, insufficient detail, and blurred edges. To address this issue, this paper proposes a multi-input hierarchical infrared image super-resolution reconstruction method based on collaborative CNN and Transformer, termed MIHNet. The network adopts a multi-input encoder–decoder structure as the framework. Firstly, a Local–Global Feature Perception Module (LGFPM) is designed, consisting of the constructed Local Texture Attention Unit (LTAU) and the Global Transformer Attention Unit (GTAU), aimed at simultaneously enhancing the local detail and global structure reconstruction capabilities of infrared images. Secondly, a Feature Refinement Module (FRM) is constructed to enhance the encoded feature expression. Then, a Multi-level Feature Fusion (MFF) module is designed to fuse the encoding stage's features adaptively. Finally, a mixed loss function composed of pixel loss, structure loss, and texture loss is constructed to guide network optimization. Experiments on three public datasets demonstrate that the proposed method outperforms thirteen other comparison algorithms in subjective and objective evaluations. Furthermore, this method has been verified in the downstream task of infrared and visible image fusion, which further demonstrates that MIHNet achieves a good SR reconstruction effect.

AB - Due to the low spatial resolution of infrared imaging systems, the acquired images typically suffer from low contrast, insufficient detail, and blurred edges. To address this issue, this paper proposes a multi-input hierarchical infrared image super-resolution reconstruction method based on collaborative CNN and Transformer, termed MIHNet. The network adopts a multi-input encoder–decoder structure as the framework. Firstly, a Local–Global Feature Perception Module (LGFPM) is designed, consisting of the constructed Local Texture Attention Unit (LTAU) and the Global Transformer Attention Unit (GTAU), aimed at simultaneously enhancing the local detail and global structure reconstruction capabilities of infrared images. Secondly, a Feature Refinement Module (FRM) is constructed to enhance the encoded feature expression. Then, a Multi-level Feature Fusion (MFF) module is designed to fuse the encoding stage's features adaptively. Finally, a mixed loss function composed of pixel loss, structure loss, and texture loss is constructed to guide network optimization. Experiments on three public datasets demonstrate that the proposed method outperforms thirteen other comparison algorithms in subjective and objective evaluations. Furthermore, this method has been verified in the downstream task of infrared and visible image fusion, which further demonstrates that MIHNet achieves a good SR reconstruction effect.

KW - CNN

KW - Deep learning

KW - Image super-resolution

KW - Infrared images

KW - Transformer

UR - http://www.scopus.com/pages/publications/105011490991

U2 - 10.1016/j.infrared.2025.106004

DO - 10.1016/j.infrared.2025.106004

M3 - Article

AN - SCOPUS:105011490991

SN - 1350-4495

VL - 150

JO - Infrared Physics and Technology

JF - Infrared Physics and Technology

M1 - 106004

ER -

MIHNet: Multi-input hierarchical infrared image super-resolution method via collaborative CNN and Transformer

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this