TY - JOUR
T1 - Retain and Enhance Modality-Specific Information for Multimodal Remote Sensing Image Land Use/Land Cover Classification
AU - Wei, Tianyu
AU - Chen, He
AU - Liu, Wenchao
AU - Chen, Liang
AU - Wang, Jue
N1 - Publisher Copyright:
© 1980-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - Multimodal remote sensing (RS) image land use/land cover (LULC) classification using optical and synthetic aperture radar (SAR) images has raised attention for recent studies. Current methods primarily employ multimodal fusion operations to directly explore relationships between multimodal features and obtain fused features, leading to the loss of beneficial modality-specific information problem. To solve this problem, this study introduces a multimodal feature decomposition and fusion (MDF) approach combined with a visual state space (VSS) block, namely MDF-VSS block. The MDF-VSS block emphasizes beneficial modality-specific information and perceives shared land cover information through modality-difference and modality-share features, which are then adaptively integrated to obtain discriminative fused features. Based on the MDF-VSS block, an MDF decoder is designed to retain beneficial multi-scale modality-specific information. Then, a multimodal specific information enhancement (MSIE) decoder is designed to perform modality-difference feature guided auxiliary classification tasks, further enhancing modality-specific information that is expert in classification. Combining the MDF and MSIE decoders, a novel retain-enhance fusion network (REF-Net) is proposed to retain and enhance modality-specific information that benefits classification, thus improving the performance of multimodal RS image LULC classification. Extensive experimental results obtained on three public datasets demonstrate the effectiveness of the proposed REF-Net.
AB - Multimodal remote sensing (RS) image land use/land cover (LULC) classification using optical and synthetic aperture radar (SAR) images has raised attention for recent studies. Current methods primarily employ multimodal fusion operations to directly explore relationships between multimodal features and obtain fused features, leading to the loss of beneficial modality-specific information problem. To solve this problem, this study introduces a multimodal feature decomposition and fusion (MDF) approach combined with a visual state space (VSS) block, namely MDF-VSS block. The MDF-VSS block emphasizes beneficial modality-specific information and perceives shared land cover information through modality-difference and modality-share features, which are then adaptively integrated to obtain discriminative fused features. Based on the MDF-VSS block, an MDF decoder is designed to retain beneficial multi-scale modality-specific information. Then, a multimodal specific information enhancement (MSIE) decoder is designed to perform modality-difference feature guided auxiliary classification tasks, further enhancing modality-specific information that is expert in classification. Combining the MDF and MSIE decoders, a novel retain-enhance fusion network (REF-Net) is proposed to retain and enhance modality-specific information that benefits classification, thus improving the performance of multimodal RS image LULC classification. Extensive experimental results obtained on three public datasets demonstrate the effectiveness of the proposed REF-Net.
KW - land use/land cover classification
KW - modality-specific information
KW - Multimodal
KW - optical
KW - remote sensing
KW - synthetic aperture radar (SAR)
UR - http://www.scopus.com/pages/publications/105011978863
U2 - 10.1109/TGRS.2025.3591926
DO - 10.1109/TGRS.2025.3591926
M3 - Article
AN - SCOPUS:105011978863
SN - 0196-2892
JO - IEEE Transactions on Geoscience and Remote Sensing
JF - IEEE Transactions on Geoscience and Remote Sensing
ER -