Retain and Enhance Modality-Specific Information for Multimodal Remote Sensing Image Land Use/Land Cover Classification

Tianyu Wei; He Chen; Wenchao Liu; Liang Chen; Jue Wang

doi:10.1109/TGRS.2025.3591926

Retain and Enhance Modality-Specific Information for Multimodal Remote Sensing Image Land Use/Land Cover Classification

Tianyu Wei, He Chen, Wenchao Liu, Liang Chen, Jue Wang^*

^*此作品的通讯作者

信息与电子学院

Beijing Institute of Technology

科研成果: 期刊稿件 › 文章 › 同行评审

摘要

Multimodal remote sensing (RS) image land use/land cover (LULC) classification using optical and synthetic aperture radar (SAR) images has raised attention for recent studies. Current methods primarily employ multimodal fusion operations to directly explore relationships between multimodal features and obtain fused features, leading to the loss of beneficial modality-specific information problem. To solve this problem, this study introduces a multimodal feature decomposition and fusion (MDF) approach combined with a visual state space (VSS) block, namely MDF-VSS block. The MDF-VSS block emphasizes beneficial modality-specific information and perceives shared land cover information through modality-difference and modality-share features, which are then adaptively integrated to obtain discriminative fused features. Based on the MDF-VSS block, an MDF decoder is designed to retain beneficial multi-scale modality-specific information. Then, a multimodal specific information enhancement (MSIE) decoder is designed to perform modality-difference feature guided auxiliary classification tasks, further enhancing modality-specific information that is expert in classification. Combining the MDF and MSIE decoders, a novel retain-enhance fusion network (REF-Net) is proposed to retain and enhance modality-specific information that benefits classification, thus improving the performance of multimodal RS image LULC classification. Extensive experimental results obtained on three public datasets demonstrate the effectiveness of the proposed REF-Net.

源语言	英语
期刊	IEEE Transactions on Geoscience and Remote Sensing
DOI	http://doi.org/10.1109/TGRS.2025.3591926
出版状态	已接受/待刊 - 2025

联合国可持续发展目标

此成果有助于实现下列可持续发展目标：

访问文件

10.1109/TGRS.2025.3591926

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{af1dec9e028343a6be295599e4d6e689,

title = "Retain and Enhance Modality-Specific Information for Multimodal Remote Sensing Image Land Use/Land Cover Classification",

abstract = "Multimodal remote sensing (RS) image land use/land cover (LULC) classification using optical and synthetic aperture radar (SAR) images has raised attention for recent studies. Current methods primarily employ multimodal fusion operations to directly explore relationships between multimodal features and obtain fused features, leading to the loss of beneficial modality-specific information problem. To solve this problem, this study introduces a multimodal feature decomposition and fusion (MDF) approach combined with a visual state space (VSS) block, namely MDF-VSS block. The MDF-VSS block emphasizes beneficial modality-specific information and perceives shared land cover information through modality-difference and modality-share features, which are then adaptively integrated to obtain discriminative fused features. Based on the MDF-VSS block, an MDF decoder is designed to retain beneficial multi-scale modality-specific information. Then, a multimodal specific information enhancement (MSIE) decoder is designed to perform modality-difference feature guided auxiliary classification tasks, further enhancing modality-specific information that is expert in classification. Combining the MDF and MSIE decoders, a novel retain-enhance fusion network (REF-Net) is proposed to retain and enhance modality-specific information that benefits classification, thus improving the performance of multimodal RS image LULC classification. Extensive experimental results obtained on three public datasets demonstrate the effectiveness of the proposed REF-Net.",

keywords = "land use/land cover classification, modality-specific information, Multimodal, optical, remote sensing, synthetic aperture radar (SAR)",

author = "Tianyu Wei and He Chen and Wenchao Liu and Liang Chen and Jue Wang",

note = "Publisher Copyright: {\textcopyright} 1980-2012 IEEE.",

year = "2025",

doi = "10.1109/TGRS.2025.3591926",

language = "English",

journal = "IEEE Transactions on Geoscience and Remote Sensing",

issn = "0196-2892",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Retain and Enhance Modality-Specific Information for Multimodal Remote Sensing Image Land Use/Land Cover Classification

AU - Wei, Tianyu

AU - Chen, He

AU - Liu, Wenchao

AU - Chen, Liang

AU - Wang, Jue

PY - 2025

Y1 - 2025

N2 - Multimodal remote sensing (RS) image land use/land cover (LULC) classification using optical and synthetic aperture radar (SAR) images has raised attention for recent studies. Current methods primarily employ multimodal fusion operations to directly explore relationships between multimodal features and obtain fused features, leading to the loss of beneficial modality-specific information problem. To solve this problem, this study introduces a multimodal feature decomposition and fusion (MDF) approach combined with a visual state space (VSS) block, namely MDF-VSS block. The MDF-VSS block emphasizes beneficial modality-specific information and perceives shared land cover information through modality-difference and modality-share features, which are then adaptively integrated to obtain discriminative fused features. Based on the MDF-VSS block, an MDF decoder is designed to retain beneficial multi-scale modality-specific information. Then, a multimodal specific information enhancement (MSIE) decoder is designed to perform modality-difference feature guided auxiliary classification tasks, further enhancing modality-specific information that is expert in classification. Combining the MDF and MSIE decoders, a novel retain-enhance fusion network (REF-Net) is proposed to retain and enhance modality-specific information that benefits classification, thus improving the performance of multimodal RS image LULC classification. Extensive experimental results obtained on three public datasets demonstrate the effectiveness of the proposed REF-Net.

AB - Multimodal remote sensing (RS) image land use/land cover (LULC) classification using optical and synthetic aperture radar (SAR) images has raised attention for recent studies. Current methods primarily employ multimodal fusion operations to directly explore relationships between multimodal features and obtain fused features, leading to the loss of beneficial modality-specific information problem. To solve this problem, this study introduces a multimodal feature decomposition and fusion (MDF) approach combined with a visual state space (VSS) block, namely MDF-VSS block. The MDF-VSS block emphasizes beneficial modality-specific information and perceives shared land cover information through modality-difference and modality-share features, which are then adaptively integrated to obtain discriminative fused features. Based on the MDF-VSS block, an MDF decoder is designed to retain beneficial multi-scale modality-specific information. Then, a multimodal specific information enhancement (MSIE) decoder is designed to perform modality-difference feature guided auxiliary classification tasks, further enhancing modality-specific information that is expert in classification. Combining the MDF and MSIE decoders, a novel retain-enhance fusion network (REF-Net) is proposed to retain and enhance modality-specific information that benefits classification, thus improving the performance of multimodal RS image LULC classification. Extensive experimental results obtained on three public datasets demonstrate the effectiveness of the proposed REF-Net.

KW - land use/land cover classification

KW - modality-specific information

KW - Multimodal

KW - optical

KW - remote sensing

KW - synthetic aperture radar (SAR)

UR - http://www.scopus.com/pages/publications/105011978863

U2 - 10.1109/TGRS.2025.3591926

DO - 10.1109/TGRS.2025.3591926

M3 - Article

AN - SCOPUS:105011978863

SN - 0196-2892

JO - IEEE Transactions on Geoscience and Remote Sensing

JF - IEEE Transactions on Geoscience and Remote Sensing

ER -

Retain and Enhance Modality-Specific Information for Multimodal Remote Sensing Image Land Use/Land Cover Classification

摘要

联合国可持续发展目标

访问文件

其它文件与链接

指纹

引用此