Enhanced Swin Transformer and Edge Spatial Attention for Remote Sensing Image Semantic Segmentation

Fuxiang Liu; Zhiqiang Hu; Lei Li; Hanlu Li; Xinxin Liu

doi:10.1109/LSP.2025.3550858

Enhanced Swin Transformer and Edge Spatial Attention for Remote Sensing Image Semantic Segmentation

Fuxiang Liu, Zhiqiang Hu, Lei Li^*, Hanlu Li, Xinxin Liu

^*此作品的通讯作者

空天科学与技术学院

科研成果: 期刊稿件 › 文章 › 同行评审

摘要

Combining convolutional neural networks (CNNs) and transformers is a crucial direction in remote sensing image semantic segmentation. However, due to differences in the spatial information focus and feature extraction methods, existing feature transfer and fusion strategies do not effectively integrate the advantages of both approaches. To address these issues, we propose a CNN-transformer hybrid network for precise remote sensing image semantic segmentation. We propose a novel Swin Transformer block to optimize feature extraction and enable the model to handle remote sensing images of arbitrary sizes. Additionally, we design an Edge Spatial Attention module to focus attention on local edge structures, effectively integrating global features and local details. This facilitates efficient information flow between the Transformer encoder and CNN decoder. Finally, a multi-scale convolutional decoder is employed to fully leverage both global information from the Transformer and local features from the CNN, leading to accurate segmentation results. Our network achieved state-of-the-art performance on the Vaihingen and Potsdam datasets, reaching mIoU and F1 scores of 67.37% and 79.82%, as well as 72.39% and 83.68%, respectively.

源语言	英语
页（从-至）	1296-1300
页数	5
期刊	IEEE Signal Processing Letters
卷	32
DOI	http://doi.org/10.1109/LSP.2025.3550858
出版状态	已出版 - 2025

访问文件

10.1109/LSP.2025.3550858

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{ea76fbed7250468896dc3cce08c280f5,

title = "Enhanced Swin Transformer and Edge Spatial Attention for Remote Sensing Image Semantic Segmentation",

abstract = "Combining convolutional neural networks (CNNs) and transformers is a crucial direction in remote sensing image semantic segmentation. However, due to differences in the spatial information focus and feature extraction methods, existing feature transfer and fusion strategies do not effectively integrate the advantages of both approaches. To address these issues, we propose a CNN-transformer hybrid network for precise remote sensing image semantic segmentation. We propose a novel Swin Transformer block to optimize feature extraction and enable the model to handle remote sensing images of arbitrary sizes. Additionally, we design an Edge Spatial Attention module to focus attention on local edge structures, effectively integrating global features and local details. This facilitates efficient information flow between the Transformer encoder and CNN decoder. Finally, a multi-scale convolutional decoder is employed to fully leverage both global information from the Transformer and local features from the CNN, leading to accurate segmentation results. Our network achieved state-of-the-art performance on the Vaihingen and Potsdam datasets, reaching mIoU and F1 scores of 67.37\% and 79.82\%, as well as 72.39\% and 83.68\%, respectively.",

keywords = "Edge detection, Swin transformer, remote sensing image, semantic segmentation",

author = "Fuxiang Liu and Zhiqiang Hu and Lei Li and Hanlu Li and Xinxin Liu",

note = "Publisher Copyright: {\textcopyright} 1994-2012 IEEE.",

year = "2025",

doi = "10.1109/LSP.2025.3550858",

language = "English",

volume = "32",

pages = "1296--1300",

journal = "IEEE Signal Processing Letters",

issn = "1070-9908",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Enhanced Swin Transformer and Edge Spatial Attention for Remote Sensing Image Semantic Segmentation

AU - Liu, Fuxiang

AU - Hu, Zhiqiang

AU - Li, Lei

AU - Li, Hanlu

AU - Liu, Xinxin

PY - 2025

Y1 - 2025

N2 - Combining convolutional neural networks (CNNs) and transformers is a crucial direction in remote sensing image semantic segmentation. However, due to differences in the spatial information focus and feature extraction methods, existing feature transfer and fusion strategies do not effectively integrate the advantages of both approaches. To address these issues, we propose a CNN-transformer hybrid network for precise remote sensing image semantic segmentation. We propose a novel Swin Transformer block to optimize feature extraction and enable the model to handle remote sensing images of arbitrary sizes. Additionally, we design an Edge Spatial Attention module to focus attention on local edge structures, effectively integrating global features and local details. This facilitates efficient information flow between the Transformer encoder and CNN decoder. Finally, a multi-scale convolutional decoder is employed to fully leverage both global information from the Transformer and local features from the CNN, leading to accurate segmentation results. Our network achieved state-of-the-art performance on the Vaihingen and Potsdam datasets, reaching mIoU and F1 scores of 67.37% and 79.82%, as well as 72.39% and 83.68%, respectively.

AB - Combining convolutional neural networks (CNNs) and transformers is a crucial direction in remote sensing image semantic segmentation. However, due to differences in the spatial information focus and feature extraction methods, existing feature transfer and fusion strategies do not effectively integrate the advantages of both approaches. To address these issues, we propose a CNN-transformer hybrid network for precise remote sensing image semantic segmentation. We propose a novel Swin Transformer block to optimize feature extraction and enable the model to handle remote sensing images of arbitrary sizes. Additionally, we design an Edge Spatial Attention module to focus attention on local edge structures, effectively integrating global features and local details. This facilitates efficient information flow between the Transformer encoder and CNN decoder. Finally, a multi-scale convolutional decoder is employed to fully leverage both global information from the Transformer and local features from the CNN, leading to accurate segmentation results. Our network achieved state-of-the-art performance on the Vaihingen and Potsdam datasets, reaching mIoU and F1 scores of 67.37% and 79.82%, as well as 72.39% and 83.68%, respectively.

KW - Edge detection

KW - Swin transformer

KW - remote sensing image

KW - semantic segmentation

UR - http://www.scopus.com/pages/publications/105001800247

U2 - 10.1109/LSP.2025.3550858

DO - 10.1109/LSP.2025.3550858

M3 - Article

AN - SCOPUS:105001800247

SN - 1070-9908

VL - 32

SP - 1296

EP - 1300

JO - IEEE Signal Processing Letters

JF - IEEE Signal Processing Letters

ER -

Enhanced Swin Transformer and Edge Spatial Attention for Remote Sensing Image Semantic Segmentation

摘要

访问文件

其它文件与链接

指纹

引用此