融合CNN与Transformer的高分辨率遥感影像建筑物双流提取模型

刘宇鑫; 孟瑜; 邓毓弸; 陈静波; 刘帝佑

下载中心

优秀审稿专家

优秀论文

首页 > , Vol. , Issue () : -

摘要

全文摘要次数： 645 全文下载次数： 563

引用本文:

DOI:

10.11834/jrs.20243307

收稿日期:

2023-07-15

修改日期:

2023-11-21

PDF Free EndNote BibTeX

融合CNN与Transformer的高分辨率遥感影像建筑物双流提取模型

刘宇鑫¹, 孟瑜², 邓毓弸², 陈静波², 刘帝佑²

1.中国科学院空天信息创新研究院、中国科学院大学电子电气与通信工程学院;2.中国科学院空天信息创新研究院

摘要:

卷积神经网络（Convolutional Neural Network，CNN）和Transformer已被广泛应用于高分辨率遥感影像的建筑物提取任务。然而，CNN在建模长距离空间依赖时仍存在挑战，导致提取的建筑物存在内部空洞问题；而Transformer在捕捉空间局部细节特征上存在不足，容易导致建筑物边缘模糊及小型建筑物的漏检。为解决上述问题，本文提出了一种新型的双流网络模型用于高分辨率遥感影像的建筑物提取，名为ILGS-Net（Network for the Integration of Local and Global Features Stream）。该模型将CNN与Transformer相结合，采用多层级的局部-全局特征融合模块，实现了对建筑物的局部细节特征与全局上下文特征的高效融合。同时，在目标函数中引入边缘损失函数约束模型训练，提高了建筑物边界的定位精度。在三个高分辨率建筑物数据集上的实验结果显示，所提出方法的交并比均高于本文所对比的最佳方法，平均提高了1%。

关键词:

建筑物提取，深度学习，双流网络，边缘损失，局部和全局特征融合

Integration of CNN and Transformer for High-Resolution Remote Sensing Image Building Extraction: A Dual-Stream Network

Abstract:

Convolutional Neural Networks (CNNs) and Transformers have emerged as pivotal tools in the realm of building extraction tasks within high-resolution remote sensing images. While these techniques have seen widespread application, challenges persist for CNNs in effectively modeling long-range spatial dependencies, often leading to complications such as the emergence of internal holes in the extracted building structures. Conversely, Transformers exhibit limitations in capturing spatial local details, potentially resulting in the production of blurry building edges and the oversight of smaller structures. In response to these challenges, this paper presents an innovative dual-stream network model tailored for building extraction in high-resolution remote sensing images, denominated as ILGS-Net (Network for the Integration of Local and Global Features Stream).ILGS-Net is designed to capitalize on the strengths of both CNNs and Transformers. The model incorporates multi-level local-global feature fusion modules to seamlessly blend intricate local details and expansive global context features of buildings. In tandem, an edge loss function is integrated into the objective function, contributing to the refinement of building boundary localization precision. The proposed ILGS-Net endeavors to address the shortcomings of existing methodologies by efficiently combining the unique attributes of CNNs and Transformers. Multi-level local-global feature fusion modules play a pivotal role in striking a harmonious balance between capturing fine-grained local details and incorporating broader global context features of buildings. Simultaneously, the inclusion of an edge loss function serves as a guiding mechanism in model training, augmenting the precision of building boundary localization. Extensive experiments conducted across three high-resolution building datasets consistently demonstrate the superior performance of the proposed ILGS-Net compared to benchmark methods outlined in this paper. Notably, the proposed method achieves, on average, a remarkable 1% increase in Intersection over Union (IoU) across all three datasets. In conclusion, ILGS-Net emerges as a groundbreaking dual-stream network model expressly designed for building extraction in high-resolution remote sensing images. By seamlessly integrating CNNs and Transformers, along with the implementation of multi-level local-global feature fusion and the inclusion of an edge loss function, the model adeptly addresses challenges associated with spatial dependencies and local details, resulting in a marked improvement in the accuracy of building extraction. The experimental results underscore the efficacy of the proposed method, positioning it as a promising and influential approach for achieving high-precision building extraction in high-resolution remote sensing images. The confluence of advanced methodologies and innovative techniques within ILGS-Net marks a significant stride forward in the field of remote sensing image analysis. As technology continues to evolve, ILGS-Net represents a pivotal contribution that holds promise for further advancements in building extraction accuracy, providing a solid foundation for continued research and application in the realm of high-resolution remote sensing imagery analysis. Looking ahead, the success of ILGS-Net prompts further exploration and research avenues. Investigating the potential of similar integrative approaches in other remote sensing tasks holds promise. Additionally, refining and expanding the current model architecture to accommodate varying scales and complexities of urban landscapes is a logical progression. Future work should focus on translating these advancements into tangible benefits for decision-makers and stakeholders in urban development and disaster response.

Key Words:

building extraction, deep learning, dual-stream network, local-global feature fusion

本文暂时没有被引用！