基于混合式特征选择的高分五号影像农田识别

陈珠琳; 贾坤; 李强子; 肖晨超; 魏丹丹; 赵祥; 魏香琴; 姚云军; 李娟

下载中心

优秀审稿专家

优秀论文

首页 > 2022, Vol. 26, Issue (7) : 1383-1394

摘要

全文摘要次数： 881 全文下载次数： 676

引用本文:

陈珠琳,贾坤,李强子,肖晨超,魏丹丹,赵祥,魏香琴,姚云军,李娟.2022.基于混合式特征选择的高分五号影像农田识别.遥感学报,26(7):1383-1394

DOI:

10.11834/jrs.20220458

收稿日期:

2020-10-26

修改日期:

PDF Free HTML EndNote BibTeX

基于混合式特征选择的高分五号影像农田识别

陈珠琳^1,2，贾坤^1,2，李强子³，肖晨超⁴，魏丹丹⁴，赵祥^1,2，魏香琴³，姚云军^1,2，李娟³

1.北京师范大学地理科学学部遥感科学国家重点实验室, 北京 100875;2.北京师范大学北京市陆表遥感数据产品工程技术研究中心, 北京 100875;3.中国科学院空天信息创新研究院, 北京 100101;4.自然资源部国土卫星遥感应用中心, 北京 100048

摘要:

精准农田识别是农作物估产和粮食安全评估的基础。遥感数据作为农田识别的重要数据源，可提供动态、快速的监测结果。高光谱数据在农田识别分类方面具有巨大的应用潜力，但其中的冗余波段影响了分类效率和分类精度。因此，本研究提出了一种适用于高光谱数据农田分类的混合式特征选择算法。首先，基于变量的重要性排序或约束程度，按步长逐步进行降维；其次，寻找分类精度骤减的转折点，并将其对应的变量作为特征子集；最后，利用序列后向选择SBS（Sequential Backward Selection）方法搜索最优分类特征子集。本研究利用GF-5高光谱数据，共研究了3种降维方法（随机森林RF（Random Forest）、互信息MI（Multi-Information）和L1正则化（L1 regularization））和3种分类算法（随机森林、支持向量机SVM（Support Vector Machine）和K近邻KNN（K-Nearest Neighbor））的组合在农田分类中的表现。结果表明，基于L1正则化法得到的特征子集自相关性较低，并且包含的红边和近红外波段有效提高了农田、森林和裸土的区分度。在不同分类模型比较中发现，SVM在高维空间中表现出非常好的抗噪能力，分类精度高于RF和KNN。而RF在低维空间中的泛化能力要高于SVM和KNN。相比于第一步降维得到的特征子集，使用SBS搜索得到的最优特征子集均提高了分类精度。最终，具有23维输入的L1-SVM-SBS分类模型得到了最高的总体分类精度（94.64%）和农田召回率（95.83%）。本研究为高光谱数据特征优选提供了一种新思路，筛选出了更具代表性的特征波段，提高了农田分类精度，对高光谱遥感分类研究具有参考价值。

关键词:

农田识别高分五号特征选择高光谱遥感 L1正则化后向序列选择

Hybrid feature selection for cropland identification using GF-5 satellite image

Abstract:

Accurate farmland area identification is the basis of crop yield estimation and an important indicator in food security assessment. As an important data source for farmland identification, remote sensing data can provide dynamic and fast observation results for classification. GF-5, which is the only hyperspectral satellite in the China High-resolution Earth Observation System, has great research and application potential in farmland identification. However, the dimensionality curse caused by the redundant bands in hyperspectral data seriously affects the calculation speed and classification accuracy of models. To solve this problem, this research proposes a hybrid feature selection algorithm for farmland identification. First, on the basis of the feature importance provided by the feature selection algorithm, the feature dimension is gradually reduced from 295 to 5 with a step length of 10. The overall accuracy of the classification results corresponding to each feature dimension is recorded. Second, the turning point (a dimension number whose corresponding overall accuracy hardly decreases when the input variable number is smaller than it) is determined based on the overall accuracy, and the corresponding variables are adopted as the feature subset. Lastly, the Sequential Backward Selection (SBS) method is used to search for the best subset.Three feature selection algorithms (i.e., Random Forest (RF), Multi-Information (MI), and L1 regularization (L1)) and three classification algorithms (RF, Support Vector Machine (SVM), and K-Nearest Neighbor (KNN)) are examined. Results indicate that the autocorrelations of the three subsets differ significantly. Most of the bands selected by the MI method are continuous and concentrated in the blue and shortwave infrared range. Therefore, the extremely high autocorrelation that exists in this subset has a negative effect on classification accuracy. By contrast, the correlation between bands in the RF and L1 feature subsets is relatively weak. However, the two feature sets still result in different classification accuracy. According to the variable distribution, many red-edge and near-infrared bands are contained in the L1 feature subset. These bands demonstrate better ability to distinguish farmland, forest, and soil than the blue and red bands selected by the RF algorithm. The classification algorithms also have different capacities. In the high-dimensional space, the SVM algorithm exhibits high robustness to noise, resulting in high accuracy. However, when the dimension decreases to a critical value, the accuracy of SVM decreases sharply. By contrast, although RF is not as robust as SVM in the high-dimensional space, it has excellent generalization ability in the low-dimensional space. Compared with the subsets obtained after the first dimensionality reduction process, the optimal feature subsets obtained by SBS searching improve the classification accuracy of each model.The L1-SVM-SBS model with a 23-dimensional input achieves the highest overall classification accuracy (94.64%) and cropland recall rate (95.83%). This study provides a new method of farmland identification using hyperspectral data. By selecting numerous representative and informative bands, this method not only improves farmland classification accuracy, but can also be used as a reference for other classification problems involving hyperspectral remote sensing.

Key Words:

cropland identification GF-5 feature selection hyperspectral remote sensing L1 regularization sequential backward selection

本文暂时没有被引用！