Abstract
Hyperspectral remote sensing image classification has been widely employed for numerous applications, such as environmental monitoring, agriculture, and mineralogy. During such classification, the number of training samples in each class often varies significantly. This imbalance in the dataset is often not identified because most classifiers are designed under a balanced dataset assumption, which can distort the minority classes or even treat them as noise. This may lead to biased and inaccurate classification results. This issue can be alleviated by applying preprocessing techniques that enable a uniform distribution of the imbalanced data for further classification. However, it is difficult to add new natural features to a training model by artificial combination of samples by using existing preprocessing techniques. For minority classes with sparse samples, the addition of sufficient natural features can effectively alleviate bias and improve the generalization. For such an imbalanced problem, semi-supervised learning is a creative solution that utilizes the rich natural features of unlabeled data, which can be collected at a low cost in the remote sensing classification. In this paper, we propose a novel semi-supervised learning-based preprocessing solution called NearPseudo. In NearPseudo, pseudo-labels are created by the initialization classifier and added to minority classes with the corresponding unlabeled samples. Simultaneously, to increase reliability and reduce the misclassification cost of pseudo-labels, we created a feedback mechanism based on a consistency check to effectively select the unlabeled data and its pseudo-labels. Experiments were conducted on a state-of-the-art representative hyperspectral dataset to verify the proposed method. The experimental results demonstrate that NearPseudo can achieve better classification accuracy than other common processing methods. Furthermore, it can be flexibly applied to most typical classifiers to improve their classification accuracy. With the intervention of NearPseudo, the accuracy of random forest, k-nearest neighbors, logistic regression, and classification and regression tree increased by 1.8%, 4.0%, 6.4%, and 3.7%, respectively. This study addresses research a gap to solve the imbalanced data-based limitations in hyperspectral image classification.
Dokumententyp: | Zeitschriftenartikel |
---|---|
Fakultät: | Geowissenschaften > Department für Geographie |
Themengebiete: | 500 Naturwissenschaften und Mathematik > 550 Geowissenschaften, Geologie |
Sprache: | Englisch |
Dokumenten ID: | 110623 |
Datum der Veröffentlichung auf Open Access LMU: | 02. Apr. 2024, 07:19 |
Letzte Änderungen: | 02. Apr. 2024, 07:19 |