A Wasserstein Distance-Based Cost-Sensitive Framework for Imbalanced Data Classification
Loading...
Date
2023-09
Authors
Feng, R.
Ji, H.
Zhu, Z.
Wang, L.
ORCID
Advisor
Referee
Mark
Journal Title
Journal ISSN
Volume Title
Publisher
Společnost pro radioelektronické inženýrství
Altmetrics
Abstract
Class imbalance is a prevalent problem in many real-world applications, and imbalanced data distribution can dramatically skew the performance of classifiers. In general, the higher the imbalance ratio of a dataset, the more difficult it is to classify. However, it is found that standard classifiers can still achieve good classification results on some highly imbalanced datasets. Obviously, the class imbalance is only a superficial characteristic of the data, and the underlying structural information is often the key factor affecting the classification performance. As implicit prior knowledge, structural information has been validated to be crucial for designing a good classifier. This paper proposes a Wasserstein-based cost-sensitive support vector machine (CS-WSVM) for class imbalance learning, incorporating prior structural information and a cost-sensitive strategy. The Wasserstein distance is introduced to model the distribution of majority and minority samples to capture the structural information, which is employed to weight the majority and minority samples. Comprehensive experiments on synthetic and real-world datasets, especially on the radar emitter signal dataset, demonstrated that CS-WSVM can achieve outstanding performance in imbalanced scenarios.
Description
Citation
Radioengineering. 2023 vol. 32, č. 3, s. 451-466. ISSN 1210-2512
https://www.radioeng.cz/fulltexts/2023/23_03_0451_0466.pdf
https://www.radioeng.cz/fulltexts/2023/23_03_0451_0466.pdf
Document type
Peer-reviewed
Document version
Published version
Date of access to the full text
Language of document
en