42. A cluster impurity‑based hybrid resampling for imbalanced classification problems
Invited abstract in session MD-34: Advancements of OR-analytics in statistics, machine learning and data science 1, stream Advancements of OR-analytics in statistics, machine learning and data science.
Monday, 14:30-16:00Room: Michael Sadler LG10
Authors (first author is the speaker)
| 1. | You-Jin Park
|
| National Taipei University of Technology |
Abstract
Generally, when a class imbalance problem exists, the classifier tends to become biased towards the majority class and thus the minority class instances are often misclassified to the majority class. And, the overlap problem in class imbalanced data is known as one of the key sources that makes the learning task become difficult or deteriorates the learning performance. Thus, in this research, we develop a cluster impurity-based hybrid resampling technique to improve the classification performance of class imbalanced data with considering both intra-cluster class imbalance and inter-cluster overlap problems. In particular, various clustering methods are employed for identifying the clusters of the instances and the cluster impurity of each instance is obtained for measuring the cluster-overlap degree. Then, the synthetic instances are created and eliminated recursively based on the cluster impurity. To validate the effectiveness of the developed technique, comprehensive experiments have been conducted on forty imbalanced datasets and non-parametric hypothesis tests have been executed to prove the statistical difference in classification performances between the developed technique and other resampling techniques.
Keywords
- Analytics and Data Science
- Artificial Intelligence
- Machine Learning
Status: accepted
Back to the list of papers