Evaluation of oversampling data balancing techniques in the context of ordinal classification



Data imbalance is characterized by a discrepancy in the number of examples per class of a dataset. This phenomenon is known to deteriorate the performance of classifiers, since they are less able to learn the characteristics of the less represented classes. For most imbalanced datasets, the application of sampling techniques improves the classifier's performance. For small datasets, oversampling has been shown to be the most appropriate strategy since it augments the original set of samples. Although several oversampling strategies have been proposed and tested over the years, the work has mostly focused on binary or multi- class tasks. Motivated by medical applications, where there is often an order associated with the classes (increasing likelihood of ma- lignancy, for instance), the present work tests some existing oversampling techniques in ordinal contexts. Moreover, four new oversampling techniques are proposed. Experiments were made both on private and public datasets. Private datasets concern the assessment of response to treatment on oncologic diseases. The 15 public datasets were chosen since they are widely used in the literature. Results show that data balance techniques improve classification results on ordinal im- balanced datasets, even when these techniques are not specifically designed for ordinal problems. With our pipeline, better or equal to published results were obtained for 10 out of the 15 public datasets with improvements upon a decrease of 0.43 on MMAE. I.



Related Project

Early-stage cancer treatment, driven by context of molecular imaging (ESTIMA)


International Joint Conference on Neural Networks (IJCNN) 2018

Cited by

No citations found