Loss factors for learning Boosting ensembles from imbalanced data

Fumera, Giorgio
Last
2016-01-01

Abstract

Class imbalance is an issue in many real world applications because classification algorithms tend to misclassify instances from the class of interest when its training samples are outnumbered by those of other classes. Several variations of AdaBoost ensemble method have been proposed in literature to learn from imbalanced data based on re-sampling. However, their loss factor is based on standard accuracy, which still biases performance towards the majority class. This problem is mitigated using cost-sensitive Boosting algorithms, although it can be avoided at the outset by modifying the loss factor calculation. In this paper, two loss factors, based on F-measure and G-mean are proposed that are more suitable to deal with imbalanced data during the Boosting learning process. The performance of standard AdaBoost and of three specialized versions for class imbalance (SMOTEBoost, RUSBoost, and RB-Boost) are empirically evaluated using the proposed loss factors, both on synthetic data and on a real-world face re-identification task. Experimental results show a significant performance improvement on AdaBoost and RUSBoost with the proposed loss factors.
2016
9781509048472
Files in This Item:
File Size Format  
ICPR_cr4.pdf

Solo gestori archivio

Description: Articolo principale
Type: versione post-print
Size 435.44 kB
Format Adobe PDF
435.44 kB Adobe PDF & nbsp; View / Open   Request a copy

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Questionnaire and social

Share on:
Impostazioni cookie