Learning from high-dimensional and class-imbalanced datasets using random forests

Pes B.
First
2021-01-01

Abstract

Class imbalance and high dimensionality are two major issues in several real-life applications, e.g., in the fields of bioinformatics, text mining and image classification. However, while both issues have been extensively studied in the machine learning community, they have mostly been treated separately, and little research has been thus far conducted on which approaches might be best suited to deal with datasets that are class-imbalanced and high-dimensional at the same time (i.e., with a large number of features). This work attempts to give a contribution to this challenging research area by studying the effectiveness of hybrid learning strategies that involve the integration of feature selection techniques, to reduce the data dimensionality, with proper methods that cope with the adverse effects of class imbalance (in particular, data balancing and cost-sensitive methods are considered). Extensive experiments have been carried out across datasets from different domains, leveraging a well-known classifier, the Random Forest, which has proven to be effective in high-dimensional spaces and has also been successfully applied to imbalanced tasks. Our results give evidence of the benefits of such a hybrid approach, when compared to using only feature selection or imbalance learning methods alone.
2021
2021
Inglese
12
8
286
1
16
16
https://www.mdpi.com/2078-2489/12/8/286
Esperti anonimi
internazionale
scientifica
class imbalance; feature selection; high-dimensional data; random forest
no
Pes, B.
1.1 Articolo in rivista
info:eu-repo/semantics/article
1 Contributo su Rivista::1.1 Articolo in rivista
262
1
open
Files in This Item:
File Size Format  
information-12-00286.pdf

open access

Description: Articolo principale
Type: versione editoriale
Size 1.04 MB
Format Adobe PDF
1.04 MB Adobe PDF View/Open

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Questionnaire and social

Share on:
Impostazioni cookie