Learning from high-dimensional biomedical datasets: The issue of class imbalance

Pes B.
First
2020-01-01

Abstract

As witnessed by a vast corpus of literature, dimensionality reduction is a fundamental step for biomedical data analysis. Indeed, in this domain, there is often the need for coping with a huge number of data attributes (or features). By removing irrelevant or redundant attributes, feature selection techniques can significantly reduce the complexity of the original problem, with important benefits in terms of domain understanding and knowledge discovery. When learning from biomedical data, however, the dimensionality issue is often addressed without a joint consideration of other critical aspects that may compromise the performance of the induced models. The adverse implications of an imbalanced class distribution, for example, are often neglected in this domain. The aim of this work is to investigate the effectiveness of hybrid learning strategies that incorporate both methods for dimensionality reduction as well as methods for alleviating the issue of class imbalance. Specifically, we combine different feature selection techniques, both univariate and multivariate, with sampling-based class balancing methods and cost-sensitive classification. The performance of the resulting learning schemes is experimentally evaluated on six high-dimensional genomic benchmarks, using different classification algorithms, with interesting insight about the best strategies to use based on the characteristics of the data at hand.
2020
Bioinformatics; Class imbalance; Cost-sensitive classification; Feature selection; High-dimensional data analysis; Random forest; Random under-sampling; Smote over-sampling
Files in This Item:
File Size Format  
ACCESS_published_final.pdf

open access

Description: Articolo principale
Type: versione editoriale
Size 6.38 MB
Format Adobe PDF
6.38 MB Adobe PDF View/Open

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Questionnaire and social

Share on:
Impostazioni cookie