Classification of imbalanced datasets is a critical problem in numerous contexts. In these applications, standard methods are not able to satisfactorily detect rare patterns due to multiple factors that bias the classifiers toward the frequent class. This paper overview a novel family of methods for the resampling of an imbalanced dataset in order to maximize the performance of arbitrary data-driven classifiers. The presented approaches exploit genetic algorithms (GA) for the optimization of the data selection process according to a set of criteria that assess each candidate sample suitability. A comparison among the presented techniques on a set of industrial and literature datasets put into evidence the validity of this family of approaches, which is able not only to improve the performance of a standard classifier but also to determine the optimal resampling rate automatically. Future activities for the improvement of the proposed approach will include the development of new criteria for the assessment of sample suitability.

Optimization of data resampling through GA for the classification of imbalanced datasets

Galli, Filippo;
2019

Abstract

Classification of imbalanced datasets is a critical problem in numerous contexts. In these applications, standard methods are not able to satisfactorily detect rare patterns due to multiple factors that bias the classifiers toward the frequent class. This paper overview a novel family of methods for the resampling of an imbalanced dataset in order to maximize the performance of arbitrary data-driven classifiers. The presented approaches exploit genetic algorithms (GA) for the optimization of the data selection process according to a set of criteria that assess each candidate sample suitability. A comparison among the presented techniques on a set of industrial and literature datasets put into evidence the validity of this family of approaches, which is able not only to improve the performance of a standard classifier but also to determine the optimal resampling rate automatically. Future activities for the improvement of the proposed approach will include the development of new criteria for the assessment of sample suitability.
2019
Settore ING-INF/05 - Sistemi di Elaborazione delle Informazioni
Classification; Data resampling; Genetic algorithm; Imbalanced datasets
File in questo prodotto:
File Dimensione Formato  
Optimization_of_data_resamplin.pdf

accesso aperto

Tipologia: Published version
Licenza: Creative Commons
Dimensione 1.14 MB
Formato Adobe PDF
1.14 MB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11384/142467
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
social impact