Data preprocessing in semi-supervised SVM classification

GORGONE, ENRICO;
2011-01-01

Abstract

The literature in the area of the semi-supervised binary classification has demonstrated that useful information can be gathered not only from those samples whose class membership is known in advance, but also from the unlabelled ones. In fact, in the support vector machine, semi-supervised models with both labelled and unlabelled samples contribute to the definition of an appropriate optimization model for finding a good quality separating hyperplane. In particular, the optimization approaches which have been devised in this context are basically of two types: a mixed integer linear programming problem, and a continuous optimization problem characterized by an objective function which is nonsmooth and nonconvex. Both such problems are hard to solve whenever the number of the unlabelled points increases. In this article, we present a data preprocessing technique which has the objective of reducing the number of unlabelled points to enter the computational model, without worsening too much the classification performance of the overall process. The approach is based on the concept of separating sets and can be implemented with a reasonable computational effort. The results of the numerical experiments on several benchmark datasets are also reported. © 2011 Taylor & Francis.
2011
2011
Inglese
60
1-2
143
151
9
Esperti anonimi
internazionale
scientifica
Data classification; Nonsmooth optimization; Semi-supervised learning; SVM; Control and Optimization; Management Science and Operations Research; Applied Mathematics
Astorino, A.; Gorgone, Enrico; Gaudioso, M; Pallaschke, D.
1.1 Articolo in rivista
info:eu-repo/semantics/article
1 Contributo su Rivista::1.1 Articolo in rivista
262
4
none
Files in This Item:
There are no files associated with this item.

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Questionnaire and social

Share on:
Impostazioni cookie