On Multilabel Classification Methods of Incompletely Labeled Biomedical Text DataReport as inadecuate

On Multilabel Classification Methods of Incompletely Labeled Biomedical Text Data - Download this document for free, or read online. Document in PDF available to download.

Computational and Mathematical Methods in Medicine - Volume 2014 2014, Article ID 781807, 11 pages -

Research Article

Center for Pediatric Hematology, Oncology, and Immunology, Moscow 117997, Russia

Moscow Institute of Physics and Technology, Moscow 117303, Russia

The Biogerontology Research Foundation, Reading W1J 5NE, UK

Chemistry Department, Moscow State University, Moscow 119991, Russia

Received 9 September 2013; Revised 8 December 2013; Accepted 12 December 2013; Published 23 January 2014

Academic Editor: Dejing Dou

Copyright © 2014 Anton Kolesov et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Multilabel classification is often hindered by incompletely labeled training datasets; for some items of such dataset or even for all of them some labels may be omitted. In this case, we cannot know if any item is labeled fully and correctly. When we train a classifier directly on incompletely labeled dataset, it performs ineffectively. To overcome the problem, we added an extra step, training set modification, before training a classifier. In this paper, we try two algorithms for training set modification: weighted k-nearest neighbor WkNN and soft supervised learning SoftSL. Both of these approaches are based on similarity measurements between data vectors. We performed the experiments on AgingPortfolio text dataset and then rechecked on the Yeast nontext genetic data. We tried SVM and RF classifiers for the original datasets and then for the modified ones. For each dataset, our experiments demonstrated that both classification algorithms performed considerably better when preceded by the training set modification step.

Author: Anton Kolesov, Dmitry Kamyshenkov, Maria Litovchenko, Elena Smekalova, Alexey Golovizin, and Alex Zhavoronkov

Source: https://www.hindawi.com/


Related documents