Motif-Based Text Mining of Microbial Metagenome Redundancy Profiling Data for Disease ClassificationReport as inadecuate




Motif-Based Text Mining of Microbial Metagenome Redundancy Profiling Data for Disease Classification - Download this document for free, or read online. Document in PDF available to download.

BioMed Research International - Volume 2016 2016, Article ID 6598307, 11 pages -

Research Article

Shanghai Public Health Clinical Center and Institutes of Biomedical Sciences, Fudan University, Shanghai 200032, China

Shanghai Center for Bioinformation Technology, Shanghai 201203, China

Key Lab of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China

Department of Medical Microbiology and Parasitology, Institutes of Medical Sciences, Shanghai Jiao Tong University School of Medicine, Shanghai 200240, China

Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, Zhejiang 310003, China

Received 28 October 2015; Accepted 12 January 2016

Academic Editor: Zhenguo Zhang

Copyright © 2016 Yin Wang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Background. Text data of 16S rRNA are informative for classifications of microbiota-associated diseases. However, the raw text data need to be systematically processed so that features for classification can be defined-extracted; moreover, the high-dimension feature spaces generated by the text data also pose an additional difficulty. Results. Here we present a Phylogenetic Tree-Based Motif Finding algorithm PMF to analyze 16S rRNA text data. By integrating phylogenetic rules and other statistical indexes for classification, we can effectively reduce the dimension of the large feature spaces generated by the text datasets. Using the retrieved motifs in combination with common classification methods, we can discriminate different samples of both pneumonia and dental caries better than other existing methods. Conclusions. We extend the phylogenetic approaches to perform supervised learning on microbiota text data to discriminate the pathological states for pneumonia and dental caries. The results have shown that PMF may enhance the efficiency and reliability in analyzing high-dimension text data.





Author: Yin Wang, Rudong Li, Yuhua Zhou, Zongxin Ling, Xiaokui Guo, Lu Xie, and Lei Liu

Source: https://www.hindawi.com/



DOWNLOAD PDF




Related documents