A fast and high performance multiple data integration algorithm for identifying human disease genesReport as inadecuate




A fast and high performance multiple data integration algorithm for identifying human disease genes - Download this document for free, or read online. Document in PDF available to download.

BMC Medical Genomics

, 8:S2

First Online: 23 September 2015

Abstract

BackgroundIntegrating multiple data sources is indispensable in improving disease gene identification. It is not only due to the fact that disease genes associated with similar genetic diseases tend to lie close with each other in various biological networks, but also due to the fact that gene-disease associations are complex. Although various algorithms have been proposed to identify disease genes, their prediction performances and the computational time still should be further improved.

ResultsIn this study, we propose a fast and high performance multiple data integration algorithm for identifying human disease genes. A posterior probability of each candidate gene associated with individual diseases is calculated by using a Bayesian analysis method and a binary logistic regression model. Two prior probability estimation strategies and two feature vector construction methods are developed to test the performance of the proposed algorithm.

ConclusionsThe proposed algorithm is not only generated predictions with high AUC scores, but also runs very fast. When only a single PPI network is employed, the AUC score is 0.769 by using F2 as feature vectors. The average running time for each leave-one-out experiment is only around 1.5 seconds. When three biological networks are integrated, the AUC score using F3 as feature vectors increases to 0.830, and the average running time for each leave-one-out experiment takes only about 12.54 seconds. It is better than many existing algorithms.

Keywordsdisease gene Bayesian analysis logistic regression multiple data integration feature vector List of abbreviationsAUCarea under the ROC curve

CGIcombining gene expression and protein interaction

DIRdata integration rank

FPRfalse positive rate

MRFMarkov random field

OMIMonline Mendelian inheritance in man

PCCPearson correlation coefficient

PPIprotein-protein interaction

ROCreceiver operating characteristic

RWRrandom walk with restart

TPRtrue positive rate.

Download fulltext PDF



Author: Bolin Chen - Min Li - Jianxin Wang - Xuequn Shang - Fang-Xiang Wu

Source: https://link.springer.com/







Related documents