Application of Sparse Bayesian Generalized Linear Model to Gene Expression Data for Classification of Prostate Cancer SubtypesReport as inadecuate




Application of Sparse Bayesian Generalized Linear Model to Gene Expression Data for Classification of Prostate Cancer Subtypes - Download this document for free, or read online. Document in PDF available to download.

A major limitationof expression profiling is caused by the large number of variables assessedcompared to relatively small sample sizes. In this study, we developed amultinomial Probit Bayesian model which utilizes the double exponential priorto induce shrinkage and reduce the number of covariates in the model 1. A hierarchical Sparse Bayesian GeneralizedLinear Model SBGLM was developed in order to facilitate Gibbs sampling whichtakes into account the progressive nature of the response variable. The methodwas evaluated using a published dataset GSE6099 which contained 99 prostatecancer cell types in four different progressive stages 2. Initially, 398 genes were selected usingordinal logistic regression with a cutoff value of 0.05 after Benjamini andHochberg FDR correction. The dataset was randomly divided into training N = 50and test N = 49 groups such that each group contained equal number of eachcancer subtype. In order to obtain more robust results we performed 50re-samplings of the training and test groups. Using the top ten genes obtainedfrom SBGLM, we were able to achieve an average classification accuracy of 85% and80% in training and test groups, respectively. To functionally evaluate themodel performance, we used a literature mining approach called Geneset CohesionAnalysis Tool 3. Examination of the top 100 genes producedan average functional cohesion p-value of 0.007 compared to 0.047 and 0.131produced by classical multi-category logistic regression and Random Forestapproaches, respectively. In addition, 96 percent of the SBGLM runs resulted ina GCAT literature cohesion p-value smaller than 0.047. Taken together, theseresults suggest that sparse Bayesian Multinomial Probit model applied to cancerprogression data allows for better subclass prediction and produces morefunctionally relevant gene sets.

KEYWORDS

LASSO, Robustness, Sparsity, MCMC, Gibbs Sampling

Cite this paper

Madahian, B. , Deng, L. and Homayouni, R. 2014 Application of Sparse Bayesian Generalized Linear Model to Gene Expression Data for Classification of Prostate Cancer Subtypes. Open Journal of Statistics, 4, 518-526. doi: 10.4236-ojs.2014.47049.





Author: Behrouz Madahian, Lih Y. Deng, Ramin Homayouni

Source: http://www.scirp.org/



DOWNLOAD PDF




Related documents