Kernelized partial least squares for feature reduction and classification of gene microarray dataReport as inadecuate

Kernelized partial least squares for feature reduction and classification of gene microarray data - Download this document for free, or read online. Document in PDF available to download.

BMC Systems Biology

, 5:S13

First Online: 23 December 2011


BackgroundThe primary objectives of this paper are: 1. to apply Statistical Learning Theory SLT, specifically Partial Least Squares PLS and Kernelized PLS K-PLS, to the universal -feature-rich-case-poor- also known as -large p small n-, or -high-dimension, low-sample size- microarray problem by eliminating those features or probes that do not contribute to the -best- chromosome bio-markers for lung cancer, and 2. quantitatively measure and verify by an independent means the efficacy of this PLS process. A secondary objective is to integrate these significant improvements in diagnostic and prognostic biomedical applications into the clinical research arena. That is, to devise a framework for converting SLT results into direct, useful clinical information for patient care or pharmaceutical research. We, therefore, propose and preliminarily evaluate, a process whereby PLS, K-PLS, and Support Vector Machines SVM may be integrated with the accepted and well understood traditional biostatistical -gold standard-, Cox Proportional Hazard model and Kaplan-Meier survival analysis methods. Specifically, this new combination will be illustrated with both PLS and Kaplan-Meier followed by PLS and Cox Hazard Ratios CHR and can be easily extended for both the K-PLS and SVM paradigms. Finally, these previously described processes are contained in the Fine Feature Selection FFS component of our overall feature reduction-evaluation process, which consists of the following components: 1. coarse feature reduction, 2. fine feature selection and 3. classification as described in this paper and prediction.

ResultsOur results for PLS and K-PLS showed that these techniques, as part of our overall feature reduction process, performed well on noisy microarray data. The best performance was a good 0.794 Area Under a Receiver Operating Characteristic ROC Curve AUC for classification of recurrence prior to or after 36 months and a strong 0.869 AUC for classification of recurrence prior to or after 60 months. Kaplan-Meier curves for the classification groups were clearly separated, with p-values below 4.5e-12 for both 36 and 60 months. CHRs were also good, with ratios of 2.846341 36 months and 3.996732 60 months.

ConclusionsSLT techniques such as PLS and K-PLS can effectively address difficult problems with analyzing biomedical data such as microarrays. The combinations with established biostatistical techniques demonstrated in this paper allow these methods to move from academic research and into clinical practice.

Download fulltext PDF

Author: Walker H Land - Xingye Qiao - Daniel E Margolis - William S Ford - Christopher T Paquette - Joseph F Perez-Rogers - Jef


Related documents