TSG: a new algorithm for binary and multi-class cancer classification and informative genes selectionReport as inadecuate

TSG: a new algorithm for binary and multi-class cancer classification and informative genes selection - Download this document for free, or read online. Document in PDF available to download.

BMC Medical Genomics

, 6:S3

First Online: 23 January 2013


BackgroundOne of the challenges in classification of cancer tissue samples based on gene expression data is to establish an effective method that can select a parsimonious set of informative genes. The Top Scoring Pair TSP, k-Top Scoring Pairs k-TSP, Support Vector Machines SVM, and prediction analysis of microarrays PAM are four popular classifiers that have comparable performance on multiple cancer datasets. SVM and PAM tend to use a large number of genes and TSP, k-TSP always use even number of genes. In addition, the selection of distinct gene pairs in k-TSP simply combined the pairs of top ranking genes without considering the fact that the gene set with best discrimination power may not be the combined pairs. The k-TSP algorithm also needs the user to specify an upper bound for the number of gene pairs. Here we introduce a computational algorithm to address the problems. The algorithm is named Chisquare-statistic-based Top Scoring Genes Chi-TSG classifier simplified as TSG.

ResultsThe TSG classifier starts with the top two genes and sequentially adds additional gene into the candidate gene set to perform informative gene selection. The algorithm automatically reports the total number of informative genes selected with cross validation. We provide the algorithm for both binary and multi-class cancer classification. The algorithm was applied to 9 binary and 10 multi-class gene expression datasets involving human cancers. The TSG classifier outperforms TSP family classifiers by a big margin in most of the 19 datasets. In addition to improved accuracy, our classifier shares all the advantages of the TSP family classifiers including easy interpretation, invariant to monotone transformation, often selects a small number of informative genes allowing follow-up studies, resistant to sampling variations due to within sample operations.

ConclusionsRedefining the scores for gene set and the classification rules in TSP family classifiers by incorporating the sample size information can lead to better selection of informative genes and classification accuracy. The resulting TSG classifier offers a useful tool for cancer classification based on numerical molecular data.

List of abbreviations usedTSPtop scoring pair

k-TSPk top scoring pairs

HC-TSPmulti-class extension of TSP with hierarchical classification scheme

HC-k-TSPmulti-class extension of k-TSP with hierarchical classification scheme

SVMSupport Vector Machine classification

PAMPrediction Analysis of Microarray

LOOCVleave-one-out cross validation.

Haiyan Wang, Hongyan Zhang contributed equally to this work.

Download fulltext PDF

Author: Haiyan Wang - Hongyan Zhang - Zhijun Dai - Ming-shun Chen - Zheming Yuan

Source: https://link.springer.com/


Related documents