An improved machine learning protocol for the identification of correct Sequest search resultsReport as inadecuate




An improved machine learning protocol for the identification of correct Sequest search results - Download this document for free, or read online. Document in PDF available to download.

BMC Bioinformatics

, 11:591

Proteomics

Abstract

BackgroundMass spectrometry has become a standard method by which the proteomic profile of cell or tissue samples is characterized. To fully take advantage of tandem mass spectrometry MS-MS techniques in large scale protein characterization studies robust and consistent data analysis procedures are crucial. In this work we present a machine learning based protocol for the identification of correct peptide-spectrum matches from Sequest database search results, improving on previously published protocols.

ResultsThe developed model improves on published machine learning classification procedures by 6% as measured by the area under the ROC curve. Further, we show how the developed model can be presented as an interpretable tree of additive rules, thereby effectively removing the -black-box- notion often associated with machine learning classifiers, allowing for comparison with expert rule-of-thumb. Finally, a method for extending the developed peptide identification protocol to give probabilistic estimates of the presence of a given protein is proposed and tested.

ConclusionsWe demonstrate the construction of a high accuracy classification model for Sequest search results from MS-MS spectra obtained by using the MALDI ionization. The developed model performs well in identifying correct peptide-spectrum matches and is easily extendable to the protein identification problem. The relative ease with which additional experimental parameters can be incorporated into the classification framework, to give additional discriminatory power, allows for future tailoring of the model to take advantage of information from specific instrument set-ups.

Electronic supplementary materialThe online version of this article doi:10.1186-1471-2105-11-591 contains supplementary material, which is available to authorized users.

Download fulltext PDF



Author: Morten Källberg - Hui Lu

Source: https://link.springer.com/



DOWNLOAD PDF




Related documents