A new type of Hidden Markov Models to predict complex domain architecture in protein sequencesReport as inadecuate

A new type of Hidden Markov Models to predict complex domain architecture in protein sequences - Download this document for free, or read online. Document in PDF available to download.

* Corresponding author 1 MAB - Méthodes et Algorithmes pour la Bioinformatique LIRMM - Laboratoire d-Informatique de Robotique et de Microélectronique de Montpellier

Abstract : Profile Hidden Markov Models pHMMs represent sequence regions, called domains or motifs, that are conserved among the proteins of a family. They are routinely used either i- to recognize the presence of a domain in a protein and thereby to test its membership of a known family, or ii- to tag the precise position of a domain in the sequence. However, a majority of proteins are composed of several domains, and during evolution, events such as rearrangements or duplications may create different domain architectures in proteins of the same family. Due to their intrinsic linear structure, pHMMs cannot model several distinct domains whose number and relative order may be variable in a family. We lack efficient tools to perform recognition and tagging in the case of complex domain architectures. Here, we propose a generalized HMM to solve exactly this. In our solution, called cyclic profile HMM CpHMM, specific transitions can model the repetition of units, as well as different relative orders of domains. In a CpHMM, complete domains are modeled by nested pHMMs. We provide a program for the construction of CpHMMs that takes as input pHMMs, thereby allowing the user to capitalized on already developed pHMMs PFAM. We adapted recognition and tagging algorithms to CpHMMs and test them on both the family of PentatricoPeptide Repeats proteins PPR and on the superfamily of saposins. Our results demonstrate that CpHMMs improve on pHMMs for the recognition and tagging of proteins with complex domains architectures, while keeping their efficiency. The architecture of PPR proteins has been manually annotated for a subfamilly in arabidopsis, however only the recognition with the PFAM PPR motif has been previously performed for the rice and poplar tree. Comparing our results with the annotations of arabidopsis PPR, we show that more than 88% of the motifs are precisely recognized by the cpHMM. Moreover, we completed the recognition of PPR, as well as the determination of their architecture, for both rice and poplar tree proteomes.

Keywords : Motif domain profile HMM domain architecture tagging recognition cyclic permutation duplication

Author: Raluca Uricaru - Laurent Brehelin - Eric Rivals -

Source: https://hal.archives-ouvertes.fr/


Related documents