Exhaustive assignment of compositional bias reveals universally prevalent biased regions: analysis of functional associations in human and DrosophilaReport as inadecuate

Exhaustive assignment of compositional bias reveals universally prevalent biased regions: analysis of functional associations in human and Drosophila - Download this document for free, or read online. Document in PDF available to download.

BMC Bioinformatics

, 7:441

First Online: 10 October 2006Received: 07 July 2006Accepted: 10 October 2006


BackgroundCompositionally biased CB regions are stretches in protein sequences made from mainly a distinct subset of amino acid residues; such regions are frequently associated with a structural role in the cell, or with protein disorder.

ResultsWe derived a procedure for the exhaustive assignment and classification of CB regions, and have applied it to thirteen metazoan proteomes. Sequences are initially scanned for the lowest-probability subsequences LPSs for single amino-acid types; subsequently, an exhaustive search for lowest probability subsequences LPSs for multiple residue types is performed iteratively until convergence, to define CB region boundaries. We analysed > 40,000 CB regions with > 20 million residues; strikingly, nine single-double- residue biases are universally abundant, and are consistently highly ranked across both vertebrates and invertebrates. To home in subpopulations of CB regions of interest in human and D. melanogaster, we analysed CB region lengths, conservation, inferred functional categories and predicted protein disorder, and filtered for coiled coils and protein structures. In particular, we found that some of the universally abundant CB regions have significant associations to transcription and nuclear localization in Human and Drosophila, and are also predicted to be moderately or highly disordered. Focussing on Q-based biased regions, we found that these regions are typically only well conserved within mammals appearing in 60–80% of orthologs, with shorter human transcription-related CB regions being unconserved outside of mammals; they are also preferentially linked to protein domains such as the homeodomain and glucocorticoid-receptor DNA-binding domain. In general, only ~40–50% of residues in these human and Drosophila CB regions have predicted protein disorder.

ConclusionThis data is of use for the further functional characterization of genes, and for structural genomics initiatives.

AbbreviationsLPSLowest Probability Subsequence

CBcompositional bias or compositionally-biased

GOGene Ontology.

Electronic supplementary materialThe online version of this article doi:10.1186-1471-2105-7-441 contains supplementary material, which is available to authorized users.

Download fulltext PDF

Author: Paul M Harrison

Source: https://link.springer.com/


Related documents