ClustalXeed: a GUI-based grid computation version for high performance and terabyte size multiple sequence alignmentReport as inadecuate




ClustalXeed: a GUI-based grid computation version for high performance and terabyte size multiple sequence alignment - Download this document for free, or read online. Document in PDF available to download.

BMC Bioinformatics

, 11:467

Sequence analysis applications

Abstract

BackgroundThere is an increasing demand to assemble and align large-scale biological sequence data sets. The commonly used multiple sequence alignment programs are still limited in their ability to handle very large amounts of sequences because the system lacks a scalable high-performance computing HPC environment with a greatly extended data storage capacity.

ResultsWe designed ClustalXeed, a software system for multiple sequence alignment with incremental improvements over previous versions of the ClustalX and ClustalW-MPI software. The primary advantage of ClustalXeed over other multiple sequence alignment software is its ability to align a large family of protein or nucleic acid sequences. To solve the conventional memory-dependency problem, ClustalXeed uses both physical random access memory RAM and a distributed file-allocation system for distance matrix construction and pair-align computation. The computation efficiency of disk-storage system was markedly improved by implementing an efficient load-balancing algorithm, called -idle node-seeking task algorithm- INSTA. The new editing option and the graphical user interface GUI provide ready access to a parallel-computing environment for users who seek fast and easy alignment of large DNA and protein sequence sets.

ConclusionsClustalXeed can now compute a large volume of biological sequence data sets, which were not tractable in any other parallel or single MSA program. The main developments include: 1 the ability to tackle larger sequence alignment problems than possible with previous systems through markedly improved storage-handling capabilities. 2 Implementing an efficient task load-balancing algorithm, INSTA, which improves overall processing times for multiple sequence alignment with input sequences of non-uniform length. 3 Support for both single PC and distributed cluster systems.

AbbreviationsHPCHigh Performance Computing

GUIGraphical User Interface

MSAMultiple Sequence Alignment

MPIMessage Passing Interface

RAMRandom Access Memory

INSTAIdle Node-Seeking Task Algorithm

CPUCentral Processing Unit

HDDHard Disk Drive

I-OInput-Output

MALIGNMultiple Alignment

SPSum of Pairs

Electronic supplementary materialThe online version of this article doi:10.1186-1471-2105-11-467 contains supplementary material, which is available to authorized users.

Download fulltext PDF



Author: Taeho Kim - Hyun Joo

Source: https://link.springer.com/







Related documents