# Efficient Parallel and Out of Core Algorithms for Constructing Large Bi-directed de Bruijn Graphs - Computer Science > Data Structures and Algorithms

Abstract: Assembling genomic sequences from a set of overlapping reads is one of themost fundamental problems in computational biology. Algorithms addressing theassembly problem fall into two broad categories - based on the data structureswhich they employ. The first class uses an overlap-string graph and the secondtype uses a de Bruijn graph. However with the recent advances in short readsequencing technology, de Bruijn graph based algorithms seem to play a vitalrole in practice.Efficient algorithms for building these massive de Bruijn graphs are veryessential in large sequencing projects based on short reads. In Jackson et. al.ICPP-2008, an $On-p$ time parallel algorithm has been given for this problem.Here $n$ is the size of the input and $p$ is the number of processors. Thisalgorithm enumerates all possible bi-directed edges which can overlap with anode and ends up generating $\Thetan\Sigma$ messages.In this paper we present a $\Thetan-p$ time parallel algorithm with acommunication complexity equal to that of parallel sorting and is not sensitiveto $\Sigma$. The generality of our algorithm makes it very easy to extend iteven to the out-of-core model and in this case it has an optimal I-O complexityof $\Theta\frac{n\logn-B}{B\logM-B}$. We demonstrate the scalability ofour parallel algorithm on a SGI-Altix computer. A comparison of our algorithmwith that of Jackson et. al. ICPP-2008 reveals that our algorithm is faster. Wealso provide efficient algorithms for the bi-directed chain compaction problem.

Author: Vamsi Kundeti, Sanguthevar Rajasekaran, Hieu Dinh

Source: https://arxiv.org/