Distributed Indexing Dispatched Alignment* (DIDA*)

DIDA* performs large-scale alignment tasks by distributing the indexing and alignment stages into smaller subtasks over a cluster of compute nodes.

Performance increase by up to 77 percent1

Distributed Indexing Dispatched Alignment* (DIDA*) is a novel distributed and parallel indexing and alignment framework that consists of five major steps to perform the indexing and alignment task: distribute, index, dispatch, align, and merge. The indexing and dispatch steps are performed in parallel. It works by first partitioning the targets into smaller parts using a heuristic balanced cut. Next, DIDA creates an index for each partition. The reads are then “flowed” through a Bloom filter to dispatch the alignment task to the node(s). Finally, the reads are aligned on all partitions in parallel and the partial results are combined together to create the final output.

DIDA is written in C++ and parallelized using OpenMP for multithreaded computing on a single computing node. For distributed computing, DIDA employs a message passing interface (MPI) for inter-process communications. As input, it gets the set of target sequences and the set of queries in FASTA or FASTQ formats, and the default output is SAM format.

Performance Results

The performance of DIDA was measured and evaluated when coupled with popular alignment methods Burrows-Wheeler Aligner* (BWA*), Bowtie2, Novoalign, and ABySS-map on C. elegans, human draft genome, human reference genome, and P. glauca genome. Compared to their baseline performance, when run through the DIDA framework with 12 nodes, BWA, Bowtie2, Novoalign, and ABySS-map use less memory (91 percent, 90 percent, 87 percent, and 91 percent, respectively) and execute faster (55 percent, 74 percent, 77 percent, and 67 percent, respectively) for a draft human genome assembly.1

Download the code ›

Reproduce these results with this optimization recipe ›

Related Codes

Assembly By Short Sequences* (ABySS*) ›

Publications

Hamid Mohamadi, Benjamin P. Vandervalk, Anthony Raymond, Shaun D. Jackman, Justin Chu, Clay P. Breshears, and Inanc Birol. "DIDA: Distributed Indexing Dispatched Alignment." PLoS ONE 10, no. 4 (2015). doi: 10.1371/journal.pone.0126409.

Configuration Table

System Overview

 

Nodes

Twelve HPC nodes interconnected by 40Gbps Infiniband

Processor

Each node has two Intel® Xeon® X5650 processors (2.67 GHz)

RAM

Each node has 48GB RAM

Operating System

CentOS 5.4
Intel® Cluster Studio 2013
DIDA ver. 1.0.1, ABySS-map v1.5.2
BWA v0.7.10, Bowtie2 v2.1.0
Novoalign v3.01.02

Infos sur le produit et ses performances

1

Les résultats des bancs d'essai ont été obtenus avant le déploiement de récents correctifs logiciels et mises à jour micrologicielles destinés à faire face aux failles de sécurité « Spectre » et « Meltdown ». L'installation de ces mises à jour peut rendre ces résultats inapplicables à votre appareil ou système.

Les logiciels et charges de travail utilisés dans les tests de performance ont peut-être été optimisés uniquement pour les microprocesseurs Intel®. Les tests de performance tels que SYSmark* et MobileMark* portent sur des configurations, composants, logiciels, opérations et fonctions spécifiques. Les résultats peuvent varier en fonction de ces facteurs. Pour l'évaluation complète d'un produit, il convient de consulter d'autres tests et d'autres sources d'informations, notamment pour connaître le comportement de ce produit lorsqu'il est associé à d'autres composants. Consultez http://www.intel.fr/benchmarks à ce sujet.