Graph-based combinatorial analysis of data offers valuable insight, but typical graph computations perform poorly on modern computing platforms. Traditional serial algorithms and graph software achieve a very low fraction of peak system performance (measured through processor, memory bandwidth, and cache utilization) for routine graph analytics on laptops and workstations. The situation gets worse for massive graph analysis on emerging high-end multicore platforms, accelerators, supercomputers, and clouds. The common themes in parallel algorithm design are work efficiency, minimizing communication, improving locality of memory references, and ensuring load-balanced computation. With graph analysis, it is challenging to design algorithms for current platforms that simultaneously optimize for all these constraints. Our group uses a combination of theoretical analysis, architecture and data-dependent performance modeling, new data-centric speedup heuristics, and known parallel performance optimization strategies, to design practical parallel algorithms and develop scalable, efficient implementations. We study algorithms for both classical graph-theoretic problems such as breadth-first search, shortest paths, and graph connectivity, as well as emerging application-driven and application-dependent problem formulations such as centrality analysis, identifying communities, dynamic networks, and detecting frequently-occurring patterns. We gratefully acknowledge support for this work through an NSF CAREER grant and an XSEDE startup allocation.
The massive volumes of next-generation sequencing data pose significant computational and data analysis challenges. Several bioinformatics problems have combinatorial problem formulations that utilize string and graph algorithms and discrete data structures. Our group is working on new methods and parallel software for accelerating the genetic variant detection pipeline, and for the problems of de Bruijn graph-based de novo assembly of genomes and microbial metagenomes, and metagenomic sequence clustering. This research is supported in part by an NSF XPS award.