TY - JOUR
T1 - Boosting the FM-index on the GPU: Effective techniques to mitigate random memory access
AU - Chacon, Alejandro
AU - Marco-Sola, Santiago
AU - Espinosa, Antonio
AU - Ribeca, Paolo
AU - Moure Lopez, Juan Carlos
PY - 2015/9/1
Y1 - 2015/9/1
N2 - © 2014 IEEE. The recent advent of high-throughput sequencing machines producing big amounts of short reads has boosted the interest in efficient string searching techniques. As of today, many mainstream sequence alignment software tools rely on a special data structure, called the FM-index, which allows for fast exact searches in large genomic references. However, such searches translate into a pseudo-random memory access pattern, thus making memory access the limiting factor of all computation-efficient implementations, both on CPUs and GPUs. Here, we show that several strategies can be put in place to remove the memory bottleneck on the GPU: more compact indexes can be implemented by having more threads work cooperatively on larger memory blocks, and a k-step FM-index can be used to further reduce the number of memory accesses. The combination of those and other optimisations yields an implementation that is able to process about two Gbases of queries per second on our test platform, being about 8κ faster than a comparable multi-core CPU version, and about 3× to 5× faster than the FM-index implementation on the GPU provided by the recently announced Nvidia NVBIO bioinformatics library.
AB - © 2014 IEEE. The recent advent of high-throughput sequencing machines producing big amounts of short reads has boosted the interest in efficient string searching techniques. As of today, many mainstream sequence alignment software tools rely on a special data structure, called the FM-index, which allows for fast exact searches in large genomic references. However, such searches translate into a pseudo-random memory access pattern, thus making memory access the limiting factor of all computation-efficient implementations, both on CPUs and GPUs. Here, we show that several strategies can be put in place to remove the memory bottleneck on the GPU: more compact indexes can be implemented by having more threads work cooperatively on larger memory blocks, and a k-step FM-index can be used to further reduce the number of memory accesses. The combination of those and other optimisations yields an implementation that is able to process about two Gbases of queries per second on our test platform, being about 8κ faster than a comparable multi-core CPU version, and about 3× to 5× faster than the FM-index implementation on the GPU provided by the recently announced Nvidia NVBIO bioinformatics library.
KW - FM-index
KW - GPGPU
KW - bioinformatics
KW - fine-grain parallelism
KW - memory-level parallelism
KW - short read mapping
U2 - https://doi.org/10.1109/TCBB.2014.2377716
DO - https://doi.org/10.1109/TCBB.2014.2377716
M3 - Article
SN - 1545-5963
VL - 12
SP - 1048
EP - 1059
JO - IEEE/ACM Transactions on Computational Biology and Bioinformatics
JF - IEEE/ACM Transactions on Computational Biology and Bioinformatics
M1 - 6975110
ER -