TY - JOUR
T1 - GPU Implementation of Bitplane Coding with Parallel Coefficient Processing for High Performance Image Compression
AU - Enfedaque, Pablo
AU - Auli-Llinas, Francesc
AU - Moure Lopez, Juan Carlos
PY - 2017/8/1
Y1 - 2017/8/1
N2 - © 1990-2012 IEEE. The fast compression of images is a requisite in many applications like TV production, teleconferencing, or digital cinema. Many of the algorithms employed in current image compression standards are inherently sequential. High performance implementations of such algorithms often require specialized hardware like field integrated gate arrays. Graphics Processing Units (GPUs) do not commonly achieve high performance on these algorithms because they do not exhibit fine-grain parallelism. Our previous work introduced a new core algorithm for wavelet-based image coding systems. It is tailored for massive parallel architectures. It is called bitplane coding with parallel coefficient processing (BPC-PaCo). This paper introduces the first high performance, GPU-based implementation of BPC-PaCo. A detailed analysis of the algorithm aids its implementation in the GPU. The main insights behind the proposed codec are an efficient thread-to-data mapping, a smart memory management, and the use of efficient cooperation mechanisms to enable inter-thread communication. Experimental results indicate that the proposed implementation matches the requirements for high resolution (4 K) digital cinema in real time, yielding speedups of 30 × with respect to the fastest implementations of current compression standards. Also, a power consumption evaluation shows that our implementation consumes 40 × less energy for equivalent performance than state-of-the-art methods.
AB - © 1990-2012 IEEE. The fast compression of images is a requisite in many applications like TV production, teleconferencing, or digital cinema. Many of the algorithms employed in current image compression standards are inherently sequential. High performance implementations of such algorithms often require specialized hardware like field integrated gate arrays. Graphics Processing Units (GPUs) do not commonly achieve high performance on these algorithms because they do not exhibit fine-grain parallelism. Our previous work introduced a new core algorithm for wavelet-based image coding systems. It is tailored for massive parallel architectures. It is called bitplane coding with parallel coefficient processing (BPC-PaCo). This paper introduces the first high performance, GPU-based implementation of BPC-PaCo. A detailed analysis of the algorithm aids its implementation in the GPU. The main insights behind the proposed codec are an efficient thread-to-data mapping, a smart memory management, and the use of efficient cooperation mechanisms to enable inter-thread communication. Experimental results indicate that the proposed implementation matches the requirements for high resolution (4 K) digital cinema in real time, yielding speedups of 30 × with respect to the fastest implementations of current compression standards. Also, a power consumption evaluation shows that our implementation consumes 40 × less energy for equivalent performance than state-of-the-art methods.
KW - Image coding
KW - SIMD computing
KW - compute unified device architecture (CUDA)
KW - graphics processing unit (GPU)
U2 - https://doi.org/10.1109/TPDS.2017.2657506
DO - https://doi.org/10.1109/TPDS.2017.2657506
M3 - Article
VL - 28
SP - 2272
EP - 2284
JO - IEEE Transactions on Parallel and Distributed Systems
JF - IEEE Transactions on Parallel and Distributed Systems
SN - 1045-9219
M1 - 7833172
ER -