TY - JOUR
T1 - GPU Implementation of Bitplane Coding with Parallel Coefficient Processing for High Performance Image Compression
AU - Enfedaque, Pablo
AU - Auli-Llinas, Francesc
AU - Moure Lopez, Juan Carlos
N1 - Funding Information:
This work has been partially supported by the Universitat Autònoma de Barcelona, by the Spanish Government (MINECO), by FEDER, and by the Catalan Government, under Grants 472-02-2/2012, TIN2015-71126-R, TIN2014-53234-C2-1-R, and 2014SGR-691.
Funding Information:
received the BE (with highest Hons.) and PhD (cum laude) degrees in computer science from the Universitat Autònoma de Barcelona (UAB), in 2002 and 2006, respectively. From 2002 to 2015, he was consecutively funded in competitive fel-lowships, including a Ramón y Cajal grant that was awarded with the intensification young inves-tigator (i3) certificate. During this time, he carried out two postdoctoral research stages with profes-sors David Taubman and Michael Marcellin. From 2016 to present, he is an associate professor (with the full professor certificate) in the Department of Information and Communications Engineering, UAB. He developed and maintains BOI codec, a JPEG2000 implementation that is used in research and professional environments. In 2013, he received a distinguished R-Letter given by the IEEE Communications Society for a paper co-authored with Michael Marcellin. He has participated and supervised various projects funded by the Spanish government and the European Union. Also, he is reviewer for magazines and symposiums, has (co)authored numerous papers in journals and conferences, and has guided several the PhD students. His research interests lie in the area of image and video coding, computing, and transmission. He is a senior member of the IEEE.
Publisher Copyright:
© 1990-2012 IEEE.
Copyright:
Copyright 2017 Elsevier B.V., All rights reserved.
PY - 2017/8/1
Y1 - 2017/8/1
N2 - The fast compression of images is a requisite in many applications like TV production, teleconferencing, or digital cinema. Many of the algorithms employed in current image compression standards are inherently sequential. High performance implementations of such algorithms often require specialized hardware like field integrated gate arrays. Graphics Processing Units (GPUs) do not commonly achieve high performance on these algorithms because they do not exhibit fine-grain parallelism. Our previous work introduced a new core algorithm for wavelet-based image coding systems. It is tailored for massive parallel architectures. It is called bitplane coding with parallel coefficient processing (BPC-PaCo). This paper introduces the first high performance, GPU-based implementation of BPC-PaCo. A detailed analysis of the algorithm aids its implementation in the GPU. The main insights behind the proposed codec are an efficient thread-to-data mapping, a smart memory management, and the use of efficient cooperation mechanisms to enable inter-thread communication. Experimental results indicate that the proposed implementation matches the requirements for high resolution (4 K) digital cinema in real time, yielding speedups of 30 × with respect to the fastest implementations of current compression standards. Also, a power consumption evaluation shows that our implementation consumes 40 × less energy for equivalent performance than state-of-the-art methods.
AB - The fast compression of images is a requisite in many applications like TV production, teleconferencing, or digital cinema. Many of the algorithms employed in current image compression standards are inherently sequential. High performance implementations of such algorithms often require specialized hardware like field integrated gate arrays. Graphics Processing Units (GPUs) do not commonly achieve high performance on these algorithms because they do not exhibit fine-grain parallelism. Our previous work introduced a new core algorithm for wavelet-based image coding systems. It is tailored for massive parallel architectures. It is called bitplane coding with parallel coefficient processing (BPC-PaCo). This paper introduces the first high performance, GPU-based implementation of BPC-PaCo. A detailed analysis of the algorithm aids its implementation in the GPU. The main insights behind the proposed codec are an efficient thread-to-data mapping, a smart memory management, and the use of efficient cooperation mechanisms to enable inter-thread communication. Experimental results indicate that the proposed implementation matches the requirements for high resolution (4 K) digital cinema in real time, yielding speedups of 30 × with respect to the fastest implementations of current compression standards. Also, a power consumption evaluation shows that our implementation consumes 40 × less energy for equivalent performance than state-of-the-art methods.
KW - Image coding
KW - SIMD computing
KW - compute unified device architecture (CUDA)
KW - graphics processing unit (GPU)
UR - http://www.scopus.com/inward/record.url?scp=85029021111&partnerID=8YFLogxK
U2 - https://doi.org/10.1109/TPDS.2017.2657506
DO - https://doi.org/10.1109/TPDS.2017.2657506
M3 - Article
SN - 1045-9219
VL - 28
SP - 2272
EP - 2284
JO - IEEE Transactions on Parallel and Distributed Systems
JF - IEEE Transactions on Parallel and Distributed Systems
IS - 8
M1 - 7833172
ER -