High-performance Video Decoding using Graphics Processing Units
Autoren
Mehr zum Buch
The increasing demand of decoding high-quality videos can lead to a challenging com- putational requirement for conventional Central Processing Unit (CPU) architectures. Graphics Processing Units (GPUs) in general provide higher computational power than CPUs. Efficient GPU execution, however, requires massive parallelism and little ex- ecuting divergence, two criteria are not fully met by all video decoding kernels. This thesis exploits how GPUs can be effectively used in video decoding applications. The challenges include proper workload distribution between the CPU and GPU, task optimizations on two heterogeneous devices, and efficient communication between them. A complete parallel HEVC decoder was proposed for heterogeneous CPU+GPU systems. We exploited available decoding parallelism on the CPU, GPU, and between the two devices simultaneously. On top of the parallel design, two workload balancing schemes were implemented, in order to adapt computation resource variation on CPU and GPU. In addition, an energy measurement module was developed for energy efficiency analysis. Evaluated results showed that suitable decoding kernels can be accelerated substan- tially (up to 28.2×) on GPUs at the kernel level. At the application level, using GPU architecture can provide significant acceleration when only a low number (1 to 8) of CPU cores are available. On a system consisting of an NVIDIA Titan X Maxwell GPU and an Intel Xeon E5-2699v3 CPU, with four CPU cores, the proposed HEVC decoder delivers 167 frames per second for 4K videos, corresponding to a speedup of 2.2× over the state- of-the-art CPU decoder using four CPU cores. When more CPU cores (>8) are employed, the benefit of using GPU vanishes and the performance is eventually outperformed by the CPU decoder due to GPU overloading. With respect to energy, because of its high power consumption GPU architecture is not as efficient as the CPU for HEVC decoding.