FFmpeg-2dwave: A parallel h.264 decoder based on FFmpeg
Mauricio Alvarez. alvarez AT ac.upc.edu
Universitat Politècnica de Catalunya (UPC)
Barcelona Supercomputing Center (BSC)
This code implements the so called 2d-wave parallel H.264 decoder (1). This parallelization approach exploits spatial (intra-frame) macroblock-level parallelism.
It runs on a parallel machine with a shared memory architecture. It has been tested on parallel plastforms based on the PowerPC, X86 (32 and 64 bits) and Itanium processors.
It can compile on the same architectures than the original FFmpeg code. It uses POSIX threads and the semaphores from real-time extension of POSIX threads. Optionally, it uses atomic instructions for performing synchronization operations.
The code is released "as is". Which means that there is not support from UPC or BSC. The license of the code is the same as the original FFmpeg., which means LGPL. For more information see http://ffmpeg.org/legal.html
If you use this code for research purposes we ask you to cite this paper (2)
The code is available as a tarball at http://alvarez.site.ac.upc.edu/hdvideobench/ffmpeg-2dwave.tar.gz
We have provided a script (configure_ffmpeg.sh) that can be used for setting the configuration of the 2d-wave h264 decoder and for reducing the number of codecs included the final binary (reduce compile time)
Edit the script and:
- Define the following configuration variables:
- CONFIG_DIR: directory in which the ffmpeg source code is installed
- PREFIX_DIR: directory in which the ffmpeg binaries are going to be installed
- Define 2D-wave options:
- 1: use the tail submit optimization, ie. worker threads can process tasks wihtout using the task queue
- 0: no tail-submit, all tasks are processed using the taskqueue.
- atomic_instructions: method for accessing the table of dependencies
- 0: use pthread locks. (slower but portable)
- 1: Use atomic instructions: fetch_and_add, sub_and_fetch, compare_and_swap. (Faster but only available on some architectures)
- Execute: configure_ffmpeg.sh
3. Compile and install
FFmpeg-2dwave has been tested with different versions of the GCC compiler. It should compile on the same platforms than the original FFmpeg.
4. Execute the ffmpeg application
You just need to specify the number of threads, the input compressed video and the uncompressed output video.
There are some test sequences in H.264 format (and in other formats as well) available at the HD-VideoBench page: HD-VideoBench
ffmpeg-2dwave -threads N -i input_file.h264 -y output_file.yuv
5. Limitations and Possible Improvements
- The scalability of 2d-wave depends on the resolution of the input video and the relative speed of CABAC entropy decoding compared to macro-block reconstruction. The fist optimization is to improve the CABAC processing by supporting the decoding of multiple CABAC frames in parallel (or accelerating the CABAC processing in any other way).
- Iy you want to scale beyond the 2D-wave algorithm, you may consider implementing the dynamic 3D-wave (spatial and temporal) macroblock-level parallelism as described in these papers (3, 4)
(1) E. B. van der Tol, E. G. T. Jaspers, and R. H. Gelderblom, Mapping of H.264 Decoding on a Multiprocessor Architecture, in Proceedings of SPIE, 2003.
(2) "M. Alvarez, A. Ramirez, A. Azevedo, C.H. Meenderinck, B.H.H. Juurlink, M. Valero. Scalability of Macroblock-level parallelism for H.264 decoding. International Conference on Parallel and Distributed Systems 2009, December 8-11, 2009, Shenzhen, China"
(3) C.H. Meenderinck, A. Azevedo, B.H.H. Juurlink, M. Alvarez, A. Ramirez. Parallel Scalability of Video Decoders. Journal of Signal Processing Systems, pp. 173, August 2009, vol 57, issue 2.
(4) "M. Alvarez, A. Ramirez, A. Azevedo, C.H. Meenderinck, B.H.H. Juurlink, M. Valero. Scalability of Macroblock-level parallelism for H.264 decoding. International Conference on Parallel and Distributed Systems 2009, December 8-11, 2009, Shenzhen, China"
Created by Mauricio Alvarez alvarez AT ac.upc.edu
last revision: 27.04.2010