Benchmarking QuickSync on Broadwell and Skylake

We’re in the process of building out some new Linux-based video encoders, and we want to output to a LOT of different destinations: live streams, archived versions on disk, high-quality versions for future editing, JPEG stills, etc.

QuickSync is a great way to get more out of our processors by offloading the encoding to the GPU. To figure out what architecture to invest in, we ran some tests with a Broadwell processor, the Core i7 5775C (3.3 GHz), and a Skylake processor, the Core i7-6700K (4.0 GHz).

On the surface, it seems like it would be a no-brainer — go with the newer Skylake architecture running at a higher clock speed. But the 5775C has the Iris Pro Graphics 6200 (GT3e, 48 execution units, 128MB eDRAM, and a clock speed of 1150 MHz, 883.2 GFlops). The 6700K has the HD Graphics 530 GPU (GT2, 24 execution units, clock speed of 1150MHz, 441.6 GFlops). So we wanted to compare.

We built test systems with support for ffmpeg and quicksync, and then ran a series of tests encoding the Limitless trailer (1920×1080 h264).

We used ffmpeg to transcode to a variety of different sizes and bitrates simultaneously.

BROADWELL – Core i7 5775C (3.3 GHz)
576×324 960×540 1920×1080 JPEG fps speedup
test1 4 2 0 0 126 5.25
test2 4 2 0 1 100 5.08
test3 4 2 1 1 55 2.78
SKYLAKE – Core i7-6700K (4.0 GHz)
576×324 960×540 1920×1080 JPEG fps speedup improvement
test1 4 2 0 0 144 6.02 14.29%
test2 4 2 0 1 125 6.32 25.00%
test3 4 2 1 1 62 3.14 12.73%

Note: the JPEG output is one JPEG still every 60 seconds. The “speedup” is ffmpeg’s measure of how much faster the encoding was than real-time.

The Skylake processor outperformed the Broadwell by a little bit in this test. One thing I would like to go back and test is whether the Broadwell ultimately has more headroom to do more simultaneous encodings. Given that the Broadwell has 48 execution units, its possible that it might be able to handle more simultaneous encoding than the Skylake.

Of even more direct interest to us is how well it would perform with live encodings (how many live inputs can it handle, and how many outputs can it generate from those inputs)?

Leave a comment

Your email address will not be published. Required fields are marked *