Slow VP8 and VP9 encoding with ffmpeg

10,828

Solution 1

The speed/quality options for VP8/VP9 are explained in the documentation. Note that in ffmpeg, you have to specify the parameters differently (see ffmpeg -h encoder=libvpx-vp9):

  • CPU Usage:
    • ffmpeg: -cpu-used (legacy option: -speed)
    • libvpx: --cpu-used
  • Quality / Deadline:
    • ffmpeg: -deadline realtime, -deadline good (legacy option: -quality)
    • libvpx: --rt, --good

The -cpu-used should be your main control knob. While the default is 0, the documentation says that:

Setting --cpu-used=1 or --cpu-used=2 will give further significant boosts to encode speed, but will start to have a more noticeable impact on quality and may also start to effect the accuracy of the data rate control.

Setting a value of 4 or 5 will turn off "rate distortion optimisation" which has a big impact on quality, but also greatly speeds up the encoder.

For live encoding particularly, you want to set -deadline realtime:

--rt Real-time mode allows the encoder to auto adjust the speed vs. quality trade-off in order to try and hit a particular cpu utilisation target. In this mode the --cpu-used parameter controls the %cpu target as follows:

target cpu utilisation = (100*(16-cpu-used)/16)%

Legal values for -cpu-used when combined with --rt mode are (0-15).

It is worth noting that in --rt mode the encode quality will depend on how hard a particular clip or section of a clip is and how fast the encoding machine is. In this mode the results will thus vary from machine to machine and even from run to run depending on what else you are doing.

But of course, with an i5 CPU, depending on how many parallel transcoding tasks you have and what level of quality you want to reach, and what the final latency should be, investing into a beefy CPU from the latest Intel i7 series would make sense.

Intel's Kaby Lake chips apparently support hardware-assisted encoding through Intel QuickSync, and ffmpeg supports that through VA-API.

Solution 2

Switch to vp9_vaapi if it is available Using libvpx-vp9 I was getting 3-5fps at 1080p which is painfully slow if you're trying to convert an hour of video.

If your GPU supports it, using vp9_vaapi can be much, much faster. On my HTPC with an i7 8650u vaapi gives about 30x better performance, I can encode 4 videos at once at 130-150fps each.

Sample ffmpeg line:

 ffmpeg -vaapi_device /dev/dri/renderD128 -i $infile -vf 'format=nv12,hwupload' -c:v vp9_vaapi -b:v 0  -c:a libvorbis $outfile

There is an option loop_filter_level seems to be equivalent to CRF and goes from 0-63. However, it is poorly documented online other than the default is 16. I tried it at 1 and 63, the file size and subjective quality were practically identical so either I'm using it wrong or the option is ignored by ffmpeg.

Using default settings I could not see any visual difference between my 1080p h264 source video and the vp9 output.

You'll need to check your GPU supports hardware encoding. Run vainfo and look for:

  VAProfileVP9Profile0            : VAEntrypointEncSlice

vp9_vaapi vs libvpx-vp9

I tried encoding the same 50 minute 1080p video with these results:

  • libvpx-vp9 took nearly 8 hours and produced a 568.8mb file
  • vp9_vaapi -loop_filter_level 1 took just over 7 minutes and produced a 756.1mb file
  • vp9_vaapi -loop_filter_level 63 tool just over 8 minutes and produced a 734.1mb file

Subjectively all the videos look the same to me and I could not tell one from the other.

Clearly, libvpx-vp9 wins on compression but unless you're very, very starved for disk space (or bandwidth if you're planning to stream the video), it is absolutely not worth the unreasonable encoding time.

I don't know why loop_filer_level makes such a little difference, I would suggest leaving it at the default (16) until it is better documented.

All the usual caveats apply. libvpx will no doubt mature over time, your hardware may produce different results, and hardware encoders often give worse visual quality than software ones (though I could not tell in my test).

Share:
10,828
Kirill K
Author by

Kirill K

DevOps Engineer from a web development background, implementing and automating end-to-end Continuous Build, Integration, Delivery, Release and Deployment processes, pipelines using best practices and popular patterns.

Updated on June 05, 2022

Comments

  • Kirill K
    Kirill K almost 2 years

    I saw this answer, but it's a little old. Maybe the situation has changed?

    I want to re-encode a stream from an IP camera to WebM (VP8 or VP9) format with ffmpeg. I need real time speed, but my CPU is a Core i5 (2017) and too busy (load avarage too more 100%).

    • Can I buy hardware that is better suited for such an encoding task?

    • What parametres for ffmpeg are recommended for transcoding in realtime?

    At the moment I'm using this command (with overlay chroma key):

    ./ffmpeg \
    -i \
    bg.jpg \
    -thread_queue_size 512 \
    -rtsp_transport tcp -i rtsp://ip_cam:port/stream \
    -codec:v libvpx -quality realtime -r 25 -crf 30 \
    -b:v 2M -qmin 10 -qmax 50 -maxrate 2.5M -bufsize 5M \
    -speed 1 \
    -b:v 2M \
    -cpu-used 0 -threads 4 \
    -auto-alt-ref 0 \
    -c:a libopus -b:a 96k \
    -filter_complex "[1:v]chromakey=0x70de77:0.1:0.0[ckout];[0:v][ckout]overlay[out]" \
    -map "[out]" \
    -f webm udp://ip_destination:1935/name/stream
    
  • Kirill K
    Kirill K almost 7 years
    Perhaps you will tell the server types of cpus, which are the most potent for transcoding? If I want to put the encoding process in the data center?
  • slhck
    slhck almost 7 years
    Latest Intel i7 models or if you want go go server-grade, Xeon. I can't name any specific model though.
  • Kirill K
    Kirill K almost 7 years
    ` -cpu-used instead of --cpu-used` - in docs, for a sample encode , but for ffmpeg need use -cpu-used
  • slhck
    slhck almost 7 years
    Yes. That's what I said. In ffmpeg you have to specify the parameters differently, e.g. -cpu-used instead of --cpu-used.
  • Kirill K
    Kirill K almost 7 years
    Ahh, sorry ))) I did not correctly understand you earlier .... But this is all already there in my transcoding example ...., why write it again?
  • slhck
    slhck almost 7 years
    I know you had it in your question. The reason I explained again is that other visitors might find this post – and then they're not going to read your question, they're most probably only going to read my answer and see the excerpt from the documentation, which could cause confusion when they want to use the parameters given in there.
  • Kirill K
    Kirill K almost 7 years
    Clearly, but exactly the same information is already in a multitude of other answers ... I was hoping for a very specific advice on my example .... You helped me only with information about the VA-API. Thank you ...
  • slhck
    slhck almost 7 years
    I didn't know that you were aware of different values for -cpu-used – you might as well have copied your code from somewhere else, since -cpu-used 0 does not give you speed boosts.
  • Kirill K
    Kirill K almost 7 years
    Yes, Thank you for trying to help! Will try transcoding with vaapi and test in other cpu .
  • LFMekz
    LFMekz over 3 years
    Nice stuff. Thanks for that info on vainfo