How to use hardware acceleration with ffmpeg

67,913

After some investigation I was able to implement the necessary HW accelerated decoding on OS X (VDA) and Linux (VDPAU). I will update the answer when I get my hands on Windows implementation as well. So let's start with the easiest:

Mac OS X

To get HW acceleration working on Mac OS you should just use the following: avcodec_find_decoder_by_name("h264_vda"); Note, however that you can accelerate h264 videos only on Mac OS with FFmpeg.

Linux VDPAU

On Linux things are much more complicated(who is surprised?). FFmpeg has 2 HW accelerators on Linux: VDPAU(Nvidia) and VAAPI(Intel) and only one HW decoder: for VDPAU. And it may seems perfectly reasonable to use vdpau decoder like in the Mac OS example above: avcodec_find_decoder_by_name("h264_vdpau");

You might be surprised to find out that it doesn't change anything and you have no acceleration at all. That's because it is only the beginning, you have to write much more code to get the acceleration working. Happily, you don't have to come up with a solution on your own: there are at least 2 good examples of how to achieve that: libavg and FFmpeg itself. libavg has VDPAUDecoder class which is perfectly clear and which I've based my implementation on. You can also consult ffmpeg_vdpau.c to get another implementation to compare. In my opinion the libavg implementation is easier to grasp, though.

The only things both aforementioned examples lack is proper copying of the decoded frame to the main memory. Both examples uses VdpVideoSurfaceGetBitsYCbCr which killed all the performance I gained on my machine. That's why you might want to use the following procedure to extract the data from a GPU:

bool VdpauDecoder::fillFrameWithData(AVCodecContext* context,
    AVFrame* frame)
{
    VdpauDecoder* vdpauDecoder = static_cast<VdpauDecoder*>(context->opaque);
    VdpOutputSurface surface;
    vdp_output_surface_create(m_VdpDevice, VDP_RGBA_FORMAT_B8G8R8A8, frame->width, frame->height, &surface);
    auto renderState = reinterpret_cast<vdpau_render_state*>(frame->data[0]);
    VdpVideoSurface videoSurface = renderState->surface;

    auto status = vdp_video_mixer_render(vdpauDecoder->m_VdpMixer,
        VDP_INVALID_HANDLE,
        nullptr,
        VDP_VIDEO_MIXER_PICTURE_STRUCTURE_FRAME,
        0, nullptr,
        videoSurface,
        0, nullptr,
        nullptr,
        surface,
        nullptr, nullptr, 0, nullptr);
    if(status == VDP_STATUS_OK)
    {
        auto tmframe = av_frame_alloc();
        tmframe->format = AV_PIX_FMT_BGRA;
        tmframe->width = frame->width;
        tmframe->height = frame->height;
        if(av_frame_get_buffer(tmframe, 32) >= 0)
        {
            VdpStatus status = vdp_output_surface_get_bits_native(surface, nullptr,
                reinterpret_cast<void * const *>(tmframe->data),
                reinterpret_cast<const uint32_t *>(tmframe->linesize));
            if(status == VDP_STATUS_OK && av_frame_copy_props(tmframe, frame) == 0)
            {
                av_frame_unref(frame);
                av_frame_move_ref(frame, tmframe);
                return;
            }
        }
        av_frame_unref(tmframe);
    }
    vdp_output_surface_destroy(surface);
    return 0;
}

While it has some "external" objects used inside you should be able to understand it once you have implemented the "get buffer" part(to which the aforementioned examples are of great help). Also I've used BGRA format which was more suitable for my needs maybe you will choose another.

The problem with all of it is that you can't just get it working from FFmpeg you need to understand at least basics of the VDPAU API. And I hope that my answer will aid someone in implementing the HW acceleration on Linux. I've spent much time on it myself before I realized that there is no simple, one-line way of implementing HW accelerated decoding on Linux.

Linux VA-API

Since my original question was regarding VA-API I can't not leave it unanswered. First of all there is no decoder for VA-API in FFmpeg so avcodec_find_decoder_by_name("h264_vaapi") doesn't make any sense: it is nullptr. I don't know how much harder(or maybe simpler?) is to implement decoding via VA-API since all the examples I've seen were quite intimidating. So I choose not to use VA-API at all and I had to implement the acceleration for an Intel card. Fortunately enough for me, there is a VDPAU library(driver?) which works over VA-API. So you can use VDPAU on Intel cards!

I've used the following link to setup it on my Ubuntu.

Also, you might want to check the comments to the original question where @Timothy_G also mentioned some links regarding VA-API.

Share:
67,913
ixSci
Author by

ixSci

Updated on April 05, 2020

Comments

  • ixSci
    ixSci about 4 years

    I need to have ffmpeg decode my video(e.g. h264) using hardware acceleration. I'm using the usual way of decoding frames: read packet -> decode frame. And I'd like to have ffmpeg speed up decoding. So I've built it with --enable-vaapi and --enable-hwaccel=h264. But I don't really know what should I do next. I've tried to use avcodec_find_decoder_by_name("h264_vaapi") but it returns nullptr. Anyway, I might want to use others API and not just VA API. How one is supposed to speed up ffmpeg decoding?

    P.S. I didn't find any examples on Internet which uses ffmpeg with hwaccel.

  • Mark Essel
    Mark Essel over 9 years
    much appreciate the sample source for Linux and vdpau, in a perfect world the osx solution you listed would work just as smoothly on linux (simply identifying intent for hardware acceleration would be a wonderful interface).
  • Mark Essel
    Mark Essel over 9 years
    any chance you've got an example of grabbing back a bgra pixel buffer after hardware accelerated decoding a frame (pixel format vda)
  • ixSci
    ixSci over 9 years
    @MarkEssel, you need to use sws_scale to convert a frame you get after the VDA to the format you need.
  • Mark Essel
    Mark Essel over 9 years
    thanks, tried that first and bumped into: [swscaler @ 0x1033f3800] vda is not supported as input pixel format. Looked here for the underlying data read stackoverflow.com/a/12238071/51700
  • ixSci
    ixSci over 9 years
    @MarkEssel, unfortunately I can't run the SW I've implemented the HW decoding in, so I can't verify what is the issue. But I looked at the code and I did nothing special to handle the frame after the decoding: here is my sws context creation: sws_getCachedContext(nullptr, frame->width, frame->height, static_cast<AVPixelFormat>(frame->format), frame->width, frame->height, AV_PIX_FMT_RGB24, SWS_BILINEAR | SWS_ACCURATE_RND, nullptr, nullptr, nullptr);
  • ixSci
    ixSci over 9 years
    @MarkEssel, if I remember it right I had the same problem. It was because I have created sws context once on the initialization. On that step the context has VDA pixel format which doesn't make sense to sws scaler. So to have the correct pixel format you need to use the actual frame pixel format for the context creation and you will have it right after the decoding. I hope I recall it right.
  • Mark Essel
    Mark Essel over 9 years
    woohoo, got something working using your method but it seems I use it once before initializing... whoops. patching it now
  • Mark Essel
    Mark Essel over 9 years
    seems like first time or so returning from avcodec_decode_video2 the pFrame is coming back with a null ->data member
  • ixSci
    ixSci over 9 years
    @MarkEssel, are you sure you are checking if the frame was fully decoded? It's 3rd parameter in avcodec_decode_video2
  • Mark Essel
    Mark Essel over 9 years
    sure am. avcodec_decode_video2(m_pCodecCtx, m_pFrame, &frameFinished,&packet); if(frameFinished) { do stuff }
  • Hi-Angel
    Hi-Angel about 7 years
    Correction: actually, neither VAAPI nor VDPAU are exclusive (at least for desktop GPUs — I don't know about embedded). Indeed VAAPI was originally developed by Intel, but is supported by all Gallium drivers, i.e. at least Radeon and Nouveau (a non-official NVidia driver, I don't know about the official one). Same for VDPAU, as you can see at the link; Intel is non-gallium though, but from googling around it seems to support it in some way too (can't confirm for I have no Intel GPU).
  • Reda Drissi
    Reda Drissi about 6 years
    @ixSci would you share a link or reference of how to achieve the same with ffmpeg?
  • ixSci
    ixSci about 6 years
    @RedaDrissi, what do you mean? This answer is about ffmpeg.
  • Reda Drissi
    Reda Drissi about 6 years
    @ixSci you have based your implementation on libavg, and you said it was possible to do it without libavg.
  • ixSci
    ixSci about 6 years
    @RedaDrissi, I based my implementation on what I've found in libavg, but the implementation itself doesn't use anything beyond pure ffmpeg. It doesn't use libavg.
  • Fattie
    Fattie over 5 years
    dear @ixSci - you may like to look at this problem which specifically deals with our iPhone (not Mac) problem we have found ... stackoverflow.com/q/54198895/294884 .. thanks for this post