How to use hardware acceleration with ffmpeg

c++ c ffmpeg hardware-acceleration

67,913

After some investigation I was able to implement the necessary HW accelerated decoding on OS X (VDA) and Linux (VDPAU). I will update the answer when I get my hands on Windows implementation as well. So let's start with the easiest:

Mac OS X

To get HW acceleration working on Mac OS you should just use the following: avcodec_find_decoder_by_name("h264_vda"); Note, however that you can accelerate h264 videos only on Mac OS with FFmpeg.

Linux VDPAU

On Linux things are much more complicated(who is surprised?). FFmpeg has 2 HW accelerators on Linux: VDPAU(Nvidia) and VAAPI(Intel) and only one HW decoder: for VDPAU. And it may seems perfectly reasonable to use vdpau decoder like in the Mac OS example above: avcodec_find_decoder_by_name("h264_vdpau");

You might be surprised to find out that it doesn't change anything and you have no acceleration at all. That's because it is only the beginning, you have to write much more code to get the acceleration working. Happily, you don't have to come up with a solution on your own: there are at least 2 good examples of how to achieve that: libavg and FFmpeg itself. libavg has VDPAUDecoder class which is perfectly clear and which I've based my implementation on. You can also consult ffmpeg_vdpau.c to get another implementation to compare. In my opinion the libavg implementation is easier to grasp, though.

The only things both aforementioned examples lack is proper copying of the decoded frame to the main memory. Both examples uses VdpVideoSurfaceGetBitsYCbCr which killed all the performance I gained on my machine. That's why you might want to use the following procedure to extract the data from a GPU:

bool VdpauDecoder::fillFrameWithData(AVCodecContext* context,
    AVFrame* frame)
{
    VdpauDecoder* vdpauDecoder = static_cast<VdpauDecoder*>(context->opaque);
    VdpOutputSurface surface;
    vdp_output_surface_create(m_VdpDevice, VDP_RGBA_FORMAT_B8G8R8A8, frame->width, frame->height, &surface);
    auto renderState = reinterpret_cast<vdpau_render_state*>(frame->data[0]);
    VdpVideoSurface videoSurface = renderState->surface;

    auto status = vdp_video_mixer_render(vdpauDecoder->m_VdpMixer,
        VDP_INVALID_HANDLE,
        nullptr,
        VDP_VIDEO_MIXER_PICTURE_STRUCTURE_FRAME,
        0, nullptr,
        videoSurface,
        0, nullptr,
        nullptr,
        surface,
        nullptr, nullptr, 0, nullptr);
    if(status == VDP_STATUS_OK)
    {
        auto tmframe = av_frame_alloc();
        tmframe->format = AV_PIX_FMT_BGRA;
        tmframe->width = frame->width;
        tmframe->height = frame->height;
        if(av_frame_get_buffer(tmframe, 32) >= 0)
        {
            VdpStatus status = vdp_output_surface_get_bits_native(surface, nullptr,
                reinterpret_cast<void * const *>(tmframe->data),
                reinterpret_cast<const uint32_t *>(tmframe->linesize));
            if(status == VDP_STATUS_OK && av_frame_copy_props(tmframe, frame) == 0)
            {
                av_frame_unref(frame);
                av_frame_move_ref(frame, tmframe);
                return;
            }
        }
        av_frame_unref(tmframe);
    }
    vdp_output_surface_destroy(surface);
    return 0;
}

While it has some "external" objects used inside you should be able to understand it once you have implemented the "get buffer" part(to which the aforementioned examples are of great help). Also I've used BGRA format which was more suitable for my needs maybe you will choose another.

The problem with all of it is that you can't just get it working from FFmpeg you need to understand at least basics of the VDPAU API. And I hope that my answer will aid someone in implementing the HW acceleration on Linux. I've spent much time on it myself before I realized that there is no simple, one-line way of implementing HW accelerated decoding on Linux.

Linux VA-API

Since my original question was regarding VA-API I can't not leave it unanswered. First of all there is no decoder for VA-API in FFmpeg so avcodec_find_decoder_by_name("h264_vaapi") doesn't make any sense: it is nullptr. I don't know how much harder(or maybe simpler?) is to implement decoding via VA-API since all the examples I've seen were quite intimidating. So I choose not to use VA-API at all and I had to implement the acceleration for an Intel card. Fortunately enough for me, there is a VDPAU library(driver?) which works over VA-API. So you can use VDPAU on Intel cards!

I've used the following link to setup it on my Ubuntu.

Also, you might want to check the comments to the original question where @Timothy_G also mentioned some links regarding VA-API.

67,913

Author by

ixSci

Updated on April 05, 2020

Comments

ixSci about 4 years

I need to have ffmpeg decode my video(e.g. h264) using hardware acceleration. I'm using the usual way of decoding frames: read packet -> decode frame. And I'd like to have ffmpeg speed up decoding. So I've built it with --enable-vaapi and --enable-hwaccel=h264. But I don't really know what should I do next. I've tried to use avcodec_find_decoder_by_name("h264_vaapi") but it returns nullptr. Anyway, I might want to use others API and not just VA API. How one is supposed to speed up ffmpeg decoding?

P.S. I didn't find any examples on Internet which uses ffmpeg with hwaccel.
Mark Essel over 9 years

much appreciate the sample source for Linux and vdpau, in a perfect world the osx solution you listed would work just as smoothly on linux (simply identifying intent for hardware acceleration would be a wonderful interface).
Mark Essel over 9 years

any chance you've got an example of grabbing back a bgra pixel buffer after hardware accelerated decoding a frame (pixel format vda)
ixSci over 9 years

@MarkEssel, you need to use sws_scale to convert a frame you get after the VDA to the format you need.
Mark Essel over 9 years

thanks, tried that first and bumped into: [swscaler @ 0x1033f3800] vda is not supported as input pixel format. Looked here for the underlying data read stackoverflow.com/a/12238071/51700
ixSci over 9 years

@MarkEssel, unfortunately I can't run the SW I've implemented the HW decoding in, so I can't verify what is the issue. But I looked at the code and I did nothing special to handle the frame after the decoding: here is my sws context creation: sws_getCachedContext(nullptr, frame->width, frame->height, static_cast<AVPixelFormat>(frame->format), frame->width, frame->height, AV_PIX_FMT_RGB24, SWS_BILINEAR | SWS_ACCURATE_RND, nullptr, nullptr, nullptr);
ixSci over 9 years

@MarkEssel, if I remember it right I had the same problem. It was because I have created sws context once on the initialization. On that step the context has VDA pixel format which doesn't make sense to sws scaler. So to have the correct pixel format you need to use the actual frame pixel format for the context creation and you will have it right after the decoding. I hope I recall it right.
Mark Essel over 9 years

woohoo, got something working using your method but it seems I use it once before initializing... whoops. patching it now
Mark Essel over 9 years

seems like first time or so returning from avcodec_decode_video2 the pFrame is coming back with a null ->data member
ixSci over 9 years

@MarkEssel, are you sure you are checking if the frame was fully decoded? It's 3rd parameter in avcodec_decode_video2
Mark Essel over 9 years

sure am. avcodec_decode_video2(m_pCodecCtx, m_pFrame, &frameFinished,&packet); if(frameFinished) { do stuff }
Hi-Angel about 7 years

Correction: actually, neither VAAPI nor VDPAU are exclusive (at least for desktop GPUs — I don't know about embedded). Indeed VAAPI was originally developed by Intel, but is supported by all Gallium drivers, i.e. at least Radeon and Nouveau (a non-official NVidia driver, I don't know about the official one). Same for VDPAU, as you can see at the link; Intel is non-gallium though, but from googling around it seems to support it in some way too (can't confirm for I have no Intel GPU).
Reda Drissi about 6 years

@ixSci would you share a link or reference of how to achieve the same with ffmpeg?
ixSci about 6 years

@RedaDrissi, what do you mean? This answer is about ffmpeg.
Reda Drissi about 6 years

@ixSci you have based your implementation on libavg, and you said it was possible to do it without libavg.
ixSci about 6 years

@RedaDrissi, I based my implementation on what I've found in libavg, but the implementation itself doesn't use anything beyond pure ffmpeg. It doesn't use libavg.
Fattie over 5 years

dear @ixSci - you may like to look at this problem which specifically deals with our iPhone (not Mac) problem we have found ... stackoverflow.com/q/54198895/294884 .. thanks for this post