Creating a video from images using ffmpeg libav and libx264?

ffmpeg h.264 video-encoding libav libx264

11,121

Libav is probably delaying the processing of the initial frames. A good practice is to check for any delayed frames after you have finished processing all frames. This is done as follows:

int i=NUMBER_OF_FRAMES_PREVIOUSLY_ENCODED
for(; got_packet_ptr; i++)
   ret = avcodec_encode_video2(m_codecContext, &packet, NULL, &got_packet_ptr);
//Write the packets to a container after this.

The point is to pass a NULL pointer in place of the frame to be encoded and continue to do so until the packet you get is non-empty. See this link for the code example - the part under "get the delayed frames".

An easier way out would be to set the number of b frames to be 0.

m_codecContext->max_b_frames = 0;

Let me know if this works fine.

Also, you haven't used the libx264 API at all. You can make use of the libx264 APIs for encoding videos, they have a simpler and cleaner syntax. Plus it offers you more control over the settings and improved performance.

For writing the video stream to mkv container, you still will have to use the libav libraries. though.

11,121

Author by

marikaner

I ❤️ TypeScript. I am employed by SAP, but views and opinions expressed here are my own.

Updated on June 09, 2022

Comments

marikaner almost 2 years

I am trying to create a video from images using the ffmpeg library. The images have a size of 1920x1080 and are supposed to be encoded with H.264 using a .mkv container. I have come across various problems, thinking I am getting closer to a solution, but this one I am really stuck on. With the settings I use, the first X frames (around 40, depending on what and how many images I use for the video) of my video are not encoded. avcodec_encode_video2 does not return any error (return value is 0) with got_picture_ptr = 0. The result is a video that actually looks as expected, but the first seconds are weirdly jumpy.

So this is how I create the video file:

// m_codecContext is an instance variable of type AVCodecContext *
// m_formatCtx is an instance variable of type AVFormatContext *

// outputFileName is a valid filename ending with .mkv
AVOutputFormat *oformat = av_guess_format(NULL, outputFileName, NULL);
if (oformat == NULL)
{
    oformat = av_guess_format("mpeg", NULL, NULL);
}

// oformat->video_codec is AV_CODEC_ID_H264
AVCodec *codec = avcodec_find_encoder(oformat->video_codec);

m_codecContext = avcodec_alloc_context3(codec);
m_codecContext->codec_id = oformat->video_codec;
m_codecContext->codec_type = AVMEDIA_TYPE_VIDEO;
m_codecContext->gop_size = 30;
m_codecContext->bit_rate = width * height * 4
m_codecContext->width = width;
m_codecContext->height = height;
m_codecContext->time_base = (AVRational){1,frameRate};
m_codecContext->max_b_frames = 1;
m_codecContext->pix_fmt = AV_PIX_FMT_YUV420P;

m_formatCtx = avformat_alloc_context();
m_formatCtx->oformat = oformat;
m_formatCtx->video_codec_id = oformat->video_codec;

snprintf(m_formatCtx->filename, sizeof(m_formatCtx->filename), "%s", outputFileName);

AVStream *videoStream = avformat_new_stream(m_formatCtx, codec);
if(!videoStream)
{
   printf("Could not allocate stream\n");
}
videoStream->codec = m_codecContext;

if(m_formatCtx->oformat->flags & AVFMT_GLOBALHEADER)
{
   m_codecContext->flags |= CODEC_FLAG_GLOBAL_HEADER;
}

avcodec_open2(m_codecContext, codec, NULL) < 0);
avio_open(&m_formatCtx->pb, outputFileName.toStdString().c_str(), AVIO_FLAG_WRITE);
avformat_write_header(m_formatCtx, NULL);

this is how the frames are added:

void VideoCreator::writeImageToVideo(const QSharedPointer<QImage> &img, int frameIndex)
{
    AVFrame *frame = avcodec_alloc_frame();

    /* alloc image and output buffer */

    int size = m_codecContext->width * m_codecContext->height;
    int numBytes = avpicture_get_size(m_codecContext->pix_fmt, m_codecContext->width, m_codecContext->height);

    uint8_t *outbuf = (uint8_t *)malloc(numBytes);
    uint8_t *picture_buf = (uint8_t *)av_malloc(numBytes);

    int ret = av_image_fill_arrays(frame->data, frame->linesize, picture_buf, m_codecContext->pix_fmt, m_codecContext->width, m_codecContext->height, 1);

    frame->data[0] = picture_buf;
    frame->data[1] = frame->data[0] + size;
    frame->data[2] = frame->data[1] + size/4;
    frame->linesize[0] = m_codecContext->width;
    frame->linesize[1] = m_codecContext->width/2;
    frame->linesize[2] = m_codecContext->width/2;

    fflush(stdout);


    for (int y = 0; y < m_codecContext->height; y++)
    {
        for (int x = 0; x < m_codecContext->width; x++)
        {
            unsigned char b = img->bits()[(y * m_codecContext->width + x) * 4 + 0];
            unsigned char g = img->bits()[(y * m_codecContext->width + x) * 4 + 1];
            unsigned char r = img->bits()[(y * m_codecContext->width + x) * 4 + 2];

            unsigned char Y = (0.257 * r) + (0.504 * g) + (0.098 * b) + 16;

            frame->data[0][y * frame->linesize[0] + x] = Y;

            if (y % 2 == 0 && x % 2 == 0)
            {
                unsigned char V = (0.439 * r) - (0.368 * g) - (0.071 * b) + 128;
                unsigned char U = -(0.148 * r) - (0.291 * g) + (0.439 * b) + 128;

                frame->data[1][y/2 * frame->linesize[1] + x/2] = U;
                frame->data[2][y/2 * frame->linesize[2] + x/2] = V;
            }
        }
    }

    int pts = frameIndex;//(1.0 / 30.0) * 90.0 * frameIndex;

    frame->pts = pts;//av_rescale_q(m_codecContext->coded_frame->pts, m_codecContext->time_base, formatCtx->streams[0]->time_base); //(1.0 / 30.0) * 90.0 * frameIndex;

    int got_packet_ptr;
    AVPacket packet;
    av_init_packet(&packet);
    packet.data = outbuf;
    packet.size = numBytes;
    packet.stream_index = formatCtx->streams[0]->index;
    packet.flags |= AV_PKT_FLAG_KEY;
    packet.pts = packet.dts = pts;
    m_codecContext->coded_frame->pts = pts;

    ret = avcodec_encode_video2(m_codecContext, &packet, frame, &got_packet_ptr);
    if (got_packet_ptr != 0)
    {
        m_codecContext->coded_frame->pts = pts;  // Set the time stamp

        if (m_codecContext->coded_frame->pts != (0x8000000000000000LL))
        {
            pts = av_rescale_q(m_codecContext->coded_frame->pts, m_codecContext->time_base, formatCtx->streams[0]->time_base);
        }
        packet.pts = pts;
        if(m_codecContext->coded_frame->key_frame)
        {
           packet.flags |= AV_PKT_FLAG_KEY;
        }

        std::cout << "pts: " << packet.pts << ", dts: "  << packet.dts << std::endl;

        av_interleaved_write_frame(formatCtx, &packet);
        av_free_packet(&packet);
    }

    free(picture_buf);
    free(outbuf);
    av_free(frame);
    printf("\n");
}

and this is the cleanup:

int numBytes = avpicture_get_size(m_codecContext->pix_fmt, m_codecContext->width, m_codecContext->height);
int got_packet_ptr = 1;

int ret;
//        for(; got_packet_ptr != 0; i++)
while (got_packet_ptr)
{
    uint8_t *outbuf = (uint8_t *)malloc(numBytes);

    AVPacket packet;
    av_init_packet(&packet);
    packet.data = outbuf;
    packet.size = numBytes;

    ret = avcodec_encode_video2(m_codecContext, &packet, NULL, &got_packet_ptr);
    if (got_packet_ptr)
    {
        av_interleaved_write_frame(m_formatCtx, &packet);
    }

    av_free_packet(&packet);
    free(outbuf);
}

av_write_trailer(formatCtx);

avcodec_close(m_codecContext);
av_free(m_codecContext);
printf("\n");

I assume it might be tied to the PTS and DTS values, but I have tried EVERYTHING. The frame index seems to make the most sense. The images are correct, I can save them to files without any problems. I am running out of ideas. I would be incredibly thankful if there was someone out there who knew better than me...

Cheers, marikaner

UPDATE:

If this is of any help this is the output at the end of the video encoding:

[libx264 @ 0x7fffc00028a0] frame I:19    Avg QP:14.24  size:312420
[libx264 @ 0x7fffc00028a0] frame P:280   Avg QP:19.16  size:148867
[libx264 @ 0x7fffc00028a0] frame B:181   Avg QP:21.31  size: 40540
[libx264 @ 0x7fffc00028a0] consecutive B-frames: 24.6% 75.4%
[libx264 @ 0x7fffc00028a0] mb I  I16..4: 30.9% 45.5% 23.7%
[libx264 @ 0x7fffc00028a0] mb P  I16..4:  4.7%  9.1%  4.5%  P16..4: 23.5% 16.6% 12.6%  0.0%  0.0%    skip:28.9%
[libx264 @ 0x7fffc00028a0] mb B  I16..4:  0.6%  0.5%  0.3%  B16..8: 26.7% 11.0%  5.5%  direct: 3.9%  skip:51.5%  L0:39.4% L1:45.0% BI:15.6%
[libx264 @ 0x7fffc00028a0] final ratefactor: 19.21
[libx264 @ 0x7fffc00028a0] 8x8 transform intra:48.2% inter:47.3%
[libx264 @ 0x7fffc00028a0] coded y,uvDC,uvAC intra: 54.9% 53.1% 30.4% inter: 25.4% 13.5% 4.2%
[libx264 @ 0x7fffc00028a0] i16 v,h,dc,p: 41% 29% 11% 19%
[libx264 @ 0x7fffc00028a0] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 16% 26% 31%  3%  4%  3%  7%  3%  6%
[libx264 @ 0x7fffc00028a0] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 30% 26% 14%  4%  5%  4%  7%  4%  7%
[libx264 @ 0x7fffc00028a0] i8c dc,h,v,p: 58% 26% 13%  3%
[libx264 @ 0x7fffc00028a0] Weighted P-Frames: Y:17.1% UV:3.6%
[libx264 @ 0x7fffc00028a0] ref P L0: 63.1% 21.4% 11.4%  4.1%  0.1%    
[libx264 @ 0x7fffc00028a0] ref B L0: 85.7% 14.3%
[libx264 @ 0x7fffc00028a0] kb/s:27478.30

marikaner almost 11 years

Thank you very much for taking your time. Unfortunately neither setting the number of b frames nor writing the delayed frames seems to do the trick. Although there obviously are delayed frames, as the program enters the loop. The video actually seems less jumpy, but still not correct. It seems there is a hole after 2 seconds where only 2 still images are shown during 2 seconds as if the frames in between were missing.
Hrishikesh_Pardeshi almost 11 years

Can you specify the number of frames in the resulting video and the total number of Images you are intending to encode? You can just check the number of calls to av_interleaved_write() (should be 480 as per your update). Also, what's the frameIndex calculation ?
marikaner almost 11 years

Yes, the number of images I am indenting to encode is 480. frameIndex is just an integer that is incremented from 0 to 479 with each frame. av_interleaved_write() is called 442 times after avcodec_encode_video2 using an actual frame and 38 times after avcodec_encode_video2 using NULL.
Hrishikesh_Pardeshi almost 11 years

Seems correct then. If all of the 480 frames are being encoded, there shouldn't be any problem in the resulting output video? Check out the resulting video in a tool like virtualDub which allows you to progress frame by frame and check if any of the input images are missing. If you can provide the output, it would be easier for me to visualize the problem.
Hrishikesh_Pardeshi almost 11 years

The delayed processing of frames occurs only when you have allowed b frames encoding. The output you put in the update shows 181 b frames encoded, is this with max_bframes set to 0 or a positive value?