What is the correct way to fix keyframes in FFmpeg for DASH?

video ffmpeg streaming video-conversion

108,451

Solution 1

The answer therefore seems to be:

Method 1 is verified to work, but is libx264-specific, and comes at the cost of eliminating the very useful scenecut option in libx264.
Method 3 works as of the FFMPEG version of April 2015, but you should verify your results with with the script included at the bottom of this post, as the FFMPEG documentation is unclear as to the effect of the option. If it works, it is the superior of the two options.
DO NOT USE Method 2, -g appears to be deprecated. It neither appears to work, nor is it explicitly defined in the documentation, nor is found in the help, nor does it appear to be used in the code. Code inspection shows that the -g option is likely meant for MPEG-2 streams (there are even code stanzas referring to PAL and NTSC!).

Also:

Files generated with Method 3 may be slightly larger than Method 1, as interstitial I frames (keyframes) are allowed.
You should explicitly set the "-r" flag in both cases, even though Method 3 places an I frame at the next frameslot on or after the time specified. Failure to set the "-r" flag places you at the mercy of the source file, possibly with a variable frame rate. Incompatible DASH transitions may result.
Despite the warnings in the FFMPEG documentation, method 3 is NOT less efficient than others. In fact, tests show that it might be slightly MORE efficient than method 1.

Script for the `-force_key_frames` option

Here is a short PERL program I used to verify I-frame cadence based on the output of slhck's ffprobe suggestion. It seems to verify that the -force_key_frames method will also work, and has the added benefit of allowing for scenecut frames. I have absolutely no idea how FFMPEG makes this work, or if I just lucked out somehow because my streams happen to be well-conditioned.

In my case, I encoded at 30fps with an expected GOP size of 6 seconds, or 180 frames. I used 180 as the gopsize argument to this program verified an I frame at each multiple of 180, but setting it to 181 (or any other number not a multiple of 180) made it complain.

#!/usr/bin/perl
use strict;
my $gopsize = shift(@ARGV);
my $file = shift(@ARGV);
print "GOPSIZE = $gopsize\n";
my $linenum = 0;
my $expected = 0;
open my $pipe, "ffprobe -i $file -select_streams v -show_frames -of csv -show_entries frame=pict_type |"
        or die "Blah";
while (<$pipe>) {
  if ($linenum > $expected) {
    # Won't catch all the misses. But even one is good enough to fail.
    print "Missed IFrame at $expected\n";
    $expected = (int($linenum/$gopsize) + 1)*$gopsize;
  }
  if (m/,I\s*$/) {
    if ($linenum < $expected) {
      # Don't care term, just an extra I frame. Snore.
      #print "Free IFrame at $linenum\n";
    } else {
      #print "IFrame HIT at $expected\n";
      $expected += $gopsize;
    }
  }
  $linenum += 1;
}

Solution 2

TL;DR

I would recommend the following:

libx264: -g X -keyint_min X (and optionally add -force_key_frames "expr:gte(t,n_forced*N)")
libx265: -x265-params "keyint=X:min-keyint=X"
libvpx-vp9: -g X

where X is the interval in frames and N is the interval in seconds. For example, for a 2-second interval with a 30fps video, X = 60 and N = 2.

A note about different frame types

In order to properly explain this topic, we first have to define the two types of I-frames / keyframes:

Instantaneous Decoder Refresh (IDR) frames: These allow independent decoding of the following frames, without access to frames previous to the IDR frame.
Non-IDR-frames: These require a previous IDR frame for the decoding to work. Non-IDR frames can be used for scene cuts in the middle of a GOP (group of pictures).

What is recommended for streaming?

For the streaming case, you want to:

Ensure that all IDR frames are at regular positions (e.g. at 2, 4, 6, … seconds) so that the video can be split up into segments of equal length.
Enable scene cut detection, so as to improve coding efficiency / quality. This means allowing I-frames to be placed in between IDR frames. You can still work with scene cut detection disabled (and this is part of many guides, still), but it's not necessary.

What do the parameters do?

In order to configure the encoder, we have to understand what the keyframe parameters do. I did some tests and discovered the following, for the three encoders libx264, libx265 and libvpx-vp9 in FFmpeg:

libx264:
- -g sets the keyframe interval.
- -keyint_min sets the minimum keyframe interval.
- -x264-params "keyint=x:min-keyint=y" is the same as -g x -keyint_min y.
- Note: When setting both to the same value, the minimum is internally set to half the maximum interval plus one, as seen in the x264 code:
```
h->param.i_keyint_min = x264_clip3( h->param.i_keyint_min, 1, h->param.i_keyint_max/2+1 );
```
libx265:
- -g is not implemented.
- -x265-params "keyint=x:min-keyint=y" works.
libvpx-vp9:
- -g sets the keyframe interval.
- -keyint_min sets the minimum keyframe interval
- Note: Due to how FFmpeg works, -keyint_min is only forwarded to the encoder when it is the same as -g. In the code from libvpxenc.c in FFmpeg we can find:
```
if (avctx->keyint_min >= 0 && avctx->keyint_min == avctx->gop_size)
    enccfg.kf_min_dist = avctx->keyint_min;
if (avctx->gop_size >= 0)
    enccfg.kf_max_dist = avctx->gop_size;
```
  This might be a bug (or lack of feature?), since libvpx definitely supports setting a different value for kf_min_dist.

Should you use `-force_key_frames`?

The -force_key_frames option forcibly inserts keyframes at the given interval (expression). This works for all encoders, but it might mess with the rate control mechanism. Especially for VP9, I've noticed severe quality fluctuations, so I cannot recommend using it in this case.

Solution 3

Here is my fifty cents for the case.

Method 1:

messing with libx264's arguments

-c:v libx264 -x264opts keyint=GOPSIZE:min-keyint=GOPSIZE:scenecut=-1

Generate iframes only at the desired intervals.

Example 1:

ffmpeg -i test.mp4 -codec:v libx264 \
-r 23.976 \
-x264opts "keyint=48:min-keyint=48:no-scenecut" \
-c:a copy \
-y test_keyint_48.mp4

Generate iframes as expected like this:

Iframes     Seconds
1           0
49          2
97          4
145         6
193         8
241         10
289         12
337         14
385         16
433         18
481         20
529         22
577         24
625         26
673         28
721         30
769         32
817         34
865         36
913         38
961         40
1009        42
1057        44
1105        46
1153        48
1201        50
1249        52
1297        54
1345        56
1393        58

Method 2 is depreciated. Ommitted.

Method 3:

insert a keyframe every N seconds (MAYBE):

-force_key_frames expr:gte(t,n_forced*GOP_LEN_IN_SECONDS)

Example 2

ffmpeg -i test.mp4 -codec:v libx264 \
-r 23.976 \
-force_key_frames "expr:gte(t,n_forced*2)"
-c:a copy \
-y test_fkf_2.mp4

Generate an iframes in a slightly different way:

Iframes     Seconds
1           0
49          2
97          4
145         6
193         8
241         10
289         12
337         14
385         16
433         18
481         20
519         21.58333333
529         22
577         24
625         26
673         28
721         30
769         32
817         34
865         36
913         38
931         38.75
941         39.16666667
961         40
1008        42
1056        44
1104        46
1152        48
1200        50
1248        52
1296        54
1305        54.375
1344        56
1367        56.95833333
1392        58
1430        59.58333333
1440        60
1475        61.45833333
1488        62
1536        64
1544        64.33333333
1584        66
1591        66.29166667
1632        68
1680        70
1728        72
1765        73.54166667
1776        74
1811        75.45833333
1824        75.95833333
1853        77.16666667
1872        77.95833333
1896        78.95833333
1920        79.95833333
1939        80.75
1968        81.95833333

As you can see it places iframes every 2 seconds AND on scenecut (seconds with floating part) which is important for video stream complexity in my opinion.

Genearated file sizes are pretty the same. Very strange that even with more keyframes in Method 3 it generates sometimes less file than standard x264 library algorithm.

For generating multiple bitrate files for HLS stream we choose method three. It perfectly aligned with 2 seconds between chunks, they have iframe at the beginning of every chunk and they have additional iframes on complex scenes which provides better experience for users who has an outdated devices and can not playback x264 high profiles.

Hope it helps someone.

Solution 4

I wanted to add some info here since my googling pulled up this discussion quite a bit in my quest to find info on trying to find a way to segment my DASH encoding the way I wanted, and none of the info I found was totally correct.

First several misconceptions to get rid of:

Not all I-frames are the same. There's big "I" frames and little "i" frames. Or to use correct terminology, IDR I-Frames and non-IDR I-Frames. IDR I-frames (sometimes called "keyframes") will create a new GOP. The non-IDR frames will not. They are handy to have inside of a GOP where there is a scene change.
-x264opts keyint=GOPSIZE:min-keyint=GOPSIZE ← This does not do what you think it does. This took me a little while to figure out. It turns out the min-keyint is limited in the code. It is not allowed to be greater than (keyint / 2) + 1. So assigning the same value to these two variables results in the value for min-keyint getting knocked down by half when encoding.

Here's the thing: scene-cut is really great, especially in video that has fast hard cuts. It keeps it nice and crisp, so I don't want to disable it, but at the same time I couldn't get a fixed GOP size as long as it was enabled. I wanted to enable scene-cut, but to only have it use non-IDR I-frames. But it wasn't working. Until I figured out (from lots of reading) about misconception #2.

It turns out I needed to set keyint to double my desired GOP size. This means that min-keyint can be set to my desired GOP size (without the internal code cutting it in half), which prevents scene-cut detection from using IDR I-frames inside the GOP size because the frame count since the last IDR I-Frame is always less than min-keyinit.

And finally setting the force_key_frame option overrides the double size keyint. So here's what works:

I prefer segments in 2 second chunks, so my GOPSIZE = Framerate * 2

ffmpeg <other_options> -force_key_frames "expr:eq(mod(n,<GOPSIZE>),0)" -x264opts rc-lookahead=<GOPSIZE>:keyint=<GOPSIZE * 2>:min-keyint=<GOPSIZE> <other_options>

You can verify using ffprobe:

ffprobe <SRC_FLE> -select_streams v -show_frames -of csv -show_entries frame=coded_picture_number,key_frame,pict_type > frames.csv

In the generated CSV file each line will tell you: frame, [is_an_IDR_?], [frame_type], [frame_number]:

frame,1,I,60  <-- frame 60, is I frame, 1 means is an IDR I-frame (aka KeyFrame)
frame,0,I,71  <-- frame 71, is I frame, 0 means not an IDR I_frame

The result is that you should only see IDR I-Frames at fixed GOPSIZE intervals, while all other I frames are non-IDR I-frames inserted as needed by scene-cut detection.

Solution 5

Twitch has a post about this. They explain that they decided to use their own program for several reasons; one of them was that ffmpeg doesn't let you run different x264 instances in different threads, but instead devotes all specified threads to one frame in one output before moving on to the next output.

If you aren't doing real-time streaming, you have more luxury. The 'correct' way is probably to encode at one resolution with just the GOP size specified with -g, and then encode the other resolutions forcing keyframes at the same places.

If you wanted to do that, you might use ffprobe to get the keyframe times and then use a shell script or an actual programming language to convert that into an ffmpeg command.

But for most content, there's very little difference between having one keyframe every 5 seconds and two keyframes every 5 seconds (one forced and one from scenecut). This is about the average I-frame size vs the size of P-frames and B-frames. If you use x264 with typical settings (the only reason I think you should do anything to affect these is if you set -qmin, as a poor way of preventing x264 from using bitrate on easy content; this limits all frame types to the same value, I think) and get a result like I-frame average size of 46 kB, P-frame 24 kB, B-frame 17 kB (half as frequent as P-frames), then an extra I-frame every second at 30 fps is only a 3% increase in file size. The difference between h264 and h263 might be made up of a bunch of 3% decreases, but a single one isn't very important.

On other types of content, frame sizes will be different. To be fair, this is about temporal complexity and not spatial complexity, so it isn't just easy content vs hard content. But generally, streaming video sites have a bitrate limit, and content with relatively large I-frames is easy content that will be encoded at high quality no matter how many extra keyframes are added. It's wasteful, but this waste will usually not be noticed. The most wasteful case is probably a video that's just a static image accompanying a song, where each keyframe is exactly the same.

One thing I'm not sure of is how forced keyframes interact with the rate limiter set with -maxrate and -bufsize. I think even YouTube has had recent problems correctly configuring buffer settings to give consistent quality. If you're just using average bitrate settings as can be seen by some sites (since you can inspect x264's options in the header/mov atom? with a hex editor) then the buffer model isn't a problem, but if you're serving user-generated content, average bitrate encourages users to add a black screen at the end of their video.

Ffmpeg's -g option, or any other encoder option that you use, is mapped to the encoder-specific option. So '-x264-params keyint=GOPSIZE' is equivalent to '-g GOPSIZE'.

One problem with using scene detection is if you prefer keyframes near specific numbers for whatever reason. If you specify keyframes every 5 seconds and use scene detection, and there's a scene change at 4.5, then it should be detected, but then the next keyframe will be at 9.5. If the time keeps getting stepped up like this, you could end up with keyframes at 42.5, 47.5, 52.5, etc., instead of 40, 45, 50, 55. Conversely, if there's a scene change at 5.5, then there will be a keyframe at 5 and 5.5 will be too early for another one. Ffmpeg doesn't let you specify "make a keyframe here if there's no scene change within the next 30 frames". Someone who understands C could add that option, though.

For variable-frame-rate video, when you're not live-streaming like Twitch, you should be able to use scene changes without converting permanently to constant frame-rate. If you use the 'select' filter in ffmpeg and use the 'scene' constant in the expression, then the debug output (-v debug or press '+' several times while encoding) shows the scene change number. This is probably different from, and not as useful as, the number used by x264, but it could still be useful.

The procedure, then, would probably be to do a test video that's only for keyframe changes, but maybe could be used for rate control data if using 2-pass. (Not sure if the generated data is at all useful for different resolutions and settings; the macroblock-tree data won't be.) Convert it to constant-framerate video, but see this bug about stuttering output when halving framerate if you ever decide to use the fps filter for other purposes. Run it through x264 with your desired keyframe and GOP settings.

Then just use these keyframe times with the original variable frame-rate video.

If you allow completely crazy user-generated content with a 20-second gap between frames, then for the variable frame-rate encode, you could split the output, use fps filter, somehow use select filter (maybe build a really long expression that has every keyframe time)... or maybe you could use the test video as input and either decode only keyframes, if that ffmpeg option works, or use the select filter to select keyframes. Then scale it to the correct size (there's even a scale2ref filter for this) and overlay the original video on it. Then use the interleave filter to combine these destined-to-be forced keyframes with the original video. If this results in two frames that are 0.001 sec apart that the interleave filter doesn't prevent, then address this problem yourself with another select filter. Dealing with frame buffer limits for the interleave filter could be the main problem here. These could all work: use some kind of filter to buffer the denser stream (fifo filter?); refer to the input file multiple times so it's decoded more than once and frames don't have to be stored; use the 'streamselect' filter, which I have never done, at exactly the times of the keyframes; improve the interleave filter by changing its default behaviour or adding an option to output the oldest frame in a buffer instead of dropping a frame.

View more solutions

108,451

Mark Gerolimatos

Currently working on streaming video distribution. Developing on Android systems, both native/C++ and API-based/Java code; on OS-X/iOS in C++ and Tcl-C (strangely called "Objective" C); on Linux in Scala/Java, BASH, PERL and Python.

Updated on September 18, 2022

Comments

Mark Gerolimatos over 1 year
When conditioning a stream for DASH playback, random access points must be at the exact same source stream time in all streams. The usual way to do this is to force a fixed frame rate and fixed GOP length (i.e. a keyframe every N frames).

In FFmpeg, fixed frame rate is easy (-r NUMBER).

But for fixed keyframe locations (GOP length), there are three methods...which one is "correct"? The FFmpeg documentation is frustratingly vague on this.

Method 1: messing with libx264's arguments
```
-c:v libx264 -x264opts keyint=GOPSIZE:min-keyint=GOPSIZE:scenecut=-1
```
There seems to be some debate if scenecut should be turned off or not, as it is unclear if the keyframe "counter" is restarted when a scene cut happens.

Method 2: setting a fixed GOP size:
```
-g GOP_LEN_IN_FRAMES
```
This is unfortunately only documented in passing in the FFMPEG documentation, and thus the effect of this argument is very unclear.

Method 3: insert a keyframe every N seconds (Maybe?):
```
-force_key_frames expr:gte(t,n_forced*GOP_LEN_IN_SECONDS)
```
This is explicitly documented. But it is still not immediately clear if the "time counter" restarts after every key frame. For instance, in an expected 5-second GOP, if there is a scenecut keyframe injected 3 seconds in by libx264, would the next keyframe be 5 seconds later or 2 seconds later?

In fact, the FFmpeg documentation differentiates between this and the -g option, but it doesn't really say how these two options above are the least bit different (obviously, -g is going to require a fixed frame rate).

Which is right?

It would seem that the -force_key_frames would be superior, as it would not require a fixed frame rate. However, this requires that
- it conforms to GOP specifications in H.264 (if any)
- it GUARANTEES that there would be a keyframe in fixed cadence, irrespective of libx264 scenecut keyframes.
It would also seem that -g could not work without forcing a fixed frame rate (-r), as there is no guarantee that multiple runs of ffmpeg with different codec arguments would provide the same instantaneous frame rate in each resolution. Fixed frame rates may reduce compression performance (IMPORTANT in a DASH scenario!).

Finally, the keyint method just seems like a hack. I hope against hope that this isn't the correct answer.

References:

An example using the -force_key_frames method

An example using the keyint method

FFmpeg advanced video options section
Mark Gerolimatos about 9 years

Thank you! This is great feedback. One question I have is how you generated that awesome table. I could totally use something like that.
Mark Gerolimatos about 9 years

(There appears to be no way to write you directly) Can you please point me towards links to any threads in this ITU-T discussion? Thanks!
slhck about 9 years

I just made that in Excel, pasting the output I got from three runs of ffprobe -i input.mp4 -select_streams v -show_frames -of csv -show_entries frame=pict_type, then coloring the cells. I'm afraid there are no public discussions, but I'll see if I can dig up some of the links I found back then.
Mark Gerolimatos about 9 years

Could you please re-try your experiment with the -force_key_frames expr:gte(t,n_forced*GOP_LEN_IN_SECONDS) form? I just tried it and found that while there were extra I frames in the stream, it DID seem to abide by my rule. A PERL program will follow as an "answer", as you cannot apparently use markup in comments.
slhck about 9 years

Interesting. I believe it's worth a separate "real" answer if you found out that it works. (Stack Exchange sites aren't really good for this discussion-style reply.) The last time I checked, -force_key_frames didn't work for me, and so I never tried it again. That was more than a year ago. Perhaps it was a bug. I'll try again soon.
slhck about 9 years

Just a note: Since this is a Q&A site and not really a discussion forum where posts are ordered chronologically, it's best to put all the information into one answer, so that people looking for a solution just have to read one post and not to look at who posted what, when :) I merged your answers and gave you a +1 on this, too. Since cross posting is not allowed, I'd suggest you delete your question on the Video site. People will find the answer(s) here.
schieferstapel over 7 years

@slhck: Could you give more details please? I've looked in the mailing list archives in May 2015 but couldn't find anything. The bottom line would be to forget about "Method 3" and stick to "Method 1".
Gyan about 7 years

@MarkGerolimatos : about -g, you say, "It neither appears to work, ... nor does it appear to be used in the code.". I checked and the the input of g is stored in avctx->gop_size and that libx264 makes use of it: x4->params.i_keyint_max = avctx->gop_size;. When I probe this generated test file: ffmpeg -i a-test-file.mp4 -g 37 -t 15 gtest.mp4, I get keyframes at exactly 0,37,74,111,148,185,222,259,296,333,370. A GOP could be cut short if scene change is triggered, and for that -sc_threshold could be set, which is also picked up by x264.
Mark Gerolimatos almost 7 years

that was fantastic! It was aldo highly counterintuitive, thank you for putting in the effort. And to summarize, I assume your definition of "I-frames" and "i-frames" is conceptual (that is, not explictly configurable in libx264) , and that the "max * 2" was the way you enforced it?
Reuben almost 7 years

Yes that was conceptual, although I've seen people use "I" vs "i" to distinguish between IDR and non-IDR I-frames. And yes, setting keyinit to the desired gop size * 2 is a way to force all I frames inside the gop to be non-IDR I-frames. Then the ffmpeg -force-key-frames over-rides key-init in the x264opts. Basically it's a really backwards way to get the desired outcome that would be possible if the x264 code allowed you to set min-keyinit and keyinit to the same value.
Reuben almost 7 years

... while also being able to both keep the scene-cut detection turned on and get fixed GOP size.
Mark Gerolimatos almost 7 years

thanks again for your awesome work! Sounds like we need a less "backwards" way of effecting it
Alexander Svetkin about 6 years

Is rc-lookahead necessary here? It affects mbtree and VBV, but does it affect i-frame generation?
Reuben about 6 years

Setting rc-lookahead explicitly prevents wasting resources. If it's not set, the mbtree and VBV will be analyzing twice as many frames as they need to since they would use the value of keyint if rc-lookahead is not set. (they use the lesser of the two arguments)
mivk almost 5 years

Method 2 -g $gopsize seems to work fine for me, using ffmpeg v. 4 and libx264. It sets a keyframe at least every $gopsize frames, and sometimes at a shorter interval, probably because of a scene change. I also could not find any reference of this switch being deprecated. So at least for x264, as of 2019, I will be using that, which is short, simple and seems to do exactly what I want.
Low power over 4 years

I actually got SIGSEGV in libvpx.so.3 when I trying to use -force_key_frames on VP9 encoding; but -g works.
momt99 over 3 years

It seems that x264opts is deprecated and replaced with x264-params.
Ryan H. over 3 years

It's 2021 and -g still works. Where is it documented that it's deprecated?