Generate single MPEG-Dash segment with ffmpeg

10,909

Solution 1

I know it's an relatively old question, but I think I managed to implement the solution you're describing. To summarize, the idea was to provide a dash manifest to the client, but only convert the segments when the client was asking for them.

The steps to achieve that were:

  • Convert a 10-second section of one stream of the original file using ffmpeg (or extracting it if it was in x264 already)
  • Repackaging it using MP4Box for MSE to consume it on the client side.

The command for step 1 would look like this (for the 3rd segment of stream 0):

ffmpeg -y -ss 30 -t 11 -threads 8 -copyts -start_at_zero -i "/path/to/original.mp4" -map 0:1 -c copy /tmp/output_segment.mp4

"-ss 30" tells ffmpeg to start 30 seconds after the start of the file. "-t 11" keeps 11 seconds of the track after that (the overlap avoids gaps in the playback). "-copyts" keeps the timestamps as they are, so the extracted segmented would start at 30s, not 0. "-c copy" copies the original stream and would be replaced by something like "-g 30 -c:v libx264 -crf 22 -profile:v high -level 3.1" if it had to be transcoded.

The second command to repackage the workstream is:

MP4Box -dash 10000 -frag 500 -rap -single-file -segment-name segment_base_name_ -tfdt $TFDT_OFFSET /tmp/output_segment.mp4 -out /tmp/unused_ouput.mp4

The ouput can be discarded, but it also creates a file named segment_base_name_init.mp4 that is then the actual segment you need. The -tfdt argument here is the most important as is offsets the segment properly in the timeline. To get the right value, I use the following command (because keyframes are not exactly at the 10s marks, the start of the segment may not be where we expect it to be):

ffprobe -print_format json -show_streams /tmp/output_segment.mp4

The right value is start_time * 1000 (-tfdt uses milliseconds)

I hope this helps, it took me a while to make it work and I stumbled upon this question since MP4Box has suddenly stopped working since the last update. Also note you can achieve that also with VP9 and Vorbis, you then don't need to repack the streams.

EDIT

For anyone who would be interested in this, there are some issues with the method I described above since MP4Box doesn't properly update the tfdt records since version 1.0 (?).

When creating a segment independently of the others, the segment has to be compliant with the Dash standard (which MP4Box did in the previous solution but FFMpeg can do it too using -f dash for the output). Options also have to ensure that boudaries of the segments are aligned with RAP (or SAP or i-frames, I think). The command looks like this:

ffmpeg  -y -ss 390 -to  400 -threads 6 -copyts -start_at_zero -noaccurate_seek -i input.mkv -map 0:1 -c copy -movflags frag_keyframe -single_file_name segment_39.mp4 -global_sidx 1 -min_frag_duration 500 -f dash unused.mpd

Then the problem is to ensure that each segment will be properly placed in the timeline by MSE. In a fragmented MP4 file, there are three locations that influence the position in the timeline:

  • in the moov box (general information on the video), the else box (in trak, edts) will have a list of edits. FFMpeg, when using -ss with -copyts, will create an empty edit before the video itself with the duration of -ss (in ms)
  • in the sidx box (index allowing to locate segments), the earliest_presentation_time field also defines an offset in track timebase
  • in each moof boxes (the header for a fragment), the tfdt box in traf has a base_media_decode_time field, placing each fragment on the timeline, also in track timebase

The problem with FFMpeg is that it will properly create the first two, but tfdt times start from zero. Since I failed to find a way to do this, I've written those simple functions to correct that. Note that it removes the first edit since it's recognized by Firefox, but not by Chrome, so videos are then compatible with both.

    async function adjustSegmentTimestamps() {
        // console.log('Closing FFMPEG data (code should be 0)', code, signal);
        const file = await open(this.filename, 'r');
        const buffer = await readFile(file);
        await file.close();

        this.outFile = await open(this.filename, 'w', 0o666);

        // Clear first entry in edit list (required for Firefox)
        const moovOffset = this.seekBoxStart(buffer, 0, buffer.length, 'moov');
        if (moovOffset == -1) {
            throw new Error('Cannot find moov box');
        }
        const moovSize = buffer.readUInt32BE(moovOffset);
        const trakOffset = this.seekBoxStart(buffer, moovOffset + 8, moovSize - 8, 'trak');
        if (trakOffset == -1) {
            throw new Error('Cannot find trak box');
        }
        const trakSize = buffer.readUInt32BE(trakOffset);
        const edtsOffset = this.seekBoxStart(buffer, trakOffset + 8, trakSize - 8, 'edts');
        if (edtsOffset == -1) {
            throw new Error('Cannot find edts box');
        }
        const edtsSize = buffer.readUInt32BE(edtsOffset);
        const elstOffset = this.seekBoxStart(buffer, edtsOffset + 8, edtsSize - 8, 'elst');
        if (elstOffset == -1) {
            throw new Error('Cannot find elst box');
        }
        const numEntries = buffer.readUInt32BE(elstOffset + 12);
        console.log('Elst entries', numEntries);
        if (numEntries === 2) {
            console.log('Setting 1st elst entry to 0 duration vs. ', buffer.readUInt32BE(elstOffset + 16));
            buffer.writeUInt32BE(0, elstOffset + 16);
        }

        // Looking for sidx to find offset
        let sidxOffset = this.seekBoxStart(buffer, 0, buffer.length, 'sidx');
        if (sidxOffset == -1) {
            throw new Error('Cannot find sidx box');
        }
        sidxOffset += 8;

        const sidxVersion = buffer.readUInt8(sidxOffset);
        let earliest_presentation_time;
        if (sidxVersion) {
            earliest_presentation_time = buffer.readBigUInt64BE(sidxOffset + 12);
            // buffer.writeBigInt64BE(BigInt(0), sidxOffset + 12);
        } else {
            earliest_presentation_time = buffer.readUInt32BE(sidxOffset + 12);
            // buffer.writeUInt32BE(0, sidxOffset + 12);
        }

        console.log('Found sidx at ', sidxOffset, earliest_presentation_time);

        // Adjust tfdt in each moof
        let moofOffset = 0;
        while (moofOffset < buffer.length) {
            console.log();
            moofOffset = this.seekBoxStart(buffer, moofOffset, buffer.length - moofOffset, 'moof');
            if (moofOffset == -1) {
                console.log('No more moofs');
                break;
            }
            const moofSize = buffer.readUInt32BE(moofOffset);

            if (moofOffset == -1) {
                console.log('Finished with moofs');
                break;
            }
            console.log('Next moof at ', moofOffset);

            const trafOffset = this.seekBoxStart(buffer, moofOffset + 8, moofSize - 8, 'traf');
            const trafSize = buffer.readUInt32BE(trafOffset);
            console.log('Traf offset found at', trafOffset);
            if (trafOffset == -1) {
                throw new Error('Traf not found');
            }

            const tfdtOffset = this.seekBoxStart(buffer, trafOffset + 8, trafSize - 8, 'tfdt');
            console.log('tfdt offset found at', tfdtOffset);
            if (tfdtOffset == -1) {
                throw new Error('Tfdt not found');
            }

            const tfdtVersion = buffer.readUInt8(tfdtOffset + 8);
            let currentBaseMediaDecodeTime;
            if (tfdtVersion) {
                currentBaseMediaDecodeTime = buffer.readBigUInt64BE(tfdtOffset + 12);
                buffer.writeBigInt64BE(currentBaseMediaDecodeTime + earliest_presentation_time, tfdtOffset + 12);
            } else {
                currentBaseMediaDecodeTime = buffer.readUInt32BE(tfdtOffset + 12);
                buffer.writeUInt32BE(currentBaseMediaDecodeTime + earliest_presentation_time, tfdtOffset + 12);
            }
            console.log('TFDT offset', currentBaseMediaDecodeTime);


            moofOffset += moofSize;
        }

        await this.outFile.write(buffer);
        await this.outFile.close();
    }

    async function seekBoxStart(buffer: Buffer, start: number, size: number, box: string): number {
        let offset = start;
        while (offset - start < size) {
            const size_ = buffer.readUInt32BE(offset);
            const type_ = buffer.toString('ascii', offset + 4, offset + 8);

            console.log('Found box:', type_);
            if (type_ === box) {
                console.log('Found box at ', box, offset);
                return offset;
            }

            offset += size_;
        }

        return -1;
    }

Solution 2

It sounds like what you are describing is live streaming rather than VOD - live streams are continuous, usually real time video streams and VOD is typically a video file which is served when the user requests it.

The usual way VOD is done in larger solutions is to segment the video first and then to package it on demand into the required streaming protocol, usually HLS or DASH at this time. This allows an operator minimise the different formats they need to maintain.

The emerging CMAF standard helps support this by using the same format for the segments for both HLS and DASH. If you search for 'CMAF' you will see many explanations of the history and the official page is here also: https://www.iso.org/standard/71975.html

Open source tools exist to help you convert an MP4 file straight into DASH - MP4Box is one of the most common ones: https://github.com/gpac/gpac/wiki/DASH-Support-in-MP4Box

ffmpeg also includes information in the documentation to support VOD: https://www.ffmpeg.org/ffmpeg-formats.html#dash-2 including an example:

ffmpeg -re -i <input> -map 0 -map 0 -c:a libfdk_aac -c:v libx264 \
-b:v:0 800k -b:v:1 300k -s:v:1 320x170 -profile:v:1 baseline \
-profile:v:0 main -bf 1 -keyint_min 120 -g 120 -sc_threshold 0 \
-b_strategy 0 -ar:a:1 22050 -use_timeline 1 -use_template 1 \
-window_size 5 -adaptation_sets "id=0,streams=v id=1,streams=a" \
-f dash /path/to/out.mpd

If it is actually a live stream you are looking at then the input is typically not an MP4 file but a stream in some format like HLS, RTMP, MPEG-TS etc.

Taking an input in this format and providing a live profile DASH output is more complicated. Generally a dedicated packager is used to do this. The open source Shaka Packager (https://github.com/google/shaka-player) would be a good place to look and it includes examples to produce DASH live output:

Assuming you want to allow the user watch while the video file is being generated then one way to do this is to make the stream look like a live stream, i.e. a 'VOD to Live' case.

You can use restreaming in Ffmpeg to transcode and stream to UDP and then feed that into a packager.

The ffmpeg documentation includes this note:

-re (input) Read input at native frame rate. Mainly used to simulate a grab device, or live input stream (e.g. when reading from a file). Should not be used with actual grab devices or live input streams (where it can cause packet loss). By default ffmpeg attempts to read the input(s) as fast as possible. This option will slow down the reading of the input(s) to the native frame rate of the input(s). It is useful for real-time output (e.g. live streaming).

This give you a flow that looks like:

mp4 file -> ffmpeg -> packager -> live DASH stream -> client

Using a packager to do this means you don't have to worry about updating the manifest when new segments are available or older ones not available.

There is an example here on the Wowza packager site (at the time of writing) which you could look at and experiment with, substituting you now files or using theirs - the output should work with any packager that can accept a UDP input stream: https://www.wowza.com/docs/how-to-restream-using-ffmpeg-with-wowza-streaming-engine

Share:
10,909
woubuc
Author by

woubuc

Updated on June 21, 2022

Comments

  • woubuc
    woubuc almost 2 years

    I've been trying to implement a Plex-like video player that transcodes an arbitrary video file on-demand, and plays it with MPEG-Dash on a webpage. I was able to implement the client side player with the dash.js reference implementation, so it will dynamically request segments from the server (using SegmentTemplate in the mpd file).

    But I'm having some problems generating these chunks in real-time. Ffmpeg lets me set -ss and -t to define the boundaries of the segment I need, but they don't play properly in the player because they're "full" video files rather than Dash segments.

    So how do I adjust my ffmpeg command to transcode just the part I need as a Dash segment, without having to generate the segments for the entire video file in advance?

    The input video file can be any format, so it cannot be assumed it's in an mp4/dash-compatible codec. So transcoding (with ffmpeg or similar tool) is required.

    My current ffmpeg command looks like this (after lots of trying):

    ffmpeg -ss 10 -t 5 -i video.mkv -f mp4 -c:a aac -c:v h264 -copyts -movflags empty_moov+frag_keyframe temp/segment.mp4
    

    The client-side player should be able to buffer the next X segments, and the user should be able to view the current position on the duration bar and seek to a different position. So treating it as a live stream isn't an option.

  • woubuc
    woubuc almost 5 years
    I'm doing VOD for sure, not live streaming. But a more "personal" variant, not like Netflix but rather like Plex, where the user provides their own media files with any format & codecs they like, to be transcoded and repackaged in real-time (rather than in advance), as the user is watching the video. So my question is not "how do I make a complete set of files for Dash" cause I've gotten that far, but I want to know how to transcode & package a single segment with given start & duration, for an arbitrary file. Mp4box and ffmpeg only do the entire Dash stream, as far as I can tell.
  • Mick
    Mick almost 5 years
    Interesting and challenging use case! I've updated the answer with some more notes.
  • woubuc
    woubuc almost 5 years
    Thanks for the ideas! Unfortunately I can't treat the video as a live stream, since the user must be able to see the current position and seek the player to a different position, within the native <video> element. This isn't possible with a live stream, I had the same idea and tried it, but a live stream doesn't allow for the client to buffer (to catch periodic network drops) and doesn't let the viewer seek to a different position in the video file. I've added this info to my original question as well.
  • Diericx
    Diericx over 3 years
    I'm doing something similar. @Mick did you ever figure this out?
  • Mick
    Mick over 3 years
    @Diericx, it would probably make sense for you to ask a new question and include as much detail as you can around your use case.