FFmpeg and HLS multiple audio renditions

audio video ffmpeg sync hls

8,181

I've found the solution myself.

The problem is that the segment muxer needs to look at some reference frames to be able to slice the audio correctly, so mapping the streams separately won't work.

What will work is to produce a "beefy" .ts segment which includes all the audio and video files, and then slice them appropriately. A simple-yet-working example:

ffmpeg-3.1.1 -i dual_short.mp4 -i audio_left_short.mp3 -i audio_right_short.mp3 \
    -threads 0 -muxdelay 0 -y \
    -map 0:v -map 1 -map 2 -pix_fmt yuv420p -movflags +faststart -r 29.97 -g 60 -refs 1 \
    -vcodec libx264 -acodec aac -profile:v baseline -level 30 -ar 44100 -ab 64k -f mpegts out.ts

# Perform 3 passes:
# 1. Generate the video.
ffmpeg-3.1.1 -i out.ts -threads 0 -muxdelay 0 -y -map 0:v   -vcodec copy  -f hls -hls_time 10 -hls_list_size 0 video/index.m3u8
# Generate Audio 1
ffmpeg-3.1.1 -i out.ts -threads 0 -muxdelay 0 -y -map 0:a:0 -codec copy -f segment -segment_time 10 -segment_list_size 0 -segment_list audio1/audio1.m3u8 -segment_format mpegts audio1/audio1_%d.aac
# Generate Audio 1
ffmpeg-3.1.1 -i out.ts -threads 0 -muxdelay 0 -y -map 0:a:1 -codec copy -f segment -segment_time 10 -segment_list_size 0 -segment_list audio2/audio2.m3u8 -segment_format mpegts audio2/audio2_%d.aac

8,181

Alfredo Di Napoli

Just a man struggling with programming languages. Functional, Prototype based, Object Oriented.. so long the path to the enlightenment..

Updated on September 18, 2022

Comments

Alfredo Di Napoli over 1 year
I'm trying to use FFmpeg to produce an HLS playlist which contains multiple audio renditions, but I cannot get the audio & video tracks to sync together. Here is the scenario:
- Suppose I have 2 video files, each with 1 audio track
- I use FFmpeg to pan the 2 videos together to form a single video, example:
- The extracted audio track for each file (transcoded as .mp3)
- I want to produce an HLS playlist where the alternative audio tracks are respectively the left & the right audio:
The problem I'm having is that I cannot make the audio sync with the video properly. I have tried a couple of ffmpeg commands, each naive at a different level, and the best case scenario is that I get a synced stream on the Desktop, but on the mobile (where the playback is handled by the device's native player), the video lose sync with audio very quickly as soon as I switch to the other video track.

I'm using ffmpeg 3.1.1.

An example command I have tried, starting from a relatively-simple one, where I map the audio tracks to the segmenter muxer, and the video to the hls:
```
ffmpeg -i dual.mp4 -i audio_left.mp3 -i audio_right.mp3 \
-threads 0 -muxdelay 0 -y \
-map 0 -pix_fmt yuv420p -vsync 1 -async 1 -vcodec libx264 -r 29.97 -g 60 -refs 3 -f hls -hls_time 10 -hls_list_size 0 video/index.m3u8 \
-map 1 -acodec aac -strict experimental -async 1 -ar 44100 -ab 96k -f segment -segment_time 10 -segment_list_size 0 -segment_list_flags -cache -segment_format aac -segment_list audio1/audio1.m3u8 audio1/audio1%d.aac \
-map 2 -acodec aac -strict experimental -async 1 -ar 44100 -ab 96k -f segment -segment_time 10 -segment_list_size 0 -segment_list_flags -cache -segment_format aac -segment_list audio2/audio2.m3u8 audio2/audio2%d.aac 
```
To more complex like outputting the raw mpegts container, and then slice the tracks up:
```
ffmpeg -i dual_short.mp4 -i audio_left_short.mp3 -i audio_right_short.mp3 \
-threads 0 -muxdelay 0 -y \
-map 0:v -map 1 -map 2 -codec copy -pix_fmt yuv420p -vsync 1 -async 1 -shortest -f mpegts pipe:1 | ffmpeg-3.1.1 -i pipe:0 \
-map 0:0 -vcodec copy -r 29.97 -g 60 -refs 3 -bsf:v h264_mp4toannexb -f hls -hls_time 10 -hls_list_size 0 video/index.m3u8 \
-map 0:1 -f ssegment -segment_time 10 -segment_list_size 0 -segment_format aac -segment_list audio1/audio1.m3u8 audio1/audio1_%d.aac \
-map 0:2 -f ssegment -segment_time 10 -segment_list_size 0 -segment_format aac -segment_list audio2/audio2.m3u8 audio2/audio2_%d.aac
```
I'm no audio/video expert, so I'm pretty sure there is something fundamentally flawed in my reasoning, so I'm asking you guys for help and guidance. In particular:
- Is what I'm trying to do here unfeasible? Another way to put it is given N audio tracks, recorded in sync with the original video, to produce an HLS playlist with the audio always lip-synced?
- Is the video FPS & Audio's bitrates the cause of the A/V sync problem? Is there even a correlation?
- Does the different level of quality of the video (e.g. bitrate) has an effect over sync?
- Will the target audio container I chose (mp3 vs aac) influence the sync?
- Shall I use a single command with multiple inputs or work on each stream separately?
As you can see I'm quite lost. I did search extensively over the internet, watch Apple's "Effective HLS" talk from WWDC 2012, but the information on how to produce effective Multiple Audio Rendition playlists seems to be scarce on the internet.

Thanks for any pointers.
tslater almost 4 years

I'm really glad I found this. I'm trying to figure out how the index.m3u8 lists the audio tracks in your example. Did you have to edit it manually to add the audio tracks?