FFMPEG audio out of sync when transcoding (demuxing) from DV
Solution 1
I've finally solved the issue - it's an overkill, but it works.
I've realized that if I copy the .dv to any other container, the audio and video is obviously out of sync. Then I wanted to cut that file to a 1 minute segment starting at the 51st minute (-ss 51:00 -t 60), it was obviously still out of sync.
However, when I used the same cut (-ss 51:00 -t 60) on the original .dv it was in sync! So what I ended up doing is I wrote a script that cut the .dv file into 1 second segment every second and saved that into separate files (yes over 3600 files per .dv). No encoding, just stream copy to a new container (avi). Then I used -f concat, to put the tiny files into one avi file, that was in sync now! Any gaps are inaudible! All that was left was encoding H264 and AAC into MP4.
I ran the script on my home server that was grinding the 50 .dv files for a couple of days, but now it's done!
THANK YOU ALL FOR YOU HELP! I've learned a lot about ffmpeg and a/v in general.
Solution 2
Here are three wildcard attempts at solving this issue:
Method 1a Use system time as timestamps
ffmpeg -use_wallclock_as_timestamps 1 -i input.dv \
-c:v libx264 -b:v 4000k -c:a aac -b:a 128k -fflags +genpts method1.ts
Method 1b Use resampler with flag set to inject silence when input audio timestamps have gaps
ffmpeg -i input.dv -c:v libx264 -b:v 4000k \
-af "aresample=async=1:first_pts=0" -c:a aac -b:a 128k -fflags +genpts method1.ts
Method 2 Merge with dummy audio
ffmpeg -i input.dv -f lavfi -i "aevalsrc=0:c=2:s=48000" \
-filter_complex "[0:a][1:a]amerge[a]" -map 0:v -map "[a]" -c:v libx264 -b:v 4000k -c:a aac -b:a 128k -ac 2 -shortest method2.ts
Method 3 Combination of the above
ffmpeg -use_wallclock_as_timestamps 1 -i input.dv -f lavfi -use_wallclock_as_timestamps 1 -i "aevalsrc=0:c=2:s=48000" \
-filter_complex "[0:a][1:a]amerge[a]" -map 0:v -map "[a]" -c:v libx264 -b:v 4000k -c:a aac -b:a 128k -ac 2 -shortest method3.ts
You can test each of them for a short duration by inserting -t N
e.g. -t 20
for a 20 second test.
If any of them work, we can then proceed to wrapping the output as MP4.
Related videos on Youtube
Wojciech
Updated on September 18, 2022Comments
-
Wojciech over 1 year
I've been stuck with this problem for months. I have over 50 DV tapes (from and old Sony camcorder) to be converted to a more modern, usable format (most likely H264). I've started off with pulling the files to my PC (via firewire) using DVGRAB. There I had two options: pulling RAW data from the dv tape, resulting in a muxed file OR demuxing it and saving to a DVI file.
That's where the problems started. Saving it to a DVI file resulted in the audio being out of sync. I thought it's a problem with DVGRAB so I saved the RAW files (which are synced correctly) and wanted to process them with ffmpeg.
It turns out that no matter how I demux it the audio is always out of sync. BEFORE you say anything about the sampling frequency - the audio differences are of absolutely random length. An hour long tape can have between 0.1 and 4 seconds of audio lag at the end.
Here's an example file that I've split into separate audio and video files to check the differences.
# ffprobe -i ./video_conversion/13.dv ffprobe version 2.8.4 Copyright (c) 2007-2015 the FFmpeg developers built with gcc 5.3.0 (GCC) configuration: --prefix=/usr --disable-debug --disable-static --disable-stripping --enable-avisynth --enable-avresample --enable-fontconfig --enable-gnutls --enable-gpl --enable-ladspa --enable-libass --enable-libbluray --enable-libdcadec --enable-libfreetype --enable-libfribidi --enable-libgsm --enable-libmodplug --enable-libmp3lame --enable-libopencore_amrnb --enable-libopencore_amrwb --enable-libopenjpeg --enable-libopus --enable-libpulse --enable-libschroedinger --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libv4l2 --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-shared --enable-version3 --enable-x11grab libavutil 54. 31.100 / 54. 31.100 libavcodec 56. 60.100 / 56. 60.100 libavformat 56. 40.101 / 56. 40.101 libavdevice 56. 4.100 / 56. 4.100 libavfilter 5. 40.101 / 5. 40.101 libavresample 2. 1. 0 / 2. 1. 0 libswscale 3. 1.101 / 3. 1.101 libswresample 1. 2.101 / 1. 2.101 libpostproc 53. 3.100 / 53. 3.100 [dv @ 0x864f2a0] Detected timecode is invalid [dv @ 0x864f2a0] Estimating duration from bitrate, this may be inaccurate Input #0, dv, from './video_conversion/13.dv': Duration: 01:00:45.80, start: 0.000000, bitrate: 28800 kb/s Stream #0:0: Video: dvvideo, yuv420p, 720x576 [SAR 16:15 DAR 4:3], 28800 kb/s, 25 fps, 25 tbr, 25 tbn, 25 tbc Stream #0:1: Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s # ffprobe -i ./video_conversion/tmp/13.mp4 ffprobe version 2.8.4 Copyright (c) 2007-2015 the FFmpeg developers built with gcc 5.3.0 (GCC) configuration: --prefix=/usr --disable-debug --disable-static --disable-stripping --enable-avisynth --enable-avresample --enable-fontconfig --enable-gnutls --enable-gpl --enable-ladspa --enable-libass --enable-libbluray --enable-libdcadec --enable-libfreetype --enable-libfribidi --enable-libgsm --enable-libmodplug --enable-libmp3lame --enable-libopencore_amrnb --enable-libopencore_amrwb --enable-libopenjpeg --enable-libopus --enable-libpulse --enable-libschroedinger --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libv4l2 --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-shared --enable-version3 --enable-x11grab libavutil 54. 31.100 / 54. 31.100 libavcodec 56. 60.100 / 56. 60.100 libavformat 56. 40.101 / 56. 40.101 libavdevice 56. 4.100 / 56. 4.100 libavfilter 5. 40.101 / 5. 40.101 libavresample 2. 1. 0 / 2. 1. 0 libswscale 3. 1.101 / 3. 1.101 libswresample 1. 2.101 / 1. 2.101 libpostproc 53. 3.100 / 53. 3.100 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from './video_conversion/tmp/13.mp4': Metadata: major_brand : isom minor_version : 512 compatible_brands: isomiso2avc1mp41 encoder : Lavf56.40.101 Duration: 01:00:45.80, start: 0.000000, bitrate: 5685 kb/s Stream #0:0(und): Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 720x576 [SAR 16:15 DAR 4:3], 5683 kb/s, 25 fps, 25 tbr, 12800 tbn, 50 tbc (default) Metadata: handler_name : VideoHandler # ffprobe -i ./video_conversion/tmp/13.mp3 ffprobe version 2.8.4 Copyright (c) 2007-2015 the FFmpeg developers built with gcc 5.3.0 (GCC) configuration: --prefix=/usr --disable-debug --disable-static --disable-stripping --enable-avisynth --enable-avresample --enable-fontconfig --enable-gnutls --enable-gpl --enable-ladspa --enable-libass --enable-libbluray --enable-libdcadec --enable-libfreetype --enable-libfribidi --enable-libgsm --enable-libmodplug --enable-libmp3lame --enable-libopencore_amrnb --enable-libopencore_amrwb --enable-libopenjpeg --enable-libopus --enable-libpulse --enable-libschroedinger --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libv4l2 --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-shared --enable-version3 --enable-x11grab libavutil 54. 31.100 / 54. 31.100 libavcodec 56. 60.100 / 56. 60.100 libavformat 56. 40.101 / 56. 40.101 libavdevice 56. 4.100 / 56. 4.100 libavfilter 5. 40.101 / 5. 40.101 libavresample 2. 1. 0 / 2. 1. 0 libswscale 3. 1.101 / 3. 1.101 libswresample 1. 2.101 / 1. 2.101 libpostproc 53. 3.100 / 53. 3.100 [mp3 @ 0x954c2a0] Skipping 0 bytes of junk at 237. Input #0, mp3, from './video_conversion/tmp/13.mp3': Metadata: encoder : Lavf56.40.101 Duration: 01:00:44.35, start: 0.023021, bitrate: 128 kb/s Stream #0:0: Audio: mp3, 48000 Hz, stereo, s16p, 128 kb/s Metadata: encoder : Lavc56.60
This particular one differs by 1.448 seconds. As I said the differences vary greatly.
As for the solution. I could just stretch the audio and combine it with the video (I've tested that), but I can't be certain if the audio will be in sync somewhere in the middle of the recording.
I think I've pinpointed the source of this behaviour. Whenever I turn the camera on or off (as to start and stop recording) the video starts just a tiny bit faster then the audio. So the more "fragments" are on the tape, the more these differences add up.
How can I fix this? Is there a way to demux the audio and video with timestamps, so that after conversion they will add up correctly? Or is there anyway to fill these gaps in audio, so that both streams are the same size to begin with?
-
Gyan about 8 yearsWhat's the command to demux the raw files?
-
Wojciech about 8 yearsThe raw .dv file is multiplexed by it's nature. FFMPEG is demuxing it by default when converting it to any container.
-
Gyan about 8 yearsOk, rather , what's your conversion command? I forgot you're transcoding.
-
Wojciech about 8 yearsI've tried a dozen combinations. Nothing special though: avconv -f dv -i ./46raw.dv -f mp4 -acodec libvo_aacenc -b:a 256k -vcodec libx264 -b:v 4000k -y ./46raw.aac.mp4
-
Gyan about 8 yearsavconv != ffmpeg. If it's just an offset issue, you can use
-af adelay=1000|1000
where 1000 is delay in ms. -
Wojciech about 8 yearsTyping error. I'm using ffmpeg on one machine and avconv on the other. Either way it doesn't work. If it were an offset delay I wouldn't ask this question -_- It's a difference in length of the audio track of about 0.1-4s on a 3600-3700s long video.
-
Gyan about 8 yearsAdd
-copyts
as an output flag and try. Check via playback for the sync and not the duration, as this flag will not pad the audio to equalize the duration. Also, unless you're using an version older than Dec '15, the internal AAC encoder is now stable and better than the VO encoder. -
Wojciech about 8 yearsMade two files, with and without -copyts. No difference. Still lagging. :( Any other ideas ?
-
Gyan about 8 yearsThe raw files have good sync, right? How are you playing those?
-
Wojciech about 8 yearsObviously the raw files are good. If those were bad my question wouldn't make much sense. Played in mplayer they work fine. Any attempt to demux the audio and video streams, even a "copy", and putting them back into any container results in it being out of sync. The error gets bigger along the video length and reaches the 0.1-4s shift near the end.
-
Gyan about 8 yearsWrap the raw to an AVI and check:
ffmpeg -f dv -i ./46raw.dv -c copy -map 0 -y ./46raw.avi
-
Wojciech about 8 yearsI wanted to answer right away, but for the sake of integrity I've checked it. Sorry... still out of sync. The problem lies in demuxing the streams.
-
Gyan about 8 yearsCan you lop the off the first, say, 10 seconds of the raw and share it? You'll have to use
dd
or something like it. -
Gyan about 8 yearsAlso, looks like the DV demuxer does not play well with missing or bad audio. Drop a line to @rhatr on twitter. He's one of the coders of the DV demuxer code.
-
Wojciech about 8 yearsWell... Those are family videos belonging to my sister, so I don't really feel good about sharing them. Maybe I'll find a neutral fragment. Thanks for pointing my to one of the devs. I don't use twitter, but I guess I'll have to.
-
Gyan about 8 yearsDid you make progress?
-
Wojciech about 8 yearsI've contacted Roman (@rhatr) and sent him a sample of the video. He struggled with it for over a week but with no avail :( I'm really grateful for the time he offered, but this means that the matter is complicated :/ I'll try to check if other video editing software can handle it.
-
-
Wojciech about 8 yearsOption 2: Simple filtergraph 'amerge' was expected to have exactly 1 input and 1 output. However, it had >1 input(s) and 1 output(s). Please adjust, or use a complex filtergraph (-filter_complex) instead. Option 1. Gives a lot of errors: [aac @ 0x9160040] Queue input is backward in time [mp4 @ 0x915e1c0] Non-monotonous DTS in output stream 0:1; previous: 70000289337917, current: 70000289337250; changing to 70000289337918. This may result in incorrect timestamps in the output file. And stops after about 90MB of an unplayable output file.
-
Gyan about 8 yearsNow, try the 3 commands. Also, test playback with ffplay i.e.
ffplay method1.ts
-
Wojciech about 8 yearsOptions 1a and 3 produce 90MB and 20MB files respectively with little to no video. Options 1b and 2 produce the whole video, but do not help with regards to the delay :(
-
Gyan about 8 yearsDoing this blindly is futile. Can you send a bit of the raw file, say, 20 seconds, or enough to observe loss of sync with your original command?
-
Gyan about 8 yearsThis is a good workaround but doesn't actually solve the sync issue since each DV to AVI wrapping is subject to the same error that you had when copying the whole .dv to .avi. What this workaround does is prevent the tiny discrepancies, if any, in each 1 second segment from cascading and accumulating since each second is a separate file. You'll still have a few of the AVIs where there's noticeable async, but those don't affect the remaining AVI segments. If you can, I'm still open to working on a short segment of the raw .dv to see if this can be accurately solved, and in one step.
-
Wojciech about 8 yearsI am aware that the gaps are still there, but stretching the audio would be pretty munch the same kind of solution. This is good enough for me. About the sample - there is little sense is sending a small sample, because the error is at most 3s in 1h and that's less then 0.1%. I can't send you a whole file since these are my sister's family videos (she wouldn't approve). If I manage to get a blank tape I could make a fresh sample for you to work with (filming a movie on a TV would give you good sync reference).
-
Gyan about 8 yearsMy desired solution won't involve stretching audio. Raw DV doesn't have timestamps, but the audio is interleaved in sync, so my tinkering would be aimed at preserving that chronological relation.If you ever get the time, I'm ready to work with a sample.