How to split video or audio by silent parts

audio video ffmpeg media video-processing

17,445

You could first use ffmpeg to detect intervals of silence, like this

ffmpeg -i "input.mov" -af silencedetect=noise=-30dB:d=0.5 -f null - 2> vol.txt

This will produce console output with readings that look like this:

[silencedetect @ 00000000004b02c0] silence_start: -0.0306667
[silencedetect @ 00000000004b02c0] silence_end: 1.42767 | silence_duration: 1.45833
[silencedetect @ 00000000004b02c0] silence_start: 2.21583
[silencedetect @ 00000000004b02c0] silence_end: 2.7585 | silence_duration: 0.542667
[silencedetect @ 00000000004b02c0] silence_start: 3.1315
[silencedetect @ 00000000004b02c0] silence_end: 5.21833 | silence_duration: 2.08683
[silencedetect @ 00000000004b02c0] silence_start: 5.3895
[silencedetect @ 00000000004b02c0] silence_end: 7.84883 | silence_duration: 2.45933
[silencedetect @ 00000000004b02c0] silence_start: 8.05117
[silencedetect @ 00000000004b02c0] silence_end: 10.0953 | silence_duration: 2.04417
[silencedetect @ 00000000004b02c0] silence_start: 10.4798
[silencedetect @ 00000000004b02c0] silence_end: 12.4387 | silence_duration: 1.95883
[silencedetect @ 00000000004b02c0] silence_start: 12.6837
[silencedetect @ 00000000004b02c0] silence_end: 14.5572 | silence_duration: 1.8735
[silencedetect @ 00000000004b02c0] silence_start: 14.9843
[silencedetect @ 00000000004b02c0] silence_end: 16.5165 | silence_duration: 1.53217

You then generate commands to split from each silence end to the next silence start. You will probably want to add some handles of, say, 250 ms, so the audio will have a duration of 250 ms * 2 more.

ffmpeg -ss <silence_end - 0.25> -t <next_silence_start - silence_end + 2 * 0.25> -i input.mov word-N.mov

(I have skipped specifying audio/video parameters)

You'll want to write a script to scrape the console log and generate a structured (maybe CSV) file with the timecodes - one pair on each line: silence_end and the next silence_start. And then another script to generate the commands with each pair of numbers.

17,445

Author by

TermiT

Updated on July 13, 2022

Comments

TermiT almost 2 years

I need to automatically split video of a speech by words, so every word is a separate video file. Do you know any ways to do this?

My plan was to detect silent parts and use them as words separators. But i didn't find any tool to do this and looks like ffmpeg is not the right tool for that.
Vi. almost 8 years

As a oneliner: ffmpeg -i input.mkv -filter_complex "[0:a]silencedetect=n=-90dB:d=0.3[outa]" -map [outa] -f s16le -y /dev/null |& F='-aq 70 -v warning' perl -ne 'INIT { $ss=0; $se=0; } if (/silence_start: (\S+)/) { $ss=$1; $ctr+=1; printf "ffmpeg -nostdin -i input.mkv -ss %f -t %f $ENV{F} -y %03d.mkv\n", $se, ($ss-$se), $ctr; } if (/silence_end: (\S+)/) { $se=$1; } END { printf "ffmpeg -nostdin -i input.mkv -ss %f $ENV{F} -y %03d.mkv\n", $se, $ctr+1; }' | bash -x
John Smith over 7 years

This one liner doesn't work on mac. -bash: syntax error near unexpected token `&'
Vi. over 7 years

@JohnSmith, Mac have old (pre-4) bash by default. Replace |& with 2>&1 |.
Rajesh Gauswami over 6 years

I am using "com.writingminds:FFmpegAndroid:0.3.2" can you help me to get list of silence
giacecco almost 6 years

@Vi.'s one-liner works perfectly, thanks! I now wonder if a) there is a way to ensure ffmpeg does not re-encode the pieces being produced this way, but just copies content to the pieces, b) what is the best way to put all the pieces back together, and c) how to automatically add perhaps an 0.2 seconds audio+video cross-dissolve between each piece, to make the result a bit more pleasant to the eye. This would make it the perfect script for editing video interviews!
Vi. almost 6 years

@giacecco To skip re-encoding add -c copy to the last ffmpeg command line. Other effects require more complicated script. Maybe I'll implement it and post as an answer someday...
innuendo over 5 years

How can one adjust the noise parameters, noise=-30dB:d=0.5 ? I have tried different values, but I am not getting silent_start and silent_end pairs, that is, sometimes one is missing.
Juan Pablo Fernandez about 5 years

@Vi. it seems you can earn 100 points by answering this question stackoverflow.com/questions/55057778/… Please take a look.
Vi. about 5 years

@JuanPabloFernandez, Thanks for the suggestion.
Marek Möhling almost 4 years

code ffmpeg -i in.m4a -filter_complex "[0:a]silencedetect=n=-90dB:d=0.3[outa]" -map [outa] -f s16le -y /dev/null |& F='-aq 70 -v warning' perl -ne 'INIT { $ss=0; $se=0; } if (/silence_start: (\S+)/) { $ss=$1; $ctr+=1; printf "ffmpeg -nostdin -i in.m4a -c copy -ss %f -t %f $ENV{F} -y %03d.m4a\n", $se, ($ss-$se), $ctr; } if (/silence_end: (\S+)/) { $se=$1; } END { printf "ffmpeg -nostdin -i in.m4a -c copy -ss %f $ENV{F} -y %03d.m4a\n", $se, $ctr+1; }' | bash -x code. @giacecco, @Vi: I added -c copy 2x to avoid re-encoding. Needs only seconds, doesn't bloat the sizes of new files by ~4.
Marek Möhling almost 4 years

@giacecco, @ Vi: PS: I used this with a .m4a audio file (download from youtube+com/watch?v=eMqYq2VMOck). Neither your original script nor the edited new one works with the .mov or .mp4 video files I have. (e .g. youtube+com/watch?v=6zQP6vgWiek)
John Smith over 2 years

@Vi. LOL, wow, apparently I've been trying to solve this problem for 6 years now. Anyway, thanks for your reply back then, it got me closer, but this one-liner still fails on filenames with " -" in them. As all my video files have the RMS dB (a negative number) listed in the file name, the one-liner doesn't work on any of them. I tried putting the filename in single quotes, and also tried double, and tried escaping the minus signs with backslashes, and none of them solved the problem.
Vi. over 2 years

@JohnSmith I have also published two non-oneliner versions: gist.github.com/vi/2fe3eb63383fcfdad7483ac7c97e9deb and gist.github.com/vi/2af29b9652a813ffe4b7e87c9a895f81. They may be more careful with filenames (no checked).