Is there an elegant way to split a file by chapter using ffmpeg?
Solution 1
(Edit: This tip came from https://github.com/phiresky via this issue: https://github.com/harryjackson/ffmpeg_split/issues/2)
You can get chapters using:
ffprobe -i fname -print_format json -show_chapters -loglevel error
If I was writing this again I'd use ffprobe's json options
(Original answer follows)
This is a working python script. I tested it on several videos and it worked well. Python isn't my first language but I noticed you use it so I figure writing it in Python might make more sense. I've added it to Github. If you want to improve please submit pull requests.
#!/usr/bin/env python
import os
import re
import subprocess as sp
from subprocess import *
from optparse import OptionParser
def parseChapters(filename):
chapters = []
command = [ "ffmpeg", '-i', filename]
output = ""
try:
# ffmpeg requires an output file and so it errors
# when it does not get one so we need to capture stderr,
# not stdout.
output = sp.check_output(command, stderr=sp.STDOUT, universal_newlines=True)
except CalledProcessError, e:
output = e.output
for line in iter(output.splitlines()):
m = re.match(r".*Chapter #(\d+:\d+): start (\d+\.\d+), end (\d+\.\d+).*", line)
num = 0
if m != None:
chapters.append({ "name": m.group(1), "start": m.group(2), "end": m.group(3)})
num += 1
return chapters
def getChapters():
parser = OptionParser(usage="usage: %prog [options] filename", version="%prog 1.0")
parser.add_option("-f", "--file",dest="infile", help="Input File", metavar="FILE")
(options, args) = parser.parse_args()
if not options.infile:
parser.error('Filename required')
chapters = parseChapters(options.infile)
fbase, fext = os.path.splitext(options.infile)
for chap in chapters:
print "start:" + chap['start']
chap['outfile'] = fbase + "-ch-"+ chap['name'] + fext
chap['origfile'] = options.infile
print chap['outfile']
return chapters
def convertChapters(chapters):
for chap in chapters:
print "start:" + chap['start']
print chap
command = [
"ffmpeg", '-i', chap['origfile'],
'-vcodec', 'copy',
'-acodec', 'copy',
'-ss', chap['start'],
'-to', chap['end'],
chap['outfile']]
output = ""
try:
# ffmpeg requires an output file and so it errors
# when it does not get one
output = sp.check_output(command, stderr=sp.STDOUT, universal_newlines=True)
except CalledProcessError, e:
output = e.output
raise RuntimeError("command '{}' return with error (code {}): {}".format(e.cmd, e.returncode, e.output))
if __name__ == '__main__':
chapters = getChapters()
convertChapters(chapters)
Solution 2
ffmpeg -i "$SOURCE.$EXT" 2>&1 \ # get metadata about file
| grep Chapter \ # search for Chapter in metadata and pass the results
| sed -E "s/ *Chapter #([0-9]+.[0-9]+): start ([0-9]+.[0-9]+), end ([0-9]+.[0-9]+)/-i \"$SOURCE.$EXT\" -vcodec copy -acodec copy -ss \2 -to \3 \"$SOURCE-\1.$EXT\"/" \ # filter the results, explicitly defining the timecode markers for each chapter
| xargs -n 11 ffmpeg # construct argument list with maximum of 11 arguments and execute ffmpeg
Your command parses through the files metadata and reads out the timecode markers for each chapter. You could do this manually for each chapter..
ffmpeg -i ORIGINALFILE.mp4 -acodec copy -vcodec copy -ss 0 -t 00:15:00 OUTFILE-1.mp4
or you can write out the chapter markers and run through them with this bash script which is just a little easier to read..
#!/bin/bash
# Author: http://crunchbang.org/forums/viewtopic.php?id=38748#p414992
# m4bronto
# Chapter #0:0: start 0.000000, end 1290.013333
# first _ _ start _ end
while [ $# -gt 0 ]; do
ffmpeg -i "$1" 2> tmp.txt
while read -r first _ _ start _ end; do
if [[ $first = Chapter ]]; then
read # discard line with Metadata:
read _ _ chapter
ffmpeg -vsync 2 -i "$1" -ss "${start%?}" -to "$end" -vn -ar 44100 -ac 2 -ab 128 -f mp3 "$chapter.mp3" </dev/null
fi
done <tmp.txt
rm tmp.txt
shift
done
or you can use HandbrakeCLI, as originally mentioned in this post, this example extracts chapter 3 to 3.mkv
HandBrakeCLI -c 3 -i originalfile.mkv -o 3.mkv
or another tool is mentioned in this post
mkvmerge -o output.mkv --split chapters:all input.mkv
Solution 3
A version of the original shell code with:
- improved efficiency by
- using
ffprobe
instead offfmpeg
- splitting the input rather than the output
- using
- improved reliability by avoiding
xargs
andsed
- improved readability by using multiple lines
- carrying over of multiple audio or subtitle streams
- remove chapters from output files (as they would be invalid timecodes)
- simplified command-line arguments
#!/bin/sh -efu
input="$1"
ffprobe \
-print_format csv \
-show_chapters \
"$input" |
cut -d ',' -f '5,7,8' |
while IFS=, read start end chapter
do
ffmpeg \
-nostdin \
-ss "$start" -to "$end" \
-i "$input" \
-c copy \
-map 0 \
-map_chapters -1 \
"${input%.*}-$chapter.${input##*.}"
done
To prevent it from interfering with the loop, ffmpeg
is instructed not to read from stdin
.
Solution 4
A little more simple than extracting data with sed
by using JSON with jq
:
#!/usr/bin/env bash
# For systems where "bash" in not in "/bin/"
set -efu
videoFile="$1"
ffprobe -hide_banner \
"$videoFile" \
-print_format json \
-show_chapters \
-loglevel error |
jq -r '.chapters[] | [ .id, .start_time, .end_time | tostring ] | join(" ")' |
while read chapter start end; do
ffmpeg -nostdin \
-ss "$start" -to "$end" \
-i "$videoFile" \
-map 0 \
-map_chapters -1 \
-c copy \
-metadata title="$chapter"
"${videoFile%.*}-$chapter.${videoFile##*.}";
done
I use the tostring jq
function because chapers[].id
is an integer.
Solution 5
I modified Harry's script to use the chapter name for the filename. It outputs into a new directory with the name of the input file (minus extension). It also prefixes each chapter name with "1 - ", "2 - ", etc in case there are chapters with the same name.
#!/usr/bin/env python
import os
import re
import pprint
import sys
import subprocess as sp
from os.path import basename
from subprocess import *
from optparse import OptionParser
def parseChapters(filename):
chapters = []
command = [ "ffmpeg", '-i', filename]
output = ""
m = None
title = None
chapter_match = None
try:
# ffmpeg requires an output file and so it errors
# when it does not get one so we need to capture stderr,
# not stdout.
output = sp.check_output(command, stderr=sp.STDOUT, universal_newlines=True)
except CalledProcessError, e:
output = e.output
num = 1
for line in iter(output.splitlines()):
x = re.match(r".*title.*: (.*)", line)
print "x:"
pprint.pprint(x)
print "title:"
pprint.pprint(title)
if x == None:
m1 = re.match(r".*Chapter #(\d+:\d+): start (\d+\.\d+), end (\d+\.\d+).*", line)
title = None
else:
title = x.group(1)
if m1 != None:
chapter_match = m1
print "chapter_match:"
pprint.pprint(chapter_match)
if title != None and chapter_match != None:
m = chapter_match
pprint.pprint(title)
else:
m = None
if m != None:
chapters.append({ "name": `num` + " - " + title, "start": m.group(2), "end": m.group(3)})
num += 1
return chapters
def getChapters():
parser = OptionParser(usage="usage: %prog [options] filename", version="%prog 1.0")
parser.add_option("-f", "--file",dest="infile", help="Input File", metavar="FILE")
(options, args) = parser.parse_args()
if not options.infile:
parser.error('Filename required')
chapters = parseChapters(options.infile)
fbase, fext = os.path.splitext(options.infile)
path, file = os.path.split(options.infile)
newdir, fext = os.path.splitext( basename(options.infile) )
os.mkdir(path + "/" + newdir)
for chap in chapters:
chap['name'] = chap['name'].replace('/',':')
chap['name'] = chap['name'].replace("'","\'")
print "start:" + chap['start']
chap['outfile'] = path + "/" + newdir + "/" + re.sub("[^-a-zA-Z0-9_.():' ]+", '', chap['name']) + fext
chap['origfile'] = options.infile
print chap['outfile']
return chapters
def convertChapters(chapters):
for chap in chapters:
print "start:" + chap['start']
print chap
command = [
"ffmpeg", '-i', chap['origfile'],
'-vcodec', 'copy',
'-acodec', 'copy',
'-ss', chap['start'],
'-to', chap['end'],
chap['outfile']]
output = ""
try:
# ffmpeg requires an output file and so it errors
# when it does not get one
output = sp.check_output(command, stderr=sp.STDOUT, universal_newlines=True)
except CalledProcessError, e:
output = e.output
raise RuntimeError("command '{}' return with error (code {}): {}".format(e.cmd, e.returncode, e.output))
if __name__ == '__main__':
chapters = getChapters()
convertChapters(chapters)
This took a good bit to figure out since I'm definitely NOT a Python guy. It's also inelegant as there were many hoops to jump through since it is processing the metadata line by line. (Ie, the title and chapter data are found in separate loops through the metadata output)
But it works and it should save you a lot of time. It did for me!
Comments
-
Kattern almost 2 years
In this page, Albert Armea share a code to split videos by chapter using
ffmpeg
. The code is straight forward, but not quite good-looking.ffmpeg -i "$SOURCE.$EXT" 2>&1 | grep Chapter | sed -E "s/ *Chapter #([0-9]+\.[0-9]+): start ([0-9]+\.[0-9]+), end ([0-9]+\.[0-9]+)/-i \"$SOURCE.$EXT\" -vcodec copy -acodec copy -ss \2 -to \3 \"$SOURCE-\1.$EXT\"/" | xargs -n 11 ffmpeg
Is there an elegant way to do this job?
-
davidcondrey about 8 yearsHere's another similar python script meant to parse m4b audio books by chapters. github.com/valekhz/m4b-converter
-
clifgriffin over 7 yearsI posted a modified version below that uses the chapter name as the filename. It's not elegant but it works :)
-
clifgriffin over 7 years@JP. Glad to hear it!
-
alexw about 7 yearsThis worked well once I ran
ffmpeg -i
independently, to determine the format of my file's metadata. I had to tinker with the regex since my chapters weren't of the formatChapter #dd:dd
. It would be good to try and make your regex more robust :-) -
epR8GaYuh over 6 yearsYour way of determing the path only works for when using an absolute path for the input file. Otherwise the variable
path
is empty and therefore the path of the output files is a directory inside the document root, for example/test
for the input filetest.mp4
. -
Ondrej Skalicka over 6 yearsand a second one, written this one just now for AAX to MP3 chapterized conversion github.com/OndrejSkalicka/aax-to-mp3-python
-
Norsk over 5 yearsthanks @clifgriffin, I liked your version and modified it to work in Python 3. I also cleaned up the imports and added leading zeroes to chapter number gist.github.com/showerbeer/97c1f31770572d05738cd2b74167f8a4
-
tonysepia over 5 yearsConfirmed: It does work, and thank you for making it available!
-
a coder over 5 yearsI saved this as
splitfilebychapter.sh
. When I run from command line I issuesplitfilebychapter.sh alargeaudiobook.mp3
. It returns:splitfilebychapter.sh: error: Filename required
. Is it looking for the name of an input file or output file? -
Business Tomcat about 5 yearsUpvote for
mkvmerge
. One liner to get all chapters that even works with windows 👍 -
Richard Thomas almost 5 yearsGreat basis for what I need. I want to edit out some stuff by chapter name and then recombine them afterwards but I can see how to do that easy enough.
-
llogan over 4 yearsYou can use
-nostdin
instead of</dev/null
,-c copy
instead of-vcodec copy -acodec copy -scodec copy
, and-map 0
instead of-map 0:a -map 0:v -map 0:s
. -
Harry over 4 years@Crissov I rejected you're edit by mistake, can you please add it back so you get the credit for it ie stackoverflow.com/review/suggested-edits/24746850
-
SebMa about 4 yearsYou don't need the
j
variable. You can loop from0
to$((count-1))
and haven=$i
becausejq
understands indexes prefixed with zeroes (example :jq -r ".chapeters[05]"
) -
Scott almost 4 yearsI'd also move the line
-ss ...
before the line-i ...
, otherwise ffmpeg builds the output file in order to seek rather than seeking directly in the input. This speeds up things immensely when you're also transcoding. Depending on what you're splitting you may not want to do this (I'm splitting and transcoding audio so seeking the input is fine). -
joki almost 4 years@llogan @Scott great suggestions, thank you! If you have
jq
at hand, I'd actually recommend @SebMa's answer which appears to be based on mine, but much more future proof thanks to usingffprobe
's JSON output. But I'll incorporate your tips anyway. -
hyiltiz over 3 yearsInstead of posting the adjusted script, you may as well send a pull request since there is a link to the Github project.
-
Andras Hatvani over 3 yearsThis one puts all but the previous chapter informations in all files ie. 1..23 in the first, 2..23 in the second and so on
-
bryc over 2 yearsThis is broken.
-ss
and-to
should be AFTER-i
, and%%J
shouldn't be enclosed in quotes because it already in quotes. also%%J
contains a CR character (0x0D), which causes problems and needs to be stripped away. -
bryc over 2 yearsAlso, because you are using
-print_format csv
, this breaks if the title contains new lines (and/or commas, possibly). -
Sirach Matthews over 2 yearsThe json from the ffprobe output has been invaluable recently. Admittedly, I have not taken advantage of the python script. Very helpful, thank you.
-
akostadinov over 2 yearsIt removes video it seems, hardocdes AAX secret and is a little broken here and there. But I liked playlist and filename/metadata stuff. So I posted a fixed-up version gist.github.com/akostadinov/…