How to convert pdf to eBook format
Solution 1
You should try pdftotext
(comes under Ubuntu in the package poppler-utils
).
It is a commandline converter. It assumes that the PDF has text and does not consist of images only.
If the PDF file consists of images (without OCR info), you have to go for an OCR solution, which is much slower.
I have successfully used the OCR method as well on PDF text which was scrambled (by positioning the individual characters on a page in a non-linear fashion). Then you use e.g. pdftoppm
to get individual images of the pages and OCR those.
Solution 2
I generally use Calibre, to convert from the various formats (epub, mobi, and pdf). It's pretty straightforward to convert with it, here's a screenshot, there are others and a video tutorial as well.
screenshot
Solution 3
I had to do this for a PDF file once, and this was the result (using pdftohtml from poppler):
#!/bin/bash
pwddir="`pwd`"
tmpdir="`mktemp -d`"
pdftohtml -enc UTF-8 -noframes -p -nomerge -nodrm -q "$1" "$tmpdir"/index
cd "$tmpdir"
sed -e :a -e '$!N;s/\n/ /;ta' \
-i index.html
sed -e 's@ @ @g' \
-e 's@<hr>@ @g' \
-e 's@<br/>\s*<br/>@</p><p>@g' \
-e 's@<br/>@ @g' \
-i index.html
tidy -utf8 -i -wrap 9999999 -m index.html
sed -e 's@<a name="[^"]*"></a>@@g' \
-i index.html
rm "$pwddir"/"$1".zip
zip "$pwddir"/"$1".zip *
Feed the zip to Calibre and convert to EPUB. Filter all CSS properties (such as colors, fonts).
Every PDF file is different - there is no definitive solution. The above worked for one specific case - you have to weak pdftohtml/pdftotext and then tweak the output to fit your needs.
If this fails and you have to resort to OCR, I've had some luck with cuneiform. But also try tesseract, ocrad, gocr. However all of those require manual labor for a good result.
Related videos on Youtube
Andre Morua
Updated on September 18, 2022Comments
-
Andre Morua almost 2 years
Is there a way to convert a PDF document into an eBook format such as epub, azw or mobi? I am looking for an application, which is fast in conversion. I have just tried calibre. After 10 minutes not even 2% of the conversion have been reached. So please no calibre. CLI is preferred.
-
Albert Maier over 3 yearsabiword (3.0.2, under ubuntu 18.04): excelent and rapid
-
-
Maximus about 11 yearsWhat part of "please no calibre" is unclear?
-
mohan rathour about 5 yearsI am not able to convert the pfd file to epub in a fixed layout. Could you please tell me what are the steps need to follow to convert a pdf to epub in a fixed layout.