How to convert pdf to eBook format

14,117

Solution 1

You should try pdftotext (comes under Ubuntu in the package poppler-utils). It is a commandline converter. It assumes that the PDF has text and does not consist of images only.

If the PDF file consists of images (without OCR info), you have to go for an OCR solution, which is much slower.

I have successfully used the OCR method as well on PDF text which was scrambled (by positioning the individual characters on a page in a non-linear fashion). Then you use e.g. pdftoppm to get individual images of the pages and OCR those.

Solution 2

I generally use Calibre, to convert from the various formats (epub, mobi, and pdf). It's pretty straightforward to convert with it, here's a screenshot, there are others and a video tutorial as well.

screenshot

   ss of calibre

Solution 3

I had to do this for a PDF file once, and this was the result (using pdftohtml from poppler):

#!/bin/bash

pwddir="`pwd`"
tmpdir="`mktemp -d`"

pdftohtml -enc UTF-8 -noframes -p -nomerge -nodrm -q "$1" "$tmpdir"/index

cd "$tmpdir"

sed -e :a -e '$!N;s/\n/ /;ta' \
    -i index.html 

sed -e 's@ @ @g' \
    -e 's@<hr>@ @g' \
    -e 's@<br/>\s*<br/>@</p><p>@g' \
    -e 's@<br/>@ @g' \
    -i index.html

tidy -utf8 -i -wrap 9999999 -m index.html

sed -e 's@<a name="[^"]*"></a>@@g' \
    -i index.html

rm "$pwddir"/"$1".zip
zip "$pwddir"/"$1".zip *

Feed the zip to Calibre and convert to EPUB. Filter all CSS properties (such as colors, fonts).

Every PDF file is different - there is no definitive solution. The above worked for one specific case - you have to weak pdftohtml/pdftotext and then tweak the output to fit your needs.

If this fails and you have to resort to OCR, I've had some luck with cuneiform. But also try tesseract, ocrad, gocr. However all of those require manual labor for a good result.

Share:
14,117

Related videos on Youtube

Andre Morua
Author by

Andre Morua

Updated on September 18, 2022

Comments

  • Andre Morua
    Andre Morua almost 2 years

    Is there a way to convert a PDF document into an eBook format such as epub, azw or mobi? I am looking for an application, which is fast in conversion. I have just tried calibre. After 10 minutes not even 2% of the conversion have been reached. So please no calibre. CLI is preferred.

    • Albert Maier
      Albert Maier over 3 years
      abiword (3.0.2, under ubuntu 18.04): excelent and rapid
  • Maximus
    Maximus about 11 years
    What part of "please no calibre" is unclear?
  • mohan rathour
    mohan rathour about 5 years
    I am not able to convert the pfd file to epub in a fixed layout. Could you please tell me what are the steps need to follow to convert a pdf to epub in a fixed layout.