How to convert a .pdf file into a folder of images?

20,595

Solution 1

OK well, I did some more research and although tohuwawohu's method does work, I found it easier to use a program called pdftoppm to achieve what I wanted done. Since I am pretty much a layperson when it comes to using command line apps, I will do my best to explain how I got this to work for me.

  1. Navigate to the folder containing the .pdf you wish to edit and open a terminal there. I did this by using the sample command:

    cd ~/Documents/PDF
    
  2. Let's say the file I want to edit is called Sample.pdf What I want to do is use pdftoppm to create image files of each page of the .pdf. Several formats can be chosen (see the man pages link above) but I prefer to use .png. The basic command looks like this:

    pdftoppm -FORMAT FILENAME.pdf PREFIX
    

    or in the example above:

    pdftoppm -png Sample.pdf Sample
    

    This command creates an image file of each page in the same folder as the original .pdf file with names like Sample-01.png, Sample-02.png and so on. I have tried it with the .png and .jpeg extensions successfully. .jpg is apparently not supported.

  3. Then I just use Archive Manager by selecting all the newly-created image files, right-clicking, and choosing "Compress" from the context menu. I then choose the archive format I prefer (in this case .cbz or Comic Book Zip) and create the new archive.

  4. Now I have a shiny new .cbz file called Sample.cbz which I can then view with my Comix reader!

Hopefully what I have posted above makes enough sense that someone else can learn from it. If I need to change it in any way please let me know.

Solution 2

I'm not very familiar with *.cbr / *.cbz, but it seems you'll have to combine two steps:

  1. Convert PDF to Images
  2. Compress them into a ZIP / RAR archive.

Regarding step 1, you could use ImageMagick's convert command. You can feed convert with a PDf comprising multiple pages, and convert will return each page as single graphics file. I've tested it with a text scanned at 400 dpi, and the following command resulted in nice single JPGEs:

$ convert -verbose -colorspace RGB -interlace none -density 400 -quality 100 yourPdfFile.pdf 00%d.jpeg

(credits regarding the -quality option: this forum entry)

As a result, you get 000.jpeg, 001.jpeg and so on. Just zip them into a .cbz file, and you're done.

You could even combine both steps by "concatenating" them:

$ convert -verbose -colorspace RGB -interlace none -density 400 -quality 100 yourPdfFile.pdf 00%d.jpg && zip -vm comic.cbz *.jpg

(make sure that there aren't any other JPEGs in your current working directory, since using the code above, zip will move all JPEGs into the cbz file)

Solution 3

I have written a simple bash script for exactly this purpose, you will need poppler installed, so:

sudo apt-get install poppler-utils #ubuntu
brew install poppler # mac

Here is the bash script (save it as convert_to_cbz.sh):

filename="${1%.*}"
echo "Converting $filename to cbz"
mkdir "./$filename"
pdftoppm -jpeg "$1" "./$filename/000"
zip -r9 "${filename}.cbz" "./$filename" 
rm -rf "./$filename"

To use the bash script:

chmod +x convert_to_cbz.sh
./convert_to_cbz.sh "Nintendo Official Magazine 066 (OldGameMags).pdf"

Hopefully this will be useful for someone!

Solution 4

Try using calibre to directly convert the .pdf to .cbr or .cbz.

Share:
20,595

Related videos on Youtube

Shawn
Author by

Shawn

Updated on September 18, 2022

Comments

  • Shawn
    Shawn over 1 year

    I have some .pdf files that I would like to convert to my preferred reading format of .cbr or .cbz or, if this isn't directly possible, I need to extract all pages from the .pdf as images and then compress them into my format of choice. I have only been able to save pages one at a time with Document Viewer. Obviously, I'd like to do it a little quicker. I have tried pdfsam, pdf shuffler, and pdfmod all with no luck. I am using Ubuntu 11.10.

  • tohuwawohu
    tohuwawohu over 12 years
    Very nice! It seems that pdftoppm is in fact easier to use than ImageMagick's convert.
  • Shawn
    Shawn over 12 years
    Thanks for the suggestion, but for me using Calibre as a solution won't work. I installed the program and I am sorry to say that it sticks out like a sore thumb on my desktop! Also, I discovered using the pdftoppm command below is WAY faster than installing and configuring Calibre before converting.
  • Anny Igi
    Anny Igi over 12 years
    @Shawn Yes, I would say that Calibre is ugly and slow, but it does do the job. I'm glad you found a better solution, though :)
  • Pankaj Badukale
    Pankaj Badukale over 8 years
    yes it very helpful and easy. I just want to know. can add alpha prefix. Like prefix-a, prefix-b, prefix-c, in this way
  • Eric Duminil
    Eric Duminil over 6 years
    Wonderful, thanks. I had to change the script a bit though. pdftoppm used all my RAM and crashed my computer. Replacing the 4th line with pdfimages -j "$1" "./$filename/000" did the trick. It works fine, it's fast, there doesn't seem to be any quality loss and the cbz is slightly smaller than the original pdf. pdfimages is also included in poppler-utils.
  • Eric Duminil
    Eric Duminil over 6 years
    pdftoppm is extremely slow and uses all the RAM on my computer. pdfimages -j worked much better.
  • mchid
    mchid about 3 years
    This doesn't work. All you get is a help message when using this syntax for pdftoppm