imagemagick wand save pdf pages as images

12,691

Solution 1

A slight simplification of @rikAtee's answer / addition of detecting the page count automatically by counting the length of the sequence:

def convert_pdf_to_png(blob):
    pdf = Image(blob=blob)

    pages = len(pdf.sequence)

    image = Image(
        width=pdf.width,
        height=pdf.height * pages
    )

    for i in xrange(pages):
        image.composite(
            pdf.sequence[i],
            top=pdf.height * i,
            left=0
        )

    return image.make_blob('png')

I haven't noticed any memory link issues, although my PDFs only tend to be 2 or 3 pages.

Solution 2

My solution:

from wand.image import Image

diag='yourpdf.pdf'

with(Image(filename=diag,resolution=200)) as source:
    images=source.sequence
    pages=len(images)
    for i in range(pages):
        Image(images[i]).save(filename=str(i)+'.png')

It works, and compared to other answers, it appears more flexible to some multi-page pdf files with variable size in different pages.

Solution 3

note: this causes memory leak

I found a way. There is probably a better way, but it works.

class Preview(object):
    def __init__(self, file):
        self.image = Image(file=file)

    def join_pages(self, page_count):
        canvas = self.create_canvas(page_count=page_count)
        for page_number in xrange(page_count):
            canvas.composite(
                self.image.sequence[page_number],
                top=self.image.height*page_number,
                left=0,
            )

    def create_canvas(self, page_count):
        return Image(
            width=self.pdf.width,
            height=self.image.height*page_count,
        )

    preview = Preview(open('path/to/pdf')
    preview.join_pages(3)
Share:
12,691
Code Review Doctor
Author by

Code Review Doctor

I'm a review Pull requests on GitHub to improve your Django code

Updated on June 04, 2022

Comments

  • Code Review Doctor
    Code Review Doctor about 2 years

    I would like to use imagemagick Wand package to convert all pages of a pdf file into a single image file. I am having the following trouble though (see comments below which highlight problem)

    import tempfile
    from wand.image import Image
    
    
    with file('my_pdf_with_5_pages.png') as f:
        image = Image(file=f, format='png')
        save_using_filename(image)
        save_using_file(image)
    
    def save_using_filename(image):
        with tempfile.NamedTemporaryFile() as temp:
            # this saves all pages, but a file for each page (so 3 files)
            image.save(filename=temp.name)
    
    def save_using_file(image):
        with tempfile.NamedTemporaryFile() as temp:
            # this only saves the first page as an image
            image.save(file=temp)
    

    My end goal it to be able to specify which pages are to be converted to one continual image. This is possible from the command line with a bit of

    convert -append input.pdf[0-4]
    

    but I am trying to work with python.

    I see we can get slices by doing this:

    [x for x in w.sequence[0:1]] # get page 1 and 2
    

    now its a question of how to join these pages together.