extract vector image from a pdf file

11,806

Solution 1

not for images only, as you seem to need, but

  • pdftocairo

http://poppler.freedesktop.org/

http://www.manpagez.com/man/1/pdftocairo/ (manpage)

is able to render a pdf page to other vector formats like PS/EPS/SVG

assuming you have a pdf page with vectorized images, you can render this page to svg and then copy only image you are interested in

note: pdftocairo cannot render multipage pdf to multipage svg

if you need to convert to svg several pdf pages you need first to pick this page range and then burst pdf pages into single pdf pages

example (if we need to convert pages 1-10 of a pdf file to svg)

pdftk file.pdf cat 1-10 output 1-10.pdf

pdftk 1-10.pdf burst

for f in *.pdf; do pdftocairo -svg $f; done

finally, with sodipodi or inkscape, you can extract images you are interested from svg rendered pdf page

Solution 2

What do you consider a "figure"? This is a concept that doesn't exist in PDF. The reason there are so many tools that can extract images from a PDF file, is because images are a very clearly identified entity.

Your "figures" however, are much less clearly defined. PDF files may contain lots of vector content that you wouldn't call a figure. Text can be stroked for example, which would make it vector art and as such it might be confused with your figures. Other decorative elements may be used in the background of the pages. Text may be underlined, which would be a vector element...

In the other direction, your "figure" may contain a caption that is text, further complicating things.

As PDF doesn't have the notion of a figure, you'll have to figure out how to isolate one on a PDF page (perhaps because the creator application always adds metadata to them, or because they use a special color or... If you can isolate them, it should be possible to trim everything irrelevant on the page and export what you need as EPS or SVG using some of the techniques described in the other answer.

Solution 3

This article describes the tools gpdfx, inkscape and pdf2svg which are not completely commandline-based, but still sound helpful.

Share:
11,806
v923z
Author by

v923z

Updated on June 12, 2022

Comments

  • v923z
    v923z almost 2 years

    Is there a command line tool on linux that would extract figures from a pdf file, and save them in vector format? I know about pdfimages, but that would create a bitmap, and that is not what I need.