extract vector image from a pdf file
Solution 1
not for images only, as you seem to need, but
- pdftocairo
http://poppler.freedesktop.org/
http://www.manpagez.com/man/1/pdftocairo/ (manpage)
is able to render a pdf page to other vector formats like PS/EPS/SVG
assuming you have a pdf page with vectorized images, you can render this page to svg and then copy only image you are interested in
note: pdftocairo cannot render multipage pdf to multipage svg
if you need to convert to svg several pdf pages you need first to pick this page range and then burst pdf pages into single pdf pages
example (if we need to convert pages 1-10 of a pdf file to svg)
- 1°
pdftk file.pdf cat 1-10 output 1-10.pdf
- 2°
pdftk 1-10.pdf burst
- 3°
for f in *.pdf; do pdftocairo -svg $f; done
- 4°
finally, with sodipodi or inkscape, you can extract images you are interested from svg rendered pdf page
Solution 2
What do you consider a "figure"? This is a concept that doesn't exist in PDF. The reason there are so many tools that can extract images from a PDF file, is because images are a very clearly identified entity.
Your "figures" however, are much less clearly defined. PDF files may contain lots of vector content that you wouldn't call a figure. Text can be stroked for example, which would make it vector art and as such it might be confused with your figures. Other decorative elements may be used in the background of the pages. Text may be underlined, which would be a vector element...
In the other direction, your "figure" may contain a caption that is text, further complicating things.
As PDF doesn't have the notion of a figure, you'll have to figure out how to isolate one on a PDF page (perhaps because the creator application always adds metadata to them, or because they use a special color or... If you can isolate them, it should be possible to trim everything irrelevant on the page and export what you need as EPS or SVG using some of the techniques described in the other answer.
Solution 3
This article describes the tools gpdfx, inkscape and pdf2svg which are not completely commandline-based, but still sound helpful.
v923z
Updated on June 12, 2022Comments
-
v923z almost 2 years
Is there a command line tool on linux that would extract figures from a pdf file, and save them in vector format? I know about pdfimages, but that would create a bitmap, and that is not what I need.