Read Pdf with C

10,508

Solution 1

Check out libpoppler. I've never used it work extracting text, just querying PDF attributes. It's pretty easy to use.

Solution 2

How well do you need to parse them? Just extracting strings should be relatively easy, fully accurate rendering is harder. Take a look at the source for evince or ghostscript?

This is for C++ but might be a good starting point for understanding PDF structure http://www.codeproject.com/KB/cpp/ExtractPDFText.aspx (sorry wrong link before)

Share:
10,508
Rui Carneiro
Author by

Rui Carneiro

Software Engineer, computer games enthusiast and geek.

Updated on June 26, 2022

Comments

  • Rui Carneiro
    Rui Carneiro almost 2 years

    I want to be able to read the content of pdf files. I need to do that with C on Linux.

    The closer i can get to this was here but I think Haru can only create pdf and is not able to read them (not 100% sure).

    PS: I only need the plain text from pdf

  • Rui Carneiro
    Rui Carneiro about 15 years
    I only need the plain text from pdf files.
  • Rui Carneiro
    Rui Carneiro about 15 years
    I think libpoppler is too "big" for what i want. It uses QT and other stuff that i think it is unnecessary.
  • Ehab Developer
    Ehab Developer about 15 years
    Poppler has optional frontends for glib and Qt (to fit nicely into their object systems), but is not required.
  • Rui Carneiro
    Rui Carneiro about 15 years
    Ok! I saw that it is already on Ubuntu repositories. I will take a look.