From PDf to String

13,714

Solution 1

use iText. The following snippet for example will extract the text.

PdfTextExtractor parser =new PdfTextExtractor(new PdfReader("C:/Text.pdf"));
parser.getTextFromPage(3);

Solution 2

PDFBox barfs on many newer PDFs, especially those with embedded PNG images.

I was very impressed with PDFTextStream

Solution 3

JPedal and Multivalent also offer text extraction in Java or you could access xpdf using Runtime.exec

Share:
13,714
Ankur
Author by

Ankur

A junior BA have some experience in the financial services industry. I do programming for my own personal projects hence the questions might sound trivial.

Updated on June 09, 2022

Comments

  • Ankur
    Ankur almost 2 years

    What is the easiest way to get the text (words) of a PDF file as one long String or array of Strings.

    I have tried pdfbox but that is not working for me.