extract text from pdf files
16,116
Solution 1
PDFTextExtractor only contains static methods and the constructor is private. itext
You can call it like so:
String myLine = PDFTextExtractor.getTextFromPage(reader, pageNumber)
Solution 2
If you want to get all the text from the PDF file and save it to a text file you can use below code.
Use pdfutil.jar library.
import java.io.IOException;
import java.io.PrintWriter;
import com.testautomationguru.utility.PDFUtil;
public class PDFToText{
public static void main(String[] args) {
try {
String pdfFilePath = "C:\\abc.pdf";
PDFUtil pdfUtil = new PDFUtil();
String content = pdfUtil.getText(pdfFilePath);
PrintWriter out = new PrintWriter("C:\\abc.txt");
out.println(content);
out.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
Author by
Rim
Updated on June 04, 2022Comments
-
Rim almost 2 years
I need to extract text (word by word) from a pdf file.
import java.io.*; import com.itextpdf.text.*; import com.itextpdf.text.pdf.*; import com.itextpdf.text.pdf.parser.*; public class pdf { private static String INPUTFILE = "http://ontology.buffalo.edu/ontology%28PIC%29.pdf" ; private static String OUTPUTFILE = "c:/new3.pdf"; public static void main(String[] args) throws DocumentException, IOException { Document document = new Document(); PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(OUTPUTFILE)); document.open(); PdfReader reader = new PdfReader(INPUTFILE); int n = reader.getNumberOfPages(); PdfImportedPage page; // Go through all pages for (int i = 1; i <= n; i++) { page = writer.getImportedPage(reader, i); System.out.println(i); Image instance = Image.getInstance(page); document.add(instance); } document.close(); PdfReader readerN = new PdfReader(OUTPUTFILE); PdfTextExtractor parse = new PdfTextExtractor(); for (int i = 1; i <= n; i++) System.out.println(parser.getTextFromPage(reader,i)); }
When I compile the code, I have this error:
the constructor PdfTextExtractor is undefined
How do I fix this?