Can Selenium verify text inside a PDF loaded by the browser?

43,164

Solution 1

While not natively supported, I have found a couple ways using the java driver. One way is to have the pdf open in your browser (having adobe acrobat installed) and then use keyboard shortcut keys to select all text (CTRL+A), then copy it to the clipboard (CTRL+C) and then you can verify the text in the clipboard. eg:

protected String getLastWindow() {
    return session().getEval("var windowId; for(var x in selenium.browserbot.openedWindows ){windowId=x;} ");
}

@Test
public void testTextInPDF() {
    session().click("link=View PDF");
    String popupName = getLastWindow();
    session().waitForPopUp(popupName, PAGE_LOAD_TIMEOUT);
    session().selectWindow(popupName);

    session().windowMaximize();
    session().windowFocus();
    Thread.sleep(3000);

    session().keyDownNative("17"); // Stands for CTRL key
    session().keyPressNative("65"); // Stands for A "ascii code for A"
    session().keyUpNative("17"); //Releases CTRL key
    Thread.sleep(1000);

    session().keyDownNative("17"); // Stands for CTRL key
    session().keyPressNative("67"); // Stands for C "ascii code for C"
    session().keyUpNative("17"); //Releases CTRL key

    TextTransfer textTransfer = new TextTransfer();
    assertTrue(textTransfer.getClipboardContents().contains("Some text in my pdf"));
}

Another way, still in java, is to download the pdf and then convert the pdf to text with PDFBox, see http://www.prasannatech.net/2009/01/convert-pdf-text-parser-java-api-pdfbox.html for an example on how to do this.

Solution 2

You cannot do this using WebDriver natively. However, PDFBox API can be used here to read content of PDF file. You will have to first of all shift a focus to browser window where PDF file is opened. You can then parse all the content of PDF file and search for the desired text string.

Here is a code to use PDFBox API to search within PDF document.

Solution 3

import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.PrintWriter;
import org.pdfbox.cos.COSDocument;
import org.pdfbox.pdfparser.PDFParser;
import org.pdfbox.pdmodel.PDDocument;
import org.pdfbox.util.PDFTextStripper;

public class pdfToTextConverter {

public static void pdfToText(String path_to_PDF_file, String Path_to_output_text_file) throws FileNotFoundException, IOException{
     //Parse text from a PDF into a string variable
     File f = new File("path_to_PDF_file");

     PDFParser parser = new PDFParser(new FileInputStream(f));
     parser.parse();

     COSDocument cosDoc = parser.getDocument();
     PDDocument pdDoc = new PDDocument(cosDoc);

     PDFTextStripper pdfStripper = new PDFTextStripper();
     String parsedText = pdfStripper.getText(pdDoc);

     System.out.println(parsedText);

     //Write parsed text into a file
     PrintWriter pw = new PrintWriter("Path_to_output_text_file");
     pw.print(parsedText);
     pw.close(); 

}

}


JAR Source
http://sourceforge.net/projects/pdfbox/files/latest/download?source=files
Share:
43,164

Related videos on Youtube

Daniel Alexiuc
Author by

Daniel Alexiuc

ლ(ಠ益ಠლ)

Updated on July 09, 2022

Comments

  • Daniel Alexiuc
    Daniel Alexiuc almost 2 years

    My web application loads a pdf in the browser. I have figured out how to check that the pdf has loaded correctly using:

    verifyAttribute xpath=//embed/@src {URL of PDF goes here}

    It would be really nice to be able to check the contents of the pdf with Selenium - for example verify that some text is present. Is there any way to do this?

    • Ihor Kaharlichenko
      Ihor Kaharlichenko over 13 years
      I guess you are talking about a PDF file rendered embedded in a page via some kind of 3rd party plugin, don't you?
    • Daniel Alexiuc
      Daniel Alexiuc over 13 years
      Hmm yeah the Adobe PDF plugin for firefox I guess. I'm not too tied to that though - if there is anything at all I can test about this pdf using Selenium then I am interested.
  • mzjn
    mzjn over 8 years
    Don't just post a wall of code. Explain how it solves the problem.
  • Chandrashekhar Swami
    Chandrashekhar Swami about 8 years
    agree with mzjn.. please elaborate it more