Display first page of PDF as Image

42,766

Solution 1

This is what I used

Document document = new Document();
try {
    document.setFile(myProjectPath);
    System.out.println("Parsed successfully...");
} catch (PDFException ex) {
    System.out.println("Error parsing PDF document " + ex);
} catch (PDFSecurityException ex) {
    System.out.println("Error encryption not supported " + ex);
} catch (FileNotFoundException ex) {
    System.out.println("Error file not found " + ex);
} catch (IOException ex) {
    System.out.println("Error handling PDF document " + ex);
}



// save page caputres to file.
float scale = 1.0f;
float rotation = 0f;

System.out.println("scale == " + scale);

// Paint each pages content to an image and write the image to file
InputStream fis2 = null;
File file = null;
for (int i = 0; i < 1; i++) {
    BufferedImage image = (BufferedImage) document.getPageImage(i,
    GraphicsRenderingHints.SCREEN,
    Page.BOUNDARY_CROPBOX, rotation, scale);
    RenderedImage rendImage = image;
    // capture the page image to file
    try {
        System.out.println("\t capturing page " + i);
        file = new File(myProjectActualPath + "myImage.png");
        ImageIO.write(rendImage, "png", file);
        fis2 = new BufferedInputStream(new FileInputStream(myProjectActualPath + "myImage.png"));

    } catch (IOException ioe) {
        System.out.println("IOException :: " + ioe);
    } catch (Exception e) {
        System.out.println("Exception :: " + e);
    }
    image.flush();
}

Solution 2

I'm not sure if all browsers display your embedded PDF (done via <h:graphicImage value="some.pdf" ... /> ) equally well.

Extracting 1st Page as PDF

If you insist on using PDF, I'd recommend one of these 2 commandline tools to extract the first page of any PDF:

  1. pdftk
  2. Ghostscript

Both are available for Linux, Mac OS X and Windows.

pdftk command

pdftk input.pdf cat 1 output page-1-of-input.pdf

Ghostscript command

gs -o page-1-of-input.pdf -sDEVICE=pdfwrite -dPDFLastPage=1 input.pdf

(On Windows use gswin32c.exe or gswin64c.exe instead of gs.)

pdftk is slightly faster than Ghostscript when it comes to page extraction, but for a single page that difference is probably neglectable. As of the most recent released version, v9.05, the previous sentence is no longer true. I found that Ghostscript (including all startup overhead) requires ~1 second to extract the 1st page from the 756 page PDF specification, while PDFTK needed ~11 seconds.

Converting 1st Page to JPEG

If you want to be sure that even older browsers can display your 1st page well, then convert it to JPEG. Ghostscript is your friend here (ImageMagick cannot do it by itself, it needs the help of Ghostscript anyway):

gs -o page-1-of-input-PDF.jpeg -sDEVICE=jpeg -dLastPage=1 input.pdf

Should you need page 33, you can do it like this:

gs -o page-33-of-input-PDF.jpeg -sDEVICE=jpeg -dFirstPage=33 -dLastPage33 input.pdf

If you need a range of PDFs, like pages 17-23, try this:

gs -o page-16+%03d-of-input-PDF.jpeg -sDEVICE=jpeg -dFirstPage=17 -dLastPage23 input.pdf

Note, that the %03d notation increments with each page processed, starting with 1. So your first JPEG's name would be page-16+001-of-input-PDF.jpeg.

Maybe PNG is better?

Be aware that JPEG isn't a format suited well for images containing high black+white contrast and sharp edges like text pages. PNG is much better for this.

To create a PNG from the 1st PDF pages with Ghostscript is easy:

gs -o page-1-of-input-PDF.png -sDEVICE=pngalpha -dLastPage=1 input.pdf

The analog options as with JPEGs are true when it comes to extract ranges of pages.

Solution 3

Warning: Don't use Ma9ic's script (posted in another answer) unless you want to...

  • ...make the PDF->JPEG conversion consume much more time + resources than it should be
  • ...give up your own control over the PDF->JPEG conversion process altogether.

While it may work well for you there are so many problems in these 8 little lines of Bash.

First,
it uses identify to extract the number of pages from the input PDF. However, identify (part of ImageMagick) is completely unable to process PDFs all by itself. It has to run Ghostscript as a 'delegate' to handle PDF input. It would be much more efficient to use Ghostscript directly instead of running it indirectly, via ImageMagick.

Second,
it uses convert to PDF->JPEG conversion. Same remark as above: it uses Ghostscript anyway, so why not run it directly?

Third,
it loops over the pages and runs a different convert process for every single page of the PDF, that is 100 converts for a 100 page PDF file. That means: it also runs 100 Ghostscript commands to produce 100 JPEGs.

Fourth,
Fahim Parkar's question was to get a thumbnail from the first page of the PDF, not from all of them.

The script does run at least 201 different commands for a 100 page PDF, when it could all be done in just 1 command. If you Ghostscript directly...

  1. ...not only will it run faster and more efficiently,
  2. ...but also it will give you more fine-grained and better control over the JPEGs' quality settings.

Use the right tool for the job, and use it correctly!


Update:

Since I was asked, here is my alternative implementation to Ma9ic's script.

#!/bin/bash 
infile=${1}

gs -q -o $(basename "${infile}")_p%04d.jpeg -sDEVICE=jpeg "${infile}"

# To get thumbnail JPEGs with a width 200 pixel use the following command:
# gs -q -o name_200px_p%04d.jpg -sDEVICE=jpeg -dPDFFitPage -g200x400 "${infile}"

# To get higher quality JPEGs (but also bigger-in-size ones) with a 
# resolution of 300 dpi use the following command:
# gs -q -o name_300dpi_p%04d.jpg -sDEVICE=jpeg -dJPEGQ=100 -r300 "${infile}"

echo "Done"

I've even run a benchmark on it. I converted the 756-page PDF-1.7 specification to JPEGs with both scripts:

  • Ma9ic's version needs 1413 seconds generate the 756 JPEGs.
  • My version saves 93% of that time and takes 91 seconds.
  • Moreover, Ma9ic's script produces on my system mostly black JPEG images, mine are Ok.
Share:
42,766
Fahim Parkar
Author by

Fahim Parkar

Passionate about programming. First worked as TAB Programmer (MR company programming using Quantam Software) Then started working on JSF platform to make websites. Then shifted to Kuwait &amp; started with iPhone development. After almost 4 years of iPhone development, now in process to Learn Android too... Alhamdullilah!!! Either sleep or code!!! SOreadytohelp

Updated on December 02, 2020

Comments

  • Fahim Parkar
    Fahim Parkar over 3 years

    I am creating web application where I am displaying images/ pdf in thumbnail format. Onclicking respective image/ pdf it get open in new window.

    For PDF, I have (this is code of the new window)

    <iframe src="images/testes.pdf" width="800" height="200" />
    

    Using this I can see all PDF in web browser. However for thumbnail purpose, I want to display only first page of PDF as an Image.

    I tried

     <h:graphicImage value="images/testes.pdf" width="800" height="200" />
    

    however it is not working. Any idea how to get this done?

    Update 1

    I am providing path of pdf file for example purpose. However I have images in Database. In actual I have code as below.

    <iframe src="#{PersonalInformationDataBean.myAttachmentString}" width="800" height="200" />
    

    Update 2

    For sake of thumbnail, what I am using is

     <h:graphicImage height=200 width=200 value="...."> 
    

    however I need to achieve same for PDF also.

    Hope I am clear what I am expecting...

    • Daniel
      Daniel almost 12 years
      I don't that it could be done without some third party api that will convert the pdf (first page) into image first....
    • Fahim Parkar
      Fahim Parkar almost 12 years
      @Daniel : Safari is displaying pdf in h:graphicImage.
    • Fahim Parkar
      Fahim Parkar almost 12 years
      @Daniel : See my updated question. Any idea how to get this done?
  • Fahim Parkar
    Fahim Parkar almost 12 years
    Thanks for answer.. which tool should I use for this??
  • user85461
    user85461 over 10 years
    You can limit to just the first page by adding the options: -dFirstPage=1 -dLastPage=1.
  • COil
    COil over 7 years
    Works well ! But when using gs for jpeg, it is better to first generate the png format and then convert it to jpeg so you can set the quality you want. (the default one is bad)
  • Archimedes Trajano
    Archimedes Trajano almost 7 years
    which PDF API is used?