Convert HTML with images to PDF using iText

15,324

The following is based on iText5 5.5.12 version

Suppose you have this directory structure:

enter image description here

With this code and using latest iText5:

package converthtmltopdf;

import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.tool.xml.XMLWorker;
import com.itextpdf.tool.xml.XMLWorkerHelper;
import com.itextpdf.tool.xml.html.Tags;
import com.itextpdf.tool.xml.net.FileRetrieve;
import com.itextpdf.tool.xml.net.FileRetrieveImpl;
import com.itextpdf.tool.xml.parser.XMLParser;
import com.itextpdf.tool.xml.pipeline.css.CSSResolver;
import com.itextpdf.tool.xml.pipeline.css.CssResolverPipeline;
import com.itextpdf.tool.xml.pipeline.end.PdfWriterPipeline;
import com.itextpdf.tool.xml.pipeline.html.AbstractImageProvider;
import com.itextpdf.tool.xml.pipeline.html.HtmlPipeline;
import com.itextpdf.tool.xml.pipeline.html.HtmlPipelineContext;
import com.itextpdf.tool.xml.pipeline.html.LinkProvider;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;

/**
 *
 * @author george.mavrommatis
 */
public class ConvertHtmlToPdf {
    public static final String HTML = "C:\\Users\\zzz\\Desktop\\itext\\index.html";
    public static final String DEST = "C:\\Users\\zzz\\Desktop\\itext\\index.pdf";
    public static final String IMG_PATH = "C:\\Users\\zzz\\Desktop\\itext\\";
    public static final String RELATIVE_PATH = "C:\\Users\\zzz\\Desktop\\itext\\";
    public static final String CSS_DIR = "C:\\Users\\zzz\\Desktop\\itext\\";

    /**
     * Creates a PDF with the words "Hello World"
     * @param file
     * @throws IOException
     * @throws DocumentException
     */
    public void createPdf(String file) throws IOException, DocumentException {
        // step 1
        Document document = new Document();
        // step 2
        PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(file));
        // step 3
        document.open();
        // step 4

        // CSS
        CSSResolver cssResolver =
                XMLWorkerHelper.getInstance().getDefaultCssResolver(false);
        FileRetrieve retrieve = new FileRetrieveImpl(CSS_DIR);
        cssResolver.setFileRetrieve(retrieve);

        // HTML
        HtmlPipelineContext htmlContext = new HtmlPipelineContext(null);
        htmlContext.setTagFactory(Tags.getHtmlTagProcessorFactory());
        htmlContext.setImageProvider(new AbstractImageProvider() {
            public String getImageRootPath() {
                return IMG_PATH;
            }
        });
        htmlContext.setLinkProvider(new LinkProvider() {
            public String getLinkRoot() {
                return RELATIVE_PATH;
            }
        });

        // Pipelines
        PdfWriterPipeline pdf = new PdfWriterPipeline(document, writer);
        HtmlPipeline html = new HtmlPipeline(htmlContext, pdf);
        CssResolverPipeline css = new CssResolverPipeline(cssResolver, html);

        // XML Worker
        XMLWorker worker = new XMLWorker(css, true);
        XMLParser p = new XMLParser(worker);
        p.parse(new FileInputStream(HTML));

        // step 5
        document.close();
    }
    /**
     * @param args the command line arguments
     */
    public static void main(String[] args) throws IOException, DocumentException {
        // TODO code application logic here
        new ConvertHtmlToPdf().createPdf(DEST);
    }

}

And here is the result:

enter image description here

This example uses code from: https://developers.itextpdf.com/examples/xml-worker-itext5/xml-worker-examples

Hope this helps

Share:
15,324
jdubicki
Author by

jdubicki

My main job is as a Java Developer for a software services company. I am currently a contributing for a swing UI based application, but am also exploring ways to bring the UI to web using JSF. For fun, I enjoy developing for mobile for Android native and hybrid. I also play the drums.

Updated on June 04, 2022

Comments

  • jdubicki
    jdubicki almost 2 years

    I have searched the questions and have not been able to find a solution to my specific problem. What I need to do is convert HTML files that contain images and CSS styling to PDF. I am using iText 5 and have been able to include the styling into the generated PDF. However, I am still struggling including the images. I have included my code below. The image with the absolute path is included in the generated PDF, the image with the relative path is not. I know I need to implement AbstractImageProvider, but I do not know how to do it. Any help is greatly appreciated.

    Java File:

    public class Converter {
    
        static String in = "C:/Users/APPS/Desktop/Test_Html/index.htm";
        static String out = "C:/Users/APPS/Desktop/index.pdf";
        static String css = "C:/Users/APPS/Desktop/Test_Html/style.css";
    
        public static void main(String[] args) {
            try {
                convertHtmlToPdf();
            } catch (DocumentException e) {
                e.printStackTrace();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    
        private static void convertHtmlToPdf() throws DocumentException, IOException {
            Document document = new Document();
            PdfWriter pdfWriter = PdfWriter.getInstance(document, new FileOutputStream(out));
            document.open();
            XMLWorkerHelper.getInstance().parseXHtml(pdfWriter, document, new FileInputStream(in), new FileInputStream(css));
            document.close();
            System.out.println("PDF Created!");
        }
    
        /**
         * Not sure how to implement this
         * @author APPS
         *
         */
        public class myImageProvider extends AbstractImageProvider {
    
            @Override
            public String getImageRootPath() {
                // TODO Auto-generated method stub
                return null;
            }
    
        }
    
    }
    

    Html File:

    <!DOCTYPE html>
    <html lang="en">
    
    <head>
        <title>HTML to PDF</title>
        <link href="style.css" rel="stylesheet" type="text/css" />
    </head>
    
    <body>
        <h1>HTML to PDF</h1>
        <p>
            <span class="itext">itext</span> 5.4.2
            <span class="description"> converting HTML to PDF</span>
        </p>
        <table>
            <tr>
                <th class="label">Title</th>
                <td>iText - Java HTML to PDF</td>
            </tr>
            <tr>
                <th>URL</th>
                <td>http://wwww.someurl.com</td>
            </tr>
        </table>
        <div class="center">
            <h2>Here is an image</h2>
            <div>
                <img src="images/Vader_TFU.jpg" />
            </div>
            <div>
                <img src="https://www.w3schools.com/images/picture.jpg" alt="Mountain" />
            </div>
        </div>
    </body>
    </html>
    

    Css File:

    h1 {
        color: #ccc;
    }
    
    table tr td {
        text-align: center;
        border: 1px solid gray;
        padding: 4px;
    }
    
    table tr th {
        background-color: #84C7FD;
        color: #fff;
        width: 100px;
    }
    
    .itext {
        color: #84C7FD;
        font-weight: bold;
    }
    
    .description {
        color: gray;
    }
    
    .center {
        text-align: center;
    }
    
  • MaVRoSCy
    MaVRoSCy over 6 years
    @jdubicki glad i was able to help. Dont forget if an answer helps you you can upvote it and then accept it. Thanks
  • jdubicki
    jdubicki over 6 years
    I am having another issue that I need help with. I had to make some modifications in order to be able to read and render nested unordered lists. I noticed that my header tags are not being parsed in the PDF can anyone help me correct this. Below are the chages I need to make.
  • jdubicki
    jdubicki over 6 years
    PdfWriter.getInstance(document, new FileOutputStream(file)); // Pipelines ElementList elements = new ElementList(); ElementHandlerPipeline end = new ElementHandlerPipeline(elements, null); HtmlPipeline html = new HtmlPipeline(htmlPipelineContext, end); CssResolverPipeline css = new CssResolverPipeline(cssResolver, html); document.open(); for (Element e : elements) { document.add(e); } document.add(Chunk.NEWLINE);
  • jdubicki
    jdubicki over 6 years
    I have posted a new question. stackoverflow.com/questions/46980365/…