Convert HTML with images to PDF using iText

java html css pdf itext

15,324

The following is based on iText5 5.5.12 version

Suppose you have this directory structure:

With this code and using latest iText5:

package converthtmltopdf;

import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.tool.xml.XMLWorker;
import com.itextpdf.tool.xml.XMLWorkerHelper;
import com.itextpdf.tool.xml.html.Tags;
import com.itextpdf.tool.xml.net.FileRetrieve;
import com.itextpdf.tool.xml.net.FileRetrieveImpl;
import com.itextpdf.tool.xml.parser.XMLParser;
import com.itextpdf.tool.xml.pipeline.css.CSSResolver;
import com.itextpdf.tool.xml.pipeline.css.CssResolverPipeline;
import com.itextpdf.tool.xml.pipeline.end.PdfWriterPipeline;
import com.itextpdf.tool.xml.pipeline.html.AbstractImageProvider;
import com.itextpdf.tool.xml.pipeline.html.HtmlPipeline;
import com.itextpdf.tool.xml.pipeline.html.HtmlPipelineContext;
import com.itextpdf.tool.xml.pipeline.html.LinkProvider;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;

/**
 *
 * @author george.mavrommatis
 */
public class ConvertHtmlToPdf {
    public static final String HTML = "C:\\Users\\zzz\\Desktop\\itext\\index.html";
    public static final String DEST = "C:\\Users\\zzz\\Desktop\\itext\\index.pdf";
    public static final String IMG_PATH = "C:\\Users\\zzz\\Desktop\\itext\\";
    public static final String RELATIVE_PATH = "C:\\Users\\zzz\\Desktop\\itext\\";
    public static final String CSS_DIR = "C:\\Users\\zzz\\Desktop\\itext\\";

    /**
     * Creates a PDF with the words "Hello World"
     * @param file
     * @throws IOException
     * @throws DocumentException
     */
    public void createPdf(String file) throws IOException, DocumentException {
        // step 1
        Document document = new Document();
        // step 2
        PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(file));
        // step 3
        document.open();
        // step 4

        // CSS
        CSSResolver cssResolver =
                XMLWorkerHelper.getInstance().getDefaultCssResolver(false);
        FileRetrieve retrieve = new FileRetrieveImpl(CSS_DIR);
        cssResolver.setFileRetrieve(retrieve);

        // HTML
        HtmlPipelineContext htmlContext = new HtmlPipelineContext(null);
        htmlContext.setTagFactory(Tags.getHtmlTagProcessorFactory());
        htmlContext.setImageProvider(new AbstractImageProvider() {
            public String getImageRootPath() {
                return IMG_PATH;
            }
        });
        htmlContext.setLinkProvider(new LinkProvider() {
            public String getLinkRoot() {
                return RELATIVE_PATH;
            }
        });

        // Pipelines
        PdfWriterPipeline pdf = new PdfWriterPipeline(document, writer);
        HtmlPipeline html = new HtmlPipeline(htmlContext, pdf);
        CssResolverPipeline css = new CssResolverPipeline(cssResolver, html);

        // XML Worker
        XMLWorker worker = new XMLWorker(css, true);
        XMLParser p = new XMLParser(worker);
        p.parse(new FileInputStream(HTML));

        // step 5
        document.close();
    }
    /**
     * @param args the command line arguments
     */
    public static void main(String[] args) throws IOException, DocumentException {
        // TODO code application logic here
        new ConvertHtmlToPdf().createPdf(DEST);
    }

}

And here is the result:

This example uses code from: https://developers.itextpdf.com/examples/xml-worker-itext5/xml-worker-examples

Hope this helps

15,324

Author by

jdubicki

My main job is as a Java Developer for a software services company. I am currently a contributing for a swing UI based application, but am also exploring ways to bring the UI to web using JSF. For fun, I enjoy developing for mobile for Android native and hybrid. I also play the drums.

Updated on June 04, 2022

Comments

jdubicki almost 2 years

I have searched the questions and have not been able to find a solution to my specific problem. What I need to do is convert HTML files that contain images and CSS styling to PDF. I am using iText 5 and have been able to include the styling into the generated PDF. However, I am still struggling including the images. I have included my code below. The image with the absolute path is included in the generated PDF, the image with the relative path is not. I know I need to implement AbstractImageProvider, but I do not know how to do it. Any help is greatly appreciated.

Java File:

public class Converter {

    static String in = "C:/Users/APPS/Desktop/Test_Html/index.htm";
    static String out = "C:/Users/APPS/Desktop/index.pdf";
    static String css = "C:/Users/APPS/Desktop/Test_Html/style.css";

    public static void main(String[] args) {
        try {
            convertHtmlToPdf();
        } catch (DocumentException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    private static void convertHtmlToPdf() throws DocumentException, IOException {
        Document document = new Document();
        PdfWriter pdfWriter = PdfWriter.getInstance(document, new FileOutputStream(out));
        document.open();
        XMLWorkerHelper.getInstance().parseXHtml(pdfWriter, document, new FileInputStream(in), new FileInputStream(css));
        document.close();
        System.out.println("PDF Created!");
    }

    /**
     * Not sure how to implement this
     * @author APPS
     *
     */
    public class myImageProvider extends AbstractImageProvider {

        @Override
        public String getImageRootPath() {
            // TODO Auto-generated method stub
            return null;
        }

    }

}

Html File:

<!DOCTYPE html>
<html lang="en">

<head>
    <title>HTML to PDF</title>
    <link href="style.css" rel="stylesheet" type="text/css" />
</head>

<body>
    <h1>HTML to PDF</h1>
    <p>
        <span class="itext">itext</span> 5.4.2
        <span class="description"> converting HTML to PDF</span>
    </p>
    <table>
        <tr>
            <th class="label">Title</th>
            <td>iText - Java HTML to PDF</td>
        </tr>
        <tr>
            <th>URL</th>
            <td>http://wwww.someurl.com</td>
        </tr>
    </table>
    <div class="center">
        <h2>Here is an image</h2>
        <div>
            <img src="images/Vader_TFU.jpg" />
        </div>
        <div>
            <img src="https://www.w3schools.com/images/picture.jpg" alt="Mountain" />
        </div>
    </div>
</body>
</html>

Css File:

h1 {
    color: #ccc;
}

table tr td {
    text-align: center;
    border: 1px solid gray;
    padding: 4px;
}

table tr th {
    background-color: #84C7FD;
    color: #fff;
    width: 100px;
}

.itext {
    color: #84C7FD;
    font-weight: bold;
}

.description {
    color: gray;
}

.center {
    text-align: center;
}

MaVRoSCy over 6 years

@jdubicki glad i was able to help. Dont forget if an answer helps you you can upvote it and then accept it. Thanks
jdubicki over 6 years

I am having another issue that I need help with. I had to make some modifications in order to be able to read and render nested unordered lists. I noticed that my header tags are not being parsed in the PDF can anyone help me correct this. Below are the chages I need to make.
jdubicki over 6 years

PdfWriter.getInstance(document, new FileOutputStream(file)); // Pipelines ElementList elements = new ElementList(); ElementHandlerPipeline end = new ElementHandlerPipeline(elements, null); HtmlPipeline html = new HtmlPipeline(htmlPipelineContext, end); CssResolverPipeline css = new CssResolverPipeline(cssResolver, html); document.open(); for (Element e : elements) { document.add(e); } document.add(Chunk.NEWLINE);
jdubicki over 6 years

I have posted a new question. stackoverflow.com/questions/46980365/…