Convert HTML with images to PDF using iText
The following is based on iText5 5.5.12 version
Suppose you have this directory structure:
With this code and using latest iText5:
package converthtmltopdf;
import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.tool.xml.XMLWorker;
import com.itextpdf.tool.xml.XMLWorkerHelper;
import com.itextpdf.tool.xml.html.Tags;
import com.itextpdf.tool.xml.net.FileRetrieve;
import com.itextpdf.tool.xml.net.FileRetrieveImpl;
import com.itextpdf.tool.xml.parser.XMLParser;
import com.itextpdf.tool.xml.pipeline.css.CSSResolver;
import com.itextpdf.tool.xml.pipeline.css.CssResolverPipeline;
import com.itextpdf.tool.xml.pipeline.end.PdfWriterPipeline;
import com.itextpdf.tool.xml.pipeline.html.AbstractImageProvider;
import com.itextpdf.tool.xml.pipeline.html.HtmlPipeline;
import com.itextpdf.tool.xml.pipeline.html.HtmlPipelineContext;
import com.itextpdf.tool.xml.pipeline.html.LinkProvider;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
/**
*
* @author george.mavrommatis
*/
public class ConvertHtmlToPdf {
public static final String HTML = "C:\\Users\\zzz\\Desktop\\itext\\index.html";
public static final String DEST = "C:\\Users\\zzz\\Desktop\\itext\\index.pdf";
public static final String IMG_PATH = "C:\\Users\\zzz\\Desktop\\itext\\";
public static final String RELATIVE_PATH = "C:\\Users\\zzz\\Desktop\\itext\\";
public static final String CSS_DIR = "C:\\Users\\zzz\\Desktop\\itext\\";
/**
* Creates a PDF with the words "Hello World"
* @param file
* @throws IOException
* @throws DocumentException
*/
public void createPdf(String file) throws IOException, DocumentException {
// step 1
Document document = new Document();
// step 2
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(file));
// step 3
document.open();
// step 4
// CSS
CSSResolver cssResolver =
XMLWorkerHelper.getInstance().getDefaultCssResolver(false);
FileRetrieve retrieve = new FileRetrieveImpl(CSS_DIR);
cssResolver.setFileRetrieve(retrieve);
// HTML
HtmlPipelineContext htmlContext = new HtmlPipelineContext(null);
htmlContext.setTagFactory(Tags.getHtmlTagProcessorFactory());
htmlContext.setImageProvider(new AbstractImageProvider() {
public String getImageRootPath() {
return IMG_PATH;
}
});
htmlContext.setLinkProvider(new LinkProvider() {
public String getLinkRoot() {
return RELATIVE_PATH;
}
});
// Pipelines
PdfWriterPipeline pdf = new PdfWriterPipeline(document, writer);
HtmlPipeline html = new HtmlPipeline(htmlContext, pdf);
CssResolverPipeline css = new CssResolverPipeline(cssResolver, html);
// XML Worker
XMLWorker worker = new XMLWorker(css, true);
XMLParser p = new XMLParser(worker);
p.parse(new FileInputStream(HTML));
// step 5
document.close();
}
/**
* @param args the command line arguments
*/
public static void main(String[] args) throws IOException, DocumentException {
// TODO code application logic here
new ConvertHtmlToPdf().createPdf(DEST);
}
}
And here is the result:
This example uses code from: https://developers.itextpdf.com/examples/xml-worker-itext5/xml-worker-examples
Hope this helps
jdubicki
My main job is as a Java Developer for a software services company. I am currently a contributing for a swing UI based application, but am also exploring ways to bring the UI to web using JSF. For fun, I enjoy developing for mobile for Android native and hybrid. I also play the drums.
Updated on June 04, 2022Comments
-
jdubicki almost 2 years
I have searched the questions and have not been able to find a solution to my specific problem. What I need to do is convert HTML files that contain images and CSS styling to PDF. I am using iText 5 and have been able to include the styling into the generated PDF. However, I am still struggling including the images. I have included my code below. The image with the absolute path is included in the generated PDF, the image with the relative path is not. I know I need to implement AbstractImageProvider, but I do not know how to do it. Any help is greatly appreciated.
Java File:
public class Converter { static String in = "C:/Users/APPS/Desktop/Test_Html/index.htm"; static String out = "C:/Users/APPS/Desktop/index.pdf"; static String css = "C:/Users/APPS/Desktop/Test_Html/style.css"; public static void main(String[] args) { try { convertHtmlToPdf(); } catch (DocumentException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } } private static void convertHtmlToPdf() throws DocumentException, IOException { Document document = new Document(); PdfWriter pdfWriter = PdfWriter.getInstance(document, new FileOutputStream(out)); document.open(); XMLWorkerHelper.getInstance().parseXHtml(pdfWriter, document, new FileInputStream(in), new FileInputStream(css)); document.close(); System.out.println("PDF Created!"); } /** * Not sure how to implement this * @author APPS * */ public class myImageProvider extends AbstractImageProvider { @Override public String getImageRootPath() { // TODO Auto-generated method stub return null; } } }
Html File:
<!DOCTYPE html> <html lang="en"> <head> <title>HTML to PDF</title> <link href="style.css" rel="stylesheet" type="text/css" /> </head> <body> <h1>HTML to PDF</h1> <p> <span class="itext">itext</span> 5.4.2 <span class="description"> converting HTML to PDF</span> </p> <table> <tr> <th class="label">Title</th> <td>iText - Java HTML to PDF</td> </tr> <tr> <th>URL</th> <td>http://wwww.someurl.com</td> </tr> </table> <div class="center"> <h2>Here is an image</h2> <div> <img src="images/Vader_TFU.jpg" /> </div> <div> <img src="https://www.w3schools.com/images/picture.jpg" alt="Mountain" /> </div> </div> </body> </html>
Css File:
h1 { color: #ccc; } table tr td { text-align: center; border: 1px solid gray; padding: 4px; } table tr th { background-color: #84C7FD; color: #fff; width: 100px; } .itext { color: #84C7FD; font-weight: bold; } .description { color: gray; } .center { text-align: center; }
-
MaVRoSCy over 6 years@jdubicki glad i was able to help. Dont forget if an answer helps you you can upvote it and then accept it. Thanks
-
jdubicki over 6 yearsI am having another issue that I need help with. I had to make some modifications in order to be able to read and render nested unordered lists. I noticed that my header tags are not being parsed in the PDF can anyone help me correct this. Below are the chages I need to make.
-
jdubicki over 6 yearsPdfWriter.getInstance(document, new FileOutputStream(file)); // Pipelines ElementList elements = new ElementList(); ElementHandlerPipeline end = new ElementHandlerPipeline(elements, null); HtmlPipeline html = new HtmlPipeline(htmlPipelineContext, end); CssResolverPipeline css = new CssResolverPipeline(cssResolver, html); document.open(); for (Element e : elements) { document.add(e); } document.add(Chunk.NEWLINE);
-
jdubicki over 6 yearsI have posted a new question. stackoverflow.com/questions/46980365/…