Converting rtf to html with format in Java
10,149
You could try converting with OpenOffice or LibreOffice using this converter library as described in this blog post
Author by
Admin
Updated on June 05, 2022Comments
-
Admin almost 2 years
I can use JEditorPane to parse the rtf text and convert it to html. But the html output is missing some format, namely the strike-through markups in this case. As you can see in the output, underline text was correctly wrapped within <u> but there is no strike-through wrapper. Any idea?
public void testRtfToHtml() { JEditorPane pane = new JEditorPane(); pane.setContentType("text/rtf"); StyledEditorKit kitRtf = (StyledEditorKit) pane.getEditorKitForContentType("text/rtf"); try { kitRtf.read( new StringReader( "{\\rtf1\\ansi \\deflang1033\\deff0{\\fonttbl {\\f0\\froman \\fcharset0 \\fprq2 Times New Roman;}}{\\colortbl;\\red0\\green0\\blue0;} {\\stylesheet{\\fs20 \\snext0 Normal;}} {\\plain \\fs26 \\strike\\fs26 This is supposed to be strike-through.}{\\plain \\fs26 \\fs26 } {\\plain \\fs26 \\ul\\fs26 Underline text here} {\\plain \\fs26 \\fs26 .{\\u698\\'20}}"), pane.getDocument(), 0); kitRtf = null; StyledEditorKit kitHtml = (StyledEditorKit) pane.getEditorKitForContentType("text/html"); Writer writer = new StringWriter(); kitHtml.write(writer, pane.getDocument(), 0, pane.getDocument().getLength()); System.out.println(writer.toString()); } catch (Exception e) { e.printStackTrace(); } }
Output:
<html> <head> <style> <!-- p.Normal { RightIndent:0.0; FirstLineIndent:0.0; LeftIndent:0.0; } --> </style> </head> <body> <p class=default> <span style="color: #000000; font-size: 13pt; font-family: Times New Roman"> This is supposed to be strike-through. </span> <span style="color: #000000; font-size: 13pt; font-family: Times New Roman"> </span> <span style="color: #000000; font-size: 13pt; font-family: Times New Roman"> <u>Underline text here</u> </span> <span style="color: #000000; font-size: 13pt; font-family: Times New Roman"> .? </span> </p> </body> </html>