Converting rtf to html with format in Java

10,149

You could try converting with OpenOffice or LibreOffice using this converter library as described in this blog post

Share:
10,149
Admin
Author by

Admin

Updated on June 05, 2022

Comments

  • Admin
    Admin almost 2 years

    I can use JEditorPane to parse the rtf text and convert it to html. But the html output is missing some format, namely the strike-through markups in this case. As you can see in the output, underline text was correctly wrapped within <u> but there is no strike-through wrapper. Any idea?

    public void testRtfToHtml()
    {
        JEditorPane pane = new JEditorPane();
        pane.setContentType("text/rtf");
    
        StyledEditorKit kitRtf = (StyledEditorKit) pane.getEditorKitForContentType("text/rtf");
    
        try
        {
            kitRtf.read(
                new StringReader(
                    "{\\rtf1\\ansi \\deflang1033\\deff0{\\fonttbl {\\f0\\froman \\fcharset0 \\fprq2 Times New Roman;}}{\\colortbl;\\red0\\green0\\blue0;} {\\stylesheet{\\fs20 \\snext0 Normal;}} {\\plain \\fs26 \\strike\\fs26 This is supposed to be strike-through.}{\\plain \\fs26 \\fs26  } {\\plain \\fs26 \\ul\\fs26 Underline text here} {\\plain \\fs26 \\fs26 .{\\u698\\'20}}"),
                pane.getDocument(), 0);
            kitRtf = null;
    
            StyledEditorKit kitHtml =
                (StyledEditorKit) pane.getEditorKitForContentType("text/html");
    
            Writer writer = new StringWriter();
            kitHtml.write(writer, pane.getDocument(), 0, pane.getDocument().getLength());
            System.out.println(writer.toString());
        }
        catch (Exception e)
        {
            e.printStackTrace();
        }
    }
    

    Output:

    <html>
      <head>
        <style>
          <!--
            p.Normal {
              RightIndent:0.0;
              FirstLineIndent:0.0;
              LeftIndent:0.0;
            }
          -->
        </style>
      </head>
      <body>
        <p class=default>
                  <span style="color: #000000; font-size: 13pt; font-family: Times New Roman">
    This is supposed to be strike-through.
          </span>
          <span style="color: #000000; font-size: 13pt; font-family: Times New Roman">
    
          </span>
           <span style="color: #000000; font-size: 13pt; font-family: Times New Roman">
    <u>Underline text here</u>
          </span>
           <span style="color: #000000; font-size: 13pt; font-family: Times New Roman">
    .?
          </span>
    
        </p>
      </body>
    </html>