How to convert docx to html file using open xml with formatting
PowerTools for Open XML just released a new HtmlConverter module. It now contains an open source, free implementation of a conversion from DOCX to HTML formatted with CSS. The module HtmlConverter.cs supports all paragraph, character, and table styles, fonts and text formatting, numbered and bulleted lists, images, and more. See http://bit.ly/1bclyg9
Sachin
Full Stack Developer I'm currently working as a technical consultant with the particular interest in the web and mobile app development. I also do lots of work in middle-ware applications especially in the implementation of cloud/on-premise integration apps. I've been doing this professionally for around 8+ years. I've been working on different technologies since then. c# is my one of the favorite languages. I like to work with javascript, jquery, css, html etc. Apart from that database like sql-server and mysql are something on which I like to work. I've worked in asp.net for quite some time and also have practical experience on ESB like mule for middleware app development for cloud integration. Email: [email protected]
Updated on July 10, 2022Comments
-
Sachin almost 2 years
I know there are lot of question having same title but I am currently having some issue for them I didn't get the correct way to go.
I am using Open xml sdk 2.5 along with Power tool to convert
.docx
file to.html
file which usesHtmlConverter
class for conversion.I am successfully able to convert the
docx
file into theHtml
file but the problem is, html file doesn't retain the original formatting of the document file. eg. Font-size,color,underline,bold etc doesn't reflect into the html file.Here is my existing code:
public void ConvertDocxToHtml(string fileName) { byte[] byteArray = File.ReadAllBytes(fileName); using (MemoryStream memoryStream = new MemoryStream()) { memoryStream.Write(byteArray, 0, byteArray.Length); using (WordprocessingDocument doc = WordprocessingDocument.Open(memoryStream, true)) { HtmlConverterSettings settings = new HtmlConverterSettings() { PageTitle = "My Page Title" }; XElement html = HtmlConverter.ConvertToHtml(doc, settings); File.WriteAllText(@"E:\Test.html", html.ToStringNewLineOnAttributes()); } } }
So I just want to know if is there any way by which I can retain the formatting in converted HTML file.
I know about some third party APIs which does the same thing. But I would prefer if there any way using open xml or any other open source to do this.
-
Sachin over 10 yearsThis does not not convert formatting, such as paragraph fonts, or character fonts.
-
Иво Недев over 5 years