How to load text of MS Word document in C# (.NET)?
Solution 1
If you are dealing with docx you can do this with out doing any interop with Word .docx file actually a ZIP contains an XML file , you can read the XML Please refer the below links
http://conceptdev.blogspot.com/2007/03/open-docx-using-c-to-extract-text-for.html
Office (2007) Open XML File Formats
Solution 2
For docx formatted Word Documents I found this interesting article on The CodeProject
Using DocxToText to Extract Text from DOCX Files
In the article the author discusses stripping out just the words themselves.
For your doc (non-docx) Word Documents other than using the Office APIs and (in the background) spawning an instance of Word you could try shelling out to one of the many different Doc2Docx converters on the market and then applying the above process for both.
Solution 3
I recently did some research on this topic. It turns out that to be able to manipulate word files programatically without opening word itself you need some very expensive tools.
There's an article over at code project on manipulating Word, you might find it useful. The author build a C# COM wrapper for dealing with calls to Word. It looks like it actually pops open the word application though.
This post over at the neowin forums looks promising too. It includes quite a few PInvoked calls for the purpose of text extraction.
Maybe if you could find a way to keep the window hidden it would be acceptable.
user2120901
Updated on September 23, 2020Comments
-
user2120901 over 3 years
How do I load MS Word document (.doc and .docx) to memory (variable) without doing this?:
wordApp.Documents.Open
I don't want to open MS Word, I just want that text inside.
You gave me answer for DOCX, but what about DOC? I want free and high performance solution - not to open 12.000 instances of Word to process all of them. :( Aspose is commercial product, and 900$ is a way too much for what I do.
-
user2120901 over 15 yearsIs there any free doc to docx solution?
-
user2120901 over 15 yearsfree library, -> Aspose: US$899
-
user2120901 over 15 yearsIf I want to process 12.000 word documents every day.. Guess why I don't want to open 12000 instances of Word ..