How to load text of MS Word document in C# (.NET)?

22,234

Solution 1

If you are dealing with docx you can do this with out doing any interop with Word .docx file actually a ZIP contains an XML file , you can read the XML Please refer the below links

http://conceptdev.blogspot.com/2007/03/open-docx-using-c-to-extract-text-for.html

Office (2007) Open XML File Formats

Solution 2

For docx formatted Word Documents I found this interesting article on The CodeProject

Using DocxToText to Extract Text from DOCX Files

In the article the author discusses stripping out just the words themselves.

For your doc (non-docx) Word Documents other than using the Office APIs and (in the background) spawning an instance of Word you could try shelling out to one of the many different Doc2Docx converters on the market and then applying the above process for both.

Solution 3

I recently did some research on this topic. It turns out that to be able to manipulate word files programatically without opening word itself you need some very expensive tools.

There's an article over at code project on manipulating Word, you might find it useful. The author build a C# COM wrapper for dealing with calls to Word. It looks like it actually pops open the word application though.

This post over at the neowin forums looks promising too. It includes quite a few PInvoked calls for the purpose of text extraction.

Maybe if you could find a way to keep the window hidden it would be acceptable.

Share:
22,234
user2120901
Author by

user2120901

Updated on September 23, 2020

Comments

  • user2120901
    user2120901 over 3 years

    How do I load MS Word document (.doc and .docx) to memory (variable) without doing this?:

    wordApp.Documents.Open

    I don't want to open MS Word, I just want that text inside.

    You gave me answer for DOCX, but what about DOC? I want free and high performance solution - not to open 12.000 instances of Word to process all of them. :( Aspose is commercial product, and 900$ is a way too much for what I do.

  • user2120901
    user2120901 over 15 years
    Is there any free doc to docx solution?
  • user2120901
    user2120901 over 15 years
    free library, -> Aspose: US$899
  • user2120901
    user2120901 over 15 years
    If I want to process 12.000 word documents every day.. Guess why I don't want to open 12000 instances of Word ..