How to extract text data from MS-Word doc file

10,649

See the following:

http://msdn.microsoft.com/en-us/library/cc974107%28office.12%29.aspx

How can i read .docx file?

Share:
10,649
Thomas
Author by

Thomas

i am developer. i am working with .Net technology (v1.1 & v2.0) last 4 year. i like this forum for fast & good response and that is why i joined this forum. my friends profile id Mou :- http://stackoverflow.com/users/728750/user728750?tab=questions and Keith :- http://stackoverflow.com/users/750398/keith-costa thanks

Updated on June 04, 2022

Comments

  • Thomas
    Thomas almost 2 years

    i am developing a resume archive where people upload their resume and that resume will be saved in a specific location. the most important things is people may use any version of MS-word to prepare their resume and resume file extension could be doc or docx. so i just like to know is there any free library available which i can use to extract text data from doc or docx file which will work in case of all ms-word version and also work if ms-word is not install in pc. i search google and found some article to extract text data from doc file but i am not sure does they work in case of all ms-word version. so please guide me with info that which library i should use to extract data from ms-word irrespective of ms-word version also give me some good article link on this issue.

    also guide me is there any viewer available which i can use to show doc file content from my c# apps irrespective of ms-word version. thanks

    i got the answer

    **Need to add this reference Microsoft.Office.Interop.Word**
    
    using System.Runtime.InteropServices.ComTypes;
    using System.IO;
    
           public static string GetText(string strfilename)
        {
            string strRetval = "";
            System.Text.StringBuilder strBuilder = new System.Text.StringBuilder();
            if (File.Exists(strfilename))
            {
                try
                {
                    using (StreamReader sr = File.OpenText(strfilename))
                    {
                        string s = "";
                        while ((s = sr.ReadLine()) != null)
                        {
                            strBuilder.AppendLine(s);
                        }
                    }
                }
                catch (Exception ex)
                {
                    SendErrorMail(ex);
                }
                finally
                {
                    if (System.IO.File.Exists(strfilename))
                        System.IO.File.Delete(strfilename);
                }
            }
    
            if (strBuilder.ToString().Trim() != "")
                strRetval = strBuilder.ToString();
            else
                strRetval = "";
    
            return strRetval;
        }
    
        public static string SaveAsText(string strfilename)
        {
            string fileName = "";
            object miss = System.Reflection.Missing.Value;
            Microsoft.Office.Interop.Word.Document doc = null;
            try
            {
                Microsoft.Office.Interop.Word.Application wordApp = new Microsoft.Office.Interop.Word.Application();
                fileName = Path.GetDirectoryName(strfilename) + @"\" + Path.GetFileNameWithoutExtension(strfilename) + ".txt";
                doc = wordApp.Documents.Open(strfilename, false);
                doc.SaveAs(fileName, Microsoft.Office.Interop.Word.WdSaveFormat.wdFormatDOSText);
    
            }
            catch (Exception ex)
            {
    
                SendErrorMail(ex);
            }
            finally
            {
                if (doc != null)
                {
                    doc.Close(ref miss, ref miss, ref miss);
                    System.Runtime.InteropServices.Marshal.ReleaseComObject(doc);
                    doc = null;
                }
                GC.Collect();
                GC.WaitForPendingFinalizers();
            }
            return fileName;
        }