How to extract text data from MS-Word doc file
See the following:
http://msdn.microsoft.com/en-us/library/cc974107%28office.12%29.aspx
Thomas
i am developer. i am working with .Net technology (v1.1 & v2.0) last 4 year. i like this forum for fast & good response and that is why i joined this forum. my friends profile id Mou :- http://stackoverflow.com/users/728750/user728750?tab=questions and Keith :- http://stackoverflow.com/users/750398/keith-costa thanks
Updated on June 04, 2022Comments
-
Thomas almost 2 years
i am developing a resume archive where people upload their resume and that resume will be saved in a specific location. the most important things is people may use any version of MS-word to prepare their resume and resume file extension could be doc or docx. so i just like to know is there any free library available which i can use to extract text data from doc or docx file which will work in case of all ms-word version and also work if ms-word is not install in pc. i search google and found some article to extract text data from doc file but i am not sure does they work in case of all ms-word version. so please guide me with info that which library i should use to extract data from ms-word irrespective of ms-word version also give me some good article link on this issue.
also guide me is there any viewer available which i can use to show doc file content from my c# apps irrespective of ms-word version. thanks
i got the answer
**Need to add this reference Microsoft.Office.Interop.Word** using System.Runtime.InteropServices.ComTypes; using System.IO; public static string GetText(string strfilename) { string strRetval = ""; System.Text.StringBuilder strBuilder = new System.Text.StringBuilder(); if (File.Exists(strfilename)) { try { using (StreamReader sr = File.OpenText(strfilename)) { string s = ""; while ((s = sr.ReadLine()) != null) { strBuilder.AppendLine(s); } } } catch (Exception ex) { SendErrorMail(ex); } finally { if (System.IO.File.Exists(strfilename)) System.IO.File.Delete(strfilename); } } if (strBuilder.ToString().Trim() != "") strRetval = strBuilder.ToString(); else strRetval = ""; return strRetval; } public static string SaveAsText(string strfilename) { string fileName = ""; object miss = System.Reflection.Missing.Value; Microsoft.Office.Interop.Word.Document doc = null; try { Microsoft.Office.Interop.Word.Application wordApp = new Microsoft.Office.Interop.Word.Application(); fileName = Path.GetDirectoryName(strfilename) + @"\" + Path.GetFileNameWithoutExtension(strfilename) + ".txt"; doc = wordApp.Documents.Open(strfilename, false); doc.SaveAs(fileName, Microsoft.Office.Interop.Word.WdSaveFormat.wdFormatDOSText); } catch (Exception ex) { SendErrorMail(ex); } finally { if (doc != null) { doc.Close(ref miss, ref miss, ref miss); System.Runtime.InteropServices.Marshal.ReleaseComObject(doc); doc = null; } GC.Collect(); GC.WaitForPendingFinalizers(); } return fileName; }