Read from word document line by line

87,158

Solution 1

Ok. I found the solution here.


The final code is as follows:

Application word = new Application();
Document doc = new Document();

object fileName = path;
// Define an object to pass to the API for missing parameters
object missing = System.Type.Missing;
doc = word.Documents.Open(ref fileName,
        ref missing, ref missing, ref missing, ref missing,
        ref missing, ref missing, ref missing, ref missing,
        ref missing, ref missing, ref missing, ref missing,
        ref missing, ref missing, ref missing);

String read = string.Empty;
List<string> data = new List<string>();
for (int i = 0; i < doc.Paragraphs.Count; i++)
{
    string temp = doc.Paragraphs[i + 1].Range.Text.Trim();
    if (temp != string.Empty)
        data.Add(temp);
}
((_Document)doc).Close();
((_Application)word).Quit();

GridView1.DataSource = data;
GridView1.DataBind();

Solution 2

The above code is correct, but it's too slow. I have improved the code, and it's much faster than the above one.

List<string> data = new List<string>();
Application app = new Application();
Document doc = app.Documents.Open(ref readFromPath);

foreach (Paragraph objParagraph in doc.Paragraphs)
    data.Add(objParagraph.Range.Text.Trim());

((_Document)doc).Close();
((_Application)app).Quit();

Solution 3

How about this yo. Get all the words from the doc and split them on return or whatever is better for you. Then turn into list

   List<string> lines = doc.Content.Text.Split('\n').ToList();
Share:
87,158

Related videos on Youtube

Bat_Programmer
Author by

Bat_Programmer

Updated on July 05, 2022

Comments

  • Bat_Programmer
    Bat_Programmer almost 2 years


    I'm trying to read a word document using C#. I am able to get all text but I want to be able to read line by line and store in a list and bind to a gridview. Currently my code returns a list of one item only with all text (not line by line as desired). I'm using the Microsoft.Office.Interop.Word library to read the file. Below is my code till now:

        Application word = new Application();
        Document doc = new Document();
    
        object fileName = path;
        // Define an object to pass to the API for missing parameters
        object missing = System.Type.Missing;
        doc = word.Documents.Open(ref fileName,
                ref missing, ref missing, ref missing, ref missing,
                ref missing, ref missing, ref missing, ref missing,
                ref missing, ref missing, ref missing, ref missing,
                ref missing, ref missing, ref missing);
    
        String read = string.Empty;
        List<string> data = new List<string>();
        foreach (Range tmpRange in doc.StoryRanges)
        {
            //read += tmpRange.Text + "<br>";
            data.Add(tmpRange.Text);
        }
        ((_Document)doc).Close();
        ((_Application)word).Quit();
    
        GridView1.DataSource = data;
        GridView1.DataBind();
    
    • Doug Hauf
      Doug Hauf over 10 years
      Is this all of the code that is listed above. I am going to be starting on a project this weekend that will read in a word file and then take out all of the code between double quots and insert a variable "A," he said. Then I have to replaced the part after the comma with "A," B. for a writer that wants to do some statistics on his code. I will put my code up for all to see. Are there any special imports that have to be done?
    • Hamdi
      Hamdi about 10 years
      I would use a lightweight library like DocX docx.codeplex.com .
    • Bat_Programmer
      Bat_Programmer about 10 years
      @Hamdi thanks I didnt know about that. I have tried and it sure is simple to use as opposed to Interop. Thanks once again.
    • John Saunders
      John Saunders about 9 years
      It is a horrible idea to use Office Interop from ASP.NET or another server technology. These APIs were written for use in a desktop application, for automating Office (a suite of desktop applications). Server applications are different in many ways that make it a very, very bad idea to use Office Interop in them. It's also unsupported by Microsoft, and may violate your Office license. See Considerations for server-side Automation of Office
  • Shyam Dixit
    Shyam Dixit over 10 years
    In my code @ open method it is showing that path is not valid and some 'COMException was not handled '
  • John Saunders
    John Saunders about 9 years
    It is a horrible idea to use Office Interop from ASP.NET or another server technology. These APIs were written for use in a desktop application, for automating Office (a suite of desktop applications). Server applications are different in many ways that make it a very, very bad idea to use Office Interop in them. It's also unsupported by Microsoft, and may violate your Office license. See Considerations for server-side Automation of Office
  • thang
    thang over 6 years
    its \r\a, but \r would do, not \n
  • Dan
    Dan about 2 years
    @thang AFAIK \r\a indicates the end of a table cell, and \r is the end of a line.