Reading a PDF File using iText5 for .NET

26,818

Try this, use the LocationTextExtractionStrategy instead of the SimpleTextExtractionStrategy it will add new line characters to the text returned. Then you can use strText.Split('\n') to split your text into a string[] and consume it on a per line basis.

Share:
26,818
Mark
Author by

Mark

Updated on October 31, 2020

Comments

  • Mark
    Mark over 3 years

    I'm using C# as programming platform and iTextSharp to read PDF content. I have used the below code to read the content but it seems it read per page.

            public string ReadPdfFile(object Filename)
            {
    
                string strText = string.Empty;
                try
                {
                    PdfReader reader = new PdfReader((string)Filename);
    
                    for (int page = 1; page <= reader.NumberOfPages; page++)
                    {
                        ITextExtractionStrategy its = new iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy();
                        String s = PdfTextExtractor.GetTextFromPage(reader, page, its);
    
                        s = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(s)));
                        strText = strText + s;
    
                    }
                    reader.Close();
                }
                catch (Exception ex)
                {
                    MessageBox.Show(ex.Message);
                }
                return strText;
            }
    

    Can anyone help me on how can I write a code reading pdf content per line?