Replace multiple newlines with single newlines during reading file

16,845

Solution 1

You could use a second regex to replace multiple new lines with a single new line and use strip to get rid of the last new line.

import os
import re

files=[]
pars=[]

for i in os.listdir('path_to_dir_with_files'):
    files.append(i)

for f in files:
    with open('path_to_dir_with_files/'+str(f), 'r') as a:
        word = re.sub(r'someword=|\,.*|\#.*','', a.read())
        word = re.sub(r'\n+', '\n', word).strip()
        pars.append(word)

for k in pars:
   print k

Solution 2

Without changing your code much, one easy way would just be to check if the line is empty before you print it, e.g.:

import os
import re

files=[]
pars=[]

for i in os.listdir('path_to_dir_with_files'):
    files.append(i)

for f in files:
    with open('path_to_dir_with_files'+str(f), 'r') as a:
        pars.append(re.sub('someword=|\,.*|\#.*','',a.read()))

for k in pars:
    if not k.strip() == "":
        print k

*** EDIT Since each element in pars is actually the entire content of the file (not just a line), you need to go through an replace any double end lines, easiest to do with re

import os
import re

files=[]
pars=[]

for i in os.listdir('path_to_dir_with_files'):
    files.append(i)

for f in files:
    with open('path_to_dir_with_files'+str(f), 'r') as a:
        pars.append(re.sub('someword=|\,.*|\#.*','',a.read()))

for k in pars:
    k = re.sub(r"\n+", "\n", k)
    if not k.strip() == "":
        print k

Note that this doesn't take care of the case where a file ends with a newline and the next one begins with one - if that's a case you are worried about you need to either add extra logic to deal with it or change the way you're reading the data in

Share:
16,845
user54
Author by

user54

Updated on June 16, 2022

Comments

  • user54
    user54 almost 2 years

    I have the next code which reads from multiple files, parses obtained lines and prints the result:

    import os
    import re
    
    files=[]
    pars=[]
    
    for i in os.listdir('path_to_dir_with_files'):
        files.append(i)
    
    for f in files:
        with open('path_to_dir_with_files'+str(f), 'r') as a:
           pars.append(re.sub('someword=|\,.*|\#.*','',a.read()))
    
    for k in pars:
       print k
    

    But I have problem with multiple new lines in output:

    test1
    
    
    test2
    

    Instead of it I want to obtain the next result without empty lines in output:

     test1
     test2
    

    and so on.

    I tried playing with regexp:

    pars.append(re.sub('someword=|\,.*|\#.*|^\n$','',a.read()))
    

    But it doesn't work. Also I tried using strip() and rstrip() including replace. It also doesn't work.

  • Patrick Haugh
    Patrick Haugh about 7 years
    or just if k.strip()
  • vallentin
    vallentin about 7 years
    This should also be done while adding to pars and not when iterating over pars.
  • user54
    user54 about 7 years
    Unfortunately it didn't give an appropriate result. In case of if not k.strip() == "" I still obtain multiple empty lines. If displaying just list without iterating through it I obtain: test1[]\n\n\n test2\n test5\ntest7[]\ntest[*]\n etc...
  • Kewl
    Kewl about 7 years
    Oh I see, because you are just reading the entire line into each item in pars, so it isn't printing line by line. I edited my answer, it just uses regular expressions to go through and get rid of any duplicate \n with a single \n
  • Yuri Olive
    Yuri Olive over 4 years
    This won't work if the file contains more than 2 consecutive "\n" like "whatever\nmay\n\n\nhappen"
  • vincent-lg
    vincent-lg over 4 years
    It's true, but still could do with a loop: while "\n\n" in text: text = text.replace("\n\n", "\n")
  • amcgregor
    amcgregor over 4 years
    This form of 'elision' is fragile and requires adaption based on the length of the desired run. E.g. desiring two newlines between "paragraphs" would require three .replace("\n\n\n", "\n\n") calls. Iterative reconstruction means a duplication of the entire string per iteration. Regular expressions can far more easily combine actual measured runs of repeating characters, with explicit control over run length: \n{min,max}, and perform such an operation in, essentially, O(1) time without excessive memory duplication.
  • Timo
    Timo over 3 years
    Could you do this line-wise, not file-wise? Like for line in f: And can you explain what the re.sub does? Comma and hash are escaped, I do not understand the someword=. There is no = in the example..
  • Kris
    Kris over 3 years
    Sure you can do it line-wise but f is the filename in this case not the content. re.sub replaces stuff that matches the first argument with whatever you put in the second argument. Check the docs and try it out.
  • Admin
    Admin about 2 years
    Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.