Replace multiple newlines with single newlines during reading file

python regex file

16,845

Solution 1

You could use a second regex to replace multiple new lines with a single new line and use strip to get rid of the last new line.

import os
import re

files=[]
pars=[]

for i in os.listdir('path_to_dir_with_files'):
    files.append(i)

for f in files:
    with open('path_to_dir_with_files/'+str(f), 'r') as a:
        word = re.sub(r'someword=|\,.*|\#.*','', a.read())
        word = re.sub(r'\n+', '\n', word).strip()
        pars.append(word)

for k in pars:
   print k

Solution 2

Without changing your code much, one easy way would just be to check if the line is empty before you print it, e.g.:

import os
import re

files=[]
pars=[]

for i in os.listdir('path_to_dir_with_files'):
    files.append(i)

for f in files:
    with open('path_to_dir_with_files'+str(f), 'r') as a:
        pars.append(re.sub('someword=|\,.*|\#.*','',a.read()))

for k in pars:
    if not k.strip() == "":
        print k

*** EDIT Since each element in pars is actually the entire content of the file (not just a line), you need to go through an replace any double end lines, easiest to do with re

import os
import re

files=[]
pars=[]

for i in os.listdir('path_to_dir_with_files'):
    files.append(i)

for f in files:
    with open('path_to_dir_with_files'+str(f), 'r') as a:
        pars.append(re.sub('someword=|\,.*|\#.*','',a.read()))

for k in pars:
    k = re.sub(r"\n+", "\n", k)
    if not k.strip() == "":
        print k

Note that this doesn't take care of the case where a file ends with a newline and the next one begins with one - if that's a case you are worried about you need to either add extra logic to deal with it or change the way you're reading the data in

16,845

Author by

user54

Updated on June 16, 2022

Comments

user54 almost 2 years
I have the next code which reads from multiple files, parses obtained lines and prints the result:
```
import os
import re

files=[]
pars=[]

for i in os.listdir('path_to_dir_with_files'):
    files.append(i)

for f in files:
    with open('path_to_dir_with_files'+str(f), 'r') as a:
       pars.append(re.sub('someword=|\,.*|\#.*','',a.read()))

for k in pars:
   print k
```
But I have problem with multiple new lines in output:
```
test1


test2
```
Instead of it I want to obtain the next result without empty lines in output:
```
 test1
 test2
```
and so on.

I tried playing with regexp:
```
pars.append(re.sub('someword=|\,.*|\#.*|^\n$','',a.read()))
```
But it doesn't work. Also I tried using strip() and rstrip() including replace. It also doesn't work.
Patrick Haugh about 7 years

or just if k.strip()
vallentin about 7 years

This should also be done while adding to pars and not when iterating over pars.
user54 about 7 years

Unfortunately it didn't give an appropriate result. In case of if not k.strip() == "" I still obtain multiple empty lines. If displaying just list without iterating through it I obtain: test1[]\n\n\n test2\n test5\ntest7[]\ntest[*]\n etc...
Kewl about 7 years

Oh I see, because you are just reading the entire line into each item in pars, so it isn't printing line by line. I edited my answer, it just uses regular expressions to go through and get rid of any duplicate \n with a single \n
Yuri Olive over 4 years

This won't work if the file contains more than 2 consecutive "\n" like "whatever\nmay\n\n\nhappen"
vincent-lg over 4 years

It's true, but still could do with a loop: while "\n\n" in text: text = text.replace("\n\n", "\n")
amcgregor over 4 years

This form of 'elision' is fragile and requires adaption based on the length of the desired run. E.g. desiring two newlines between "paragraphs" would require three .replace("\n\n\n", "\n\n") calls. Iterative reconstruction means a duplication of the entire string per iteration. Regular expressions can far more easily combine actual measured runs of repeating characters, with explicit control over run length: \n{min,max}, and perform such an operation in, essentially, O(1) time without excessive memory duplication.
Timo over 3 years

Could you do this line-wise, not file-wise? Like for line in f: And can you explain what the re.sub does? Comma and hash are escaped, I do not understand the someword=. There is no = in the example..
Kris over 3 years

Sure you can do it line-wise but f is the filename in this case not the content. re.sub replaces stuff that matches the first argument with whatever you put in the second argument. Check the docs and try it out.
Admin about 2 years

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.