Replace multiple newlines with single newlines during reading file
Solution 1
You could use a second regex to replace multiple new lines with a single new line and use strip to get rid of the last new line.
import os
import re
files=[]
pars=[]
for i in os.listdir('path_to_dir_with_files'):
files.append(i)
for f in files:
with open('path_to_dir_with_files/'+str(f), 'r') as a:
word = re.sub(r'someword=|\,.*|\#.*','', a.read())
word = re.sub(r'\n+', '\n', word).strip()
pars.append(word)
for k in pars:
print k
Solution 2
Without changing your code much, one easy way would just be to check if the line is empty before you print it, e.g.:
import os
import re
files=[]
pars=[]
for i in os.listdir('path_to_dir_with_files'):
files.append(i)
for f in files:
with open('path_to_dir_with_files'+str(f), 'r') as a:
pars.append(re.sub('someword=|\,.*|\#.*','',a.read()))
for k in pars:
if not k.strip() == "":
print k
*** EDIT Since each element in pars is actually the entire content of the file (not just a line), you need to go through an replace any double end lines, easiest to do with re
import os
import re
files=[]
pars=[]
for i in os.listdir('path_to_dir_with_files'):
files.append(i)
for f in files:
with open('path_to_dir_with_files'+str(f), 'r') as a:
pars.append(re.sub('someword=|\,.*|\#.*','',a.read()))
for k in pars:
k = re.sub(r"\n+", "\n", k)
if not k.strip() == "":
print k
Note that this doesn't take care of the case where a file ends with a newline and the next one begins with one - if that's a case you are worried about you need to either add extra logic to deal with it or change the way you're reading the data in
user54
Updated on June 16, 2022Comments
-
user54 almost 2 years
I have the next code which reads from multiple files, parses obtained lines and prints the result:
import os import re files=[] pars=[] for i in os.listdir('path_to_dir_with_files'): files.append(i) for f in files: with open('path_to_dir_with_files'+str(f), 'r') as a: pars.append(re.sub('someword=|\,.*|\#.*','',a.read())) for k in pars: print k
But I have problem with multiple new lines in output:
test1 test2
Instead of it I want to obtain the next result without empty lines in output:
test1 test2
and so on.
I tried playing with regexp:
pars.append(re.sub('someword=|\,.*|\#.*|^\n$','',a.read()))
But it doesn't work. Also I tried using strip() and rstrip() including replace. It also doesn't work.
-
Patrick Haugh about 7 yearsor just
if k.strip()
-
vallentin about 7 yearsThis should also be done while adding to
pars
and not when iterating overpars
. -
user54 about 7 yearsUnfortunately it didn't give an appropriate result. In case of if not k.strip() == "" I still obtain multiple empty lines. If displaying just list without iterating through it I obtain: test1[]\n\n\n test2\n test5\ntest7[]\ntest[*]\n etc...
-
Kewl about 7 yearsOh I see, because you are just reading the entire line into each item in pars, so it isn't printing line by line. I edited my answer, it just uses regular expressions to go through and get rid of any duplicate \n with a single \n
-
Yuri Olive over 4 yearsThis won't work if the file contains more than 2 consecutive "\n" like "whatever\nmay\n\n\nhappen"
-
vincent-lg over 4 yearsIt's true, but still could do with a loop:
while "\n\n" in text: text = text.replace("\n\n", "\n")
-
amcgregor over 4 yearsThis form of 'elision' is fragile and requires adaption based on the length of the desired run. E.g. desiring two newlines between "paragraphs" would require three
.replace("\n\n\n", "\n\n")
calls. Iterative reconstruction means a duplication of the entire string per iteration. Regular expressions can far more easily combine actual measured runs of repeating characters, with explicit control over run length:\n{min,max}
, and perform such an operation in, essentially, O(1) time without excessive memory duplication. -
Timo over 3 yearsCould you do this line-wise, not file-wise? Like for line in f: And can you explain what the re.sub does? Comma and hash are escaped, I do not understand the someword=. There is no = in the example..
-
Kris over 3 yearsSure you can do it line-wise but
f
is the filename in this case not the content.re.sub
replaces stuff that matches the first argument with whatever you put in the second argument. Check the docs and try it out. -
Admin about 2 yearsYour answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.