Python using re module to parse an imported text file
26,086
This should do the trick, check comments for explanation about what Im doing here =) Good luck
import re
filename = 'sliceeverfile3.txt'
pattern = '\d\d,\d\d,\d\d,\d\d,\d\d,\d\d,\d\d'
new_file = []
# Make sure file gets closed after being iterated
with open(filename, 'r') as f:
# Read the file contents and generate a list with each line
lines = f.readlines()
# Iterate each line
for line in lines:
# Regex applied to each line
match = re.search(pattern, line)
if match:
# Make sure to add \n to display correctly when we write it back
new_line = match.group() + '\n'
print new_line
new_file.append(new_line)
with open(filename, 'w') as f:
# go to start of file
f.seek(0)
# actually write the lines
f.writelines(new_file)
Author by
user1478335
Updated on May 13, 2020Comments
-
user1478335 almost 4 years
def regexread(): import re result = '' savefileagain = open('sliceeverfile3.txt','w') #text=open('emeverslicefile4.txt','r') text='09,11,14,34,44,10,11, 27886637, 0\n561, Tue, 5,Feb,2013, 06,25,31,40,45,06,07, 19070109, 0\n560, Fri, 1,Feb,2013, 05,21,34,37,38,01,06, 13063500, 0\n559, Tue,29,Jan,2013,' pattern='\d\d,\d\d,\d\d,\d\d,\d\d,\d\d,\d\d' #with open('emeverslicefile4.txt') as text: f = re.findall(pattern,text) for item in f: print(item) savefileagain.write(item) #savefileagain.close()
The above function as written parses the text and returns sets of seven numbers. I have three problems.
- Firstly the 'read' file which contains exactly the same text as text='09,...etc' returns a
TypeError expected string or buffer
, which I cannot solve even by reading some of the posts. - Secondly, when I try to write results to the 'write' file, nothing is returned and
- thirdly, I am not sure how to get the same output that I get with the print statement, which is three lines of seven numbers each which is the output that I want.
This is the first time that I have used regex, so be gentle please!
- Firstly the 'read' file which contains exactly the same text as text='09,...etc' returns a