Python using re module to parse an imported text file

26,086

This should do the trick, check comments for explanation about what Im doing here =) Good luck

import re
filename = 'sliceeverfile3.txt'
pattern  = '\d\d,\d\d,\d\d,\d\d,\d\d,\d\d,\d\d'
new_file = []

# Make sure file gets closed after being iterated
with open(filename, 'r') as f:
   # Read the file contents and generate a list with each line
   lines = f.readlines()

# Iterate each line
for line in lines:

    # Regex applied to each line 
    match = re.search(pattern, line)
    if match:
        # Make sure to add \n to display correctly when we write it back
        new_line = match.group() + '\n'
        print new_line
        new_file.append(new_line)

with open(filename, 'w') as f:
     # go to start of file
     f.seek(0)
     # actually write the lines
     f.writelines(new_file)
Share:
26,086
user1478335
Author by

user1478335

Updated on May 13, 2020

Comments

  • user1478335
    user1478335 almost 4 years
    def regexread():
        import re
    
        result = ''
        savefileagain = open('sliceeverfile3.txt','w')
    
        #text=open('emeverslicefile4.txt','r')
        text='09,11,14,34,44,10,11,  27886637,    0\n561, Tue, 5,Feb,2013, 06,25,31,40,45,06,07,  19070109,    0\n560, Fri, 1,Feb,2013, 05,21,34,37,38,01,06,  13063500,    0\n559, Tue,29,Jan,2013,'
    
        pattern='\d\d,\d\d,\d\d,\d\d,\d\d,\d\d,\d\d'
        #with open('emeverslicefile4.txt') as text:     
        f = re.findall(pattern,text)
    
        for item in f:
            print(item)
    
        savefileagain.write(item)
        #savefileagain.close()
    

    The above function as written parses the text and returns sets of seven numbers. I have three problems.

    1. Firstly the 'read' file which contains exactly the same text as text='09,...etc' returns a TypeError expected string or buffer, which I cannot solve even by reading some of the posts.
    2. Secondly, when I try to write results to the 'write' file, nothing is returned and
    3. thirdly, I am not sure how to get the same output that I get with the print statement, which is three lines of seven numbers each which is the output that I want.

    This is the first time that I have used regex, so be gentle please!