Match multiline regex in file object
Solution 1
You can read the data from the file object into a string with ifile.read()
Solution 2
times = [match.group(1) for match in pattern.finditer(ifile.read())]
finditer
yield MatchObjects
. If the regex doesn't match anything times
will be an empty list.
You can also modify your regex to use non-capturing groups for storeU
, storeI
, iIx
and avgCI
, then pattern.findall
will contain only matched times.
Note: naming variable time
might shadow standard library module. times
would be a better option.
Solution 3
Why don't you read the whole file into a buffer using
buffer = open("data.txt").read()
and then do a search with that?
williamx
Updated on March 15, 2020Comments
-
williamx about 4 years
How can I extract the groups from this regex from a file object (data.txt)?
import numpy as np import re import os ifile = open("data.txt",'r') # Regex pattern pattern = re.compile(r""" ^Time:(\d{2}:\d{2}:\d{2}) # Time: 12:34:56 at beginning of line \r{2} # Two carriage return \D+ # 1 or more non-digits storeU=(\d+\.\d+) \s uIx=(\d+) \s storeI=(-?\d+.\d+) \s iIx=(\d+) \s avgCI=(-?\d+.\d+) """, re.VERBOSE | re.MULTILINE) time = []; for line in ifile: match = re.search(pattern, line) if match: time.append(match.group(1))
The problem in the last part of the code, is that I iterate line by line, which obviously doesn't work with multiline regex. I have tried to use
pattern.finditer(ifile)
like this:for match in pattern.finditer(ifile): print match
... just to see if it works, but the finditer method requires a string or buffer.
I have also tried this method, but can't get it to work
matches = [m.groups() for m in pattern.finditer(ifile)]
Any idea?
After comment from Mike and Tuomas, I was told to use .read().. Something like this:
ifile = open("data.txt",'r').read()
This works fine, but would this be the correct way to search through the file? Can't get it to work...
for i in pattern.finditer(ifile): match = re.search(pattern, i) if match: time.append(match.group(1))
Solution
# Open file as file object and read to string ifile = open("data.txt",'r') # Read file object to string text = ifile.read() # Close file object ifile.close() # Regex pattern pattern_meas = re.compile(r""" ^Time:(\d{2}:\d{2}:\d{2}) # Time: 12:34:56 at beginning of line \n{2} # Two newlines \D+ # 1 or more non-digits storeU=(\d+\.\d+) # Decimal-number \s uIx=(\d+) # Fetch uIx-variable \s storeI=(-?\d+.\d+) # Fetch storeI-variable \s iIx=(\d+) # Fetch iIx-variable \s avgCI=(-?\d+.\d+) # Fetch avgCI-variable """, re.VERBOSE | re.MULTILINE) file_times = open("output_times.txt","w") for match in pattern_meas.finditer(text): output = "%s,\t%s,\t\t%s,\t%s,\t\t%s,\t%s\n" % (match.group(1), match.group(2), match.group(3), match.group(4), match.group(5), match.group(6)) file_times.write(output) file_times.close()
Maybe it can be written more compact and pythonic though....
-
williamx about 14 yearsSeems like the correct way to do it! But I still have some problems with the search though...
-
williamx about 14 yearsI get the correct result by checking out the match.group(n) where n goes from 1 to 6. That means the regex works. But I don't get any results from the expression You provided, only an empty list. I have tried it on a text-string, which works fine, so it's probably the ifile.read() which doesn't work. Any tips?
-
SilentGhost about 14 years@william: you need to post example of your subject string and probably do this in another question.
-
williamx about 14 yearsThis solution seems to give a problem with closing the file.. Maybe it's not important to do that though :)