Match multiline regex in file object

22,015

Solution 1

You can read the data from the file object into a string with ifile.read()

Solution 2

times = [match.group(1) for match in pattern.finditer(ifile.read())]

finditer yield MatchObjects. If the regex doesn't match anything times will be an empty list.

You can also modify your regex to use non-capturing groups for storeU, storeI, iIx and avgCI, then pattern.findall will contain only matched times.

Note: naming variable time might shadow standard library module. times would be a better option.

Solution 3

Why don't you read the whole file into a buffer using

buffer = open("data.txt").read()

and then do a search with that?

Share:
22,015
williamx
Author by

williamx

Updated on March 15, 2020

Comments

  • williamx
    williamx about 4 years

    How can I extract the groups from this regex from a file object (data.txt)?

    import numpy as np
    import re
    import os
    ifile = open("data.txt",'r')
    
    # Regex pattern
    pattern = re.compile(r"""
                    ^Time:(\d{2}:\d{2}:\d{2})   # Time: 12:34:56 at beginning of line
                    \r{2}                       # Two carriage return
                    \D+                         # 1 or more non-digits
                    storeU=(\d+\.\d+)
                    \s
                    uIx=(\d+)
                    \s
                    storeI=(-?\d+.\d+)
                    \s
                    iIx=(\d+)
                    \s
                    avgCI=(-?\d+.\d+)
                    """, re.VERBOSE | re.MULTILINE)
    
    time = [];
    
    for line in ifile:
        match = re.search(pattern, line)
        if match:
            time.append(match.group(1))
    

    The problem in the last part of the code, is that I iterate line by line, which obviously doesn't work with multiline regex. I have tried to use pattern.finditer(ifile) like this:

    for match in pattern.finditer(ifile):
        print match
    

    ... just to see if it works, but the finditer method requires a string or buffer.

    I have also tried this method, but can't get it to work

    matches = [m.groups() for m in pattern.finditer(ifile)]
    

    Any idea?


    After comment from Mike and Tuomas, I was told to use .read().. Something like this:

    ifile = open("data.txt",'r').read()
    

    This works fine, but would this be the correct way to search through the file? Can't get it to work...

    for i in pattern.finditer(ifile):
        match = re.search(pattern, i)
        if match:
            time.append(match.group(1))
    

    Solution

    # Open file as file object and read to string
    ifile = open("data.txt",'r')
    
    # Read file object to string
    text = ifile.read()
    
    # Close file object
    ifile.close()
    
    # Regex pattern
    pattern_meas = re.compile(r"""
                    ^Time:(\d{2}:\d{2}:\d{2})   # Time: 12:34:56 at beginning of line
                    \n{2}                       # Two newlines
                    \D+                         # 1 or more non-digits
                    storeU=(\d+\.\d+)           # Decimal-number
                    \s
                    uIx=(\d+)                   # Fetch uIx-variable
                    \s
                    storeI=(-?\d+.\d+)          # Fetch storeI-variable
                    \s
                    iIx=(\d+)                   # Fetch iIx-variable
                    \s
                    avgCI=(-?\d+.\d+)           # Fetch avgCI-variable
                    """, re.VERBOSE | re.MULTILINE)
    
    file_times = open("output_times.txt","w")
    for match in pattern_meas.finditer(text):
        output = "%s,\t%s,\t\t%s,\t%s,\t\t%s,\t%s\n" % (match.group(1), match.group(2), match.group(3), match.group(4), match.group(5), match.group(6))
        file_times.write(output)
    file_times.close()
    

    Maybe it can be written more compact and pythonic though....

  • williamx
    williamx about 14 years
    Seems like the correct way to do it! But I still have some problems with the search though...
  • williamx
    williamx about 14 years
    I get the correct result by checking out the match.group(n) where n goes from 1 to 6. That means the regex works. But I don't get any results from the expression You provided, only an empty list. I have tried it on a text-string, which works fine, so it's probably the ifile.read() which doesn't work. Any tips?
  • SilentGhost
    SilentGhost about 14 years
    @william: you need to post example of your subject string and probably do this in another question.
  • williamx
    williamx about 14 years
    This solution seems to give a problem with closing the file.. Maybe it's not important to do that though :)