Python Regex to match a string as a pattern and return number

10,861

Solution 1

Try this one out:

((\S+)\s+SUCCESS Number of wins : (\d+))

These are the results:

>>> regex = re.compile("((\S+)\s+SUCCESS Number of wins : (\d+))")
>>> r = regex.search(string)
>>> r
<_sre.SRE_Match object at 0xc827cf478a56b350>
>>> regex.match(string)
<_sre.SRE_Match object at 0xc827cf478a56b228>

# List the groups found
>>> r.groups()
(u'TheBears SUCCESS Number of wins : 14', u'TheBears', u'14')

# List the named dictionary objects found
>>> r.groupdict()
{}

# Run findall
>>> regex.findall(string)
[(u'TheBears SUCCESS Number of wins : 14', u'TheBears', u'14')]
# So you can do this for the name and number:
>>> fullstring, name, number = r.groups()

If you don't need the full string just remove the surround parenthesis.

Solution 2

I believe that there is no actual need to use a regex here. So you can use the following code if it acceptable for you(note that i have posted it so you will have ability to have another one option):

dict((line[:line.lower().index('success')+1], line[line.lower().index('wins:') + 6:]) for line in text.split('\n') if 'success' in line.lower())

OR in case of you are sure that all words are splitted by single spaces:

output={}
for line in text:
    if 'success' in line.lower():
        words = line.strip().split(' ')
        output[words[0]] = words[-1]

Solution 3

If the text in the middle is always constant, there is no need for a regular expression. The inbuilt string processing functions will be more efficient and easier to develop, debug and maintain. In this case, you can just use the inbuilt split() function to get the pieces, and then clean the two pieces as appropriate:

>>> def winnumber(s):
...     parts = s.split('SUCCESS Number of wins : ')
...     return (parts[0].strip(), int(parts[1]))
... 
>>> winnumber('TheBears      SUCCESS Number of wins : 14')
('TheBears', 14)

Note that I have output the number of wins as an integer (as presumably this will always be a whole number), but you can easily substitute float()- or any other conversion function - for int() if you desire.

Edit: Obviously this will only work for single lines - if you call the function with several lines it will give you errors. To process an entire file, I'd use map():

>>> map(winnumber, open(filename, 'r'))
[('TheBears', 14), ('OtherTeam', 6)]

Also, I'm not sure of your end use for this code, but you might find it easier to work with the outputs as a dictionary:

>>> dict(map(winnumber, open(filename, 'r')))
{'OtherTeam': 6, 'TheBears': 14}
Share:
10,861
Simos Anderson
Author by

Simos Anderson

Updated on June 04, 2022

Comments

  • Simos Anderson
    Simos Anderson almost 2 years

    I have some lines that represent some data in a text file. They are all of the following format:

    s = 'TheBears      SUCCESS Number of wins : 14'
    

    They all begin with the name then whitespace and the text 'SUCCESS Number of wins : ' and finally the number of wins, n1. There are multiple strings each with a different name and value. I am trying to write a program that can parse any of these strings and return the name of the dataset and the numerical value at the end of the string. I am trying to use regular expressions to do this and I have come up with the following:

    import re
    def winnumbers(s):
        pattern = re.compile(r"""(?P<name>.*?)     #starting name
                                 \s*SUCCESS        #whitespace and success
                                 \s*Number\s*of\s*wins  #whitespace and strings
                                 \s*\:\s*(?P<n1>.*?)""",re.VERBOSE)
        match = pattern.match(s)
    
        name = match.group("name")
        n1 = match.group("n1")
    
        return (name, n1)
    

    So far, my program can return the name, but the trouble comes after that. They all have the text "SUCCESS Number of wins : " so my thinking was to find a way to match this text. But I realize that my method of matching an exact substring isn't correct right now. Is there any way to match a whole substring as part of the pattern? I have been reading quite a bit on regular expressions lately but haven't found anything like this. I'm still really new to programming and I appreciate any assistance.

    Eventually, I will use float() to return n1 as a number, but I left that out because it doesn't properly find the number in the first place right now and would only return an error.