Build a dictionary from successful regex matches in python
Solution 1
Thanks for the replies. After putting them together I got
file_data = open('x:\\path\\to\\file','r').read()
my_list = re.findall(pattern, file_data, re.MULTILINE)
my_dict = {c:b for a,b,c in my_list}
but I don't think I could have gotten there today without the help.
Solution 2
Here's some quick'n'dirty optimisations to your code:
my_dict = dict()
with open(r'x:\path\to\file', 'r') as data:
for line in data:
match = re.search(pattern, line)
if match:
one_tuple = match.group(3, 2)
my_dict[one_tuple[0]] = one_tuple[1]
Solution 3
In the spirit of EAFP I'd suggest
with open(r'x:\path\to\file', 'r') as data:
for line in data:
try:
m = re.search(pattern, line)
my_dict[m.group(2)] = m.group(3)
except AttributeError:
pass
Another way is to keep using lists, but redesign the pattern so that it contains only two groups (key, value)
. Then you could simply do:
matches = [re.findall(pattern, line) for line in data]
mydict = dict(x[0] for x in matches if x)
Solution 4
matchRes = pattern.match(line)
if matchRes:
my_dict = matchRes.groupdict()
Solution 5
I'm not sure I'd recommend it, but here's a way you could try to use a comprehension instead(I substituted a string for the file for simplicity)
>>> import re
>>> data = """1foo bar
... 2bing baz
... 3spam eggs
... nomatch
... """
>>> pattern = r"(.)(\w+)\s(\w+)"
>>> {x[0]: x[1] for x in (m.group(3, 2) for m in (re.search(pattern, line) for line in data.splitlines()) if m)}
{'baz': 'bing', 'eggs': 'spam', 'bar': 'foo'}
WiringHarness
Updated on June 22, 2022Comments
-
WiringHarness almost 2 years
I'm pretty new to Python, and I'm trying to parse a file. Only certain lines in the file contain data of interest, and I want to end up with a dictionary of the stuff parsed from valid matching lines in the file.
The code below works, but it's a bit ugly and I'm trying to learn how it should be done, perhaps with a comprehension, or else with a multiline regex. I'm using Python 3.2.
file_data = open('x:\\path\\to\\file','r').readlines() my_list = [] for line in file_data: # discard lines which don't match at all if re.search(pattern, line): # icky, repeating search!! one_tuple = re.search(pattern, line).group(3,2) my_list.append(one_tuple) my_dict = dict(my_list)
Can you suggest a better implementation?
-
WiringHarness almost 12 yearsDict comprehension; I like it!
-
georg almost 12 yearsYou might want to make the first group in you regex non-capturing (
?:
) to skip the comprehension step:my_dict = dict(re.findall...)
-
Victor Polevoy almost 9 yearsPlease, follow your code snippet with some details, explanations to be more clear for readers.
-
smci over 6 yearsPretty good improvement. However: reading all the data into a variable, rather than iterating over a file object (and implicitly calling
readline()
method), is not very scalable.re.findall()
works perfectly well on an iterator rather than a variable.