Python parsing list of string
Solution 1
OK, with help of brandizzi, I have found THE answer to this question.
Solution:
listconfig = []
for line in list_of_strings:
matched = re.search(r"Key:[\s]*(?P<key>[0-9A-Fa-f]+)[\s]*" \
r"(Index:[\s]*(?P<index>[0-9]+)[\s]*)?" \
r"(Field 1:[\s]*(?P<field_1>[0-9]+)[\s]*)?" \
r"(Field 2:[\s]*(?P<field_2>[0-9 A-Za-z]+)[\s]*)?" \
r"(Field 3:[\s]*(?P<field_3>[-+]?[0-9]+)[\s]*)?", line)
if matched:
print matched.groupdict()
listconfig.append(matched.groupdict())
Solution 2
Firstly, your regex seems to not work properly. The Key
field should have values which could include f
, right? So its group should not be ([0-9A-Ea-e]+)
but instead ([0-9A-Fa-f]+)
. Also, it is a good - actually, a wonderful - practice to prefix the regex string with r
when dealing with regexes because it avoids problems with \
escaping characters. (If you do not understand why to do it, look at raw strings)
Now, my approach to the problem. First, I would create a regex without pipes:
>>> regex = r"(Key):[\s]*([0-9A-Fa-f]+)[\s]*" \
... r"(Index):[\s]*([0-9]+)[\s]*" \
... r"(Field 1):[\s]*([0-9]+)[\s]*" \
... r"(Field 2):[\s]*([0-9 A-Za-z]+)[\s]*" \
... r"(Field 3):[\s]*([-+]?[0-9]+)[\s]*"
With this change, the findall()
will return only one tuple of found groups for an entire line. In this tuple, each key is followed by its value:
>>> re.findall(regex, line)
[('Key', 'af12d9', 'Index', '0', 'Field 1', '1234', 'Field 2', '1234 Ring ', 'Field 3', '-10')]
So I get the tuple...
>>> found = re.findall(regex, line)[0]
>>> found
('Key', 'af12d9', 'Index', '0', 'Field 1', '1234', 'Field 2', '1234 Ring ', 'Field 3', '-10')
...and using slices I get only the keys...
>>> found[::2]
('Key', 'Index', 'Field 1', 'Field 2', 'Field 3')
...and also only the values:
>>> found[1::2]
('af12d9', '0', '1234', '1234 Ring ', '-10')
Then I create a list of tuples containing the key and its corresponding value with zip()
function:
>>> zip(found[::2], found[1::2])
[('Key', 'af12d9'), ('Index', '0'), ('Field 1', '1234'), ('Field 2', '1234 Ring '), ('Field 3', '-10')]
The gran finale is to pass the list of tuples to the dict()
constructor:
>>> dict(zip(found[::2], found[1::2]))
{'Field 3': '-10', 'Index': '0', 'Field 1': '1234', 'Key': 'af12d9', 'Field 2': '1234 Ring '}
I find this solution the best, but it is indeed a subjective question in some sense. HTH anyway :)
sudo
Updated on June 04, 2022Comments
-
sudo almost 2 years
I have list of strings, I'm looking for lines like this:
Key: af12d9 Index: 0 Field 1: 1234 Field 2: 1234 Field 3: -10
after finding lines like this, I want to store each one as a dictionary {'key' : af12d9, 'index' : 0, 'field 1' : .... }, then store this dictionary to a list, so I will have a list of dictionaries.
I was able to get it working like this:
listconfig = [] for line in list_of_strings: matched = findall("(Key:[\s]*[0-9A-Fa-f]+[\s]*)|(Index:[\s]*[0-9]+[\s]*)|(Field 1:[\s]*[0-9]+[\s]*)|(Field 2:[\s]*[0-9]+[\s]*)|(Field 3:[\s]*[-+]?[0-9]+[\s]*)", line) if matched: listconfig += [dict(map(lambda pair: (pair[0].strip().lower(), pair[1].strip().lower()), map(lambda line: line[0].split(':'), [filter(lambda x: x, group) for group in matched])))]
I'm just wondering if there could a better way (short and efficient) to do this because I think the findall will do 5 searches per string. (correct? since it returns a list of 5 tuples.)
Thank you.
Solution:
OK, with help of brandizzi, I have found THE answer to this question.
Solution:
listconfig = [] for line in list_of_strings: matched = re.search(r"Key:[\s]*(?P<key>[0-9A-Fa-f]+)[\s]*" \ r"(Index:[\s]*(?P<index>[0-9]+)[\s]*)?" \ r"(Field 1:[\s]*(?P<field_1>[0-9]+)[\s]*)?" \ r"(Field 2:[\s]*(?P<field_2>[0-9 A-Za-z]+)[\s]*)?" \ r"(Field 3:[\s]*(?P<field_3>[-+]?[0-9]+)[\s]*)?", line) if matched: print matched.groupdict() listconfig.append(matched.groupdict())