Why does re.findall return a list of tuples when my pattern only contains one group?
Solution 1
You pattern has two groups, the bigger group:
(1([a-z]+)2|[a-z])
and the second smaller group which is a subset of your first group:
([a-z]+)
Here is a solution that gives you the expected result although mind you, it is really ugly and there is probably a better way. I just can't figure it out:
import re
s = 'ab1cd2efg1hij2k'
a = re.findall( r'((?:1)([a-z]+)(?:2)|([a-z]))', s )
a = [tuple(j for j in i if j)[-1] for i in a]
>>> print a
['a', 'b', 'cd', 'e', 'f', 'g', 'hij', 'k']
Solution 2
I am 5 years too late to the party, but I think I might have found an elegant solution to the re.findall() ugly tuple-ridden output with multiple capture groups.
In general, if you end up with an output which looks something like that:
[('pattern_1', '', ''), ('', 'pattern_2', ''), ('pattern_1', '', ''), ('', '', 'pattern_3')]
Then you can bring it into a flat list with this little trick:
["".join(x) for x in re.findall(all_patterns, iterable)]
The expected output will be like so:
['pattern_1', 'pattern_2', 'pattern_1', 'pattern_3']
It was tested on Python 3.7. Hope it helps!
Solution 3
Your regular expression has 2 groups, just look at the number of parenthesis you are using :). One group would be ([a-z]+)
and the other one (1([a-z]+)2|[a-z])
. The key is that you can have groups inside other groups. So, if possible, you should build a regular expression with only one group, so that you don't have to post-process the result.
An example of regular expression with only one group would be:
>>> import re
>>> s = 'ab1cd2efg1hij2k'
>>> re.findall('((?<=1)[a-z]+(?=2)|[a-z])', s)
['a', 'b', 'cd', 'e', 'f', 'g', 'hij', 'k']
Solution 4
Look at this answer for similar question: https://bugs.python.org/issue6663 Just drop the parenthesis if you are using findall:
import re
s = 'ab1cd2efg1hij2k'
re.findall( r'(?<=1)[a-z]+(?=2)|[a-z]', s )
Solution 5
If you want to have an 'or' match without having the split into match groups just add a '?:' to the beginning of the 'or' match.
Without '?:'
re.findall('(test (word1|word2))', 'test word1')
Output:
[('test word1', 'word1')]
With '?:'
re.findall('(test (?:word1|word2))', 'test word1')
Output:
['test word1']
Further explanation: https://www.ocpsoft.org/tutorials/regular-expressions/or-in-regex/
usual me
Updated on December 30, 2020Comments
-
usual me over 3 years
Say I have a string
s
containing letters and two delimiters1
and2
. I want to split the string in the following way:- if a substring
t
falls between1
and2
, returnt
- otherwise, return each character
So if
s = 'ab1cd2efg1hij2k'
, the expected output is['a', 'b', 'cd', 'e', 'f', 'g', 'hij', 'k']
.I tried to use regular expressions:
import re s = 'ab1cd2efg1hij2k' re.findall( r'(1([a-z]+)2|[a-z])', s ) [('a', ''), ('b', ''), ('1cd2', 'cd'), ('e', ''), ('f', ''), ('g', ''), ('1hij2', 'hij'), ('k', '')]
From there i can do
[ x[x[-1]!=''] for x in re.findall( r'(1([a-z]+)2|[a-z])', s ) ]
to get my answer, but I still don't understand the output. The documentation says thatfindall
returns a list of tuples if the pattern has more than one group. However, my pattern only contains one group. Any explanation is welcome. - if a substring
-
Blckknght almost 10 yearsYour pattern is pretty odd. You don't need the non-capturing groups around
1
and2
, or the group around the whole pattern (which you expend a bunch of effort to skip in the output). Instead, just accept that thefindall
call will return 2-tuples and turn them into single values witha = [x or y for x, y in a]
. -
A. Rabus over 3 years"non-capturing group" is the keyword here... (added solely for the search engine)
-
grantr about 2 yearssaved me-thanks