Flatten an irregular list of lists
Solution 1
Using generator functions can make your example easier to read and improve performance.
Python 2
Using the Iterable
ABC added in 2.6:
from collections import Iterable
def flatten(xs):
for x in xs:
if isinstance(x, Iterable) and not isinstance(x, basestring):
for item in flatten(x):
yield item
else:
yield x
Python 3
In Python 3, basestring
is no more, but the tuple (str, bytes)
gives the same effect. Also, the yield from
operator returns an item from a generator one at a time.
from collections.abc import Iterable
def flatten(xs):
for x in xs:
if isinstance(x, Iterable) and not isinstance(x, (str, bytes)):
yield from flatten(x)
else:
yield x
Solution 2
My solution:
import collections
def flatten(x):
if isinstance(x, collections.Iterable):
return [a for i in x for a in flatten(i)]
else:
return [x]
A little more concise, but pretty much the same.
Solution 3
Generator using recursion and duck typing (updated for Python 3):
def flatten(L):
for item in L:
try:
yield from flatten(item)
except TypeError:
yield item
list(flatten([[[1, 2, 3], [4, 5]], 6]))
>>>[1, 2, 3, 4, 5, 6]
Solution 4
Here is my functional version of recursive flatten which handles both tuples and lists, and lets you throw in any mix of positional arguments. Returns a generator which produces the entire sequence in order, arg by arg:
flatten = lambda *n: (e for a in n
for e in (flatten(*a) if isinstance(a, (tuple, list)) else (a,)))
Usage:
l1 = ['a', ['b', ('c', 'd')]]
l2 = [0, 1, (2, 3), [[4, 5, (6, 7, (8,), [9]), 10]], (11,)]
print list(flatten(l1, -2, -1, l2))
['a', 'b', 'c', 'd', -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
Solution 5
Generator version of @unutbu's non-recursive solution, as requested by @Andrew in a comment:
def genflat(l, ltypes=collections.Sequence):
l = list(l)
i = 0
while i < len(l):
while isinstance(l[i], ltypes):
if not l[i]:
l.pop(i)
i -= 1
break
else:
l[i:i + 1] = l[i]
yield l[i]
i += 1
Slightly simplified version of this generator:
def genflat(l, ltypes=collections.Sequence):
l = list(l)
while l:
while l and isinstance(l[0], ltypes):
l[0:1] = l[0]
if l: yield l.pop(0)
telliott99
Updated on September 30, 2021Comments
-
telliott99 over 2 years
Yes, I know this subject has been covered before (here, here, here, here), but as far as I know, all solutions, except for one, fail on a list like this:
L = [[[1, 2, 3], [4, 5]], 6]
Where the desired output is
[1, 2, 3, 4, 5, 6]
Or perhaps even better, an iterator. The only solution I saw that works for an arbitrary nesting is found in this question:
def flatten(x): result = [] for el in x: if hasattr(el, "__iter__") and not isinstance(el, basestring): result.extend(flatten(el)) else: result.append(el) return result flatten(L)
Is this the best model? Did I overlook something? Any problems?
-
Andrew Wagner over 14 yearsit's a pre-order traversal of the tree formed by the nested lists. only the leaves are returned. Note that this implementation will consume the original data structure, for better or worse. Could be fun to write one that both preserves the original tree, but also doesn't have to copy the list entries.
-
c-urchin over 13 yearsI think you need to test for strings -- eg add "and not isinstance(l[0], basestring)" as in Cristian's solution. Otherwise you get an infinite loop around l[0:1] = l[0]
-
telliott99 over 13 yearsI like simple too. In this case though, you iterate over the list as many times as there are nestings or levels. Could get expensive.
-
clay over 13 years@telliott99: You're right if your lists are really big and/or nested to great depths. However, if that isn't the case, then the simpler solution works just as well, and without the deep magic of some of the other answers. There is a place for multi-stage recursive generator comprehensions, but I'm not convinced that should be where you look first. (I guess you know where I fall in the "Worse is Better" debate.)
-
clay over 13 years@telliott99: Or to put that another way, you won't have to "try to Grok" my solution. If performance isn't a bottleneck, what matters most to you as a programmer?
-
JWL almost 12 yearsOf all the suggestions on this page, this is the only one that flattened this list
l = ([[chr(i),chr(i-32)] for i in xrange(ord('a'), ord('z')+1)] + range(0,9))
in a snap when i did thislist(flatten(l))
. All the others, would start working and take forever! -
DanB almost 12 yearsThis is clever (and probably fast)... but it's not very pythonic.
-
gb. over 11 years"why not just do something like this" you say? Because it is very easy to break! Very bad idea. One example, what if your items are strings, not ints? Then if a string contains a '[' you are doomed. And what if your items have no good (or very long) string representation?
-
Daniel 'Dang' Griffith over 11 yearsThis is a good example of making a generator, but as c-urchin mentions, the algorithm itself fails when the sequence contains strings.
-
abarnert almost 11 yearsYou can do this without importing anything if you just
try: iter(x)
to test whether it's iterable… But I don't think having to import a stdlib module is a downside worth avoiding. -
dansalmo almost 11 yearsUse
isinstance(i, (tuple, list))
. Initializing empty variables is a flag for me to look to alternate code structures, typically comprehensions, generators, recursion, etc. -
wim over 10 yearsany consensus on whether to use ABC Iterable or ABC Sequence?
-
unutbu over 10 years
sets
,dicts
,deques
,listiterators
,generators
, filehandles, and custom classes with__iter__
defined are all instances ofcollections.Iterable
, but notcollections.Sequence
. The result of flattening adict
is a bit iffy, but otherwise, I thinkcollections.Iterable
is a better default thancollections.Sequence
. It's definitely the more liberal. -
unutbu over 10 years@wim: One problem with using
collections.Iterable
is that this includes infinite generators. I've changed my answer handle this case. -
Kristof Pal over 10 yearsgreat solution, however would be much helpful if you added some comment to describe what
e
,a
,n
refer to -
SourceSeeker over 10 years@WolfgangKuehne: Try
args
forn
,intermediate
(or the shortermid
or you might preferelement
) fora
andresult
fore
, so:flatten = lambda *args: (result for mid in args for result in (flatten(*mid) if isinstance(mid, (tuple, list)) else (mid,)))
-
dash-tom-bang about 10 yearsSimpler solutions have less logic. Recursion is a pretty fundamental programming construct that anyone who considers themselves a programmer should be completely comfortable with. Generators are very much the Python Way and (along with comprehensions) are something that any professional Python programmer should grok instantly.
-
dash-tom-bang about 10 years
return type(l)(ret)
will get you the same container type back as was passed in, also. :) -
clay about 10 yearsI agree about recursion. When I wrote my answer, python still broke recursion at 1000 cycles. Have they changed this? As for being a professional python programmer, I'm not. Moreover, I imagine many people programming in python do not do so full time.
-
Xolve about 10 years@dash-tom-bang Can you please explain what it means in a bit detail.
-
dash-tom-bang almost 10 yearsIf you pass in a list, you probably want a list back. If you pass in a tuple, you probably want a tuple back. If you pass in a mishmash of the two, you'll get whatever the outer enclosing thing was.
-
abarnert over 9 yearsIf you try to generalize this to something other than int values, it'll be fun with, e.g.,
[['C=64', 'APPLE ]['], ['Amiga', 'Mac', 'ST']]
:) On the other hand, given a list that contains itself, it'll do a little better than the other answers, raising an exception instead of just looping until you run out of memory/recursing until you exhaust the stack… -
clay over 9 yearsThe original prompt was about flattening a list of integers. If you just change the list comprehension to d=[x for x in c] it should work fine for your sample.
-
abarnert over 9 yearsFirst,
[x for x in c]
is just a slow and verbose way to make a copy ofc
, so why would you do that? Second, your code is clearly going to convert'APPLE ]['
into'APPLE '
, because it doesn't handle quoting, it just assumes any brackets are list brackets. -
clay over 9 yearsHa! The way your comment got formatted on my computer, I didn't even realize that was supposed to be Apple II as it appeared on the old computers. In any case, my answer to both your questions is that this exercise--for me--is merely an experiment to find a creative solution to flattening a list. I'm not sure I would generalize it to flattening every list out there.
-
Nir Alfasi over 9 yearsWorth to note that this solution works only if all the items are of type
int
-
josch almost 9 yearsThis also flattens dictionaries. Maybe you want to use
collections.Sequence
instead ofcollections.Iteratable
? -
bcdan almost 9 yearsThis is significantly faster than
compiler.ast.flatten
. Great, compact code, works for any (I think) object type. -
RolKau over 8 yearsThis doesn't work with things that aren't lists initially, e.g.
for i in flatten(42): print (i)
. This could be fixed by moving theisinstance
-test and the else-clause outside of thefor el
-loop. (Then you could throw anything at it, and it would make a flattened list out of it) -
dansalmo over 8 yearsThanks, that works nice for Python 3. For 2.x the previous is needed:
for i in flatten(item): yield i
-
Zion over 8 years@gb. Well what if this was what the op needed? and the example was clearly a list of
ints
so "what if's" don't apply here, had the OP stated otherwise, but then again he didn't so, this is the one of the simplest and most valid answers according to what was given. -
gb. over 8 yearsWell sorry, "what ifs" apply, careful considerations of all "what ifs" is the blood and guts of programming.
-
yelsayed almost 8 yearsWhy do you need the compound check
if isinstance(el, collections.Iterable) and not isinstance(el, basestring)
? Why not justif not isinstance(el, list)
? -
Zero almost 7 yearsCould make it more concise,
def flatten(x): return [a for i in x for a in flatten(i)] if isinstance(x, collections.Iterable) else [x]
- but readability might be subjective here. -
cs95 over 6 yearsIt's not a one liner. No matter how much you attempt to fit it into one, the
def foo():
is a separate line. Also, this is very unreadable. -
Emilio M Bumachar about 6 yearsI've de-one-line-ified the code, and did some further refactoring. (edit is pending peer review as I write this) This particular method seemed very readable to me, though the original code did need some refactoring.
-
sten about 6 yearslist(flatten([['X'], 'Y'])) fails on 2.X variant
-
dansalmo about 6 years@user1019129 see my comment above yours
-
sten about 6 yearsyes it fails with the cycle.. i think because a string is also an "array"-of-chars
-
cglacet almost 6 years"nested list with any depth" not true. Just try you'll see:
current_value = combiner(current_value,each_item) RecursionError: maximum recursion depth exceeded
-
Oldyoung almost 6 yearshmmm I are you trying to flaten list with more than 1000 layers?
-
cglacet almost 6 yearsOf course, that's the whole point of the discussion about recursive vs. iterative solutions. If you know in advance that the number of layers is < than 1000 then the most simple solution will work. When you say "any depth" this includes list with depth > 1000.
-
cglacet almost 6 yearsThis has no advantage over the simple solution as this fails to process deep lists, ie "RecursionError: maximum recursion depth exceeded while getting the repr of an object".
-
dawg over 5 yearsFor Python 3.7, using
collections.Iterable
is deprecated. Usecollections.abc.Iterable
instead. -
aandis over 5 yearsthis doesn't work on strings because strings are iterable too. Replace the condition with
if isinstance(x, collections.Iterable) and not isinstance(x, basestring)
-
Statham over 5 yearsthe multi-nest is redundant
-
gosuto over 5 years@yelsayed because you want to prevent separating strings into their individual letters.
-
gosuto over 5 yearsPut a concise version of this in a Gist for further reference: gist.github.com/jorijnsmit/b6deccd3fd5fefbabb72759c74040745
-
Jorge Orpinel Pérez over 5 yearsNo recursion needed.
-
Jorge Orpinel Pérez over 5 yearsYou just need to
arr_str = str(arr)
and then[int(s) for s in re.findall(r'\d+', arr_str)]
really. See github.com/jorgeorpinel/flatten_nested_lists/blob/master/… -
Jorge Orpinel Pérez over 5 yearsYes! Very similar to my code at github.com/jorgeorpinel/flatten_nested_lists/blob/master/…
-
cglacet about 5 yearsIndeed, recursion is never needed. In this specific case using recursion is not the best solution as it will crash on deeply nested lists (depth > 1000). But if you don't aim at having something safe, then yes recursive function are better as they are way easier to read/write.
-
Georgy almost 5 yearsThis doesn't seem to work for the 3rd and the 4th examples. It throws
StopIteration
. Also, looks likewhile True: first = next(remainder)
could be replaced byfor first in remainder:
. -
Georgy almost 5 yearsFYI: it uses recursive solution: link to source
-
Max Ghenis over 4 yearsI've recently started getting this warning with this:
DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
-
baduker about 4 years@Georgy this could be fixed with encapsulating the body of flatten in a
try-except StopIteration block
. -
jbuddy_13 about 4 yearsCould somebody walk me through what's happening here? It goes a bit over my head
-
noobninja almost 4 yearsreplace
collections.Iterable
withlist
-
Alexej Magura almost 4 yearsThis is the only solution I have come across, in a moderate google search, on any website that actually works for lists nested deeper than one level.
-
Jake Schmidt over 3 yearsthis only works for types (like int) which are convertible to string and back. something with the complexity of regular expressions is also not needed to tackle such a simple problem. If you want a simple solution, pradyunsg's is best.
-
A Kareem over 3 yearsI applaud your idea of using generators but your code seems to take awfully long to run (it takes more than an hour to run over a single list with 1 million empty strings) since every time it reaches a string/byte object it only yields when it reaches a maximum recursion depth limit, and it repeats the whole process for every string/byte object. (which im pretty sure is an unintended side effect from catching all exceptions) Although I really like your idea so I implemented a correct version, I would like to edit your answer to fix the problem but I think that would be awfully rude of me.
-
Marko over 3 yearsThank you for your comment, @AKareem, I hope I've improved my code now, I will look into your version too :-)
-
Jeff Hykin over 3 yearsThis is a work of art. So few characters, and still nearly impossible to understand. 10/10 best Python code golf I've seen yet 🏌️♂️🏌️♀️⛳️. Having something this short almost makes up for the fact that Python doesn't have a built-in flatten function.
-
rvrvrv over 3 yearsGreat stuff! For people (like me) who are using pandas anyway, this is a beautifully simple way
-
Eduardo Luis Santos about 3 yearsRecursionError: maximum recursion depth exceeded in comparison
-
Shreyas almost 3 yearsPlease do not edit the answer. If you feel the need to "refactor", feel free to post as your own answer. There is a reason why the code is presented the way it is. It is to emphasise that the approach came from lisp. You can plain ignore the "one-liner" part of it - it was not intended as some kind of boasting. It was, again, to indicate that the thought behind it is still "one-liner": that of first and rest list processing.
-
mattino over 2 yearsThis looks very elegant and simple. Why it does not have more upvotes? Are there any problems with this solution?
-
Nevermore over 2 years@jbuddy_13
yield from <iterable>
is exactly the same asfor x in <iterable>: yield x
. It's basically a one liner syntax for a for-loop generator. So whatflatten()
is doing is it's checking for either one of two cases. The base case (in theelse
) is that the element is not an iterable, so yield that element. The recursive case (in theif
) is that the element is an iterable, so pass it back intoflatten()
and yield whatever the results are. -
ATOMP about 2 yearsThis could be made much more efficient by using
collections.deque
and not copying. Assuming you have control over the creation of the nested lists. Also,deepcopy
hits the recursion limit.