Flatten an irregular list of lists

176,422

Solution 1

Using generator functions can make your example easier to read and improve performance.

Python 2

Using the Iterable ABC added in 2.6:

from collections import Iterable

def flatten(xs):
    for x in xs:
        if isinstance(x, Iterable) and not isinstance(x, basestring):
            for item in flatten(x):
                yield item
        else:
            yield x

Python 3

In Python 3, basestring is no more, but the tuple (str, bytes) gives the same effect. Also, the yield from operator returns an item from a generator one at a time.

from collections.abc import Iterable

def flatten(xs):
    for x in xs:
        if isinstance(x, Iterable) and not isinstance(x, (str, bytes)):
            yield from flatten(x)
        else:
            yield x

Solution 2

My solution:

import collections


def flatten(x):
    if isinstance(x, collections.Iterable):
        return [a for i in x for a in flatten(i)]
    else:
        return [x]

A little more concise, but pretty much the same.

Solution 3

Generator using recursion and duck typing (updated for Python 3):

def flatten(L):
    for item in L:
        try:
            yield from flatten(item)
        except TypeError:
            yield item

list(flatten([[[1, 2, 3], [4, 5]], 6]))
>>>[1, 2, 3, 4, 5, 6]

Solution 4

Here is my functional version of recursive flatten which handles both tuples and lists, and lets you throw in any mix of positional arguments. Returns a generator which produces the entire sequence in order, arg by arg:

flatten = lambda *n: (e for a in n
    for e in (flatten(*a) if isinstance(a, (tuple, list)) else (a,)))

Usage:

l1 = ['a', ['b', ('c', 'd')]]
l2 = [0, 1, (2, 3), [[4, 5, (6, 7, (8,), [9]), 10]], (11,)]
print list(flatten(l1, -2, -1, l2))
['a', 'b', 'c', 'd', -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

Solution 5

Generator version of @unutbu's non-recursive solution, as requested by @Andrew in a comment:

def genflat(l, ltypes=collections.Sequence):
    l = list(l)
    i = 0
    while i < len(l):
        while isinstance(l[i], ltypes):
            if not l[i]:
                l.pop(i)
                i -= 1
                break
            else:
                l[i:i + 1] = l[i]
        yield l[i]
        i += 1

Slightly simplified version of this generator:

def genflat(l, ltypes=collections.Sequence):
    l = list(l)
    while l:
        while l and isinstance(l[0], ltypes):
            l[0:1] = l[0]
        if l: yield l.pop(0)
Share:
176,422
telliott99
Author by

telliott99

Updated on September 30, 2021

Comments

  • telliott99
    telliott99 over 2 years

    Yes, I know this subject has been covered before (here, here, here, here), but as far as I know, all solutions, except for one, fail on a list like this:

    L = [[[1, 2, 3], [4, 5]], 6]
    

    Where the desired output is

    [1, 2, 3, 4, 5, 6]
    

    Or perhaps even better, an iterator. The only solution I saw that works for an arbitrary nesting is found in this question:

    def flatten(x):
        result = []
        for el in x:
            if hasattr(el, "__iter__") and not isinstance(el, basestring):
                result.extend(flatten(el))
            else:
                result.append(el)
        return result
    
    flatten(L)
    

    Is this the best model? Did I overlook something? Any problems?

  • Andrew Wagner
    Andrew Wagner over 14 years
    it's a pre-order traversal of the tree formed by the nested lists. only the leaves are returned. Note that this implementation will consume the original data structure, for better or worse. Could be fun to write one that both preserves the original tree, but also doesn't have to copy the list entries.
  • c-urchin
    c-urchin over 13 years
    I think you need to test for strings -- eg add "and not isinstance(l[0], basestring)" as in Cristian's solution. Otherwise you get an infinite loop around l[0:1] = l[0]
  • telliott99
    telliott99 over 13 years
    I like simple too. In this case though, you iterate over the list as many times as there are nestings or levels. Could get expensive.
  • clay
    clay over 13 years
    @telliott99: You're right if your lists are really big and/or nested to great depths. However, if that isn't the case, then the simpler solution works just as well, and without the deep magic of some of the other answers. There is a place for multi-stage recursive generator comprehensions, but I'm not convinced that should be where you look first. (I guess you know where I fall in the "Worse is Better" debate.)
  • clay
    clay over 13 years
    @telliott99: Or to put that another way, you won't have to "try to Grok" my solution. If performance isn't a bottleneck, what matters most to you as a programmer?
  • JWL
    JWL almost 12 years
    Of all the suggestions on this page, this is the only one that flattened this list l = ([[chr(i),chr(i-32)] for i in xrange(ord('a'), ord('z')+1)] + range(0,9)) in a snap when i did this list(flatten(l)). All the others, would start working and take forever!
  • DanB
    DanB almost 12 years
    This is clever (and probably fast)... but it's not very pythonic.
  • gb.
    gb. over 11 years
    "why not just do something like this" you say? Because it is very easy to break! Very bad idea. One example, what if your items are strings, not ints? Then if a string contains a '[' you are doomed. And what if your items have no good (or very long) string representation?
  • Daniel 'Dang' Griffith
    Daniel 'Dang' Griffith over 11 years
    This is a good example of making a generator, but as c-urchin mentions, the algorithm itself fails when the sequence contains strings.
  • abarnert
    abarnert almost 11 years
    You can do this without importing anything if you just try: iter(x) to test whether it's iterable… But I don't think having to import a stdlib module is a downside worth avoiding.
  • dansalmo
    dansalmo almost 11 years
    Use isinstance(i, (tuple, list)). Initializing empty variables is a flag for me to look to alternate code structures, typically comprehensions, generators, recursion, etc.
  • wim
    wim over 10 years
    any consensus on whether to use ABC Iterable or ABC Sequence?
  • unutbu
    unutbu over 10 years
    sets, dicts, deques, listiterators, generators, filehandles, and custom classes with __iter__ defined are all instances of collections.Iterable, but not collections.Sequence. The result of flattening a dict is a bit iffy, but otherwise, I think collections.Iterable is a better default than collections.Sequence. It's definitely the more liberal.
  • unutbu
    unutbu over 10 years
    @wim: One problem with using collections.Iterable is that this includes infinite generators. I've changed my answer handle this case.
  • Kristof Pal
    Kristof Pal over 10 years
    great solution, however would be much helpful if you added some comment to describe what e, a, n refer to
  • SourceSeeker
    SourceSeeker over 10 years
    @WolfgangKuehne: Try args for n, intermediate (or the shorter mid or you might prefer element) for a and result for e, so: flatten = lambda *args: (result for mid in args for result in (flatten(*mid) if isinstance(mid, (tuple, list)) else (mid,)))
  • dash-tom-bang
    dash-tom-bang about 10 years
    Simpler solutions have less logic. Recursion is a pretty fundamental programming construct that anyone who considers themselves a programmer should be completely comfortable with. Generators are very much the Python Way and (along with comprehensions) are something that any professional Python programmer should grok instantly.
  • dash-tom-bang
    dash-tom-bang about 10 years
    return type(l)(ret) will get you the same container type back as was passed in, also. :)
  • clay
    clay about 10 years
    I agree about recursion. When I wrote my answer, python still broke recursion at 1000 cycles. Have they changed this? As for being a professional python programmer, I'm not. Moreover, I imagine many people programming in python do not do so full time.
  • Xolve
    Xolve about 10 years
    @dash-tom-bang Can you please explain what it means in a bit detail.
  • dash-tom-bang
    dash-tom-bang almost 10 years
    If you pass in a list, you probably want a list back. If you pass in a tuple, you probably want a tuple back. If you pass in a mishmash of the two, you'll get whatever the outer enclosing thing was.
  • abarnert
    abarnert over 9 years
    If you try to generalize this to something other than int values, it'll be fun with, e.g., [['C=64', 'APPLE ]['], ['Amiga', 'Mac', 'ST']] :) On the other hand, given a list that contains itself, it'll do a little better than the other answers, raising an exception instead of just looping until you run out of memory/recursing until you exhaust the stack…
  • clay
    clay over 9 years
    The original prompt was about flattening a list of integers. If you just change the list comprehension to d=[x for x in c] it should work fine for your sample.
  • abarnert
    abarnert over 9 years
    First, [x for x in c] is just a slow and verbose way to make a copy of c, so why would you do that? Second, your code is clearly going to convert 'APPLE ][' into 'APPLE ', because it doesn't handle quoting, it just assumes any brackets are list brackets.
  • clay
    clay over 9 years
    Ha! The way your comment got formatted on my computer, I didn't even realize that was supposed to be Apple II as it appeared on the old computers. In any case, my answer to both your questions is that this exercise--for me--is merely an experiment to find a creative solution to flattening a list. I'm not sure I would generalize it to flattening every list out there.
  • Nir Alfasi
    Nir Alfasi over 9 years
    Worth to note that this solution works only if all the items are of type int
  • josch
    josch almost 9 years
    This also flattens dictionaries. Maybe you want to use collections.Sequence instead of collections.Iteratable?
  • bcdan
    bcdan almost 9 years
    This is significantly faster than compiler.ast.flatten. Great, compact code, works for any (I think) object type.
  • RolKau
    RolKau over 8 years
    This doesn't work with things that aren't lists initially, e.g. for i in flatten(42): print (i). This could be fixed by moving the isinstance-test and the else-clause outside of the for el-loop. (Then you could throw anything at it, and it would make a flattened list out of it)
  • dansalmo
    dansalmo over 8 years
    Thanks, that works nice for Python 3. For 2.x the previous is needed: for i in flatten(item): yield i
  • Zion
    Zion over 8 years
    @gb. Well what if this was what the op needed? and the example was clearly a list of ints so "what if's" don't apply here, had the OP stated otherwise, but then again he didn't so, this is the one of the simplest and most valid answers according to what was given.
  • gb.
    gb. over 8 years
    Well sorry, "what ifs" apply, careful considerations of all "what ifs" is the blood and guts of programming.
  • yelsayed
    yelsayed almost 8 years
    Why do you need the compound check if isinstance(el, collections.Iterable) and not isinstance(el, basestring)? Why not just if not isinstance(el, list)?
  • Zero
    Zero almost 7 years
    Could make it more concise, def flatten(x): return [a for i in x for a in flatten(i)] if isinstance(x, collections.Iterable) else [x] - but readability might be subjective here.
  • cs95
    cs95 over 6 years
    It's not a one liner. No matter how much you attempt to fit it into one, the def foo(): is a separate line. Also, this is very unreadable.
  • Emilio M Bumachar
    Emilio M Bumachar about 6 years
    I've de-one-line-ified the code, and did some further refactoring. (edit is pending peer review as I write this) This particular method seemed very readable to me, though the original code did need some refactoring.
  • sten
    sten about 6 years
    list(flatten([['X'], 'Y'])) fails on 2.X variant
  • dansalmo
    dansalmo about 6 years
    @user1019129 see my comment above yours
  • sten
    sten about 6 years
    yes it fails with the cycle.. i think because a string is also an "array"-of-chars
  • cglacet
    cglacet almost 6 years
    "nested list with any depth" not true. Just try you'll see: current_value = combiner(current_value,each_item) RecursionError: maximum recursion depth exceeded
  • Oldyoung
    Oldyoung almost 6 years
    hmmm I are you trying to flaten list with more than 1000 layers?
  • cglacet
    cglacet almost 6 years
    Of course, that's the whole point of the discussion about recursive vs. iterative solutions. If you know in advance that the number of layers is < than 1000 then the most simple solution will work. When you say "any depth" this includes list with depth > 1000.
  • cglacet
    cglacet almost 6 years
    This has no advantage over the simple solution as this fails to process deep lists, ie "RecursionError: maximum recursion depth exceeded while getting the repr of an object".
  • dawg
    dawg over 5 years
    For Python 3.7, using collections.Iterable is deprecated. Use collections.abc.Iterable instead.
  • aandis
    aandis over 5 years
    this doesn't work on strings because strings are iterable too. Replace the condition with if isinstance(x, collections.Iterable) and not isinstance(x, basestring)
  • Statham
    Statham over 5 years
    the multi-nest is redundant
  • gosuto
    gosuto over 5 years
    @yelsayed because you want to prevent separating strings into their individual letters.
  • gosuto
    gosuto over 5 years
    Put a concise version of this in a Gist for further reference: gist.github.com/jorijnsmit/b6deccd3fd5fefbabb72759c74040745
  • Jorge Orpinel Pérez
    Jorge Orpinel Pérez over 5 years
    No recursion needed.
  • Jorge Orpinel Pérez
    Jorge Orpinel Pérez over 5 years
    You just need to arr_str = str(arr) and then [int(s) for s in re.findall(r'\d+', arr_str)] really. See github.com/jorgeorpinel/flatten_nested_lists/blob/master/…
  • Jorge Orpinel Pérez
    Jorge Orpinel Pérez over 5 years
  • cglacet
    cglacet about 5 years
    Indeed, recursion is never needed. In this specific case using recursion is not the best solution as it will crash on deeply nested lists (depth > 1000). But if you don't aim at having something safe, then yes recursive function are better as they are way easier to read/write.
  • Georgy
    Georgy almost 5 years
    This doesn't seem to work for the 3rd and the 4th examples. It throws StopIteration. Also, looks like while True: first = next(remainder) could be replaced by for first in remainder:.
  • Georgy
    Georgy almost 5 years
    FYI: it uses recursive solution: link to source
  • Max Ghenis
    Max Ghenis over 4 years
    I've recently started getting this warning with this: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  • baduker
    baduker about 4 years
    @Georgy this could be fixed with encapsulating the body of flatten in a try-except StopIteration block.
  • jbuddy_13
    jbuddy_13 about 4 years
    Could somebody walk me through what's happening here? It goes a bit over my head
  • noobninja
    noobninja almost 4 years
    replace collections.Iterable with list
  • Alexej Magura
    Alexej Magura almost 4 years
    This is the only solution I have come across, in a moderate google search, on any website that actually works for lists nested deeper than one level.
  • Jake Schmidt
    Jake Schmidt over 3 years
    this only works for types (like int) which are convertible to string and back. something with the complexity of regular expressions is also not needed to tackle such a simple problem. If you want a simple solution, pradyunsg's is best.
  • A Kareem
    A Kareem over 3 years
    I applaud your idea of using generators but your code seems to take awfully long to run (it takes more than an hour to run over a single list with 1 million empty strings) since every time it reaches a string/byte object it only yields when it reaches a maximum recursion depth limit, and it repeats the whole process for every string/byte object. (which im pretty sure is an unintended side effect from catching all exceptions) Although I really like your idea so I implemented a correct version, I would like to edit your answer to fix the problem but I think that would be awfully rude of me.
  • Marko
    Marko over 3 years
    Thank you for your comment, @AKareem, I hope I've improved my code now, I will look into your version too :-)
  • Jeff Hykin
    Jeff Hykin over 3 years
    This is a work of art. So few characters, and still nearly impossible to understand. 10/10 best Python code golf I've seen yet 🏌️‍♂️🏌️‍♀️⛳️. Having something this short almost makes up for the fact that Python doesn't have a built-in flatten function.
  • rvrvrv
    rvrvrv over 3 years
    Great stuff! For people (like me) who are using pandas anyway, this is a beautifully simple way
  • Eduardo Luis Santos
    Eduardo Luis Santos about 3 years
    RecursionError: maximum recursion depth exceeded in comparison
  • Shreyas
    Shreyas almost 3 years
    Please do not edit the answer. If you feel the need to "refactor", feel free to post as your own answer. There is a reason why the code is presented the way it is. It is to emphasise that the approach came from lisp. You can plain ignore the "one-liner" part of it - it was not intended as some kind of boasting. It was, again, to indicate that the thought behind it is still "one-liner": that of first and rest list processing.
  • mattino
    mattino over 2 years
    This looks very elegant and simple. Why it does not have more upvotes? Are there any problems with this solution?
  • Nevermore
    Nevermore over 2 years
    @jbuddy_13 yield from <iterable> is exactly the same as for x in <iterable>: yield x. It's basically a one liner syntax for a for-loop generator. So what flatten() is doing is it's checking for either one of two cases. The base case (in the else) is that the element is not an iterable, so yield that element. The recursive case (in the if) is that the element is an iterable, so pass it back into flatten() and yield whatever the results are.
  • ATOMP
    ATOMP about 2 years
    This could be made much more efficient by using collections.deque and not copying. Assuming you have control over the creation of the nested lists. Also, deepcopy hits the recursion limit.