Remove empty string from list

16,480

Solution 1

You can use a list comprehension to remove all elements that are '':

mylist = [1, 2, 3, '', 4]
mylist = [i for i in mylist if i != '']

Then you can calculate the average by taking the sum and dividing it by the number of elements in the list:

avg = sum(mylist)/len(mylist)

Floating Point Average (Assuming python 2)

Depending on your application you may want your average to be a float and not an int. If that is the case, cast one of these values to a float first:

avg = float(sum(mylist))/len(mylist)

Alternatively you can use python 3's division:

from __future__ import division
avg = sum(mylist)/len(mylist)

Solution 2

You can use filter():

filter() returns a list in Python 2 if we pass it a list and an iterator in Python 3. As suggested by @PhilH you can use itertools.ifilter() in Python 2 to get an iterator.

To get a list as output in Python 3 use list(filter(lambda x:x != '', lis))

In [29]: lis = [1, 2, 3, '', 4, 0]

In [30]: filter(lambda x:x != '', lis)
Out[30]: [1, 2, 3, 4, 0]

Note to filter any falsy value you can simply use filter(None, ...):

>>> lis = [1, 2, 3, '', 4, 0]
>>> filter(None, lis)
[1, 2, 3, 4]

Solution 3

The other answers show you how to create a new list with the desired element removed (which is the usual way to do this in python). However, there are occasions where you want to operate on a list in place -- Here's a way to do it operating on the list in place:

while True:
    try:
        mylist.remove('')
    except ValueError:
        break

Although I suppose it could be argued that you could do this with slice assignment and a list comprehension:

mylist[:] = [i for i in mylist if i != '']

And, as some have raised issues about memory usage and the wonders of generators:

mylist[:] = (i for i in mylist if i != '')

works too.

Solution 4

itertools.ifilterfalse(lambda x: x=='', myList)

This uses iterators, so it doesn't create copies of the list and should be more efficient both in time and memory, making it robust for long lists.

JonClements points out that this means keeping track of the length separately, so to show that process:

def ave(anyOldIterator):
    elementCount = 0
    runningTotal = 0
    for element in anyOldIterator:
        runningTotal += element
        elementCount += 1
    return runningTotal/elementCount

Or even better

def ave(anyOldIterator):
    idx = None
    runningTotal = 0
    for idx,element in enumerate(anyOldIterator):
        runningTotal += element
    return runningTotal/(idx+1)

Reduce:

def ave(anyOldIterator):
    pieces = reduce(lambda x,y: (y[0],x[1]+y[1]), enumerate(anyOldIterator))
    return pieces[1]/(pieces[0]+1)

Timeit on the average of range(0,1000) run 10000 times gives the list comprehension a time of 0.9s and the reduce version 0.16s. So it's already 5x faster before we add in filtering.

Solution 5

You can use:

alist = ['',1,2]
new_alist = filter(None, alist)
new_alist_2 = filter(bool, alist)

Result:

new_alist = [1,2]
new_alist_2 = [1,2]
Share:
16,480
user1783702
Author by

user1783702

Updated on July 03, 2022

Comments

  • user1783702
    user1783702 almost 2 years

    I just started Python classes and I'm really in need of some help. Please keep in mind that I'm new if you're answering this.

    I have to make a program that takes the average of all the elements in a certain list "l". That is a pretty easy function by itself; the problem is that the teacher wants us to remove any empty string present in the list before doing the average.

    So when I receive the list [1,2,3,'',4] I want the function to ignore the '' for the average, and just take the average of the other 4/len(l). Can anyone help me with this?

    Maybe a cycle that keeps comparing a certain position from the list with the '' and removes those from the list? I've tried that but it's not working.

  • Gareth Latty
    Gareth Latty over 11 years
    This is a little dangerous as 0 could get stripped out too.
  • Matt
    Matt over 11 years
    Is there a reason you can't just do [x for x in a if a != '']?
  • Gareth Latty
    Gareth Latty over 11 years
    As the OP specifically wants to get rid of empty strings, I think this is the best solution, it's clear and concise.
  • Gareth Latty
    Gareth Latty over 11 years
    With the update, this becomes a good all-round solution, however it may be overkill for the asker's use case.
  • Gareth Latty
    Gareth Latty over 11 years
    @Matt This potentially strips out [] and other similar things too, but to be honest, if you want that extra functionality, I'd go with ragsagar's updated answer as it'll handle more cases.
  • Rag Sagar
    Rag Sagar over 11 years
    @Matt it is [x for x in a if x != '']
  • Russell Smith
    Russell Smith over 11 years
    Comparing to false is decidedly different then comparing for equality to '', and obscures the actual intent (which is specifically to strip out '')
  • Rag Sagar
    Rag Sagar over 11 years
    Updated to consider 0 as Lattyware said. But i too think it is an overkill.
  • Russell Smith
    Russell Smith over 11 years
    Too complex for a case where a simple list comprehension will suffice.
  • Rag Sagar
    Rag Sagar over 11 years
    By the time i was about answer Matt asnwered. Thought let it be there as an alternative.
  • Jon Clements
    Jon Clements over 11 years
    How about: idx = [idx for idx, val in enumerate(x) if val=='']; for i in reversed(idx): x.pop(i) ? - more efficient than .remove...
  • Matt
    Matt over 11 years
    This will also remove 0 were it an element in the list.
  • Phil H
    Phil H over 11 years
    An equivalent generator solution would be icing on the cake, to prevent extra copies of the lists.
  • Matt
    Matt over 11 years
    @PhilH I thought about that, but you can't get the length of generator expressions so it would make this much more complicated.
  • Jon Clements
    Jon Clements over 11 years
    Unfortunately with the side effect that the length will need to be tracked separately to compute an average...
  • Phil H
    Phil H over 11 years
    I'd suggest using itertools.ifilter() as it is iterator based rather than creating a copy of the list just for an average.
  • Ashwini Chaudhary
    Ashwini Chaudhary over 11 years
    @PhilH yes on python 2.x ifilter() should be preferred, on python 3.x filter() returns an iterator itself.
  • Phil H
    Phil H over 11 years
    @JonClements, why would that make a difference? I'll illuminate in an edit.
  • mgilson
    mgilson over 11 years
    @JonClements -- You're right, it is more efficient, but it's also less explicit. (and performs slightly slower than the easier to read slice-assignment version for a list set up like lst = [1,2,3,'',4]*20)
  • Matt
    Matt over 11 years
    @pistache how did you calculate the average from this in your speed test?
  • Matt
    Matt over 11 years
    @ragsagar yes, I mistyped that, you have what I meant.
  • mgilson
    mgilson over 11 years
    If you explain how I could improve this answer, I'd be happy to edit (or if you have appropriate privileges yourself, feel free to edit)
  • DSM
    DSM over 11 years
    This will be a fair bit slower than a listcomp.
  • mgilson
    mgilson over 11 years
    @Matt -- If we're interested in keeping py2k compatability, you may want to consider dividing by float(len(mylist)) -- just to make sure you do "true division" here, but +1 from me
  • pistache
    pistache over 11 years
    @Matt I used timeit(), but for this one I was hesitating between using timeit() on list(itertools.i...) and on computing the generator. If I use list(), the result is different. I'll correct my post.
  • mgilson
    mgilson over 11 years
    for the solution by Phil H, are you actually iterating over the iterator that is returned, or are you just creating the iterator and doing nothing useful with it??
  • Matt
    Matt over 11 years
    @mgilson For some applications that may be what you want, for others you may need an integer average. Ill include it
  • mgilson
    mgilson over 11 years
    @Matt -- If you want an integer average, you should use // instead of / :-)
  • Jon Clements
    Jon Clements over 11 years
    Umm, well +1 from me anyway for alternate solutions - not sure why you got -1 though...
  • Phil H
    Phil H over 11 years
    @pistache: surely the most appropriate timing is the calculation of an average? If you do it without any list() call as in my edit above, or using a reduce step, then it should be faster for large lists.
  • mgilson
    mgilson over 11 years
    @JonClements -- I don't know either, that's how it goes sometimes :-). (I liked your clever use of reversed and pop by the way -- I would have added it if it did better than the in place version with slice assignment)
  • Jon Clements
    Jon Clements over 11 years
    Well, it wasn't meant to "compete" with the inplace assignment, more so the try/remove/except method... But thanks ;)
  • mgilson
    mgilson over 11 years
    I think runningTotal ought to be initialized to 0, otherwise you get None += first_element which is nonsense.
  • Jon Clements
    Jon Clements over 11 years
    JonClements suggests this makes the length a problem - to be fair, that's not what I said - my statement the length will need to be tracked separately was referring to the fact that extra work would need to be done (as you have shown) as sum(blah)/len(blah) isn't possible
  • Phil H
    Phil H over 11 years
    @JonClements: Yes, my apologies. My original reading was somehow that it was more critical than it actually is. You were spot on. I had a vague impression there was a last in Python, which would have made the enumerate version easey, but there isn't.
  • Matt
    Matt over 11 years
    @pistache The op said he wanted to strip out '' and nothing else. He did not say he wanted to keep integers, and convert strings to integers, as ragsagar's answer does.
  • Matt
    Matt over 11 years
    @pistache How did you calculate my time vs your time? After your edit, are we not doing the same thing?
  • pistache
    pistache over 11 years
    @Matt the time I count for me is by testing for False while including 0s (my first solution).
  • Matt
    Matt over 11 years
    @pistache That is not what the op asked for
  • Rag Sagar
    Rag Sagar over 11 years
    I don't know what this is all about. This is a homework problem and i dont think there is any need of optimization. i wonder how you calculated the time of execution. IMHO Matt's solution is straight forward and clear.
  • Matt
    Matt over 11 years
    gives the list comprehension a time of 0.9s and the reduce version 0.16s. So it's already 5x faster before we add in filtering are you comparing the iterator version that doesn't filter out empty strings to the list comprehension version that does? that doesn't seem like a valid comparison.
  • Phil H
    Phil H over 11 years
    @Matt, no, just comparing the iterator version that doesn't filter with the list comprehension that doesn't filter. It's a straight iterator vs LC. It might be more relevant to include the filtering here, but it would probably also be relevant to test it on the real inputs. The iterator/LC comparison is a general point really, not limited to this use case.
  • Matt
    Matt over 11 years
    @PhilH the list comprehension that doesn't filter doesn't actually do anything though
  • Phil H
    Phil H over 11 years
    Ok, Matt, it's not meant to be a complete program, it's supposed to illustrate the differences between iterators and lists.