Is there a better way to use strip() on a list of strings? - python

23,212

Solution 1

You probably shouldn't be using list as a variable name since it's a type. Regardless:

list = map(str.strip, list) 

This will apply the function str.strip to every element in list, return a new list, and store the result back in list.

Solution 2

You could use list comprehensions

stripped_list = [j.strip() for j in initial_list]

Solution 3

Some intriguing discussions on performance happened here, so let me provide a benchmark:

http://ideone.com/ldId8

noslice_map              : 0.0814900398254
slice_map                : 0.084676027298
noslice_comprehension    : 0.0927240848541
slice_comprehension      : 0.124806165695
iter_manual              : 0.133514881134
iter_enumerate           : 0.142778873444
iter_range               : 0.160353899002

So:

  1. map(str.strip, my_list) is the fastest way, it's just a little bit faster than comperhensions.
    • Use map or itertools.imap if there's a single function that you want to apply (like str.split)
    • Use comprehensions if there's a more complicated expression
  2. Manual iteration is the slowest way; a reasonable explanation is that it requires the interpreter to do more work and the efficient C runtime does less
  3. Go ahead and assign the result like my_list[:] = map..., the slice notation introduces only a small overhead and is likely to spare you some bugs if there are multiple references to that list.
    • Know the difference between mutating a list and re-creating it.

Solution 4

I think you mean

a_list = [s.strip() for s in a_list]

Using a generator expression may be a better approach, like this:

stripped_list = (s.strip() for s in a_list)

offers the benefit of lazy evaluation, so the strip only runs when the given element, stripped, is needed.

If you need references to the list to remain intact outside the current scope, you might want to use list slice syntax.:

a_list[:] = [s.strip() for s in a_list]

For commenters interested in the speed of various approaches, it looks as if in CPython the generator-to-slice approach is the least efficient:

>>> from timeit import timeit as t
>>> t("""a[:]=(s.strip() for s in a)""", """a=[" %d " % s for s in range(10)]""")
4.35184121131897
>>> t("""a[:]=[s.strip() for s in a]""", """a=[" %d " % s for s in range(10)]""")
2.9129951000213623
>>> t("""a=[s.strip() for s in a]""", """a=[" %d " % s for s in range(10)]""")
2.47947096824646
Share:
23,212
alvas
Author by

alvas

食飽未?

Updated on August 30, 2020

Comments

  • alvas
    alvas over 3 years

    For now i've been trying to perform strip() on a list of strings and i did this:

    i = 0
    for j in alist:
        alist[i] = j.strip()
        i+=1
    

    Is there a better way of doing that?

    • KRyan
      KRyan over 11 years
      Upvoting for random anonymous uncommented downvote. If there is something wrong with the question, it's utterly meaningless to downvote without telling the author what.
    • Kos
      Kos over 11 years
      If you want to iterate using indices, do for (i, value) in enumerate(alist)
    • Kos
      Kos over 11 years
      I've added a benchmark which compares some options described here.
  • Kos
    Kos over 11 years
    Why say "supposedly slightly more efficient" instead of profiling and checking? And BTW [:] is useful because then it alters the same list, not re-assigns the variable to a new list.
  • Admin
    Admin over 11 years
    It's less efficient because it has to copy N items instead of replacing the reference to the list. The only "advantage", which you may not need or want, is that the change is visible to anyone who has another reference to the original list object.
  • Kos
    Kos over 11 years
    +1 that's the way. And if you want to alter the same list instance instead of binding the variable to a new one (say, not to break other references to this list), use the slice syntax like @kojiro said
  • Marcin
    Marcin over 11 years
    An example where map is an excellent choice. (itertools.imap might or might not be better, of course, as for example when assigning to a slice).
  • Sean W.
    Sean W. over 11 years
    imho, that's unpythonic.
  • Marcin
    Marcin over 11 years
    @Kos In that case, an iterator-based solution would be even better (as it avoids creating a whole list which is then unreferenced and awaiting garbage collection).
  • Marcin
    Marcin over 11 years
    I've changed this to a generator expression, as it's vastly more appropriate.
  • Surya
    Surya over 11 years
    Do you think list comprehensions make code work faster?? or just smaller??
  • kojiro
    kojiro over 11 years
    @Marcin it might be a more appropriate approach, but it's an incorrect answer to the question asked. I edited the question to describe both options.
  • alvas
    alvas over 11 years
    no worries, memory shouldn't be a problem since i'm reading a file, searching a string and dumping it away once i've found the index of a string. =)
  • karthikr
    karthikr over 11 years
    List comprehensions are very efficient for iterable object with simple rules. You may use maps and list comprehensions depending on the complexity. But yes, they do provide a quick and efficient implementation
  • Izkata
    Izkata over 11 years
    Do you mean my_list = map(str.strip, list[:])? 'Cause the other way gives me a NameError.
  • Marcin
    Marcin over 11 years
    @kojiro If you are assigning to a slice, a generator is more appropriate. You have edited your question to eliminate slice assignment.
  • kojiro
    kojiro over 11 years
    @Marcin how is it more appropriate? I've added timeits and it doesn't seem to be as efficient. (I'm not equating efficiency and appropriateness, but I genuinely don't know why it would be more appropriate in the absence of efficiency.)
  • Marcin
    Marcin over 11 years
    @kojiro You'll likely see better efficiency for larger lists, as less memory allocation will occur; secondly in real usage it is likely to lead to better overall performance, as there will be less in the way of garbage collectible, but uncollected objects hanging around.
  • Marcin
    Marcin over 11 years
    Also, it's generally nicer for everybody else if you don't copy the interpreter prompts
  • kojiro
    kojiro over 11 years
    @Marcin if I don't copy the interpreter prompts, how can you tell the difference between a command and its output?
  • Marcin
    Marcin over 11 years
    @kojiro I usually comment the output like so # => , but a simple comment will suffice.
  • Kos
    Kos over 11 years
    I mean my_list[:] = map(str.strip, my_list). See the code under the link.
  • shantanoo
    shantanoo over 10 years
    Instead of using map and storing the data in the list again, itertools.imap is better in case of python 2.x. In python 3.x map will return iter.