Splitting a string by list of indices

41,217

Solution 1

s = 'long string that I want to split up'
indices = [0,5,12,17]
parts = [s[i:j] for i,j in zip(indices, indices[1:]+[None])]

returns

['long ', 'string ', 'that ', 'I want to split up']

which you can print using:

print '\n'.join(parts)

Another possibility (without copying indices) would be:

s = 'long string that I want to split up'
indices = [0,5,12,17]
indices.append(None)
parts = [s[indices[i]:indices[i+1]] for i in xrange(len(indices)-1)]

Solution 2

Here is a short solution with heavy usage of the itertools module. The tee function is used to iterate pairwise over the indices. See the Recipe section in the module for more help.

>>> from itertools import tee, izip_longest
>>> s = 'long string that I want to split up'
>>> indices = [0,5,12,17]
>>> start, end = tee(indices)
>>> next(end)
0
>>> [s[i:j] for i,j in izip_longest(start, end)]
['long ', 'string ', 'that ', 'I want to split up']

Edit: This is a version that does not copy the indices list, so it should be faster.

Solution 3

You can write a generator if you don't want to make any modifications to the list of indices:

>>> def split_by_idx(S, list_of_indices):
...     left, right = 0, list_of_indices[0]
...     yield S[left:right]
...     left = right
...     for right in list_of_indices[1:]:
...         yield S[left:right]
...         left = right
...     yield S[left:]
... 
>>> 
>>> 
>>> s = 'long string that I want to split up'
>>> indices = [5,12,17]
>>> [i for i in split_by_idx(s, indices)]
['long ', 'string ', 'that ', 'I want to split up']
Share:
41,217
Yarin
Author by

Yarin

Products PDF Buddy - Popular online PDF editor Gems Snappconfig - Smarter Rails app configuration

Updated on May 26, 2020

Comments

  • Yarin
    Yarin almost 4 years

    I want to split a string by a list of indices, where the split segments begin with one index and end before the next one.

    Example:

    s = 'long string that I want to split up'
    indices = [0,5,12,17]
    parts = [s[index:] for index in indices]
    for part in parts:
        print part
    

    This will return:

    long string that I want to split up
    string that I want to split up
    that I want to split up
    I want to split up

    I'm trying to get:

    long
    string
    that
    I want to split up

  • jamylak
    jamylak almost 12 years
    Another way is, [s[i:j] for i,j in izip_longest(indices,indices[1:])] but I like your way better!
  • schlamar
    schlamar almost 12 years
    This copies the indices list with indices[1:] and creates a new list with double size by the zip function -> Bad performance and memory consumption.
  • jamylak
    jamylak almost 12 years
    @ms4py This is fine, performance is not an issue in this case, this is a very readable solution. If performance is an issue my suggestion can be used.
  • Yarin
    Yarin almost 12 years
    eumiro- thank you, this works great. Can you explain how the +[None] part works?
  • eumiro
    eumiro almost 12 years
    @ms4py - ok, there's an updated version withou copying of the list and without zip. Although your itertools version is probably more performant.
  • eumiro
    eumiro almost 12 years
    @Yarin - indices[1:] + [None] copies the array without the first element and adds a None at the end. So for your indices it looks like [5,12,17,None]. I am using it to be able to access the last part of the string with s[17:None] (the same like s[17:], just using two variables I have anyway).
  • jamylak
    jamylak almost 12 years
    @Yarin [1:None] for example is the same as [1:]
  • Yarin
    Yarin almost 12 years
    Thanks for the alt approach- ill have to check out itertools sometime
  • jamylak
    jamylak almost 12 years
    @ms4py What do you mean by that?
  • Levon
    Levon almost 12 years
    Neat approach, learned something new. Is there an easy way to get rid of the extra blank at the end of the first 3 strings inside the expression? I tried s[i:j].strip() but that didn't work at all (not sure why not)
  • jamylak
    jamylak almost 12 years
    If you are gonna use this you may as well use the pairwise function straight from the itertools docs. Also using next(end) is preferred to end.next() for python 3 compatibility.
  • lonewarrior556
    lonewarrior556 about 4 years
    Not sure it's your fortee but how would on do this in NodeJs?
  • Siva Sankar
    Siva Sankar about 2 years
    This had been a hectic for me since an hour and half. Thanks @eumiro