Is there a way to split a string by every nth separator in Python?

18,834

Solution 1

Here’s another solution:

span = 2
words = "this-is-a-string".split("-")
print ["-".join(words[i:i+span]) for i in range(0, len(words), span)]

Solution 2

>>> s="a-b-c-d-e-f-g-h-i-j-k-l"         # use zip(*[i]*n)
>>> i=iter(s.split('-'))                # for the nth case    
>>> map("-".join,zip(i,i))    
['a-b', 'c-d', 'e-f', 'g-h', 'i-j', 'k-l']

>>> i=iter(s.split('-'))
>>> map("-".join,zip(*[i]*3))
['a-b-c', 'd-e-f', 'g-h-i', 'j-k-l']
>>> i=iter(s.split('-'))
>>> map("-".join,zip(*[i]*4))
['a-b-c-d', 'e-f-g-h', 'i-j-k-l']

Sometimes itertools.izip is faster as you can see in the results

>>> from itertools import izip
>>> s="a-b-c-d-e-f-g-h-i-j-k-l"
>>> i=iter(s.split("-"))
>>> ["-".join(x) for x in izip(i,i)]
['a-b', 'c-d', 'e-f', 'g-h', 'i-j', 'k-l']

Here is a version that sort of works with an odd number of parts depending what output you desire in that case. You might prefer to trim the '-' off the end of the last element with .rstrip('-') for example.

>>> from itertools import izip_longest
>>> s="a-b-c-d-e-f-g-h-i-j-k-l-m"
>>> i=iter(s.split('-'))
>>> map("-".join,izip_longest(i,i,fillvalue=""))
['a-b', 'c-d', 'e-f', 'g-h', 'i-j', 'k-l', 'm-']

Here are some timings

$ python -m timeit -s 'import re;r=re.compile("[^-]+-[^-]+");s="a-b-c-d-e-f-g-h-i-j-k-l"' 'r.findall(s)'
100000 loops, best of 3: 4.31 usec per loop

$ python -m timeit -s 'from itertools import izip;s="a-b-c-d-e-f-g-h-i-j-k-l"' 'i=iter(s.split("-"));["-".join(x) for x in izip(i,i)]'
100000 loops, best of 3: 5.41 usec per loop

$ python -m timeit -s 's="a-b-c-d-e-f-g-h-i-j-k-l"' 'i=iter(s.split("-"));["-".join(x) for x in zip(i,i)]'
100000 loops, best of 3: 7.3 usec per loop

$ python -m timeit -s 's="a-b-c-d-e-f-g-h-i-j-k-l"' 't=s.split("-");["-".join(t[i:i+2]) for i in range(0, len(t), 2)]'
100000 loops, best of 3: 7.49 usec per loop

$ python -m timeit -s 's="a-b-c-d-e-f-g-h-i-j-k-l"' '["-".join([x,y]) for x,y in zip(s.split("-")[::2], s.split("-")[1::2])]'
100000 loops, best of 3: 9.51 usec per loop

Solution 3

Regular expressions handle this easily:

import re
s = "aaaa-aa-bbbb-bb-c-ccccc-d-ddddd"
print re.findall("[^-]+-[^-]+", s)

Output:

['aaaa-aa', 'bbbb-bb', 'c-ccccc', 'd-ddddd']

Update for Nick D:

n = 3
print re.findall("-".join(["[^-]+"] * n), s)

Output:

['aaaa-aa-bbbb', 'bb-c-ccccc']

Solution 4

EDIT: The original code I posted didn't work. This version does:

I don't think you can split on every other one, but you could split on every - and join every pair.

chunks = []
content = "this-is-a-string"
split_string = content.split('-')

for i in range(0, len(split_string) - 1,2) :
    if i < len(split_string) - 1:
        chunks.append("-".join([split_string[i], split_string[i+1]]))
    else:
        chunks.append(split_string[i])
Share:
18,834

Related videos on Youtube

Gnuffo1
Author by

Gnuffo1

Updated on September 20, 2020

Comments

  • Gnuffo1
    Gnuffo1 over 3 years

    For example, if I had the following string:

    "this-is-a-string"

    Could I split it by every 2nd "-" rather than every "-" so that it returns two values ("this-is" and "a-string") rather than returning four?

  • Mamey
    Mamey over 14 years
    sorry, i misread your question first time and rewrote it, n was a leftover from previous. Now it gives a list of strings.
  • Jed Smith
    Jed Smith over 14 years
    Probably the most elegant solution which is still readable, the rest are stretching it.
  • Gumbo
    Gumbo over 14 years
    … and only for an even number of words.
  • B Bulfin
    B Bulfin over 14 years
    Nick: Not so. See my update. Gumbo: Also not so. Just a simple change to the regex will handle that case as well if it is desired.
  • Nick Dandoulakis
    Nick Dandoulakis over 14 years
    @recursive, ok but I don't see the d-ddddd in the output ;-)
  • hasen
    hasen over 14 years
    sorry, have to -1, too complicated, uses regex,
  • Gumbo
    Gumbo over 14 years
    You’re using the wrong code for my proposal. I’m operating on the words an not the string. python -m timeit -s 's="a-b-c-d-e-f-g-h-i-j-k-l".split("-")' '["-".join(s[i:i+2]) for i in range(0, len(s), 2)]'
  • RedGlyph
    RedGlyph over 14 years
    Nicely done, but fails for an odd number of elements. It shouldn't be too hard to overcome though.
  • John La Rooy
    John La Rooy over 14 years
    @Gumbo, sorry, I fixed it to match your comment, I've just moved the split() out of the setup clause and used t as a temporary variable
  • Bruno Feroleto
    Bruno Feroleto over 14 years
    This is very complicated (long to read/decipher), compared to many of the other solutions proposed here…
  • Gnuffo1
    Gnuffo1 over 14 years
    This seems the simplest for working for a variable length between seperations.
  • Mamey
    Mamey over 14 years
    rubbish. it's actually much easier to decipher. length is largely irrelevant and it could be shortened by making it less readable. It should have good performance since the loop only has a simple test condition to deal with. Also it has the most flexibility for handling other processing inside the loop. Also the winning answer will crash on a string with an odd number of hyphens. Iter and list ops might be pythonic but that doesn't necessarily make them 'better'.
  • Patrick Artner
    Patrick Artner about 5 years
    Your code has many errors: # TypeError: insert() takes exactly 2 arguments (1 given) can be fixed by using append instead -> returns ['-this-is-a-string']. While it now runs, the result is false. Fixing the split character to '-': returns ['-this', 'is-a'] which is better but still wrong. By moving the c += 1 line after the `else-clause you can fix that as well. After those fixes the solution is fine

Related