Is there a way to split a string by every nth separator in Python?
Solution 1
Here’s another solution:
span = 2
words = "this-is-a-string".split("-")
print ["-".join(words[i:i+span]) for i in range(0, len(words), span)]
Solution 2
>>> s="a-b-c-d-e-f-g-h-i-j-k-l" # use zip(*[i]*n)
>>> i=iter(s.split('-')) # for the nth case
>>> map("-".join,zip(i,i))
['a-b', 'c-d', 'e-f', 'g-h', 'i-j', 'k-l']
>>> i=iter(s.split('-'))
>>> map("-".join,zip(*[i]*3))
['a-b-c', 'd-e-f', 'g-h-i', 'j-k-l']
>>> i=iter(s.split('-'))
>>> map("-".join,zip(*[i]*4))
['a-b-c-d', 'e-f-g-h', 'i-j-k-l']
Sometimes itertools.izip is faster as you can see in the results
>>> from itertools import izip
>>> s="a-b-c-d-e-f-g-h-i-j-k-l"
>>> i=iter(s.split("-"))
>>> ["-".join(x) for x in izip(i,i)]
['a-b', 'c-d', 'e-f', 'g-h', 'i-j', 'k-l']
Here is a version that sort of works with an odd number of parts depending what output you desire in that case. You might prefer to trim the '-'
off the end of the last element with .rstrip('-')
for example.
>>> from itertools import izip_longest
>>> s="a-b-c-d-e-f-g-h-i-j-k-l-m"
>>> i=iter(s.split('-'))
>>> map("-".join,izip_longest(i,i,fillvalue=""))
['a-b', 'c-d', 'e-f', 'g-h', 'i-j', 'k-l', 'm-']
Here are some timings
$ python -m timeit -s 'import re;r=re.compile("[^-]+-[^-]+");s="a-b-c-d-e-f-g-h-i-j-k-l"' 'r.findall(s)'
100000 loops, best of 3: 4.31 usec per loop
$ python -m timeit -s 'from itertools import izip;s="a-b-c-d-e-f-g-h-i-j-k-l"' 'i=iter(s.split("-"));["-".join(x) for x in izip(i,i)]'
100000 loops, best of 3: 5.41 usec per loop
$ python -m timeit -s 's="a-b-c-d-e-f-g-h-i-j-k-l"' 'i=iter(s.split("-"));["-".join(x) for x in zip(i,i)]'
100000 loops, best of 3: 7.3 usec per loop
$ python -m timeit -s 's="a-b-c-d-e-f-g-h-i-j-k-l"' 't=s.split("-");["-".join(t[i:i+2]) for i in range(0, len(t), 2)]'
100000 loops, best of 3: 7.49 usec per loop
$ python -m timeit -s 's="a-b-c-d-e-f-g-h-i-j-k-l"' '["-".join([x,y]) for x,y in zip(s.split("-")[::2], s.split("-")[1::2])]'
100000 loops, best of 3: 9.51 usec per loop
Solution 3
Regular expressions handle this easily:
import re
s = "aaaa-aa-bbbb-bb-c-ccccc-d-ddddd"
print re.findall("[^-]+-[^-]+", s)
Output:
['aaaa-aa', 'bbbb-bb', 'c-ccccc', 'd-ddddd']
Update for Nick D:
n = 3
print re.findall("-".join(["[^-]+"] * n), s)
Output:
['aaaa-aa-bbbb', 'bb-c-ccccc']
Solution 4
EDIT: The original code I posted didn't work. This version does:
I don't think you can split on every other one, but you could split on every - and join every pair.
chunks = []
content = "this-is-a-string"
split_string = content.split('-')
for i in range(0, len(split_string) - 1,2) :
if i < len(split_string) - 1:
chunks.append("-".join([split_string[i], split_string[i+1]]))
else:
chunks.append(split_string[i])
Related videos on Youtube
Gnuffo1
Updated on September 20, 2020Comments
-
Gnuffo1 over 3 years
For example, if I had the following string:
"this-is-a-string"
Could I split it by every 2nd "-" rather than every "-" so that it returns two values ("this-is" and "a-string") rather than returning four?
-
Mamey over 14 yearssorry, i misread your question first time and rewrote it, n was a leftover from previous. Now it gives a list of strings.
-
Jed Smith over 14 yearsProbably the most elegant solution which is still readable, the rest are stretching it.
-
Gumbo over 14 years… and only for an even number of words.
-
B Bulfin over 14 yearsNick: Not so. See my update. Gumbo: Also not so. Just a simple change to the regex will handle that case as well if it is desired.
-
Nick Dandoulakis over 14 years@recursive, ok but I don't see the
d-ddddd
in the output ;-) -
hasen over 14 yearssorry, have to -1, too complicated, uses regex,
-
Gumbo over 14 yearsYou’re using the wrong code for my proposal. I’m operating on the words an not the string.
python -m timeit -s 's="a-b-c-d-e-f-g-h-i-j-k-l".split("-")' '["-".join(s[i:i+2]) for i in range(0, len(s), 2)]'
-
RedGlyph over 14 yearsNicely done, but fails for an odd number of elements. It shouldn't be too hard to overcome though.
-
John La Rooy over 14 years@Gumbo, sorry, I fixed it to match your comment, I've just moved the
split()
out of the setup clause and usedt
as a temporary variable -
Bruno Feroleto over 14 yearsThis is very complicated (long to read/decipher), compared to many of the other solutions proposed here…
-
Gnuffo1 over 14 yearsThis seems the simplest for working for a variable length between seperations.
-
Mamey over 14 yearsrubbish. it's actually much easier to decipher. length is largely irrelevant and it could be shortened by making it less readable. It should have good performance since the loop only has a simple test condition to deal with. Also it has the most flexibility for handling other processing inside the loop. Also the winning answer will crash on a string with an odd number of hyphens. Iter and list ops might be pythonic but that doesn't necessarily make them 'better'.
-
Patrick Artner about 5 yearsYour code has many errors:
# TypeError: insert() takes exactly 2 arguments (1 given)
can be fixed by usingappend
instead -> returns['-this-is-a-string']
. While it now runs, the result is false. Fixing the split character to'-'
: returns['-this', 'is-a']
which is better but still wrong. By moving thec += 1
line after the `else-clause you can fix that as well. After those fixes the solution is fine