Most pythonic way to interleave two strings
Solution 1
For me, the most pythonic* way is the following which pretty much does the same thing but uses the +
operator for concatenating the individual characters in each string:
res = "".join(i + j for i, j in zip(u, l))
print(res)
# 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
It is also faster than using two join()
calls:
In [5]: l1 = 'A' * 1000000; l2 = 'a' * 1000000
In [6]: %timeit "".join("".join(item) for item in zip(l1, l2))
1 loops, best of 3: 442 ms per loop
In [7]: %timeit "".join(i + j for i, j in zip(l1, l2))
1 loops, best of 3: 360 ms per loop
Faster approaches exist, but they often obfuscate the code.
Note: If the two input strings are not the same length then the longer one will be truncated as zip
stops iterating at the end of the shorter string. In this case instead of zip
one should use zip_longest
(izip_longest
in Python 2) from the itertools
module to ensure that both strings are fully exhausted.
*To take a quote from the Zen of Python: Readability counts.
Pythonic = readability for me; i + j
is just visually parsed more easily, at least for my eyes.
Solution 2
Faster Alternative
Another way:
res = [''] * len(u) * 2
res[::2] = u
res[1::2] = l
print(''.join(res))
Output:
'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
Speed
Looks like it is faster:
%%timeit
res = [''] * len(u) * 2
res[::2] = u
res[1::2] = l
''.join(res)
100000 loops, best of 3: 4.75 µs per loop
than the fastest solution so far:
%timeit "".join(list(chain.from_iterable(zip(u, l))))
100000 loops, best of 3: 6.52 µs per loop
Also for the larger strings:
l1 = 'A' * 1000000; l2 = 'a' * 1000000
%timeit "".join(list(chain.from_iterable(zip(l1, l2))))
1 loops, best of 3: 151 ms per loop
%%timeit
res = [''] * len(l1) * 2
res[::2] = l1
res[1::2] = l2
''.join(res)
10 loops, best of 3: 92 ms per loop
Python 3.5.1.
Variation for strings with different lengths
u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = 'abcdefghijkl'
Shorter one determines length (zip()
equivalent)
min_len = min(len(u), len(l))
res = [''] * min_len * 2
res[::2] = u[:min_len]
res[1::2] = l[:min_len]
print(''.join(res))
Output:
AaBbCcDdEeFfGgHhIiJjKkLl
Longer one determines length (itertools.zip_longest(fillvalue='')
equivalent)
min_len = min(len(u), len(l))
res = [''] * min_len * 2
res[::2] = u[:min_len]
res[1::2] = l[:min_len]
res += u[min_len:] + l[min_len:]
print(''.join(res))
Output:
AaBbCcDdEeFfGgHhIiJjKkLlMNOPQRSTUVWXYZ
Solution 3
With join()
and zip()
.
>>> ''.join(''.join(item) for item in zip(u,l))
'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
Solution 4
On Python 2, by far the faster way to do things, at ~3x the speed of list slicing for small strings and ~30x for long ones, is
res = bytearray(len(u) * 2)
res[::2] = u
res[1::2] = l
str(res)
This wouldn't work on Python 3, though. You could implement something like
res = bytearray(len(u) * 2)
res[::2] = u.encode("ascii")
res[1::2] = l.encode("ascii")
res.decode("ascii")
but by then you've already lost the gains over list slicing for small strings (it's still 20x the speed for long strings) and this doesn't even work for non-ASCII characters yet.
FWIW, if you are doing this on massive strings and need every cycle, and for some reason have to use Python strings... here's how to do it:
res = bytearray(len(u) * 4 * 2)
u_utf32 = u.encode("utf_32_be")
res[0::8] = u_utf32[0::4]
res[1::8] = u_utf32[1::4]
res[2::8] = u_utf32[2::4]
res[3::8] = u_utf32[3::4]
l_utf32 = l.encode("utf_32_be")
res[4::8] = l_utf32[0::4]
res[5::8] = l_utf32[1::4]
res[6::8] = l_utf32[2::4]
res[7::8] = l_utf32[3::4]
res.decode("utf_32_be")
Special-casing the common case of smaller types will help too. FWIW, this is only 3x the speed of list slicing for long strings and a factor of 4 to 5 slower for small strings.
Either way I prefer the join
solutions, but since timings were mentioned elsewhere I thought I might as well join in.
Solution 5
If you want the fastest way, you can combine itertools with operator.add
:
In [36]: from operator import add
In [37]: from itertools import starmap, izip
In [38]: timeit "".join([i + j for i, j in uzip(l1, l2)])
1 loops, best of 3: 142 ms per loop
In [39]: timeit "".join(starmap(add, izip(l1,l2)))
1 loops, best of 3: 117 ms per loop
In [40]: timeit "".join(["".join(item) for item in zip(l1, l2)])
1 loops, best of 3: 196 ms per loop
In [41]: "".join(starmap(add, izip(l1,l2))) == "".join([i + j for i, j in izip(l1, l2)]) == "".join(["".join(item) for item in izip(l1, l2)])
Out[42]: True
But combining izip
and chain.from_iterable
is faster again
In [2]: from itertools import chain, izip
In [3]: timeit "".join(chain.from_iterable(izip(l1, l2)))
10 loops, best of 3: 98.7 ms per loop
There is also a substantial difference between
chain(*
and chain.from_iterable(...
.
In [5]: timeit "".join(chain(*izip(l1, l2)))
1 loops, best of 3: 212 ms per loop
There is no such thing as a generator with join, passing one is always going to be slower as python will first build a list using the content because it does two passes over the data, one to figure out the size needed and one to actually do the join which would not be possible using a generator:
/* Here is the general case. Do a pre-pass to figure out the total
* amount of space we'll need (sz), and see whether all arguments are
* bytes-like.
*/
Also if you have different length strings and you don't want to lose data you can use izip_longest :
In [22]: from itertools import izip_longest
In [23]: a,b = "hlo","elworld"
In [24]: "".join(chain.from_iterable(izip_longest(a, b,fillvalue="")))
Out[24]: 'helloworld'
For python 3 it is called zip_longest
But for python2, veedrac's suggestion is by far the fastest:
In [18]: %%timeit
res = bytearray(len(u) * 2)
res[::2] = u
res[1::2] = l
str(res)
....:
100 loops, best of 3: 2.68 ms per loop
Related videos on Youtube
Brandon Deo
Updated on June 08, 2022Comments
-
Brandon Deo almost 2 years
What's the most pythonic way to mesh two strings together?
For example:
Input:
u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' l = 'abcdefghijklmnopqrstuvwxyz'
Output:
'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
-
SuperBiasedMan over 8 yearsAnswers here have largely assumed that your two input strings will be the same length. Is that a safe assumption or do you need that to be handled?
-
Brandon Deo over 8 years@SuperBiasedMan It may be helpful to see how to handle all conditions if you have a solution. It's relevant to the question, but not my case specifically.
-
SuperBiasedMan over 8 years@drexx The top answerer commented with a solution for it anyway, so I just edited it into their post so it's comprehensive.
-
-
Blender over 8 yearsOr
''.join(itertools.chain.from_iterable(zip(u, l)))
-
TigerhawkT3 over 8 yearsCoding effort for n strings is O(n), though. Still, it's good as long as n is small.
-
Padraic Cunningham over 8 yearsYour generator is probably causing more overhead than the join.
-
Padraic Cunningham over 8 yearsrun
"".join([i + j for i, j in zip(l1, l2)])
and it will definitely be the fastest -
Copperfield over 8 yearswhy
list
?? is unneeded -
Copperfield over 8 yearsnot according to my tests, you lose time making the intermediary list and that defeat the purpose of using iterators. Timeit the
"".join(list(...))
give me 6.715280318699769 and timeit the"".join(starmap(...))
give me 6.46332361384313 -
SuperBiasedMan over 8 yearsThis will truncate a list if one is shorter than the other, as
zip
stops when the shorter list has been fully iterated over. -
TigerhawkT3 over 8 years@SuperBiasedMan - Yep.
itertools.zip_longest
can be used if it becomes an issue. -
Copperfield over 8 yearsthen what, is machine dependent?? because no matter where I run the test I get the same exact result
"".join(list(starmap(add, izip(l1,l2))))
is slower than"".join(starmap(add, izip(l1,l2)))
. I run the test in my machine in python 2.7.11 and in python 3.5.1 even in the virtual console of www.python.org with python 3.4.3 and all say the same and I run it a couple of times and always the same -
Copperfield over 8 yearsI read and I what I see is that it build a list internally all the time in its buffers variable regarless of what you pass to it, so the more reason to NO give it a list
-
Aleksi Torhamo over 8 years
"".join(map("".join, zip(l1, l2)))
is even faster, although not necessarily more pythonic. -
Padraic Cunningham over 8 years@Copperfield, are you talking about the list call or passing a list?
-
Copperfield over 8 yearsthe list call, in
list(starmap(...))
vsstarmap(...)
or similar with any of the itertools functions. In passing a list vs passing general generator likejoin([ a+b for...])
vsjoin( a+b for ...)
my tests agree with yours -
Veedrac over 8 yearsThere's always
map(add, l1, l2)
for prettiness. It seems to be slower thanstarmap
though. That said, I can't repro the list comprehension being slower thanstarmap
. -
Veedrac over 8 years@PadraicCunningham wrt.
list(...)
being slower, manually callinglist
won't make things faster. The only reason"".join([x for x in y])
is recommended over"".join(x for x in y)
is that the latter creates a generator, which has pause-resume overhead. Doing"".join(list(x for x in y))
wouldn't help things. -
Padraic Cunningham over 8 years@Veedrac, I thought they were talking about a list vs a generator, the list call is not needed but it adds about 1 percent overhead so it does not have much of a bearing in either case. The only thing that makes a significant difference is using a generator vs a list comprehension
-
Curt over 8 yearsHe said most Pythonic, not most Haskellic ;)
-
scnerd over 8 yearsStill not as fast as the fastest answer, though: which got 50.3 ms on this same data and computer
-
jfs about 6 yearsyou don't need starmap here:
''.join(map(add, a, b))
-
Kelly Bundy almost 2 yearsThis builds the list
[''] * len(u)
and then throws it away. Better do[''] * (len(u) * 2)
. -
Kelly Bundy almost 2 yearsMakes the solution ~10% faster in my test.