Element-wise string concatenation in numpy
Solution 1
This can be done using numpy.core.defchararray.add. Here is an example:
>>> import numpy as np
>>> a1 = np.array(['a', 'b'])
>>> a2 = np.array(['E', 'F'])
>>> np.core.defchararray.add(a1, a2)
array(['aE', 'bF'],
dtype='<U2')
There are other useful string operations available for NumPy data types.
Solution 2
You can use the chararray
subclass to perform array operations with strings:
a1 = np.char.array(['a', 'b'])
a2 = np.char.array(['E', 'F'])
a1 + a2
#chararray(['aE', 'bF'], dtype='|S2')
another nice example:
b = np.array([2, 4])
a1*b
#chararray(['aa', 'bbbb'], dtype='|S4')
Solution 3
This can (and should) be done in pure Python, as numpy
also uses the Python string manipulation functions internally:
>>> a1 = ['a','b']
>>> a2 = ['E','F']
>>> map(''.join, zip(a1, a2))
['aE', 'bF']
Solution 4
Another solution is to convert string arrays into arrays of python of objects so that str.add is called:
>>> import numpy as np
>>> a = np.array(['a', 'b', 'c', 'd'], dtype=np.object)
>>> print a+a
array(['aa', 'bb', 'cc', 'dd'], dtype=object)
This is not that slow (less than twice as slow as adding integer arrays).
Solution 5
One more basic, elegant and fast solution:
In [11]: np.array([x1 + x2 for x1,x2 in zip(a1,a2)])
Out[11]: array(['aE', 'bF'], dtype='<U2')
It is very fast for smaller arrays.
In [12]: %timeit np.array([x1 + x2 for x1,x2 in zip(a1,a2)])
3.67 µs ± 136 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [13]: %timeit np.core.defchararray.add(a1, a2)
6.27 µs ± 28.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [14]: %timeit np.char.array(a1) + np.char.array(a2)
22.1 µs ± 319 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
For larger arrays, time difference is not much.
In [15]: b1 = np.full(10000,'a')
In [16]: b2 = np.full(10000,'b')
In [189]: %timeit np.array([x1 + x2 for x1,x2 in zip(b1,b2)])
6.74 ms ± 66.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [188]: %timeit np.core.defchararray.add(b1, b2)
7.03 ms ± 419 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [187]: %timeit np.char.array(b1) + np.char.array(b2)
6.97 ms ± 284 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Dave31415
Updated on July 09, 2022Comments
-
Dave31415 almost 2 years
Is this a bug?
import numpy as np a1=np.array(['a','b']) a2=np.array(['E','F']) In [20]: add(a1,a2) Out[20]: NotImplemented
I am trying to do element-wise string concatenation. I thought Add() was the way to do it in numpy but obviously it is not working as expected.
-
Keith about 12 yearsAs the name implies, number is for numbers. Python itself has pretty good string operations. Why not just use that?
"".join(["a", "b"])
works fine. -
Dave31415 about 12 yearsI was looking at this docs.scipy.org/doc/numpy/reference/routines.char.html
-
Keith about 12 yearsThat's cool. But: "All of them are based on the string methods in the Python standard library.". So if you just use the standard library you can write code that doesn't depend on numpy.
-
gypaetus over 8 yearsThe
add
operation does not do the same thing asjoin
. numpy's add can be useful for multidimensional arrays or nested lists. -
Eric over 7 yearsWhere did
add
come from?
-
-
Dave31415 about 12 yearsOk, so the add function I was using is not at top level in numpy. Is either of those faster/better or preferred for any reason?
-
apdnu about 11 yearsThis doesn't answer the question. There are times when one might want to do this in numpy, e.g. when working with large arrays of strings. The original poster gave a simple example for which one would use pure Python, but was asking for a numpy solution.
-
Francesco Montesano over 10 yearsThe
add
string operations you link to gives aNotImplemented
(as in the question) for numpy 1.6.1 under python 3.2. Do you know from which version is implemented? -
Mike T over 10 years@FrancescoMontesano checking with that version combination on Ubuntu 12.04.2 LTS, the example in my answer works as expected. Generally speaking, using
np.add
also raisesNotImplemented
with any version. Ensure you are usingnp.core.defchararray.add
. -
Francesco Montesano over 10 yearsNow I've seen the full signature of
add
in the docs (I missed that before). Anyway, would be nice if numpy would wrapnp.core.defchararray.*
into corresponding numeric ndarray operations. I think its much neater and easy to remember to donp.add
. -
Niklas B. almost 10 years@Thucydides411 From what I understood at the time of writing my answer, numpy just used the builtin Python primitives, so I didn't see what advantage that would have. Not sure whether that is true, it seems like it is not. Maybe I misinterpreted the statement "All of them are based on the string methods in the Python standard library." in the docs
-
jdehesa almost 7 yearsAs noted in the docstring of the module, "the preferred alias for
defchararray
isnumpy.char
", so you can just saynp.char.add
. -
PanwarS87 over 6 years@MikeT : Is it possible to define a delimiter to get an output like array(['a#E', 'b#F']) ? Btw thank you for the above solution. Using map('#'.join, zip(a1, a2)) I can but curious it is possible with numpy.
-
PanwarS87 over 6 years@NiklasB. Thank you, Nick. I was looking for the exact same thing. Just curious how I can implement the same using numpy. I will dig numpy docs.