Element-wise string concatenation in numpy

python arrays string numpy elementwise-operations

64,925

Solution 1

This can be done using numpy.core.defchararray.add. Here is an example:

>>> import numpy as np
>>> a1 = np.array(['a', 'b'])
>>> a2 = np.array(['E', 'F'])
>>> np.core.defchararray.add(a1, a2)
array(['aE', 'bF'], 
      dtype='<U2')

There are other useful string operations available for NumPy data types.

Solution 2

You can use the chararray subclass to perform array operations with strings:

a1 = np.char.array(['a', 'b'])
a2 = np.char.array(['E', 'F'])

a1 + a2
#chararray(['aE', 'bF'], dtype='|S2')

another nice example:

b = np.array([2, 4])
a1*b
#chararray(['aa', 'bbbb'], dtype='|S4')

Solution 3

This can (and should) be done in pure Python, as numpy also uses the Python string manipulation functions internally:

>>> a1 = ['a','b']
>>> a2 = ['E','F']
>>> map(''.join, zip(a1, a2))
['aE', 'bF']

Solution 4

Another solution is to convert string arrays into arrays of python of objects so that str.add is called:

>>> import numpy as np
>>> a = np.array(['a', 'b', 'c', 'd'], dtype=np.object)   
>>> print a+a
array(['aa', 'bb', 'cc', 'dd'], dtype=object)

This is not that slow (less than twice as slow as adding integer arrays).

Solution 5

One more basic, elegant and fast solution:

In [11]: np.array([x1 + x2 for x1,x2 in zip(a1,a2)])
Out[11]: array(['aE', 'bF'], dtype='<U2')

It is very fast for smaller arrays.

In [12]: %timeit np.array([x1 + x2 for x1,x2 in zip(a1,a2)])
3.67 µs ± 136 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [13]: %timeit np.core.defchararray.add(a1, a2)
6.27 µs ± 28.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [14]: %timeit np.char.array(a1) + np.char.array(a2)
22.1 µs ± 319 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

For larger arrays, time difference is not much.

In [15]: b1 = np.full(10000,'a')    
In [16]: b2 = np.full(10000,'b')    

In [189]: %timeit np.array([x1 + x2 for x1,x2 in zip(b1,b2)])
6.74 ms ± 66.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [188]: %timeit np.core.defchararray.add(b1, b2)
7.03 ms ± 419 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [187]: %timeit np.char.array(b1) + np.char.array(b2)
6.97 ms ± 284 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

View more solutions

64,925

Author by

Dave31415

Updated on July 09, 2022

Comments

Dave31415 almost 2 years
Is this a bug?
```
import numpy as np
a1=np.array(['a','b'])
a2=np.array(['E','F'])

In [20]: add(a1,a2)
Out[20]: NotImplemented
```
I am trying to do element-wise string concatenation. I thought Add() was the way to do it in numpy but obviously it is not working as expected.
- Keith about 12 years
  
  As the name implies, number is for numbers. Python itself has pretty good string operations. Why not just use that? "".join(["a", "b"]) works fine.
- Dave31415 about 12 years
  
  I was looking at this docs.scipy.org/doc/numpy/reference/routines.char.html
- Keith about 12 years
  
  That's cool. But: "All of them are based on the string methods in the Python standard library.". So if you just use the standard library you can write code that doesn't depend on numpy.
- gypaetus over 8 years
  
  The add operation does not do the same thing as join. numpy's add can be useful for multidimensional arrays or nested lists.
- Eric over 7 years
  
  Where did add come from?
Dave31415 about 12 years

Ok, so the add function I was using is not at top level in numpy. Is either of those faster/better or preferred for any reason?
apdnu about 11 years

This doesn't answer the question. There are times when one might want to do this in numpy, e.g. when working with large arrays of strings. The original poster gave a simple example for which one would use pure Python, but was asking for a numpy solution.
Francesco Montesano over 10 years

The add string operations you link to gives a NotImplemented (as in the question) for numpy 1.6.1 under python 3.2. Do you know from which version is implemented?
Mike T over 10 years

@FrancescoMontesano checking with that version combination on Ubuntu 12.04.2 LTS, the example in my answer works as expected. Generally speaking, using np.add also raises NotImplemented with any version. Ensure you are using np.core.defchararray.add.
Francesco Montesano over 10 years

Now I've seen the full signature of add in the docs (I missed that before). Anyway, would be nice if numpy would wrap np.core.defchararray.* into corresponding numeric ndarray operations. I think its much neater and easy to remember to do np.add.
Niklas B. almost 10 years

@Thucydides411 From what I understood at the time of writing my answer, numpy just used the builtin Python primitives, so I didn't see what advantage that would have. Not sure whether that is true, it seems like it is not. Maybe I misinterpreted the statement "All of them are based on the string methods in the Python standard library." in the docs
jdehesa almost 7 years

As noted in the docstring of the module, "the preferred alias for defchararray is numpy.char", so you can just say np.char.add.
PanwarS87 over 6 years

@MikeT : Is it possible to define a delimiter to get an output like array(['a#E', 'b#F']) ? Btw thank you for the above solution. Using map('#'.join, zip(a1, a2)) I can but curious it is possible with numpy.
PanwarS87 over 6 years

@NiklasB. Thank you, Nick. I was looking for the exact same thing. Just curious how I can implement the same using numpy. I will dig numpy docs.