Element-wise string concatenation in numpy

64,925

Solution 1

This can be done using numpy.core.defchararray.add. Here is an example:

>>> import numpy as np
>>> a1 = np.array(['a', 'b'])
>>> a2 = np.array(['E', 'F'])
>>> np.core.defchararray.add(a1, a2)
array(['aE', 'bF'], 
      dtype='<U2')

There are other useful string operations available for NumPy data types.

Solution 2

You can use the chararray subclass to perform array operations with strings:

a1 = np.char.array(['a', 'b'])
a2 = np.char.array(['E', 'F'])

a1 + a2
#chararray(['aE', 'bF'], dtype='|S2')

another nice example:

b = np.array([2, 4])
a1*b
#chararray(['aa', 'bbbb'], dtype='|S4')

Solution 3

This can (and should) be done in pure Python, as numpy also uses the Python string manipulation functions internally:

>>> a1 = ['a','b']
>>> a2 = ['E','F']
>>> map(''.join, zip(a1, a2))
['aE', 'bF']

Solution 4

Another solution is to convert string arrays into arrays of python of objects so that str.add is called:

>>> import numpy as np
>>> a = np.array(['a', 'b', 'c', 'd'], dtype=np.object)   
>>> print a+a
array(['aa', 'bb', 'cc', 'dd'], dtype=object)

This is not that slow (less than twice as slow as adding integer arrays).

Solution 5

One more basic, elegant and fast solution:

In [11]: np.array([x1 + x2 for x1,x2 in zip(a1,a2)])
Out[11]: array(['aE', 'bF'], dtype='<U2')

It is very fast for smaller arrays.

In [12]: %timeit np.array([x1 + x2 for x1,x2 in zip(a1,a2)])
3.67 µs ± 136 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [13]: %timeit np.core.defchararray.add(a1, a2)
6.27 µs ± 28.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [14]: %timeit np.char.array(a1) + np.char.array(a2)
22.1 µs ± 319 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

For larger arrays, time difference is not much.

In [15]: b1 = np.full(10000,'a')    
In [16]: b2 = np.full(10000,'b')    

In [189]: %timeit np.array([x1 + x2 for x1,x2 in zip(b1,b2)])
6.74 ms ± 66.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [188]: %timeit np.core.defchararray.add(b1, b2)
7.03 ms ± 419 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [187]: %timeit np.char.array(b1) + np.char.array(b2)
6.97 ms ± 284 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Share:
64,925
Dave31415
Author by

Dave31415

Updated on July 09, 2022

Comments

  • Dave31415
    Dave31415 almost 2 years

    Is this a bug?

    import numpy as np
    a1=np.array(['a','b'])
    a2=np.array(['E','F'])
    
    In [20]: add(a1,a2)
    Out[20]: NotImplemented
    

    I am trying to do element-wise string concatenation. I thought Add() was the way to do it in numpy but obviously it is not working as expected.

    • Keith
      Keith about 12 years
      As the name implies, number is for numbers. Python itself has pretty good string operations. Why not just use that? "".join(["a", "b"]) works fine.
    • Dave31415
      Dave31415 about 12 years
    • Keith
      Keith about 12 years
      That's cool. But: "All of them are based on the string methods in the Python standard library.". So if you just use the standard library you can write code that doesn't depend on numpy.
    • gypaetus
      gypaetus over 8 years
      The add operation does not do the same thing as join. numpy's add can be useful for multidimensional arrays or nested lists.
    • Eric
      Eric over 7 years
      Where did add come from?
  • Dave31415
    Dave31415 about 12 years
    Ok, so the add function I was using is not at top level in numpy. Is either of those faster/better or preferred for any reason?
  • apdnu
    apdnu about 11 years
    This doesn't answer the question. There are times when one might want to do this in numpy, e.g. when working with large arrays of strings. The original poster gave a simple example for which one would use pure Python, but was asking for a numpy solution.
  • Francesco Montesano
    Francesco Montesano over 10 years
    The add string operations you link to gives a NotImplemented (as in the question) for numpy 1.6.1 under python 3.2. Do you know from which version is implemented?
  • Mike T
    Mike T over 10 years
    @FrancescoMontesano checking with that version combination on Ubuntu 12.04.2 LTS, the example in my answer works as expected. Generally speaking, using np.add also raises NotImplemented with any version. Ensure you are using np.core.defchararray.add.
  • Francesco Montesano
    Francesco Montesano over 10 years
    Now I've seen the full signature of add in the docs (I missed that before). Anyway, would be nice if numpy would wrap np.core.defchararray.* into corresponding numeric ndarray operations. I think its much neater and easy to remember to do np.add.
  • Niklas B.
    Niklas B. almost 10 years
    @Thucydides411 From what I understood at the time of writing my answer, numpy just used the builtin Python primitives, so I didn't see what advantage that would have. Not sure whether that is true, it seems like it is not. Maybe I misinterpreted the statement "All of them are based on the string methods in the Python standard library." in the docs
  • jdehesa
    jdehesa almost 7 years
    As noted in the docstring of the module, "the preferred alias for defchararray is numpy.char", so you can just say np.char.add.
  • PanwarS87
    PanwarS87 over 6 years
    @MikeT : Is it possible to define a delimiter to get an output like array(['a#E', 'b#F']) ? Btw thank you for the above solution. Using map('#'.join, zip(a1, a2)) I can but curious it is possible with numpy.
  • PanwarS87
    PanwarS87 over 6 years
    @NiklasB. Thank you, Nick. I was looking for the exact same thing. Just curious how I can implement the same using numpy. I will dig numpy docs.