convert 2d numpy array to string

12,707

Solution

A one-liner will do:

b = '\n'.join('\t'.join('%0.3f' %x for x in y) for y in a)

Using a simpler example:

>>> a = np.arange(25, dtype=float).reshape(5, 5)
>>> a
array([[  0.,   1.,   2.,   3.,   4.],
       [  5.,   6.,   7.,   8.,   9.],
       [ 10.,  11.,  12.,  13.,  14.],
       [ 15.,  16.,  17.,  18.,  19.],
       [ 20.,  21.,  22.,  23.,  24.]])

This:

b = '\n'.join('\t'.join('%0.3f' %x for x in y) for y in a)
print(b)

prints this:

0.000   1.000   2.000   3.000   4.000
5.000   6.000   7.000   8.000   9.000
10.000  11.000  12.000  13.000  14.000
15.000  16.000  17.000  18.000  19.000
20.000  21.000  22.000  23.000  24.000

Explanation

You already used a list comprehension in your second method. Here we have a generator expression, which looks exactly like a list comprehension. The only syntactical difference is that the [] are replaced by (). A generator expression does not build the list but hands a so called generator to join. In the end it has the same effect but skips the step of building this intermediate list.

There can be multiple for in such an expression, which makes it nested. This:

b = '\n'.join('\t'.join('%0.3f' %x for x in y) for y in a)

is equivalent to:

res = []
for y in a:
    res.append('\t'.join('%0.3f' %x for x in y))
b = '\n'.join(res)

Performance

I use %%timeit in the IPython Notebook:

%%timeit
b = '\n'.join('\t'.join('%0.3f' %x for x in y) for y in a)

10 loops, best of 3: 42.4 ms per loop


%%timeit
b=''
for i in range(0,a.shape[0]):
    for j in range(0,a.shape[1]-1):
        b+=str(a[i,j])+'\t'
    b+=str(a[i,-1])+'\n'

10 loops, best of 3: 50.2 ms per loop


%%timeit
b=''
for i in range(0,a.shape[0]):
    b+='\t'.join(['%0.3f' %x for x in a[i,:]])+'\n'

10 loops, best of 3: 43.8 ms per loop

Looks like they are all about the same speed. Actually, the += is optimized in CPython. Otherwise, it would be much slower, than the join() approach. Other Python implementations such as Jython or PyPy can show much bigger time differences and can make the join() much faster compared to +=.

Share:
12,707
A B
Author by

A B

Updated on June 23, 2022

Comments

  • A B
    A B almost 2 years

    I am new to Python and am trying to convert a 2d numpy array, like:

    a=numpy.array([[191.25,0,0,1],[191.251,0,0,1],[191.252,0,0,1]])
    

    to a string in which the column entries are separated by one delimiter '\t' and the the rows are separated by another delimiter '\n' with control over the precision of each column, to get:

    b='191.250\t0.00\t0\t1\n191.251\t0.00\t0\t1\n191.252\t0.00\t0\t1\n'
    

    First, I create the array by:

    import numpy as np
    
    col1=np.arange(191.25,196.275,.001)[:, np.newaxis]
    nrows=col1.shape[0]
    
    col2=np.zeros((nrows,1),dtype=np.int)
    col3=np.zeros((nrows,1),dtype=np.int)
    col4=np.ones((nrows,1),dtype=np.int)
    
    a=np.hstack((col1,col2,col3,col4))
    

    Then I produce b, by one of 2 methods:

    Method 1:

    b=''
    for i in range(0,a.shape[0]):
        for j in range(0,a.shape[1]-1):
            b+=str(a[i,j])+'\t'
        b+=str(a[i,-1])+'\n'
    b
    

    Method 2:

    b=''
    for i in range(0,a.shape[0]):
        b+='\t'.join(['%0.3f' %x for x in a[i,:]])+'\n'
    b
    

    However, I'm guessing there are better ways of producing a and b. I am looking for the most efficient ways (i.e. memory, time, code compactness) to create a and b.


    Follow up questions

    Thank you Mike,

    b = '\n'.join('\t'.join('%0.3f' %x for x in y) for y in a)+'\n'
    

    worked for me but I have a few follow up questions (this couldn't fit in the comment section):

    1. Though this is more compact, is the speed the same as executing a nested for loop, as this what seems to be going on within the parentheses?
    2. I understand that x and y are iterators across the 2 dimensions of y, however, how does Python "know" they are and which dimensions they are supposed to iterate across? In Matlab, for example, these things have to be explicitly stated.
    3. Is there a way to independently set the precision for each column (e.g. I'd like %0.3f for the first three columns and %0.0f for the last column)?
    4. Is there an easy way to do the reverse procedure- i.e. given b, produce a? I have come up with 2 methods:

    Method 1

    y=b.split('\n')[:-1]
    z=[y[i].split('\t') for i in range(0,len(y))]
    a=numpy.array(z,dtype=float)
    

    Method 2

    import re
    a=numpy.array(filter(None,re.split('[\n\t]+',b)),dtype=float).reshape(-1,4)
    

    Is there a better way?

  • A B
    A B over 8 years
    Hi Mike, thanks, that worked for me. I have a few follow up questions but was unable to fit them here so I have included them in an edit of my original question.
  • Mike Müller
    Mike Müller over 8 years
    @AB I added an explanation to your first two additional questions. You can accept an answer if it solves your problem. I think my answer does.
  • Mike Müller
    Mike Müller over 8 years
    @AB I recommend to you to create two new questions such as "Conditional formation of array rows" for 3. and "How to make a NumPy array from a string?" for 4. Otherwise, this question becomes to crowded. Also, the answer should be useful for others too. But you need a good question formulation to find what you are looking for. Hiding answer in other questions dos not help. Just point me to these new questions and I will have a look at them.