Appending to numpy arrays

11,485

Solution 1

numpy.append, unlike python's list.append, does not perform operations in place. Therefore, you need to assign the result back to a variable, as below.

import numpy

xyz_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
nums = numpy.array([])
coords = numpy.array([])

for i in range(int(len(xyz_list)/4)):
    nums = numpy.append(nums, xyz_list[i*4])
    coords = numpy.append(coords, xyz_list[i*4+1:(i+1)*4])

print(nums)    # [ 1.  5.  9.]
print(coords)  # [  2.   3.   4.   6.   7.   8.  10.  11.  12.]

You can reshape coords as follows:

coords = coords.reshape(3, 3)

# array([[  2.,   3.,   4.],
#        [  6.,   7.,   8.],
#        [ 10.,  11.,  12.]])

More details on numpy.append behaviour

Documentation:

Returns: A copy of arr with values appended to axis. Note that append does not occur in-place: a new array is allocated and filled.

If you know the shape of your numpy array output beforehand, it is efficient to instantiate via np.zeros(n) and fill it with results later.

Another option: if your calculations make heavy use of inserting elements to the left of an array, consider using collections.deque from the standard library.

Solution 2

np.append is not a list clone. It is a clumsy wrapper to np.concatenate. It is better to learn to use that correctly.

xyz_list = frag_str.split()
nums = []
coords = []
for i in range(int(len(xyz_list)/4)):
    nums.append(xyz_list[i*4])
    coords.append(xyz_list[i*4+1:(i+1)*4])
nums = np.concatenate(nums)
coords = np.concatenate(coords)

List append is faster, and easier to initialize. np.concatenate works fine with a list of arrays. np.append uses concatenate, but only accepts two inputs. np.array is needed if the list contains numbers or strings.


You don't give an example of frag_str. But the name and the use of split suggests it is a string. I don't think anything else has a split method.

In [74]: alist = 'one two three four five six seven eight'.split()

That's a list of strings. Using your indexing I can construct 2 lists:

In [76]: [alist[i*4] for i in range(2)]
Out[76]: ['one', 'five']

In [77]: [alist[i*4+1:(i+1)*4] for i in range(2)]
Out[77]: [['two', 'three', 'four'], ['six', 'seven', 'eight']]

And I can make arrays from each of those lists:

In [78]: np.array(Out[76])
Out[78]: array(['one', 'five'], dtype='<U4')
In [79]: np.array(Out[77])
Out[79]: 
array([['two', 'three', 'four'],
       ['six', 'seven', 'eight']], dtype='<U5')

In the first case the array is 1d, in the second, 2d.

It the string contains digits, we can make an integer array by specifying dtype.

In [80]: alist = '1 2 3 4 5 6 7 8'.split()
In [81]: np.array([alist[i*4] for i in range(2)])
Out[81]: array(['1', '5'], dtype='<U1')
In [82]: np.array([alist[i*4] for i in range(2)], dtype=int)
Out[82]: array([1, 5])

Solution 3

As stated above, numpy.append does not append items in place, but the reason why is important. You must store the returned array from numpy.append to the original variable, or else your code will not work. That being said, you should likely rethink your logic.

Numpy uses C-style arrays internally, which are arrays in contiguous memory without leading or trailing unused elements. In order to append an item to an array, Numpy must allocate a buffer of the array size + 1, copy all the data over, and add the appended element.

In pseudo-C code, this comes to the following:

int* numpy_append(int* arr, size_t size, int element)
{
    int* new_arr = malloc(sizeof(int) * (size+1);
    mempcy(new_arr, arr, sizeof(int) * size);
    new_arr[size] = element;
    return new_arr;
}

This is extremely inefficient, since a new array must be allocated each time (memory allocation is slow), all the elements must be copied over, and the new element added to the end of the new array.

In comparison, Python lists reserve extra elements beyond the size of the container, until the size is the same as the capacity of the list, and grow exponentially. This is much more efficient for insertions at the end of the container than reallocating the entire buffer each time.

You should use Python lists and list.append, and then convert the new list to a NumPy array. Or, if performance is truly critical, use a C++-extension using std::vector rather than numpy.append in all scenarios. Re-write your code, or it will be glacial.

Edit

Also,as pointed out in the comments, if you know the size of a Numpy array before hand, pre-allocating it with np.zeros(n) is efficient, as is using a custom wrapper around a NumPy array

class extendable_array:
    def __init__(self, size=0, dtype=np.int):
        self.arr = np.array(dtype=dtype)
        self.size = size

    def grow(self):
        '''Double the array'''

        arr = self.arr
        self.arr = np.zeros(min(arr.size * 2, 1), dtype=arr.dtype)
        self.arr[:arr.size] = arr

    def append(self, value):
        '''Append a value to the array'''

        if self.arr.size == self.size:
            self.grow()

        self.arr[self.size] = value
        self.size += 1.

    # add more methods here
Share:
11,485

Related videos on Youtube

Brynhildr Xie
Author by

Brynhildr Xie

Confused undergrad at work, trying to amuse himself by applying real-life situations and video games into Python scripts.

Updated on May 25, 2022

Comments

  • Brynhildr Xie
    Brynhildr Xie almost 2 years

    I'm trying to construct a numpy array, and then append integers and another array to it. I tried doing this:

    xyz_list = frag_str.split()
    nums = numpy.array([])
    coords = numpy.array([])
    for i in range(int(len(xyz_list)/4)):
        numpy.append(nums, xyz_list[i*4])
        numpy.append(coords, xyz_list[i*4+1:(i+1)*4])
    print(atoms)
    print(coords)
    

    Printing out the output only gives my empty arrays. Why is that? In addition, how can I rewrite coords in a way that allows me to have 2D arrays like this: array[[0,0,0],[0,0,1],[0,0,-1]]?

    • hpaulj
      hpaulj about 6 years
      Looks like frag_str is a string, which you are splitting into a list on whitespace. I don't see where you are getting integers.
  • juanpa.arrivillaga
    juanpa.arrivillaga about 6 years
    Note, if you are only appending to "the right", i.e. .append, then a list is a better substitute. deque is optimized for appending from both sides.
  • jpp
    jpp about 6 years
    I would add: if you know the size of your resulting numpy array, then instantiating an empty array, e.g. np.zeros(n), beforehand and filling it is efficient.
  • Alex Huszagh
    Alex Huszagh about 6 years
    @jp_data_analysis Edited, also added a custom wrapper to create an extendable array with a growth factor of 2.