Efficient evaluation of a function at every cell of a NumPy array

120,008

Solution 1

You could just vectorize the function and then apply it directly to a Numpy array each time you need it:

import numpy as np

def f(x):
    return x * x + 3 * x - 2 if x > 0 else x * 5 + 8

f = np.vectorize(f)  # or use a different name if you want to keep the original f

result_array = f(A)  # if A is your Numpy array

It's probably better to specify an explicit output type directly when vectorizing:

f = np.vectorize(f, otypes=[np.float])

Solution 2

A similar question is: Mapping a NumPy array in place. If you can find a ufunc for your f(), then you should use the out parameter.

Solution 3

If you are working with numbers and f(A(i,j)) = f(A(j,i)), you could use scipy.spatial.distance.cdist defining f as a distance between A(i) and A(j).

Share:
120,008
Peter
Author by

Peter

Updated on April 26, 2020

Comments

  • Peter
    Peter about 4 years

    Given a NumPy array A, what is the fastest/most efficient way to apply the same function, f, to every cell?

    1. Suppose that we will assign to A(i,j) the f(A(i,j)).

    2. The function, f, doesn't have a binary output, thus the mask(ing) operations won't help.

    Is the "obvious" double loop iteration (through every cell) the optimal solution?

  • Peter
    Peter over 12 years
    I am afraid that the vectorized function cannot be faster than the "manual" double loop iteration and assignment through all the array elements. Especially, because it stores the result to a newly created variable (and not directly to the initial input). Thanks a lot for your reply though:)
  • blubberdiblub
    blubberdiblub over 12 years
    @Peter: Ah, now I see that you have mentioned assigning the result back to the former array in your original question. I'm sorry I missed that when first reading it. Yeah, in that case the double loop must be faster. But have you also tried a single loop on the flattened view of the array? That might be slightly faster, since you save a little loop overhead and Numpy needs to do one less multiplication and addition (for calculating the data offset) at each iteration. Plus it works for arbitrarily dimensioned arrays. Might be slower on very small arrays, tho.
  • Gabriel
    Gabriel almost 8 years
    Notice the warning given in the vectorize function description: The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop. So this will very likely not speed up the process at all.
  • hpaulj
    hpaulj almost 8 years
    Pay attention to how vectorize determines the return type. That has produced bugs. frompyfunc is a bit faster, but returns a dtype object array. Both feed scalars, not rows or columns.
  • abukaj
    abukaj about 7 years
    @blubberdiblub np.vectorize(f)(np.array([])) raises IndexError: index 0 is out of bounds for axis 0 with size 0
  • hpaulj
    hpaulj about 7 years
    Specifying otypes as mentioned in the answer should take care of that IndexError.
  • Suuuehgi
    Suuuehgi over 5 years
    @Gabriel Just throwing np.vectorize on my function (which utilizes RK45) gives me a speed up of a factor of ~ 20.