Explain the speed difference between numpy's vectorized function application VS python's for loop

10,439

Solution 1

Due to the internal workings of numpy, (as far as i know, numpy works with C internally, so everything you push down to numpy is actually much faster because it is in a different language)

edit: taking out the zip, and replacing it with a vstack should go faster too, (zip tends to go slow if the arguments are very large, and than vstack is faster), (but that is also something which puts it into numpy (thus into C), while zip is python)

and yes, if i understand correctly - not sure about that- , you are doing 42k times a try/except block, that should definately be bad for the speed,

test:

T=numpy.ndarray((5,10))
for t in T:
print t.shape

results in (10,)

This means that yes, if your matrices are 42kx784 matrices, you are trying 42k times a try-except block, i am assuming that should put an effect in the computation times, as well as each time doing a zip each time, but not certain if that would be the main cause,

(so every one of your 42k times you run your stuff, takes 0.17sec, i am quite certain that a try/except block doesnt take 0.17 seconds, but maybe the overhead it causes or so, does contribute to it?

try changing the following:

double_vec = zip(img_vec,img_pix_vec)
result_row = np.array([tfidf(x[0],x[1],img_total) for x in double_vec])

to

result_row=np.array([tfidf(img_vec[i],img_pix_vec[i],img_total) for i in xrange(len(img_vec))])

that at least gets rid of the zip statement, But not sure if the zip statement takes down your time by one min, or by nearly two hours (i know zip is slow, compared to numpy vstack, but no clue if that would give you two hours time gain)

Solution 2

The difference you're seeing isn't due to anything fancy like SSE vectorization. There are two primary reasons. The first is that NumPy is written in C, and the C implementation doesn't have to go through the tons of runtime method dispatch and exception checking and so on that a Python implementation goes through.

The second reason is that even for Python code, your loop-based implementation is inefficient. You're using vstack in a loop, and every time you call vstack, it has to completely copy all arrays you've passed to it. That adds an extra factor of len(data_matrix) to your asymptotic complexity.

Share:
10,439
Admin
Author by

Admin

Updated on June 09, 2022

Comments

  • Admin
    Admin almost 2 years

    I was implementing a weighting system called TF-IDF on a set of 42000 images, each consisting 784 pixels. This is basically a 42000 by 784 matrix.

    The first method I attempted made use of explicit loops and took more than 2 hours.

    def tfidf(color,img_pix,img_total):
        if img_pix==0:
            return 0
        else:
            return color * np.log(img_total/img_pix)
    
    ...
    
    result = np.array([])
    for img_vec in data_matrix:
        double_vec = zip(img_vec,img_pix_vec)
        result_row = np.array([tfidf(x[0],x[1],img_total) for x in double_vec])
        try:
            result = np.vstack((result,result_row))
        # first row will throw a ValueError since vstack accepts rows of same len
        except ValueError:
            result = result_row
    

    The second method I attempted used numpy matrices and took less than 5 minutes. Note that data_matrix, img_pix_mat are both 42000 by 784 matrices while img_total is a scalar.

    result = data_matrix * np.log(np.divide(img_total,img_pix_mat))
    

    I was hoping someone could explain the immense difference in speed.

    The authors of the following paper entitled "The NumPy array: a structure for efficient numerical computation" (http://arxiv.org/pdf/1102.1523.pdf), state on the top of page 4 that they observe a 500 times speed increase due to vectorized computation. I'm presuming much of the speed increase I'm seeing is due to this. However, I would like to go a step further and ask why numpy vectorized computations are that much faster than standard python loops?

    Also, perhaps you guys might know of other reasons why the first method is slow. Do try/except structures have high overhead? Or perhaps forming a new np.array for each loop is takes a long time?

    Thanks.

  • usethedeathstar
    usethedeathstar almost 11 years
    Does this answer the question? or do you want a more technical answer? (in general, python is not made to run fast, other languages exist for that, but numpy exists for those that want to use python but still get speeds that are at least comparable to the speeds you would get in C or FORTRAN, which were made for numbercrunching applications)
  • usethedeathstar
    usethedeathstar almost 11 years
    Is it the zip which causes it??? (just to know for future use, if it is 42k times calling the zip which causes it?) since it would be nice to know if that gives a 2 hour difference, or if that would only improve 30min or so,