Is the Julia language really as fast as it claims?

19,053

Solution 1

Vectorized operations like .^ are exactly the kind of thing that Octave is good at because they're actually entirely implemented in specialized C code. Somewhere in the code that is compiled when Octave is built, there is a C function that computes .^ for a double and an array of doubles – that's what you're really timing here, and it's fast because it's written in C. Julia's .^ operator, on the other hand, is written in Julia:

julia> a = 0.9999;

julia> @which a.^(1:10000)
.^(x::Number,r::Ranges{T}) at range.jl:327

That definition consists of this:

.^(x::Number, r::Ranges) = [ x^y for y=r ]

It uses a one-dimensional array comprehension to raise x to each value y in the range r, returning the result as a vector.

Edward Garson is quite right that one shouldn't use globals for optimal performance in Julia. The reason is that the compiler can't reason very well about the types of globals because they can change at any point where execution leaves the current scope. Leaving the current scope doesn't sound like it happens that often, but in Julia, even basic things like indexing into an array or adding two integers are actually method calls and thus leave the current scope. In the code in this question, however, all the time is spent inside the .^ function, so the fact that a is a global doesn't actually matter:

julia> @elapsed a.^(1:10000)
0.000809698

julia> let a = 0.9999;
         @elapsed a.^(1:10000)
       end
0.000804208

Ultimately, if all you're ever doing is calling vectorized operations on floating point arrays, Octave is just fine. However, this is often not actually where most of the time is spent even in high-level dynamic languages. If you ever find yourself wanting to iterate over an array with a for loop, operating on each element with scalar arithmetic, you'll find that Octave is quite slow at that sort of thing – often thousands of times slower than C or Julia code doing the same thing. Writing for loops in Julia, on the other hand, is a perfectly reasonable thing to do – in fact, all our sorting code is written in Julia and is comparable to C in performance. There are also many other reasons to use Julia that don't have to do with performance. As a Matlab clone, Octave inherits many of Matlab's design problems, and doesn't fare very well as a general purpose programming language. You wouldn't, for example, want to write a web service in Octave or Matlab, but it's quite easy to do so in Julia.

Solution 2

You're using global variables which is a performance gotcha in Julia.

The issue is that globals can potentially change type whenever your code calls anther function. As a result, the compiler has to generate extremely slow code that cannot make any assumptions about the types of global variables that are used.

Simple modifications of your code in line with https://docs.julialang.org/en/stable/manual/performance-tips/ should yield more satisfactory results.

Share:
19,053

Related videos on Youtube

juliohm
Author by

juliohm

Updated on June 04, 2022

Comments

  • juliohm
    juliohm almost 2 years

    Following this post I decided to benchmark Julia against GNU Octave and the results were inconsistent with the speed-ups illustrated in julialang.org.

    I compiled both Julia and GNU Octave with CXXFLAGS='-std=c++11 -O3', the results I got:

    GNU Octave

    a=0.9999;
    
    tic;y=a.^(1:10000);toc
    Elapsed time is 0.000159025 seconds.
    
    tic;y=a.^(1:10000);toc
    Elapsed time is 0.000162125 seconds.
    
    tic;y=a.^(1:10000);toc
    Elapsed time is 0.000159979 seconds.
    

    --

    tic;y=cumprod(ones(1,10000)*a);toc
    Elapsed time is 0.000280142 seconds.
    
    tic;y=cumprod(ones(1,10000)*a);toc
    Elapsed time is 0.000280142 seconds.
    
    tic;y=cumprod(ones(1,10000)*a);toc
    Elapsed time is 0.000277996 seconds.
    

    Julia

    tic();y=a.^(1:10000);toc()
    elapsed time: 0.003486508 seconds
    
    tic();y=a.^(1:10000);toc()
    elapsed time: 0.003909662 seconds
    
    tic();y=a.^(1:10000);toc()
    elapsed time: 0.003465313 seconds
    

    --

    tic();y=cumprod(ones(1,10000)*a);toc()
    elapsed time: 0.001692931 seconds
    
    tic();y=cumprod(ones(1,10000)*a);toc()
    elapsed time: 0.001690245 seconds
    
    tic();y=cumprod(ones(1,10000)*a);toc()
    elapsed time: 0.001689241 seconds
    

    Could someone explain why Julia is slower than GNU Octave with these basic operations? After warmed, it should call LAPACK/BLAS without overhead, right?

    EDIT:

    As explained in the comments and answers, the code above is not a good benchmark nor it illustrates the benefits of using the language in a real application. I used to think of Julia as a faster "Octave/MATLAB", but it is much more than that. It is a huge step towards productive, high-performance, scientific computing. By using Julia, I was able to 1) outperform software in my research field written in Fortran and C++, and 2) provide users with a much nicer API.

    • StefanKarpinski
      StefanKarpinski over 10 years
      I'm fairly certain that neither of these operations – .^ or cumprod – are part of BLAS or LAPACK. These operations are just implemented in C as part of Octave's source and in Julia as part of Julia's base distribution.
    • juliohm
      juliohm over 10 years
      @StefanKarpinski, I mean ones(1,10000)*a and the internals probably using Horner's rule or something. But you're right, I shouldn't have mentioned LAPACK/BLAS for this particular snippet of code, my fingers always type them unconsciously. :)
    • Jack Tang
      Jack Tang about 9 years
      under 1 second is too short to be meaningful.
  • StefanKarpinski
    StefanKarpinski over 10 years
    I've briefly addressed why this happens in my answer to your question. Basically, the issue is that globals can potentially change type whenever your code calls anther function. As a result, the compiler has to generate extremely slow code that cannot make any assumptions about the types of global variables that are used.
  • StefanKarpinski
    StefanKarpinski over 10 years
    Glad to help answer your questions. As a policy, we try to never resort to implementing things in C. This keeps us honest – we have to make Julia fast enough to allow us to do that. It's better to do everything in the language and be a bit slower until we can make the compiler better and then everything gets fast – system code and user code alike.
  • EdwardG
    EdwardG over 10 years
    Yes, it's not entirely intuitive - I should have added a bit more to my response.
  • EdwardG
    EdwardG about 9 years
    @StefanKarpinski thank you for improving this answer.
  • Grayscale
    Grayscale over 4 years
    It looks like the "quite easy" link now leads to a 404 error :(
  • Grayscale
    Grayscale over 4 years
    It looks like the link gives a 404 error. I think this link should give the intended page though: docs.julialang.org/en/v1/manual/performance-tips.