F# performance in scientific computing

20,207

Solution 1

  • F# does floating point computation as fast as the .NET CLR will allow it. Not much difference from C# or other .NET languages.
  • F# does not allow vector instructions by itself, but if your CLR has an API for these, F# should not have problems using it. See for instance Mono.
  • As far as I know, there is only one F# compiler for the moment, so maybe the question should be "how good is the F# compiler when it comes to optimisation?". The answer is in any case "potentially as good as the C# compiler, probably a little bit worse at the moment". Note that F# differs from e.g. C# in its support for inlining at compile time, which potentially allows for more efficient code which rely on generics.
  • Memory foot prints of F# programs are similar to that of other .NET languages. The amount of control you have over allocation and garbage collection is the same as in other .NET languages.
  • I don't know about the support for distributed memory.
  • F# has very nice primitives for dealing with flat data structures, e.g. arrays and lists. Look for instance at the content of the Array module: map, map2, mapi, iter, fold, zip... Arrays are popular in scientific computing, I guess due to their inherently good memory locality properties.
  • For scientific computation packages using F#, you may want to look at what Jon Harrop is doing.

Solution 2

I am curious as to how F# performance compares to C++ performance?

Varies wildly depending upon the application. If you are making extensive use of sophisticated data structures in a multi-threaded program then F# is likely to be a big win. If most of your time is spent in tight numerical loops mutating arrays then C++ might be 2-3× faster.

Case study: Ray tracer My benchmark here uses a tree for hierarchical culling and numerical ray-sphere intersection code to generate an output image. This benchmark is several years old and the C++ code has been improved upon dozens of times over the years and read by hundreds of thousands of people. Don Syme at Microsoft managed to write an F# implementation that is slightly faster than the fastest C++ code when compiled with MSVC and parallelized using OpenMP.

I have read that F# is supposed to be more scalable and more performant, but how is this real-world performance compares to C++?

Developing code is much easier and faster with F# than C++, and this applies to optimization as well as maintenance. Consequently, when you start optimizing a program the same amount of effort will yield much larger performance gains if you use F# instead of C++. However, F# is a higher-level language and, consequently, places a lower ceiling on performance. So if you have infinite time to spend optimizing you should, in theory, always be able to produce faster code in C++.

This is exactly the same benefit that C++ had over Fortran and Fortran had over hand-written assembler, of course.

Case study: QR decomposition This is a basic numerical method from linear algebra provided by libraries like LAPACK. The reference LAPACK implementation is 2,077 lines of Fortran. I wrote an F# implementation in under 80 lines of code that achieves the same level of performance. But the reference implementation is not fast: vendor-tuned implementations like Intel's Math Kernel Library (MKL) are often 10x faster. Remarkably, I managed to optimize my F# code well beyond the performance of Intel's implementation running on Intel hardware whilst keeping my code under 150 lines of code and fully generic (it can handle single and double precision, and complex and even symbolic matrices!): for tall thin matrices my F# code is up to 3× faster than the Intel MKL.

Note that the moral of this case study is not that you should expect your F# to be faster than vendor-tuned libraries but, rather, that even experts like Intel's will miss productive high-level optimizations if they use only lower-level languages. I suspect Intel's numerical optimization experts failed to exploit parallelism fully because their tools make it extremely cumbersome whereas F# makes it effortless.

How well does it do floating-point?

Performance is similar to ANSI C but some functionality (e.g. rounding modes) is not available from .NET.

Does it allow vector instructions

No.

how friendly is it towards optimizing compilers?

This question does not make sense: F# is a proprietary .NET language from Microsoft with a single compiler.

How big a memory foot print does it have?

An empty application uses 1.3Mb here.

Does it allow fine-grained control over memory locality?

Better than most memory-safe languages but not as good as C. For example, you can unbox arbitrary data structures in F# by representing them as "structs".

does it have capacity for distributed memory processors, for example Cray?

Depends what you mean by "capacity for". If you can run .NET on that Cray then you could use message passing in F# (just like the next language) but F# is intended primarily for desktop multicore x86 machines.

what features does it have that may be of interest to computational science where heavy number processing is involved?

Memory safety means you do not get segmentation faults and access violations. The support for parallelism in .NET 4 is good. The ability to execute code on-the-fly via the F# interactive session in Visual Studio 2010 is extremely useful for interactive technical computing.

Are there actual scientific computing implementations that use it?

Our commercial products for scientific computing in F# already have hundreds of users.

However, your line of questioning indicates that you think of scientific computing as high-performance computing (e.g. Cray) and not interactive technical computing (e.g. MATLAB, Mathematica). F# is intended for the latter.

Solution 3

In addition to what others said, there is one important point about F# and that's parallelism. The performance of ordinary F# code is determined by CLR, although you may be able to use LAPACK from F# or you may be able to make native calls using C++/CLI as part of your project.

However, well-designed functional programs tend to be much easier to parallelize, which means that you can easily gain performance by using multi-core CPUs, which are definitely available to you if you're doing some scientific computing. Here are a couple of relevant links:

Regarding distributed computing, you can use any distributed computing framework that's available for the .NET platform. There is a MPI.NET project, which works well with F#, but you may be also able to use DryadLINQ, which is a MSR project.

Solution 4

As with all language/performance comparisons, your mileage depends greatly on how well you can code.

F# is a derivative of OCaml. I was surprised to find out that OCaml is used a lot in the financial world, where number crunching performance is very important. I was further surprised to find out that OCaml is one of the faster languages, with performance on par with the fastest C and C++ compilers.

F# is built on the CLR. In the CLR, code is expressed in a form of bytecode called the Common Intermediate Language. As such, it benefits from the optimizing capabilities of the JIT, and has performance comparable to C# (but not necessarily C++), if the code is written well.

CIL code can be compiled to native code in a separate step prior to runtime by using the Native Image Generator (NGEN). This speeds up all later runs of the software as the CIL-to-native compilation is no longer necessary.

One thing to consider is that functional languages like F# benefit from a more declarative style of programming. In a sense, you are over-specifying the solution in imperative languages such as C++, and this limits the compiler's ability to optimize. A more declarative programming style can theoretically give the compiler additional opportunities for algorithmic optimization.

Solution 5

It depends on what kind of scientific computing you are doing.

If you are doing traditional heavy computing, e.g. linear algebra, various optimizations, then you should not put your code in .Net framework, at least not suitable in F#. Because this is at the algorithm level, most of the algorithms must be coded in an imperative languages to have good performance in running time and memory usage. Others mentioned parallel, I must say it is probably useless when you doing low level stuff like parallel an SVD implementation. Because when you know how to parallel an SVD, you simply won't use an high level languages, Fortran, C or modified C(e.g. cilk) are your friends.

However, a lot of the scientific computing today is not of this kind, which is some kind of high level applications, e.g. statistical computing and data mining. In these tasks, aside from some linear algebra, or optimization, there are also a lot of data flows, IOs, prepossessing, doing graphics, etc. For these tasks, F# is really powerful, for its succinctness, functional, safety, easy to parallel, etc.

As others have mentioned, .Net well supports Platform Invoke, actually quite a few projects inside MS are use .Net and P/Invoke together to improve the performance at the bottle neck.

Share:
20,207
Anycorn
Author by

Anycorn

Andrey Asadchev in real life. Currently working in Silicon Valley. Previously was a PostDoc at VT (MPQC project), before that PhD at Iowa State. My interests are FISHING!!!, C++, Python, numerical algorithms, parallel and scientific programming, code optimization.

Updated on July 05, 2022

Comments

  • Anycorn
    Anycorn almost 2 years

    I am curious as to how F# performance compares to C++ performance? I asked a similar question with regards to Java, and the impression I got was that Java is not suitable for heavy numbercrunching.

    I have read that F# is supposed to be more scalable and more performant, but how is this real-world performance compares to C++? specific questions about current implementation are:

    • How well does it do floating-point?
    • Does it allow vector instructions
    • how friendly is it towards optimizing compilers?
    • How big a memory foot print does it have? Does it allow fine-grained control over memory locality?
    • does it have capacity for distributed memory processors, for example Cray?
    • what features does it have that may be of interest to computational science where heavy number processing is involved?
    • Are there actual scientific computing implementations that use it?

    Thanks

  • Anycorn
    Anycorn almost 14 years
    interesting. my world is limited somewhat to fortran and C++, but then trying to expand my horizons. I have not really seen OCaml applications in my field
  • duffymo
    duffymo almost 14 years
    Sorry, I don't understand this comment at all.
  • Anycorn
    Anycorn almost 14 years
    most of them are still fortran because of inertia (I do not think fortran has much advantage today). the same goes for linpack (which is superseded by lapack). some recent blas implementations, such as atlas and goto are actually C and platform intrinsics, rather than fortran.
  • duffymo
    duffymo almost 14 years
    My data is dated, I'll admit. But I'd be interested in seeing a benchmark comparing Fortran and C today for linear algebra. The big key question: What language are vendors of modern, commercial packages using?
  • Anycorn
    Anycorn almost 14 years
    that I do not know. I looked at binary strings of mkl and that appears to be mixture of C and fortran, more fortran. however I would have thought that there would be some large hand tuned assembly for kernels. would be interesting to know indeed.
  • Rune FS
    Rune FS almost 14 years
    THe f# compiler might be new (and the performance of the code generated by the F# compiler therefor unknown) but the functional oriented part of F# is far from new. It can with no changes (this is only true for F# writen in a specific way) be compiled as OCaml which has be around for centuries. OCaml is provable a very optimizer friendly language (due to immutability for one) if the optimizer in the F# is on par with the OCaml optimizer then heavy number crunching is very much suited for F#
  • Onorio Catenacci
    Onorio Catenacci almost 14 years
    @Robert Harvey--I've heard that about OCaml as well. Blazing fast performance and small code as well.
  • kvb
    kvb almost 14 years
    @RuneFS - Achieving good performance in O'Caml often comes at the price of not using its higher-level constructs (see section 3.3 of janestreetcapital.com/minsky_weeks-jfp_18.pdf, for example). When talking about F# performance in the real world, the fact that the only current F# implementation runs on .NET (CLR or Mono) also means that certain optimizations may not be available. I am a huge F# fan, and in the future further optimizations may provide more speed, but at the moment I suspect that there are many applications where "optimal" C++ code would outperform "optimal" F# code.
  • Rune FS
    Rune FS almost 14 years
    @kvb I pretty much agree with you. I was just trying to point out that the language was not new but that the compiler is and your arguments would hold true under that statement (rephrased to make it clear what I ment: "The F# compiler is still a very new compiler")
  • Juliet
    Juliet almost 14 years
    "at the algorithm level, most of the algorithms must be coded in an imperative languages to have good performance in running time and memory usage" [citation needed]
  • Yin Zhu
    Yin Zhu almost 14 years
    the running time of these algorithms is measured in flops, high level languages are hard to measure this. The memory usage is also hard to predict, where in C and Fortran you are able to count precisely how much bytes you would be using.
  • Eamon Nerbonne
    Eamon Nerbonne almost 14 years
    F# is implemented in .NET, however, and that means it inherits some of its problems with regards to overspecification. F# functions are .NET methods behind the scenes, and these are guaranteed to execute in a particular order since they might have side effects - even if 99% of the time F# won't have these or you don't care about their order (e.g. debugging/logging statements). So, I'd caution about expecting too much performance from F# - it's nice; it can be reasonable fast - but it mostly gains brevity from its functional nature, not optimizability.
  • Eamon Nerbonne
    Eamon Nerbonne almost 14 years
    F# runs fast enough. I don't expect it's compiler to be able to drastically improve; the language is still at it's core a side-effect-permitting language which guarantees a particular order of execution; greatly constraining optimization. e.g. let f x y = (expensive x |> g) y is fundamentally different from let f x = expensive x |> g in F#, even though they are semantically equivalent in a functional world.
  • kvb
    kvb almost 14 years
    @Eamon - There are certainly challenges. However, I think that your position is overly bleak. Because F# runs on the CLR, improvements to either the F# compiler itself or the CLR JIT will affect performance. There are probably plenty of places where the .NET JIT compiler can be dramatically improved (e.g. skipping a wider variety of provably unnecessary array bounds checks, inlining heuristic improvements, etc.). Given that this is the first production release of a language created by a small team, I also wouldn't be surprised if further effort could improve the F# compiler's output.
  • kvb
    kvb almost 14 years
    Also, there's always the possibility that something like purity annotations might become available in a future version of the language, which would provide much more latitude for aggressive optimization.
  • Totti
    Totti almost 14 years
    -1: This is not true: "If you code LBFGS in F#, the performance can not be better than C++/CLI or C#, (but would be very close).". This is exactly the kind of application where F# can be a lot faster than C#.
  • Yin Zhu
    Yin Zhu almost 14 years
    @Jon Why? Do you mean 'parallel'?
  • Totti
    Totti almost 14 years
    Optimization algorithms are higher-order functions that accept the function to be minimized as an argument. This design pattern is better represented in F# where both functions can be marked inline in order to remove the performance overhead of the abstraction. C# cannot express this so you must pay a hefty performance penalty for the functional abstraction in C# but not in F#.
  • Totti
    Totti almost 14 years
    @Eamon: Actually most F# functions are not compiled into methods, execution order is not guaranteed in that sense (the compiler is free to reorder expressions just like the compilers for any high-performance language) and its functional nature does expose optimization opportunities (e.g. inlined higher-order functions with minimal overhead compared to hand-rolled loops).
  • Yin Zhu
    Yin Zhu almost 14 years
    @Jon. I have coded LBFGS, I know the tricks to improve performance and memory usage that must be coded in imperative style. FP seems to have good design patterns here, but the performance has less to do with style, especially for highly optimized numerical code. In most problems to use LBFGS, the time cost is mainly in the function value and gradient calculations, every few are used in LBFGS itself. Making it inline does boost the performance if there are far more LBFGS or line search iterations than computation in the function value and gradient. However, this is generally not true.
  • Yin Zhu
    Yin Zhu almost 14 years
    Second, I don't see the performance issue that directly pass an vector(an array pointer) to a function, run it and it returns you another pointer to the gradient array. Inline helps if this functions costs only a little time, when there is some overhead in the interaction. Because the gradient array is often of a big size, (this is why we need Limitedmemory-BFGS), we must make sure the gradient array is pre-allocated and reused in the future iterations. Just a lot of imperative thinking in the implementation in this kind of stuff.
  • Eamon Nerbonne
    Eamon Nerbonne almost 14 years
    Right, so if you use inlined functions and only use side-effect free operations (i.e. no .NET interop) then it can reorder. Unfortunately, as can be verified with reflector, plain F# functions are compiled into .NET methods. MS itself, on the MSDN page about inline functions, says "you should avoid using inline functions for optimization unless you have tried all other optimization techniques". But even if you do, what optimizations will F# make that similar code in C++ (static inline) couldn't make? With manual help, I'm sure F# is a step in the right direction - but it's no Haskell.
  • Eamon Nerbonne
    Eamon Nerbonne almost 14 years
    What I'm trying to say is not that it's impossible for F# to have specific advantages in particular situations, but that people shouldn't be lead to believe those advantages are in any way automatic or even always achievable. Semantically, the language is just not that different from C# - even if it encourages you to use structures that are side-effect free on a local scope and even if the currect compiler uses that information better that C#'s current compiler does. I really don't see how F#'s semantics enable more new compiler optimizations over, say, C++. No magic bullet, this...
  • Eamon Nerbonne
    Eamon Nerbonne almost 14 years
    Purity annotations might be a big win for performance. And I'm not trying to belittle F# - it's just that I see its benefits more on the code brevity and readability side, rather than expecting many performance benefits. I'd rather people choose F# for those reasons that because they think perf is better - and then discard it when they discover it rarely is. As to new-and-improved CLR optimizations: the CLR is 10 years old. While it's certainly not perfect, I wouldn't count on radical performance enhancements anymore; the obvious improvements will have already been made.
  • Totti
    Totti almost 14 years
    No, the main benefit of inline in F# is not that it removes the overhead of function calls but, rather, that it causes the CLR to type-specialize your code. If your LBFGS is only handling float array or vector inputs and outputs then you have type specialized it by hand for one particular case and that has made it much less useful. A general-purpose BFGS implementation should read its input and write its output directly in the user's data structures using functions that the user supplies. F# has a huge performance advantage over C# here.
  • Totti
    Totti almost 14 years
    Our modern commercial packages for numerical computing are written in F# and it beats Fortran quite happily. FFTW provides the FFT routines in MATLAB and is written in OCaml and beats everything else quite happily.
  • duffymo
    duffymo almost 14 years
    In my earlier comments I'm thinking about what you're calling high-performance computing, not interactive.
  • Totti
    Totti almost 14 years
    Why would you expect purity annotations to be a big win for performance? How is that not just a step towards Haskell and its dire performance?
  • kvb
    kvb almost 14 years
    @Jon - Is it your contention that there are no situations where an optimization (e.g. stream fusion, perhaps) could only be safely applied if a functions is known to be pure? How would being able to add purity annotations to an impure language put it on a path towards Haskell's "dire" performance?
  • Totti
    Totti almost 14 years
    @kvb: There are obviously optimizations that depend upon knowledge of purity but Haskell already showed that such optimizations are virtually useless in practice: they unpredictably provide insignificant performance improvements in a few practically-irrelevant corner cases so developers end up enforcing the same optimizations by hand anyway to ensure that they get done when it is important. Where is this "big win" in performance of adding purity annotations supposed to come from?
  • Ben Voigt
    Ben Voigt almost 14 years
    "it's easier to figure out the performance by inspection in an imperative language" is VERY different from "only imperative languages give good performance". And also wrong. Second-order effects such as cache coherency are so important on modern processors, that measuring algorithms in FLOPs is worthless. Between a FLOP-optimized algorithm and a locality-optimized algorithm that required 10x the FLOPs, the locality-optimized algorithm will win. Repeat after me: the FPU is no longer the bottleneck.
  • ZXX
    ZXX over 13 years
    You haven't exactly posted that F# implenentation that allegedly outperformed MATLAB :-)
  • Totti
    Totti over 13 years
  • Totti
    Totti over 13 years
    @Eamon: As Haskell has shown, the optimizations that purity facilitates rarely lead to better performance in practice and usually result in uselessly unpredictable performance. Thanks to languages like Haskell, we now know that predictable performance characteristics are more valuable in practice than the mythical sufficiently smart compiler. Fortunately, F# did not copy Haskell's mistake.
  • Totti
    Totti over 13 years
    The optimization of pattern matches is one area where F# has the potential to do a lot but C# does nothing. This is relevant to symbolic computations in scientific computing. Not uncoincidentally, some of the world's largest symbolic computations were written in F#'s predecessor, OCaml.
  • Eamon Nerbonne
    Eamon Nerbonne over 13 years
    @Jon: you're talking about enforced, absolute purity. I'm talking about the fact that F# can't take advantage of purity where practical at all (well, except if you use inline) - and that hurts. An automatically inferred (or occasionally manually placed) purity annotation could dramatically improve performance in many cases and leave semantics otherwise completely unaltered - compare with non-aliases parameters in some languages, or rvalue references in C++, or inlined functions in many languages. It doesn't require a super-smart compiler; detecting unmodified expressions here today.
  • Eamon Nerbonne
    Eamon Nerbonne over 13 years
    Another example is .NET's ngen-across-image boundaries attribute (I forget what it's called)
  • Totti
    Totti over 13 years
    Can you give concrete examples of practical cases where you believe purity can dramatically improve performance?
  • user492238
    user492238 about 12 years
    @Jon Harrop 'memory locality? Better than most memory-safe languages but not as good as C' Which options for such locality-control exist for C, which are not available in F#? And is this a language or a platform restriction? Thanks
  • Totti
    Totti about 12 years
    @user492238: In C, you can do things like smuggling bits in pointers and obtain interior pointers that point into the middle of a heap-allocated block of memory. Garbage collected languages will almost always prohibit this. So there are some sacrifices but they are relatively tiny.
  • user492238
    user492238 about 12 years
    @Jon Harrop. Right. But does this really differ from indexing into an array? Furthermore, as you know, C# does allow extensive pointer manipulation. But right, this question goes about F#.
  • Totti
    Totti about 12 years
    @user492238: Indexing into an array requires you to multiply the index by the size of an element and add that to the pointer to the start of the array before you can dereference an element. That is more expensive than just dereferencing a pointer but the difference is usually masked by bigger cache effects. Writing pointers is much more expensive in managed languages than C or C++ due to the write barrier (and that will also impact locality). C unions are another example.
  • user492238
    user492238 about 12 years
    @JonHarrop Thanks. I agree on the first part (even if its not true for C# and its pointers ;). Which write barrier are you referring to? It sounds like there would be some mechanism preventing a computational loop in IL to perform similar fast then the same loop in C++? Not that I knew of .. ?
  • Totti
    Totti about 12 years
    @user492238: "not true for C#". In what sense is that not true in C#? The write barrier I was referring to is a piece of code injected by the VM whenever user code writes a pointer into the heap. It is used to keep the GC apprised of changes to the heap topology: memorymanagement.org/glossary/w.html#write.barrier
  • user492238
    user492238 about 12 years
    @JonHarrop Ah. The remembered list of younger objects. Right. "In what sense.." C# allows this dereferencing as well: double[A]; fixed(double* a = A) { a = a + 2000; a[0] = 4.0; a[1] = 5.0; is equiv. to A[2001] = 5.0; Once optimized JIT'ed, there is no overhead for execution.
  • Totti
    Totti about 12 years
    @user492238: "there is no overhead for execution". Interior pointers must be saved and reloaded across potential GC safe points because .NET has a moving collector that might move the heap block that the interior pointer is pointing into. There must also be some way to find the parent heap block given an interior pointer in order to prevent the GC from reclaiming it. This is most likely accomplished by carrying the parent pointer around "inside" the interior pointer. So I doubt there is no overhead in the general case.
  • user492238
    user492238 about 12 years
    @JonHarrop By watching the generated machine instructions it gets clear, there is really no overhead. Dont exactly know, how this relates to your concerns of pointers and movements by the GC. Possibly, because such pointer operations are only allowed inside a fixed context. So the target heap area is safe and not getting moved.
  • Totti
    Totti about 12 years
    @user492238 "only allowed inside a fixed context". Do you mean you cannot take an interior pointer and store it in the heap in C#? You can in C, of course.
  • ziggystar
    ziggystar about 10 years
    So are you telling us, that your parallel QR decomposition is faster than a single-threaded one? Because it reads like that.
  • Totti
    Totti almost 9 years
    @ziggystar: Intel's MKL is parallelised, of course.
  • Matthieu M.
    Matthieu M. over 8 years
    I would just like to point out that the question was F# vs C++ and this answer is F# vs C# and that C++ and C# are different languages.
  • VoronoiPotato
    VoronoiPotato over 7 years
    @JonHarrop it's very hard to write fortran code that is always faster, however possible and should he be able to write it duffymo's peers certainly won't be able to read it.
  • duffymo
    duffymo over 7 years
    I see no reason to single me out on a comment that's five years old. You have no idea what I can read and understand.
  • jackmott
    jackmott over 7 years
    This post is full of unsubstantiated assertions. The idea that F# easily lets you create more performant code than C++ is especially questionable. I've been pretty deeply involved in F#, including many PRs to speed up the higher order Array functions and I can assure you this is not generally the case. That the creator of F# can create a faster thing in F# than you can in C++ may speak more to your relative talents in each language than any innate property of them.
  • Totti
    Totti over 7 years
    @jackmott: I opened with "If you are making extensive use of sophisticated data structures in a multi-threaded program then F# is likely to be a big win. If most of your time is spent in tight numerical loops mutating arrays then C++ might be 2-3× faster" so I think it is pretty clear I'm not talking about the functions of the Array module here.
  • kam
    kam almost 3 years
    Just to add a comment on it years later. F# has access to native untyped pointer arithmetic through nativeint and nativepointer, and the ability to allocate both stack and heap memory, through nativeInterop and Runtime.InteropServices.Marshall.