How much faster is C++ than C#?

c# c++ performance benchmarking

203,206

Solution 1

There is no strict reason why a bytecode based language like C# or Java that has a JIT cannot be as fast as C++ code. However C++ code used to be significantly faster for a long time, and also today still is in many cases. This is mainly due to the more advanced JIT optimizations being complicated to implement, and the really cool ones are only arriving just now.

So C++ is faster, in many cases. But this is only part of the answer. The cases where C++ is actually faster, are highly optimized programs, where expert programmers thoroughly optimized the hell out of the code. This is not only very time consuming (and thus expensive), but also commonly leads to errors due to over-optimizations.

On the other hand, code in interpreted languages gets faster in later versions of the runtime (.NET CLR or Java VM), without you doing anything. And there are a lot of useful optimizations JIT compilers can do that are simply impossible in languages with pointers. Also, some argue that garbage collection should generally be as fast or faster as manual memory management, and in many cases it is. You can generally implement and achieve all of this in C++ or C, but it's going to be much more complicated and error prone.

As Donald Knuth said, "premature optimization is the root of all evil". If you really know for sure that your application will mostly consist of very performance critical arithmetic, and that it will be the bottleneck, and it's certainly going to be faster in C++, and you're sure that C++ won't conflict with your other requirements, go for C++. In any other case, concentrate on first implementing your application correctly in whatever language suits you best, then find performance bottlenecks if it runs too slow, and then think about how to optimize the code. In the worst case, you might need to call out to C code through a foreign function interface, so you'll still have the ability to write critical parts in lower level language.

Keep in mind that it's relatively easy to optimize a correct program, but much harder to correct an optimized program.

Giving actual percentages of speed advantages is impossible, it largely depends on your code. In many cases, the programming language implementation isn't even the bottleneck. Take the benchmarks at http://benchmarksgame.alioth.debian.org/ with a great deal of scepticism, as these largely test arithmetic code, which is most likely not similar to your code at all.

Solution 2

C# may not be faster, but it makes YOU/ME faster. That's the most important measure for what I do. :)

Solution 3

I'm going to start by disagreeing with part of the accepted (and well-upvoted) answer to this question by stating:

There are actually plenty of reasons why JITted code will run slower than a properly optimized C++ (or other language without runtime overhead) program including:

compute cycles spent on JITting code at runtime are by definition unavailable for use in program execution.
any hot paths in the JITter will be competing with your code for instruction and data cache in the CPU. We know that cache dominates when it comes to performance and native languages like C++ do not have this type of contention, by design.
a run-time optimizer's time budget is necessarily much more constrained than that of a compile-time optimizer's (as another commenter pointed out)

Bottom line: Ultimately, you will almost certainly be able to create a faster implementation in C++ than you could in C#.

Now, with that said, how much faster really isn't quantifiable, as there are too many variables: the task, problem domain, hardware, quality of implementations, and many other factors. You'll have run tests on your scenario to determine the the difference in performance, and then decide whether it is worth the the additional effort and complexity.

This is a very long and complex topic, but I feel it's worth mentioning for the sake of completeness that C#'s runtime optimizer is excellent, and is able to perform certain dynamic optimizations at runtime that are simply not available to C++ with its compile-time (static) optimizer. Even with this, the advantage is still typically deeply in the native application's court, but the dynamic optimizer is the reason for the "almost certainly" qualifier given above.

In terms of relative performance, I was also disturbed by the figures and discussions I saw in some other answers, so I thought I'd chime in and at the same time, provide some support for the statements I've made above.

A huge part of the problem with those benchmarks is you can't write C++ code as if you were writing C# and expect to get representative results (eg. performing thousands of memory allocations in C++ is going to give you terrible numbers.)

Instead, I wrote slightly more idiomatic C++ code and compared against the C# code @Wiory provided. The two major changes I made to the C++ code were:

used vector::reserve()
flattened the 2d array to 1d to achieve better cache locality (contiguous block)

C# (.NET 4.6.1)

private static void TestArray()
{
    const int rows = 5000;
    const int columns = 9000;
    DateTime t1 = System.DateTime.Now;
    double[][] arr = new double[rows][];
    for (int i = 0; i < rows; i++)
        arr[i] = new double[columns];
    DateTime t2 = System.DateTime.Now;

    Console.WriteLine(t2 - t1);

    t1 = System.DateTime.Now;
    for (int i = 0; i < rows; i++)
        for (int j = 0; j < columns; j++)
            arr[i][j] = i;
    t2 = System.DateTime.Now;

    Console.WriteLine(t2 - t1);
}

Run time (Release): Init: 124ms, Fill: 165ms

C++14 (Clang v3.8/C2)

#include <iostream>
#include <vector>

auto TestSuite::ColMajorArray()
{
    constexpr size_t ROWS = 5000;
    constexpr size_t COLS = 9000;

    auto initStart = std::chrono::steady_clock::now();

    auto arr = std::vector<double>();
    arr.reserve(ROWS * COLS);

    auto initFinish = std::chrono::steady_clock::now();
    auto initTime = std::chrono::duration_cast<std::chrono::microseconds>(initFinish - initStart);

    auto fillStart = std::chrono::steady_clock::now();

    for(auto i = 0, r = 0; r < ROWS; ++r)
    {
        for (auto c = 0; c < COLS; ++c)
        {
            arr[i++] = static_cast<double>(r * c);
        }
    }

    auto fillFinish = std::chrono::steady_clock::now();
    auto fillTime = std::chrono::duration_cast<std::chrono::milliseconds>(fillFinish - fillStart);

    return std::make_pair(initTime, fillTime);
}

Run time (Release): Init: 398µs (yes, that's microseconds), Fill: 152ms

Total Run times: C#: 289ms, C++ 152ms (roughly 90% faster)

Observations

Changing the C# implementation to the same 1d array implementation yielded Init: 40ms, Fill: 171ms, Total: 211ms (C++ was still almost 40% faster).
It is much harder to design and write "fast" code in C++ than it is to write "regular" code in either language.
It's (perhaps) astonishingly easy to get poor performance in C++; we saw that with unreserved vectors performance. And there are lots of pitfalls like this.
C#'s performance is rather amazing when you consider all that is going on at runtime. And that performance is comparatively easy to access.
More anecdotal data comparing the performance of C++ and C#: https://benchmarksgame.alioth.debian.org/u64q/compare.php?lang=gpp&lang2=csharpcore

The bottom line is that C++ gives you much more control over performance. Do you want to use a pointer? A reference? Stack memory? Heap? Dynamic polymorphism or eliminate the runtime overhead of a vtable with static polymorphism (via templates/CRTP)? In C++ you have to... er, get to make all these choices (and more) yourself, ideally so that your solution best addresses the problem you're tackling.

Ask yourself if you actually want or need that control, because even for the trivial example above, you can see that although there is a significant improvement in performance, it requires a deeper investment to access.

Solution 4

It's five oranges faster. Or rather: there can be no (correct) blanket answer. C++ is a statically compiled language (but then, there's profile guided optimization, too), C# runs aided by a JIT compiler. There are so many differences that questions like “how much faster” cannot be answered, not even by giving orders of magnitude.

Solution 5

In my experience (and I have worked a lot with both languages), the main problem with C# compared to C++ is high memory consumption, and I have not found a good way to control it. It was the memory consumption that would eventually slow down .NET software.

Another factor is that JIT compiler cannot afford too much time to do advanced optimizations, because it runs at runtime, and the end user would notice it if it takes too much time. On the other hand, a C++ compiler has all the time it needs to do optimizations at compile time. This factor is much less significant than memory consumption, IMHO.

View more solutions

203,206

Trap

I'm a software developer working for a videogame company.

Updated on July 26, 2021

Comments

Trap almost 3 years

Or is it now the other way around?

From what I've heard there are some areas in which C# proves to be faster than C++, but I've never had the guts to test it by myself.

Thought any of you could explain these differences in detail or point me to the right place for information on this.
- Yinda Yin about 13 years
  
  Protected, to prevent any more random benchmarks from being posted. If you think you can make your case, you will need 10 rep to do so.
- pixelpax about 6 years
  
  It's almost a moot question, given that we live in an age in which IL can be converted to CPP and optimized from there: docs.unity3d.com/Manual/IL2CPP.html
- Seva Alekseyev about 4 years
  
  A language that checks for out of range array access will never outperform one that doesn't.
- Trap about 4 years
  
  @SevaAlekseyev It's not the language who does this but the compiler. One of the reasons C++ is that fast (apart from the obvious ones) is that C++ compilers have been around for past 35 years (if not more). There's nothing that prevents C# compilers to get better over time. For the case you mention, please read this stackoverflow.com/questions/16713076/…
Martin York over 15 years

<quote>code in interpreted languages gets faster in later versions of the runtime</quote> As code compiled by a better version of the compiler will also get faster.
Nemanja Trifunovic over 15 years

In fact there is at least one reason: JIT needs to be fast, and cannot afford to spend time on various advanced optimizations available to a C++ compiler.
Karl over 15 years

Could you name a few examples? Games written in C# what you've found slow
Trap over 15 years

I'd say it's quicker most of the time :)
Martin Probst over 15 years

@Nemanja Trifunovic: depends on your scenario. In server applications, JIT doesn't really need to be fast - you can amortize the cost over time very well, and perform incremental enhancements on the code.
Martin Probst over 15 years

Actually, Java VMs (and probably .NET) go to great lengths to avoid dynamic dispatch. Basically, if there is a way to avoid polymorphims, you can be pretty sure your VM will do it.
David The Man over 15 years

Even the example applications that came with the installation felt slow.
Konrad Rudolph over 15 years

I'm aware of the VMs' abilities. However, this goes much farther. The point is that template C++ codes do use “dynamic” dispatching, or rather, something analogous.
Robert Fraser over 15 years

I like C++, but that's probably because I program games where almost everything is insanely math heavy (physics, collison, weighted mesh deformation - stuff like that.)
Paolo Di Pietro over 15 years

I fully agree. I wonder why people expect a precise answer (63.5%), when they ask a general question. I don't think there is no general answer to this kind of question.
Michael Entin over 15 years

Most modern games are GPU-limited. For such games it does not matter if the logic (executed on CPU) is 10% slower, they are still limited by GPU, not CPU. Garbage collector is a real problem, causing random short freezes if the memory allocations are not tuned well.
Todd Gamblin over 15 years

"but also commonly leads to errors due to over-optimizations." [citation desperately needed]. I work at a national lab, and we optimize the hell out of our code. This does not commonly result in buggy code.
user49117 over 15 years

Yes. The answer is "It depends.".
Blake about 15 years

@martinprobst "with a great deal of scepticism" - no, not scepticism. The appropriate attitude is curiosity - see the benchmarks game FAQ "Flawed Benchmarks".
Alex over 14 years

Have you got any evidence to support your outrageous five oranges claim? My experiments all point to 2 oranges at most, with a 3 mango improvement when doing template metaprogramming.
Zan Lynx about 14 years

I'm not a huge Java fan but there's nothing that says Java can't use a real-time friendly GC.
Arafangion almost 14 years

You're using different data structures and library code there, although "370 seconds" does indicate something horrible - you aren't running it in the debugger by any chance are you? I suspect that the performance of the CSV library you are using is more interesting than the performance of the language you are using. I would question the use of a vector in that context, and what optimisations you used. Additionally, it is widely known that iostreams (in particular, the "myfile << *j << ", ";") is much slower than other methods of writing to the file, for at least some common implementations.
Arafangion almost 14 years

Finally, you're doing more work in the C++ version. (Why are you clearing the csvColumn, csvElement and csvLines?)
Arafangion almost 14 years

There are plenty of real-time GC implementations if you care to look. (GC is an area that is overflowing with research papers)
gradbot almost 14 years

"It's relatively easy to optimize a correct program, but much harder to correct an optimized program."
Roman Starkov over 13 years

+1 I always have trouble explaining this to my C# colleagues who know little C++ in a way that would enable them to appreciate the significance. You've explained it rather nicely.
yzorg over 13 years

But when combing code from multiple owners I believe it is still true that template instantiations are very hard to share across module boundaries. I'm talking about sharing common code like List<T> or vector<T> across many modules in an application. So for composable systems (many modules, many owners) runtimes like the CLR start to make up for their fixed overhead by reducing thrashing of the CPU cache with many copies of the same template instantiations. I think over time the C++ performance lead will shrink until only niche libraries and untyped C libraries remain.
Konrad Rudolph over 13 years

@crtracy: you are making your bet without high-performance computing applications. Consider weather forecasting, bioinformatics and numeric simulations. The performance lead of C++ in these areas will not shrink, because no other code can achieve comparable performance at the same level of high abstraction.
Wouter van Nifterick over 13 years

@callmesteve: I know what you mean, but your last sentence should sound like nails over a chalk board to any programmer.
Aidiakapi about 13 years

@Matin York: But the new JIT can also execute older assemblies faster, the new compiler is useless without a new compile ;)
BlueRaja - Danny Pflughoeft almost 13 years

@postfuturist: That is not true on PC; the garbage collector does such a good job of getting in and out I've never experienced any problems with it. However, on XBox 360 and Zune/Windows-7-Phone, the garbage collector is not nearly as smart as on PC; I've never written for either, but people who have tell me the garbage collector is a huge problem.
Wiory almost 13 years

I'd love to see index accessor in std::list. Anyway, it takes 37 secs with list, release mode. Release without debugging: 3s list, 0,3 s vector. Probably dereferencing issue or sth. Sample: nopaste.pl/12fb
Totti over 12 years

-1: This is actually a myth. Firstly, the latency of idiomatic C++ is actually awful and often much worse than .NET because RAII causes avalanches of destructors when large data structures fall out of scope whereas modern GCs are incremental and .NET's is even concurrent. Secondly, you can actually completely remove GC pauses on .NET by not allocating.
Totti over 12 years

Best case for VMs will be run-time compilation of generated code (e.g. to match a regular expression read in at run time) because statically compiled vanilla C++ programs can only use interpretation because they do not have a JIT compiler built in.
Totti over 12 years

FWIW, Richard Jones just published an updated version of his garbage collection book that covers, amongst other things, state-of-the-art real-time GC designs.
Florian Doyon over 12 years

If you do this, you then have to forego using the BCL as most of the methods create transient objects.
Bogatyr over 12 years

In one project at work we had to mine gargantuan amounts of data, including holding many GB in memory simultaneously and performing expensive calculations on all of it -- this required precise control of all allocations, C++ was pretty much the only choice. +1 for C++. On the other hand, that was just one project, we spent most of our time writing systems which interacted with slow simulators, and debugging could be a nightmare, so I wished we could have used a programmer-time-optimizing language for all that other stuff.
Justin over 12 years

This is quite true, it wasn't until .net 4 that the GC was made incremental. We have a large C# app that pauses for seconds at a time for GC. For performance critical apps this is a killer.
filozof almost 12 years

I was intrigued by your answer. Have you tested the same benchmark with unsafe code and lockbits, and drawing the random lines yourself? Now that would be an interesting thing to look at.
QBziZ almost 12 years

@Pedery nope I haven't. just using the GDI and .NET.Graphics in the most basic of ways. what do you mean by "drawing the random lines yourself"?
filozof almost 12 years

Then you should perhaps consider to test this to get a more realistic metrics for how fast C# can be. Here's a nice overview of the technique: bobpowell.net/lockingbits.htm
QBziZ almost 12 years

That is not what we want to do, putting separate pixels in a frame buffer ourselves. If you have to implement everything yourself what's the point of having an API/Platform to code against? For me this is a non-argument. We never needed to put separate pixels in a framebuffer in GDI for drawing lines, and we are not planning to do this in .NET neither. In my view, we did use a realistic metric, and .NET turned out to be slow.
filozof almost 12 years

It's realistic since sometimes you need to speed up your graphics and you can do this while still working within C#. I think it's interesting you have tested this so it could have been moreso interesting to see whether the C# lockbits method is as fast as its C++ counterparts.
VoronoiPotato over 11 years

There's a reason why programs that tend to push the hardware tend to use C++. You have more fine tuned control when you need it. Performance is only key when you're pushing the system, otherwise use C# or Java to save you time.
Admin over 11 years

C# is programmed in C++, so I don't see how it could possibly be faster any faster than C++. At best C# can only be as fast as C++. Then again, .NET has a managed heap, managed thread pool, etc. that may make your code faster than a C++ program that does not have these managers - but these are features of .NET and not the C# programming lanugage itself.
Admin over 11 years

..to clarify: the C# compiler and the .NET CLR & CLI is programmed in C++.
Admin over 11 years

You can force the managed heap to dispose objects. See msdn.microsoft.com/en-us/library/… for more info.
Nemanja Trifunovic over 11 years

@IngeHenriksen: I am well aware of the Dispose pattern, but it does not help with the managed memory at all.
Martin Probst over 11 years

Inge: not sure you're on the right track there. Yes, C# is implemented in another language, but the JIT compiler is generating machine code, so it's not an interpreted language. Thus it is not limited by its C++ implementation. I'm not quire sure why you think adding some manager to something inherently makes it faster.
doug65536 over 11 years

@IngeHenriksen disposing it only ensures that the Dispose method has been called. Disposal never frees garbage collected memory. The Dispose method is only intended for cleaning up unmanaged resources like file handles and has nothing to do with memory management.
doug65536 over 11 years

Every iteration of the while loop is going to destruct and reconstruct a std::istream and a std::vector and a std::string. The while body goes out of scope every iteration, all those variables inside the while scope are going to destruct and construct on every iteration.
deceleratedcaviar over 11 years

@doug65536 yes I realise that now, having done C#. Perhaps I'll just remove all that from history...
doug65536 almost 11 years

@Daniel ok, I'll remove my comment too.
Totti over 10 years

@NemanjaTrifunovic: "JIT needs to be fast, and cannot afford to spend time on various advanced optimizations available to a C++ compiler". That is often claimed but I have never seen any evidence for it. Can you cite some concrete examples of such optimizations?
Totti over 10 years

@NemanjaTrifunovic: "JIT compiler cannot afford too much time to do advanced optimizations". Can you cite some optimizations that are not done by JITs because they would take too long?
Totti over 10 years

C# has native code compilation of run-time generated code which is a more general solution to this problem. For example, regular expressions are compiled to machine code in C# whereas C++ resorts to an interpreter. You cannot use templates to do that in C++ if the regular expression is only available at run-time. So, in the general case, C# is orders of magnitude faster than C++ in the context of metaprogramming.
Konrad Rudolph over 10 years

@Jon Use a better regex engine in C++ then (there might not be one; but that’s not a fundamental restriction – but actually there are such engines, e.g. Boost.Xpressive). Your second statement is laughable.
Puppy over 10 years

You can use a library like LLVM to dynamically generate native code in C++ if you want to. There's no language restriction against it. It would be a legal implementation of std::regex.
Konrad Rudolph over 10 years

@JonHarrop Incidentally, to avoid misunderstanding, I concede that C# metaprogramming is more powerful because you can use it at runtime. I’m taking issue with your “orders of magnitude faster” claim which is flat out wrong.
Totti over 10 years

@DeadMG: Yes, of course. You can solve any problem quickly in any language by generating custom machine code.
Totti over 10 years

@KonradRudolph: "I’m taking issue with your “orders of magnitude faster” claim which is flat out wrong". How big do you think the performance difference between an interpreter and the code generated by an optimizing compiler is? I've seen Mathematica run 100,000x slower than C...
Konrad Rudolph over 10 years

@Jon Apples and oranges. Your specific claim was “C# is orders of magnitude faster than C++ in the context of metaprogramming”, not “using precompiled code is orders of magnitude faster than interpreted code”. While we’re at it, your claim that runtime code generation is “more general” than compile-time code generation is also clearly wrong – they both have strengths and weaknesses. Compile-time code generation uses the type system to provide static type safety – runtime code generation cannot do that (it can provide strong type safety, but not static type safety).
Totti over 10 years

@KonradRudolph: "your claim that runtime code generation is “more general” than compile-time code generation is also clearly wrong". Run-time codegen can handle both statically- and dynamically-available programs whereas compile-time can only handle statically-available programs. "runtime code generation cannot do that". MetaOCaml is a counter example.
Konrad Rudolph over 10 years

@Jon That’s all nice and well but completely academic. Neither C++ nor C# can do this. This whole discussion is about C# and C++.
Totti over 10 years

@KonradRudolph: Perhaps you could comment on the 13x performance difference I've described here: stackoverflow.com/questions/19798653/c-vs-net-regex-performa‌nce
Konrad Rudolph over 10 years

@BlueRaja There’s a great article which explains how and when GCs are slow. Given high enough memory pressure, they can be slow even on a PC. I don’t know the exact requirements for AAA game titles but it’s not hard to imagine that they routine go beyond the boundaries within which a modern GC performs well.
eonil about 10 years

@Aidiakapi JIT means you have a compiler at runtime environment. There's no reason not to recompile C++ program with a new compiler.
eonil about 10 years

Just talking like "there's a RTGC" doesn't answer the question about determinism. I believe many people are still in doubt on this due to lack of real clear explanation.
Aidiakapi about 10 years

@Eonil Except that it requires redeployment? For server systems, sure go ahead and recompile software with more modern compilers that have more advanced optimizations. But for the consumer market you usually can't do that until you ship a new version. With .NET a program compiled 8 years ago, may suddenly become faster because of an optimization in a new .NET Framework.
Aidiakapi about 10 years

@KonradRudolph Although games often use a lot of memory, most of it can remain untouched. Especially with loading screens a lot of data is retrieved from the HDD, put in RAM, transferred to video card, and some of it can be reclaimed. By forcing a full collection of the GC after that loading phase, a lot of pressure can be released during run time. General game's performance problems come from requiring lots of processing power in small time frames while having spare in others. The GC is pretty good at determining when you have spare, and will try to collect in those frames.
Konrad Rudolph about 10 years

@Aidiakapi Not to contradict you in principle, but when (in the game loop during normal game time) is there so much less pressure that a large collection could run without impacting the experience? Also, even when memory remains untouched it still needs to be available. Maybe pervasive use of weak references (to implement a cache, and reload from secondary memory upon collection) could help, but GCs only work well when there is vastly more memory available than in use. And finally, theory’s one thing: in practice, which modern GC reliably does what you’ve described?
Henk Holterman almost 10 years

This is a nonsense argument, Windows (and Linux) are not Real Time OSes. Your C++ code could be swapped out for a number of 18 ms slots at any time too.
Arsalan Ahmad almost 10 years

What do you think about the new C# "roslyn" compiler? I've heard it really speeds up the code significantly
Sam over 9 years

For more precise measurements you shouldn't be using System.DateTime.Now, but rather, the Stopwatch class.
Zachary Kraus over 9 years

Part of the reason you are getting such slow fill times for the vector in C++ is you are using push_back. This has been shown on numerous posts to be slower than using the at method or operator []. In order to use either of those methods you need to use the resize or reserve method. Additionally, the reason your initialization is taking so long for c++ vector case is that you are forcing a copy or assignment operator [not sure which in this case) to initialize your c++ vector. For the array in c++ there is an algorithm that uses 2 new calls rather than 5001 and is faster iterating as well.
Zachary Kraus over 9 years

from the looks of reading your c++ code you are trying to copy from one file to another file. Instead of using the complex interactions between file streams, strings, vectors and string streams, you could have just copied the input file stream to the output file stream. This would have saved a lot of time and memory.
Aidiakapi almost 9 years

@ArsalanDotMe Roslyn is a compiler from C# to bytecode. There's very few optimizations done at this stage, since aggressively optimizing at that point would prevent more advanced optimizations at the JIT time. However, you're probably talking about RyuJIT, the new JIT compiler, which does have several major performance improvements.
AndersK over 8 years

actually the main point is that certain things cannot be done effectively in C# but can be done effectively in C++ e.g. AMP Maybe it is just temporary, but currently it is like that.
emlai over 8 years

"you will just need to remember to pair your new T[] with a corresponding delete[]" – No you don't. There's std::unique_ptr to do that for you.
Krythic over 8 years

@postfuturist If you program your game correctly you won't use the GC at all. Most data types and calculations are done via structs, and objects typically persist during the entire lifetime of a level. When the level is over, simply force GC before loading the next level. (Speaking of C# in this regard)
Krythic over 8 years

In terms of C# game dev: You can properly code a game to never use the GC until a level transition. All math is done via structs, this means you won't need to init objects or toss a continuous load onto the GC. In fact, the biggest optimization you can do for C# game dev is to try not to use the GC at all. C# also allows you to force GC at any time, which can be done at level transitions. Basically, the GC is only a problem if you let it be.
zackery.fix over 8 years

@HenkHolterman True, but you could always write a boot-loader in assembly, tie that into a kernel bootstrap for your application and execute your C++ apps directly against the hardware (in RT btw). You can't do this in C# and any efforts that I have seen only mimic pre-compiled assembly in C# and use a ton of C code, which make it point-less to use C#. Reading all this is kinda funny, because C# is truly useless without the .NET framework.
zackery.fix over 8 years

In C++, you have the option of using different allocation methods, so depending on how memory was allocated (AOT?) in C#, it could be done the same way (but much faster) in C++.
DAG about 8 years

I think you didn't do c++ in a appropriate way. Just a glance and found so many issues. E.g. vector<vector<double>> myList=vector<vector<double>>()
DAG about 8 years

And have you noticed how many times you constructed such a 2d vector? And do you mind do a reserve() to accelerate push_back()? I think your benchmark does not make any sense. And the way you measures clocks are far from accurate as well. Just my 2 cents
DAG about 8 years

if you can't manage the cache behavior, you can't beat optimized c++ code. A cache miss from L1 to main memory could slow your operation 100 times.
Peter about 8 years

asuming you wrote something in graphics why write safe code in c#, have you considered using unsafe code and compare again ?.
Peter about 8 years

to do speed tests, test things in memmory dont get to disk IO, unsless your testing on the latest SSD's and its dedicated to your performance app. As computers constantly write to disk, even if you dont touch the keyboard.
Peter about 8 years

realtime computers are a oxymoron
Peter about 8 years

Disagree i wrote a camere filter that does do blob detection in 20ms
Peter about 8 years

actually C# does include ways to inline your functions using System.Runtime.CompilerServices; ... [MethodImpl(MethodImplOptions.AggressiveInlining)] void MyMethod(...)
Konrad Rudolph about 8 years

@user3800527 Nobody disputes that, but I don't know a single implementation of .net which could use that to inline, say, calls to an IComparer inside the loop of a sorting algorithm. C++ can do that.
Peter about 8 years

you place it above the function your calling, you can do it multiple times, so a function can use a sub-funtion, and the subfuction will be placed inside your main function (when compiled) (ea no jumps or calls); but you m ight need to write your own sorting mechanisms (and maybe even paralize them)
Peter about 8 years

But also memory cheap these days, is it a bigger problem to add 32GB, or instead let someone code a few month's more in c++ (large project), as compared to c#.
Nemanja Trifunovic about 8 years

@user3800527: Even if adding RAM was always feasible (and it is not - imagine Microsoft adding RAM to each MS Office user) that will not solve the problem. Memory is hierarchical and a C# program will have many more cache misses than a C++ one.
Konrad Rudolph about 8 years

@user3800527 That won’t work if the sorting algorithm is generic and can be called with distinct IComparer types. That’s the whole problem here, and which is solved by C++’ compile-time generics (= templates).
Peter about 8 years

not exactly sure what you have in mind here, maybe put your data in a struct, and your functions inside a class so you could do without object-functions. (but maybe i need to see the problem you have) structs arrays lists hashtables and objects classes are all design and performance related choises, but not always properly chosen.
Konrad Rudolph about 8 years

@user3800527 I think you’re missing the whole point of this answer. Of course you can work around this by breaking encapsulation and dropping down to low-level structures — you can write assembly in (most) any language. The thing that makes C++ (almost) unique, and uniquely suited for high-performance programming, is that you can build high-level abstractions that come at no runtime cost. So you don’t need to write assembly-like code in C++ to get premium performance: a well-written sort(arr, generic_comparer) will be as efficient as a hand-written loop in C++. It never will be in C#.
QBziZ about 8 years

Well, I have just a slight idea what blob detection is, but just stating one timing does rather prove nothing at all. Have you written one in C++? In JavaScript? And compared those to the one in C#? And besides that, I don't think blob detection is using many graphics primitives. Correct me if wrong, but I guess it's statistical algorithms performing operations on pixels.
Johan Boulé about 8 years

[Donald E. Knuth, 2008-04-25] "X" can produce thousands of binaries, tuned perfectly to the configurations of individual users, whereas "Y" usually will exist in only a few versions. A generic binary executable file must include things like inefficient "sync" instructions that are totally inappropriate for many installations; such wastage goes away when the source code is highly configurable. This should be a huge win for "X".
Tas almost 8 years

This doesn't appear to answer the question, and reads more as a comment or rant.
Admin almost 8 years

I would say that it's more than bytecode vs native, it's also about the language itself, if the language is not expressive enough (and I think both C# and Java fit in this category) to express things efficiently for the backend, you're going to lose regardless whether your native or bytecode/IL etc, native languages can be slow too; I personally believe bytecode gets more merits that it should, and for the most part everything should be native, native can be usable too; it's also about how far do you need to go, do you really need it all up to last cycle? sometimes you do, most times you don't
paulm almost 8 years

vector<vector<double>> myList=vector<vector<double>>() I see C# devs who write C++ do this all the time, this is SLOW and USELESS, don't make pointless copies of stuff
Luaan almost 8 years

@AndersK. This hasn't been true even when you posted that. It's not commonly used, but .NET has had support for AMP for a while, and AFAIK JVM has it too. Don't forget that C# can deal with unmanaged memory and pointers just fine - it's just a bit more work than in C++ (a bit similar to C, in fact). If you want managed code and C++, Managed C++ gives you both at the same time (though obviously, the native part of code is not multi-platform). And .NET makes interop extremely easy, so you can always drop to native C++ if you need to. Or VB6, whatever floats your boat.
Luaan almost 8 years

Yes, C++ can do this, and it's a great thing. But don't think that .NET (or JVM) can't - as far as I know, right now, you're right that the support isn't there. But in both, you already have the JIT compiler getting around virtual method calls if you mostly call the same method. Expanding this to allow inlining the virtual method bodies is obviously possible - though I wouldn't hold my breath, since it isn't important for most projects, and the workaround is very simple. Instead, we can expect the higher level abstractions to get better - e.g. smarter anonymous delegates.
Luaan almost 8 years

@JonHarrop Um? WPF doesn't use GDI, or the C# Graphics class. And very slow compared to what? On what kind of hardware? And how do you measure "slow"? And how does C# WPF compare to C++ WPF? I agree that e.g. MFC in C++ is a lot faster than Winforms in .NET, but WPF isn't really comparable. I can easily have tens of thousands of controls in WPF without virtualisation, and still be snappy - neither native MFC nor Winforms comes anywhere close.
Luaan almost 8 years

@zackery.fix Haha, no. I'm not saying that C# would be my first choice for an embedded system with 32 kiB of memory, but that's only because there's little point in optimizing any part of C# or .NET for that kind of environment. But if you have at least half a meg of memory, you're good to go. And I've written an OS in C# before, and I can tell you that it's awesome. Without the .NET framework, of course. It's funny how you say that C# can only "mimic" pre-compiled assembly, while in C++, you get this natively... with assembly. So you can you write an OS in C++ or not? :)
Luaan almost 8 years

@zackery.fix .NET has an interesting edge in heap allocation, because it only has to move a pointer to allocate a new object. This is only feasible due to the compacting garbage collector. Of course you can do the same thing in C++, but C++ doesn't do that. It's funny how you use the same argument to say "C# could but doesn't, so it's garbage" and "C++ doesn't, but it could, so it's awesome" :)
Luaan almost 8 years

Note from the future: .NET does have support for SIMD and friends since about 2014, though it's not widely used.
Konrad Rudolph almost 8 years

@Luaan No, it really isn’t possible. What C++ can do and other languages can’t is that even if the outer function isn’t inlined, the inner function will. For instance, imagine a Sort method that accepts a comparator argument. In C++, this comparator can be inlined even when Sort won’t be. In the languages you listed, this is fundamentally not possible, and this is design restriction, not a technical one that might vanish in the future.
AndersK almost 8 years

@Luuan I think I saw that in some old interview in relation to the "Native C++" initiative, don't remember exactly who or when so you are probably right.
Luaan almost 8 years

@KonradRudolph You underestimate the fact that the runtime is free to optimize whatever it wishes as long as the executing program can't notice the difference (on a single thread). It's a perfectly valid optimization to inline the inner function, though it would certainly require a much smarter JIT than .NET has right now. It would happen at runtime, yes, but .NET has enough metadata to make it possible. And due to the runtime compiler features, it's actually quite easy for me to add this at runtime, without relying on the JIT compiler - IL is a lot easier to handle than x86 assembly.
Konrad Rudolph almost 8 years

@Luaan You confuse what’s legal with what’s technically possible. C++ can only do this because it generates a different outside function for every template type (for details, see stackoverflow.com/a/13722515/1968): the function itself is tagged with a (static) type. In .NET, the outside function isn’t tagged with distinct types, neither static nor dynamic. To generate separate code for it based on different parameters, the optimiser would have to solve a problem that is, as far as I know, intractable.
Konrad Rudolph almost 8 years

@Luaan And, just to clarify: with that I don’t mean that it is absolutely impossible to perform this optimisation (that’s obviously silly), just that it’s impossible to do this in reasonable time (= constant time complexity with regards to expression complexity) — which is what matters when assessing whether a given optimisation can be performed at runtime.
Luaan almost 8 years

@KonradRudolph Oh, but it doesn't have to be the same method. That already happens now in fact, for example with .NET's generics. And since you have delegate identity, it's possible to make this as efficient as templates. Don't forget that the JIT compiler deals with IL, which is a very high-level "assembly" language - it has tons of information. Don't get me wrong - it's definitely a much easier problem with C++'s templates, and as I said, I don't expect such an optimization to ever be implemented for .NET, since there's so many more useful things they can implement instead.
Konrad Rudolph almost 8 years

@Luaan Apologies, got side-tracked by my own example, where Sort was a non-generic method in my mind. You’re right, it has a properly distinct type in .NET if it’s a generic (not sure about Java, which doesn’t have reified generics after all … but maybe the runtime keeps track of this data anyway). Yes, you’re right.
Luaan almost 8 years

@KonradRudolph Yeah, in Java, generics are little more than syntactic sugar. In .NET, they are a runtime fact as well - though I certainly wished for C++ templates in C# at times :) But I think that there's some examples on the JVM/JRE where a more powerful quotation system is used to basically make all code into data, quite similar to LISP - maybe Clojure? In that case, all the indirection can be removed at any time at runtime, which should allow such inlining to work perfectly, long before you have to care about complications like local data slots.
Johan Boulé almost 8 years

This is plainly wrong. I get totally different results, where C# is slower by a significant margin in both tests: 177% for array, and 212% for vector/list.
Joanna Marietti over 7 years

@doug65536: It sounds like Dispose does have something to do with memory management... When you "clean up unmanaged resources", the memory that contained those resources is freed up, right?
doug65536 over 7 years

@JoannaMarietti Well, yes, but eventually those resources will be freed by the finalizer even if you forget to call Dispose. Forgetting to immediately dispose a file opened exclusively could be a major issue, it might block it from being opened elsewhere for an unreasonable amount of time. You might also accidentally leave file locks held. Leaks are still possible with a garbage collector, if you accidentally keep reachable references unnecessarily.
U007D over 7 years

Wow. Not sure what conclusions one can draw from comparing Lists versus resizable arrays, but if you're going to use vectors like this, you'll want to learn about reserve(), my friend, reserve().
Ferruccio over 7 years

I don't think that code generated by a C++ compiler is necessarily faster (or slower) than the code generated by a C# compiler. I think C++ gives much more control over the performance characteristics of your code than C# does.
U007D about 7 years

@Quonux thank you for the comment. Of course this is not a "real program". The point of the benchmarks was refactor a C# benchmark offered elsewhere on this page as evidence that JITted code is somehow faster than native--it's not, and the benchmark was potentially misleading to new people.
Jose Fernando Lopez Fernandez almost 7 years

When you mentioned this: "And there are a lot of useful optimizations JIT compilers can do that are simply impossible in languages with pointers," were you referencing aliasing-related optimizations?
Markus Knappen Johansson almost 7 years

@Quonux why do you have to write like that? It's people like you that makes me dislike stackoverflow.
HumbleWebDev almost 7 years

I'd say the fairest way to compare the languages would be to use their own native tools, and perform the operations in almost exactly the same way. Maybe using each language's default string.split function and reading in line by line the default way would have been better.
Quonux almost 7 years

@MarkusKnappenJohansson I had a bad day ;) , I'm just a human too, removed my downvote, yet my opinion still applies. Oh please don't dislike SO just because there are some "stupid" people :) . Have a nice one.
sebjwallace over 6 years

Thank you, someone who actually provides an answer rather than just saying "oh it depends... blah".
Clearer over 6 years

@JonHarrop Just move your destructors to a seperate thread. Gone are the latencies.
Totti over 6 years

@DAG: True but you can manage the cache behaviour in all of these languages, of course.
Totti over 6 years

@Clearer: You're Greenspunning a GC.
Clearer over 6 years

@JonHarrop Not at all. A garbage collector identifies which part of memory is safe to reclaim and then reclaims it -- either in the same thread as the rest of the program or in some other thread. I'm moving the clean up code from one thread to another. I still manage the clean up and do not rely on magic to deal with it.
Totti over 6 years

@Clearer: So you're saying that you're not Greenspunning a GC because you're not using "magic"?
nikib3ro over 6 years

COMPLETELY MISLEADING BENCHMARK. In C++ version you are simply reserving part of memory (and then marveling how that operation takes microseconds to execute). In C# version you are creating 5000 ARRAYS (instantiating objects in memory). C++ is faster than C#... but difference is nowhere near 40%... right now it's more in range of <10%. What your example illustrates is that programmers should stick with language of their choice (and from your profile it's obvious that you are career C++ programmer). In C# you can do 2D array int[,]... following up with example.
nikib3ro over 6 years

So, I've redone C# example on my machine to use double[,] which is closer to what is used in benchmark. It is still not perfect comparison since almost anything in C# is object that gets initialized in memory (and there is overhead there). Results: initialization took 3 ms and execution took 95 ms. So I could now declare that C# is 40% FASTER than C++. Which is obviously not true. Maybe my machine is faster than @U007D. Maybe I forget to set Release mode (then my code takes >300ms to execute). Conclusion: speed of compiled code is equal. How well you've wrote the code is way bigger factor.
Clearer over 6 years

@JonHarrop Any proper GC will traverse all allocated memory to figure out which bits are no longer reachable and will then reclaim those bits. That's not what's going on if you move the destruction of objects to a different thread; you're manually declaring that those bits are safe to reclaim with all the benefits and problems that you usually have, and none (as in not a single one) of the benefits of a GC. A plain GC will introduce latencies at semi-random locations in your program; usually much worse than RAII will and a lot less predictably.
Totti over 6 years

@Clearer: "Any proper GC will traverse all allocated memory to figure out which bits are no longer reachable and will then reclaim those bits". Reference counting is an obvious counter example. "A plain GC will introduce latencies at semi-random locations in your program; usually much worse than RAII". I have only ever seen latencies decrease when RAII is replaced with a GC so I'd love to see that hypothesis tested properly.
Clearer over 6 years

@JonHarrop Reference counting does exactly what I described or it's potentially broken. I have never experienced any significant delays caused by destruction of objects using RAII; if you're seeing decreased latencies with a GC, I can only imagine one of the following scenarios: (1) you're something wrong (i.e. allocating 1 object 1000 times, rather than 1000 objects at a time), (2) the GC is not doing what it's telling you it's doing (not cleaning everything up) or (3) it's moving destruction to a separate thread. I've seen GCs move destruction to a new threads before.
Clearer over 6 years

Any proper C++ developer will not run into the problems you describe. Only bad C programmers who decided to slap classes on their programs and call it C++ have those problems.
Totti over 6 years

@Clearer: "Reference counting does exactly what I described or it's potentially broken". Sorry but that's just not how RC works. I suggest you read gchandbook.org
Totti over 6 years

@Clearer: "I can only imagine one of the following scenarios". When collections of collections fall out of scope the parent destructors recursively call the child destructors until the entire DAG has been destructed. That is an arbitrarily-long pause. Incremental GCs don't do that.
Quonux over 6 years

for the love of gods, this is 8 years old, OMFGz
Quonux over 6 years

feel free to give a better more up to date answer
Bill K about 6 years

Notch coded minecraft to be pretty fast considering the amount of data he's manipulating. Also, he coded it mostly single-handedly in a comparatively short amount of time, something that would have been virtually impossible in C++. I do agree with the optimization techniques though--if you have the extra 10x dev time to spend so your code runs twice as fast, it's probably worth it.
user541686 about 6 years

"There is no strict reason why a bytecode based language like C# or Java that has a JIT cannot be as fast as C++ code." Sure there is. For one thing, JIT compilers have more limited time in which to compile the code than AoT compilers. That's enough to put them at a disadvantage.
Peter Cordes about 6 years

Did you even compile with optimizations enabled? 9 sec vs. 370 sec sounds unlikely, unless MSVC missed optimizing away some very expensive stuff that g++ found.
Krish1992 almost 6 years

"premature optimization is the root of all evil", this old quote doesn't hold in practice. You need to design for performance from the get go. Always been this belief that the JIT will outperform statically compiled languages, but in practice it never seem to quite happen, for logical reasons ( abstracting away memory management mostly ). Besides, the presumed benefits of a JIT is based on the fact that there's an large number of potential CPUs out there to optimize for, which in practice definitely isn't the case.
Jax almost 6 years

From what I can tell, the code in your C++ example is literally just allocating the memory ahead of time. The PROPER C# implementation would simply write 'List<double> arrs = new List<double>(ROWS*COLS)' which allocates the memory required to index a 2 dimensional array in 1 dimensional format (eg, what you did in C++). There's absolutely no reason to allocate a 2-dimensional array and manually flatten it -- the massive amount of iterations in your pre-test is the cause of the shitty performance. I imagine the overhead would still be more in C#, but not by a considerable amount.
Orestis P. over 5 years

Very inneficient c# implementation as others have stated. Also, time was measured by subtracting DateTime objects which is not the correct usage (Not enough precision, heavier than Stopwatch). This post just proves that people should prefer the language that they are experienced with as opposed to online benchmarks.
Richmar1 about 5 years

@Karl I also have consistently seen that games written in C# are slow. Believe it or not, I actually look to see if a game was made in C# because i would try to avoid it due to performance reasons it's so bad. The Forest, 7 Days To Die, Broforce, Escape Plan, And on and on. I REALLY like Broforce though, so i put up with it. C# games are consistently embarrassingly slow in my experience. It is very possible that the developers did a poor job on optimization though. C# is one of my favorite languages however, but it is rather slow for real-time applications
U007D almost 5 years

FYI, for those commenting on the quality of C# implementation, please be aware that I did not write the C# benchmarks. As I have stated, they were posted by another person on this page as evidence that C# was faster than C++, which, as I have argued, is generally false.
U007D almost 5 years

@JDSweetBeat, you may have missed the comparison of a comparable C# implementation I provided. Please see the first bullet under "Observations", above.
Jax almost 5 years

@U007D Thanks for pointing that out, I did indeed overlook that bulletpoint :-)
rxantos over 4 years

C++ : Long optimization phase, done one time before running.
Stefan Reich over 4 years

There is no fundamental reason why JIT solutions should be slower than precompiled code, if you disregard warm-up time (which may actually happen before the program is shipped). In fact, mathematically, the opposite is true: JITs have the option of optimizing further at runtime which once-compilers like C++ don't have. What we currently have available is another question, but long-term, JITs will basically win the performance race.
Lokanath over 4 years

Simple for loop prove nothing because c# would have optimized lots of stuff to make it faster
Sturla Molden about 4 years

"Premature optimization is the root of all evil" is commonly attributed to Donald Knuth, but it was actually Tony Hoare (inventor of quicksort) who made this statement.
gjvdkamp over 3 years

It's true that c++ gives you so many tools to eek out the latest perf, but that all takes up space in your head. Those technical hardware level decisions break the abstraction of how you want to reason about your algorithms. C# alllows you to reason in a cleaner more abstracted environment, and by using that you specify more what needs to be done instead of spelling out how. This allows the compiler to rewrite to equivalent and faster machine code.
J. Bailleul about 3 years

Your arguments are way too generic and mostly irrelevant. You have no effort to push into writing C++ code in standard uses to get a program running way too fast with few memory. And if you need to optimize C++ code, you have direct access to a vast array of optimizations that will be hard to beat by other languages; in the best case, just mimicking. The compiler does a gigantic effort because it has time to do so, inter and intra modules. And it's very, very strong at enabling zero-cost abstractions due to all what is known to be static at compile-time/
user982042 about 3 years

We should also consider the compile-time metaprogramming capability of c++ which enables developers to execute pieces of code at compile time. Note modern c++ is more like C with templates. It's more towards functional programming compared to OOPS. The introduction of constexp had made metaprogramming easier to read and write.
Per almost 3 years

Huh? Game engines are "real-time systems", and all of them do garbage collection, even in C++, because immediate deallocation of ref-counted objects is too expensive and interrupts gameplay. Garbage collection in real-time systems is a feature, not a bug, because you can put all your deallocation work at a time where it doesn't matter.
FreelanceConsultant about 2 years

I would be interested to see the results of this where the C++ implementation was closer to the C# implementation. In other words: Allocating a vector<vector<double>> structure, where Reserve is used to reserve space for BOTH dimensions of the arrays.
FreelanceConsultant about 2 years

This could be due to so many other factors - there is a lot of software stack between the functions which cause graphical lines to be stored prior to being rendered as pixels and the function calls from C++ or C#.
FreelanceConsultant about 2 years

This answer is completely wrong and does not make sense. When GC happens is completely deterministic, unless you are arguing external factors drive GC. (In which case, there is still no difference between when GC happens in Java or C#, or when functions to deallocate memory in C++ are triggered.) Examples of external factors might be user input (keyboard/mouse) or network traffic. These are generally beyond your control as a software developer, and so you could argue modelling them statistically is an appropriate thing to do. (eg: Assuming they are "random" is not totally unreasonable.)
FreelanceConsultant about 2 years

In other words, something happens in your program which triggers GC. In C++ you might say that if a vector is declared in a function then the GC (dealloc) happens when the vector goes out of scope. There is nothing "random" about that. The same thing is true in any GC language. When the GC happens is not random but triggered by some, completely deterministic, process. That said, I am about to second-guess myself here. I assume there are no languages where something like a random number is generated which triggers a timer after a random length of time, which causes GC to occur.
ShadowRanger about 2 years

Fairly sure what you're doing in the C++ example is illegal (almost certain to work, but illegal). You call arr.reserve(ROWS * COLS);, which makes it possible to push_back ROWS * COLS times without triggering reallocation. But instead of repeated push_back, you manually assign to progressively increasing indices of arr. This works in practice, because the memory is there and operator[] isn't doing bounds-checks, but it's undefined behavior (the size of the vector remains 0, and you're accessing a non-existent element).
U007D about 2 years

Nice catch, @ShadowRanger, I think you are probably correct in this.
GoldenretriverYT about 2 years

C# isnt slower than python. I made a simple test program that creates a 1024*1024*8 sized byte array and then fills it with index % 255 and it took 1.4 seconds in python and 45ms in C# (first execution, not after running it multiple times) - also the question was how much faster C++ is, not python