C#: Virtual Function invocation is even faster than a delegate invocation?

13,294

Solution 1

Think about what's required in each case:

Virtual call

  • Check for nullity
  • Navigate from object pointer to type pointer
  • Look up method address in instruction table
  • (Not sure - even Richter doesn't cover this) Go to base type if method isn't overridden? Recurse until we find the right method address. (I don't think so - see edit at bottom.)
  • Push original object pointer onto stack ("this")
  • Call method

Delegate call

  • Check for nullity
  • Navigate from object pointer to array of invocations (all delegates are potentially multicast)
  • Loop over array, and for each invocation:
    • Fetch method address
    • Work out whether or not to pass the target as first argument
    • Push arguments onto stack (may have been done already - not sure)
    • Optionally (depending on whether the invocation is open or closed) push the invocation target onto the stack
    • Call method

There may be some optimisation so that there's no looping involved in the single-call case, but even so that will take a very quick check.

But basically there's just as much indirection involved with a delegate. Given the bit I'm unsure of in the virtual method call, it's possible that a call to an unoverridden virtual method in a massively deep type hierarchy would be slower... I'll give it a try and edit with the answer.

EDIT: I've tried playing around with both the depth of inheritance hierarchy (up to 20 levels), the point of "most derived overriding" and the declared variable type - and none of them seems to make a difference.

EDIT: I've just tried the original program using an interface (which is passed in) - that ends up having about the same performance as the delegate.

Solution 2

A virtual call is dereferencing two pointers at a well-known offset in the memory. It's not actually dynamic binding; there is no code at runtime to reflect over the metadata to discover the right method. The compiler generates couple of instructions to do the call, based on the this pointer. in fact, the virtual call is a single IL instruction.

A predicate call is creating an anonymous class to encapsulate the predicate. That class has to be instantiated and there is some code generated to actually check whether the predicate function pointer is null or not.

I would suggest you look at the IL constructs for both. Compile a simplified version of your source above with a single call to each of the two DoSomthing. Then use ILDASM to see what is the actual code for each pattern.

(And I am sure I'll get downvoted for not using the right terminology :-))

Solution 3

Test result worth 1000 of words: http://kennethxu.blogspot.com/2009/05/strong-typed-high-performance_15.html

Solution 4

It is possible that since you don't have any methods that override the virtual method that the JIT is able to recognize this and use a direct call instead.

For something like this it's generally better to test it out as you have done than try to guess what the performance will be. If you want to know more about how delegate invocation works, I suggest the excellent book "CLR Via C#" by Jeffrey Richter.

Solution 5

I doubt it accounts for all of your difference, but one thing off the top of my head that may account for some of the difference is that virtual method dispatch already has the this pointer ready to go. When calling through a delegate the this pointer has to be fetched from the delegate.

Note that according to this blog article the difference was even greater in .NET v1.x.

Share:
13,294

Related videos on Youtube

Morgan Cheng
Author by

Morgan Cheng

I am a software engineer in China. This is my Blog ??????????,??????

Updated on April 16, 2022

Comments

  • Morgan Cheng
    Morgan Cheng about 2 years

    It just happens to me about one code design question. Say, I have one "template" method that invokes some functions that may "alter". A intuitive design is to follow "Template Design Pattern". Define the altering functions to be "virtual" functions to be overridden in subclasses. Or, I can just use delegate functions without "virtual". The delegate functions is injected so that they can be customized too.

    Originally, I thought the second "delegate" way would be faster than "virtual" way, but some coding snippet proves it is not correct.

    In below code, the first DoSomething method follows "template pattern". It calls on the virtual method IsTokenChar. The second DoSomthing method doesn't depend on virtual function. Instead, it has a pass-in delegate. In my computer, the first DoSomthing is always faster than the second. The result is like 1645:1780.

    "Virtual invocation" is dynamic binding and should be more time-costing than direct delegation invocation, right? but the result shows it is not.

    Anybody can explain this?

    using System;
    using System.Diagnostics;
    
    class Foo
    {
        public virtual bool IsTokenChar(string word)
        {
            return String.IsNullOrEmpty(word);
        }
    
        // this is a template method
        public int DoSomething(string word)
        {
            int trueCount = 0;
            for (int i = 0; i < repeat; ++i)
            {
                if (IsTokenChar(word))
                {
                    ++trueCount;
                }
            }
            return trueCount;
        }
    
        public int DoSomething(Predicate<string> predicator, string word)
        {
            int trueCount = 0;
            for (int i = 0; i < repeat; ++i)
            {
                if (predicator(word))
                {
                    ++trueCount;
                }
            }
            return trueCount;
        }
    
        private int repeat = 200000000;
    }
    
    class Program
    {
        static void Main(string[] args)
        {
            Foo f = new Foo();
    
            {
                Stopwatch sw = Stopwatch.StartNew();
                f.DoSomething(null);
                sw.Stop();
                Console.WriteLine(sw.ElapsedMilliseconds);
            }
    
            {
                Stopwatch sw = Stopwatch.StartNew();
                f.DoSomething(str => String.IsNullOrEmpty(str), null);
                sw.Stop();
                Console.WriteLine(sw.ElapsedMilliseconds);
            }
        }
    }
    
    • Mitch Wheat
      Mitch Wheat over 15 years
      One for Jon Skeet, I feel! ;)
  • technophile
    technophile over 15 years
    Not only that, but if storing/calling delegates were more efficient than virtual dispatch, the framework would almost certainly store and call delegates instead of using virtual dispatch. Given that it doesn't, it probably isn't (except in rare/pathological cases).
  • Morgan Cheng
    Morgan Cheng over 15 years
    I don't think it make difference the method is "virtual" in parent class or "override" in subclass. If it is "virtual", it is always dynamic binding.
  • Jon Skeet
    Jon Skeet over 15 years
    QUick correction: I think you meant "virtual overrides" rather than "virtual overloads"
  • Morgan Cheng
    Morgan Cheng over 15 years
    In my understanding, "the actual method to invoke is decided at runtime" is called "dynamic binding".
  • Morgan Cheng
    Morgan Cheng over 15 years
    A Predicate is just a delegate instance. It is a instance, but I'm not so sure there is a "anonymous class" created. Is there "anonymous class" concept in C#/.NET?
  • Franci Penov
    Franci Penov over 15 years
    A virtual call is not "deciding which method to invoke at runtime". the method to invoke is well known and already associated with the class.
  • Franci Penov
    Franci Penov over 15 years
    When I say "anonymous class", I meant a helper class generated by the compiler to wrap the delegate call. There is a separate concept of anonymous classes, introduced with LINQ.
  • Franci Penov
    Franci Penov over 15 years
    On virtual calls there's no check for null. Also, the vtable of the methods is determined at compile time, so there's no run-time recursion over base classes. The compiler generates the pointer to the right method from the proper base class and puts it in the proper slot in the vtable.
  • Jon Skeet
    Jon Skeet over 15 years
    callvirt does check for null - see section 4.2 of partition 3 in the CLI spec, or P166 of CLR via C#. (If the reference were null, which implementation would be called?) Thanks for the confirmation of the "no recursion" bit though. That's what the experiment was basically suggesting.
  • thinkbeforecoding
    thinkbeforecoding about 15 years
    +1 for the no recursion, vtables are flattened at compile type.
  • stakx - no longer contributing
    stakx - no longer contributing about 9 years
    This answer contains a fundamental flaw: IL code generated by the C# compiler won't tell you how fast the code will run; the assembly code output by the JIT would be a more reliable way of gauging execution speed. This is because (1) IL is based on an abstract stack machine that is most likely different from the underlying (typically register-based) computer architecture, and therefore IL has to be transformed quite heavily before it can be executed; and because (2) optimizations are apparently performed mostly after, not before IL (i.e. by the JIT, not by the C# compiler).
  • stakx - no longer contributing
    stakx - no longer contributing about 9 years
    (Regarding that final claim in the above comment, see Eric Lippert's blog article, "What does the optimize switch do?")
  • Franci Penov
    Franci Penov about 9 years
    @stakx of course, looking at the assembly code also tells you nothing about the perf comparison, given that you have zero knowledge about how the transistors in the CPU will be triggered by the different instructions, and whether two different instructions will be even executed by the integer or floating point unit (if the CPU has one), or might even be early executed/skipped by the predictive branching logic (if the CPU supports one), or even whether the instructions will be executed on the CPU or passed through to the host CPU (if running the code in virtual machine).
  • Franci Penov
    Franci Penov about 9 years
    It's also unclear what the perf implications would be if the instructions happen to be executed through light switching, instead of electron gates. Not to mention that it's completely unclear whether the computation will be calculated by bits, or qubits, which might get totally different performance profile. So, yeah, pretty much every single performance-related answer on SO happens to have exactly the same fundamental flaw.
  • stakx - no longer contributing
    stakx - no longer contributing about 9 years
    @FranciPenov: You are correct. That's why I only called looking at assembly code "[a] more reliable [way of gauging execution speed]". With assembly code, the relationship to what actually gets executed is far closer to a being a 1:1 relationship than with IL; that's the important bit in my comments. Concerning quantum computers: let's stay practical; they are not exactly seeing mainstream use yet. But yes, my comments might be completely wrong in 10-20 years' time.