How expensive is the lock statement?

45,902

Solution 1

Here is an article that goes into the cost. Short answer is 50ns.

Solution 2

The technical answer is that this is impossible to quantify, it heavily depends on the state of the CPU memory write-back buffers and how much data that the prefetcher gathered has to be discarded and re-read. Which are both very non-deterministic. I use 150 CPU cycles as a back-of-the-envelope approximation that avoids major disappointments.

The practical answer is that it is waaaay cheaper than the amount of time you'll burn on debugging your code when you think you can skip a lock.

To get a hard number you'll have to measure. Visual Studio has a slick concurrency analyzer available as an extension.

Solution 3

Further reading:

I would like to present few articles of mine, that are interested in general synchronization primitives and they are digging into Monitor, C# lock statement behavior, properties, and costs depending on distinct scenarios and number of threads. It is specifically interested about CPU wastage and throughput periods to understand how much work can be pushed through in multiple scenarios:

https://www.codeproject.com/Articles/1236238/Unified-Concurrency-I-Introduction https://www.codeproject.com/Articles/1237518/Unified-Concurrency-II-benchmarking-methodologies https://www.codeproject.com/Articles/1242156/Unified-Concurrency-III-cross-benchmarking

Original answer:

Oh dear!

It seems that correct answer flagged here as THE ANSWER is inherently incorrect! I would like to ask the author of the answer, respectfully, to read the linked article to the end. article

The author of the article from 2003 article was measuring on Dual Core machine only and in the first measuring case, he measured locking with a single thread only and the result was about 50ns per lock access.

It says nothing about a lock in the concurrent environment. So we have to continue reading the article and in the second half, the author was measuring locking scenario with two and three threads, which gets closer to concurrency levels of today's processors.

So the author says, that with two threads on Dual Core, the locks cost 120ns, and with 3 threads it goes to 180ns. So it seems to be clearly dependent on the number of threads accessing the lock concurrently.

So it is simple, it is not 50 ns unless it is a single thread, where the lock gets useless.

Another issue for consideration is that it is measured as average time!

If the time of iterations would be measured, there would be even times between 1ms to 20ms, simply because the majority was fast, but few threads will be waiting for processors time and incur even milliseconds long delays.

This is bad news for any kind of application which requires high throughput, low latency.

And the last issue for consideration is that there could be slower operations inside the lock and very often that is the case. The longer the block of code is executed inside the lock, the higher the contention is and delays rise sky high.

Please consider, that over one decade has passed already from 2003, that is few generations of processors designed specifically to run fully concurrently and locking is considerably harming their performance.

Solution 4

This doesn't answer your query about performance, but I can say that the .NET Framework does offer an Interlocked.Add method that will allow you to add your amount to your done member without manually locking on another object.

Solution 5

lock (Monitor.Enter/Exit) is very cheap, cheaper than alternatives like a Waithandle or Mutex.

But what if it was (a little) slow, would you rather have a fast program with incorrect results?

Share:
45,902
Kees C. Bakker
Author by

Kees C. Bakker

Senior Software Developer and Team Manager for Capital ID - a leading international supplier specialized in automating and managing marketing processes (MRM, MOM), using its software platform ID Manager. Specialties: C# / ASP.Net Html / CSS jQuery / JavaScript (T)SQL Visit my blog: KeesTalksTech.com Follow me: twitter.com/KeesTalksTech LinkedIn: linkedin.com/in/keescbakker

Updated on June 13, 2020

Comments

  • Kees C. Bakker
    Kees C. Bakker almost 4 years

    I've been experimenting with multi threading and parallel processing and I needed a counter to do some basic counting and statistic analysis of the speed of the processing. To avoid problems with concurrent use of my class I've used a lock statement on a private variable in my class:

    private object mutex = new object();
    
    public void Count(int amount)
    {
     lock(mutex)
     {
      done += amount;
     }
    }
    

    But I was wondering... how expensive is locking a variable? What are the negative effects on performance?

  • Henk Holterman
    Henk Holterman over 13 years
    Yes, this is probably the best answer. But mainly for reason of shorter and cleaner code. The difference in speed is not likely to be noticeable.
  • Kees C. Bakker
    Kees C. Bakker over 13 years
    So in conclusion the more objects you have the more expensive it gets.
  • Kees C. Bakker
    Kees C. Bakker over 13 years
    Haha... I was going for the fast program and the good results.
  • Kees C. Bakker
    Kees C. Bakker about 13 years
    thanks for this answer. I'm doing more stuff with locks. Added ints is one of many. Love the suggestion, will use it from now on.
  • Herman
    Herman almost 10 years
    Short better answer: 50ns + time spent waiting if other thread is holding lock.
  • Arsen Zahray
    Arsen Zahray almost 9 years
    The more threads are entering and leaving lock, the more expensive it gets. The cost expands exponentially with the number of threads
  • hangar
    hangar over 8 years
    locks are much, much easier to get right, even if lock-free code is potentially faster. Interlocked.Add on its own has the same issues as += with no synchronization.
  • BlueRaja - Danny Pflughoeft
    BlueRaja - Danny Pflughoeft over 8 years
    Some context: dividing two numbers on a 3Ghz x86 takes about 10ns (not including the time it takes to fetch/decode the instruction); and loading a single variable from (non-cached) memory into a register takes about 40ns. So 50ns is insanely, blindingly fast - you shouldn't worry about the cost of using lock any more than you'd worry about the cost of using a variable.
  • ipavlu
    ipavlu over 8 years
    Actually no, it can be quantified and measured. It just is not as easy as writing those locks all around the code, then stating that it is all just 50ns, a myth measured on single threaded access to the lock.
  • ipavlu
    ipavlu over 8 years
    @henk-holterman There are multiple issues with your statements: First as this question and answers clearly showed, there is low understanding of impacts of lock on the overall performance, even people stating myth about 50ns which is applicable only with single-threaded environment. Second your statement is here and will stay for years and in mean time, processors grown in cores, but speed of cores does not so much.**Thrid** applications become only more complex over time, and then it is layer upon layer of locking in environment of many cores and the number is rising,2,4,8,10,20,16,32
  • Otis
    Otis over 8 years
    Also, that article was old when this question was asked.
  • ipavlu
    ipavlu over 8 years
    My usual approach is to build synchronization in loosely coupled way with as little interaction as possible. That goes very fast to lock-free data structures. I made for my code wrappers around spinlock to simplify development and even when TPL has special concurrent collections, I have developed spin locked collections of my own around list, array, dictionary and queue, as I needed little more control and sometimes some code running under spinlock. I can tell you, it is possible and allows to solve multiple scenarios TPL collections can not do and with great performance/throughput gain.
  • Dmytro Zakharov
    Dmytro Zakharov almost 8 years
    Results from the article do not apply to server environment. On multi-socket servers the cost will be even higher, as each CPU directly deals only with it's own memory. So, readers, keep this in mind.
  • Snoop
    Snoop over 7 years
    "think you can skip a lock"... I think that's where a lot of people are at when they read this question...
  • Zar Shardan
    Zar Shardan over 7 years
    This might be a bad example because your loop really doesn't do anything, apart from a single variable assignment and a lock is at least 2 function calls. Also, 20ns per lock you are getting isn't that bad.
  • Milad
    Milad almost 7 years
    Running the test in single-mode on a corei3: With locks = 3215 milliseconds Without locks = 479 milliseconds Difference = 2736 milliseconds 100000000 locks requires 2735 milliseconds Lock requires 0.02735 microseconds
  • Milad
    Milad almost 7 years
    Running test in mutithreaded-mode on the same machine: With locks = 6474 milliseconds Without locks = 476 milliseconds Difference = 5998 milliseconds 100000000 locks requires 5997 milliseconds Lock requires 0.05997 microseconds
  • Milad
    Milad almost 7 years
    It seems almost no cost on modern machines.
  • ipavlu
    ipavlu over 6 years
    Really great metric, "almost no cost", not to mention incorrect. You guys do not take into consideration, that it is short and fast only and ONLY if there is no contention at all, one thread. IN SUCH CASE, you DO NOT NEED LOCK AT ALL. Second issue, lock is not lock, but hybrid lock, it detects inside CLR that lock is not held by anyone based on atomic operations and in such case, it avoids calls to operating system core, that is different ring which is not measured by these tests. What is measured as 25ns to 50ns is actually application level interlocked instructions code if lock is not taken
  • Gooseberry
    Gooseberry about 6 years
    To clarify, the article is not saying the lock performance degrades with the number of threads in the application; performance degrades with the number of threads contending over the lock. (That is implied, but not clearly stated, in the answer above.)
  • ipavlu
    ipavlu about 6 years
    I presume you mean this: "So it seems to be clearly dependent on the number of concurrently accessed threads and more is worse." Yes, the wording could be better. I meant "concurrently accessed" as threads concurrently accessing the lock, thus creating contention.
  • antikbd
    antikbd over 3 years
    Lock-free isn't "potentially faster". It can be orders of magnitudes faster in extremely tight, long-running, concurrent loops.