Are "data races" and "race condition" actually the same thing in context of concurrent programming

38,365

Solution 1

No, they are not the same thing. They are not a subset of one another. They are also neither the necessary, nor the sufficient condition for one another.

The definition of a data race is pretty clear, and therefore, its discovery can be automated. A data race occurs when 2 instructions from different threads access the same memory location, at least one of these accesses is a write and there is no synchronization that is mandating any particular order among these accesses.

A race condition is a semantic error. It is a flaw that occurs in the timing or the ordering of events that leads to erroneous program behavior. Many race conditions can be caused by data races, but this is not necessary.

Consider the following simple example where x is a shared variable:

Thread 1    Thread 2

 lock(l)     lock(l)
 x=1         x=2
 unlock(l)   unlock(l)

In this example, the writes to x from thread 1 and 2 are protected by locks, therefore they are always happening in some order enforced by the order with which the locks are acquired at runtime. That is, the writes' atomicity cannot be broken; there is always a happens before relationship between the two writes in any execution. We just cannot know which write happens before the other a priori.

There is no fixed ordering between the writes, because locks cannot provide this. If the programs' correctness is compromised, say when the write to x by thread 2 is followed by the write to x in thread 1, we say there is a race condition, although technically there is no data race.

It is far more useful to detect race conditions than data races; however this is also very difficult to achieve.

Constructing the reverse example is also trivial. This blog post also explains the difference very well, with a simple bank transaction example.

Solution 2

According to Wikipedia, the term "race condition" has been in use since the days of the first electronic logic gates. In the context of Java, a race condition can pertain to any resource, such as a file, network connection, a thread from a thread pool, etc.

The term "data race" is best reserved for its specific meaning defined by the JLS.

The most interesting case is a race condition that is very similar to a data race, but still isn't one, like in this simple example:

class Race {
  static volatile int i;
  static int uniqueInt() { return i++; }
}

Since i is volatile, there is no data race; however, from the program correctness standpoint there is a race condition due to the non-atomicity of the two operations: read i, write i+1. Multiple threads may receive the same value from uniqueInt.

Solution 3

TL;DR: The distinction between data race and race condition depends on the nature of problem formulation, and where to draw the boundary between undefined behavior and well-defined but indeterminate behavior. The current distinction is conventional and best reflects the interface between processor architect and programming language.

1. Semantics

Data race specifically refers to the non-synchronized conflicting "memory accesses" (or actions, or operations) to the same memory location. If there is no conflict in the memory accesses, while there is still indeterminate behavior caused by operation ordering, that is a race condition.

Note "memory accesses" here have specific meaning. They refer to the "pure" memory load or store actions, without any additional semantics applied. For example, a memory store from one thread does not (necessarily) know how long it takes for the data to be written into the memory, and finally propagates to another thread. For another example, a memory store to one location before another store to another location by the same thread does not (necessarily) guarantee the first data written in the memory be ahead of the second. As a result, the order of those pure memory accesses are not (necessarily) able to be "reasoned" , and anything could happen, unless otherwise well defined.

When the "memory accesses" are well defined in terms of ordering through synchronization, additional semantics can ensure that, even if the timing of the memory accesses are indeterminate, their order can be "reasoned" through the synchronizations. Note, although the ordering between the memory accesses can be reasoned, they are not necessarily determinate, hence the race condition.

2. Why the difference?

But if the order is still indeterminate in race condition, why bother to distinguish it from data race? The reason is in practical rather than theoretical. It is because the distinction does exist in the interface between the programming language and processor architecture.

A memory load/store instruction in modern architecture is usually implemented as "pure" memory access, due to the nature of out-of-order pipeline, speculation, multi-level of cache, cpu-ram interconnection, especially multi-core, etc. There are lots of factors leading to indeterminate timing and ordering. To enforce ordering for every memory instruction incurs huge penalty, especially in a processor design that supports multi-core. So the ordering semantics are provided with additional instructions like various barriers (or fences).

Data race is the situation of processor instruction execution without additional fences to help reasoning the ordering of conflicting memory accesses. The result is not only indeterminate, but also possibly very weird, e.g., two writes to the same word location by different threads may result with each writing half of the word, or may only operate upon their locally cached values. -- These are undefined behavior, from the programmer's point of view. But they are (usually) well defined from the processor architect's point of view.

Programmers have to have a way to reason their code execution. Data race is something they cannot make sense, therefore should always avoid (normally). That is why the language specifications that are low level enough usually define data race as undefined behavior, different from the well-defined memory behavior of race condition.

3. Language memory models

Different processors may have different memory access behavior, i.e., processor memory model. It is awkward for programmers to study the memory model of every modern processor and then develop programs that can benefit from them. It is desirable if the language can define a memory model so that the programs of that language always behave as expected as the memory model defines. That is why Java and C++ have their memory models defined. It is the burden of the compiler/runtime developers to ensure the language memory models are enforced across different processor architectures.

That said, if a language does not want to expose the low level behavior of the processor (and is willing to sacrifice certain performance benefits of the modern architectures), they can choose to define a memory model that completely hide the details of "pure" memory accesses, but apply ordering semantics for all their memory operations. Then the compiler/runtime developers may choose to treat every memory variable as volatile in all processor architectures. For these languages (that support shared memory across threads), there are no data races, but may still be race conditions, even with a language of complete sequential consistence.

On the other hand, the processor memory model can be stricter (or less relaxed, or at higher level), e.g., implementing sequential consistency as early-days processor did. Then all memory operations are ordered, and no data race exists for any languages running in the processor.

4. Conclusion

Back to the original question, IMHO it is fine to define data race as a special case of race condition, and race condition at one level may become data race at a higher level. It depends on the nature of problem formulation, and where to draw the boundary between undefined behavior and well-defined but indeterminate behavior. Just the current convention defines the boundary at language-processor interface, does not necessarily mean that is always and must be the case; but the current convention probably best reflects the state-of-the-art interface (and wisdom) between processor architect and programming language.

Solution 4

No, they are different & neither of them is a subset of one or vice-versa.

The term race condition is often confused with the related term data race, which arises when synchronization is not used to coordinate all access to a shared nonfinal field. You risk a data race whenever a thread writes a variable that might next be read by another thread or reads a variable that might have last been written by another thread if both threads do not use synchronization; code with data races has no useful defined semantics under the Java Memory Model. Not all race conditions are data races, and not all data races are race conditions, but they both can cause concurrent programs to fail in unpredictable ways.

Taken from the excellent book - Java Concurrency in Practice by Joshua Bloch & Co.

Share:
38,365

Related videos on Youtube

Inquisitive
Author by

Inquisitive

Updated on October 05, 2020

Comments

  • Inquisitive
    Inquisitive over 3 years

    I often find these terms being used in context of concurrent programming . Are they the same thing or different ?

  • Geek
    Geek over 10 years
    can you drop in a line in your answer describing what data race actually means in JLS?
  • Marko Topolnik
    Marko Topolnik over 10 years
    @geek The word "JLS" is a hyperlink to the relevant section of the JLS.
  • josinalvo
    josinalvo about 9 years
    "data race (...) no synchronization that is mandating any particular order among these accesses." I am a bit confused. In your example, the operations might occur in both orders (either =1 then =2, or the other way around). Why is it not a data race ?
  • Baris Kasikci
    Baris Kasikci about 9 years
    @josinalvo : it is an artifact of the technical definition of a data race.The key point is that between the two accesses, there will be a lock release and a lock acquire (for either one of the possible orders). By definition, a lock release and a lock acquire establishes an order between the two accesses, and therefore there is no data race.
  • Marko Topolnik
    Marko Topolnik almost 7 years
    Synchronization never mandates any particular order between ops, so this is not a very fortunate way to express it. On the other hand, the JMM specifies that for each read operation there must be a definite write op it observes, even in a data race. It's hard to avoid explicitly mentioning happens-before and sync order, yet even the JLS definition is wrong in mentioning just happens-before: by its definition, two concurrent volatile writes constitute a data race.
  • Noldorin
    Noldorin almost 6 years
    @BarisKasikci "establishes an order" doesn't have any real meaning, as far as I'm concerned. They're just weasel words. I honestly don't believe "data race" is a remotely useful concept, as literally every memory location that gets accessed by multiple threads can be considered to be in a "data race"
  • Baris Kasikci
    Baris Kasikci almost 6 years
    Release-acquire pairs always establish an order. A general explanation is lengthy, but a trivial example is a signal-wait pair. @Noldorin "Establishes an order" refers to a happens-before order, which is a key concept of concurrency theory (see Lamport's seminal paper on the happened before relationship) and distributed systems. Data races are a useful concept, in that their presence poses many issues (e.g., undefined semantics as per C++ memory model, very complex semantics in Java etc.). Their detection and elimination constitute a vast literature in research and practice.
  • Noldorin
    Noldorin almost 6 years
    @BarisKasikci I'm afraid I'm still not remotely convinced they're useful outside of ivory-tower theoretical models of concurrency. For me, and I'd say most actual developers, race conditions are the useful and intuitive everyday concept. The C and C++ memory models actually define "data races" as a subset of race conditions, last time I checked, which makes a lot more sense.
  • Baris Kasikci
    Baris Kasikci almost 6 years
    @Noldorin The C++ memory model has a thorough discussion that is primarily centered around data races (you can also read the papers by Sarita Adve and Hans Boehm to see the initial proposals of the memory model). Given that the standard has no mentions of race conditions and many (above 60) mentions of data races, I doubt that your last statement is correct. Can you point to a reference?
  • Noldorin
    Noldorin almost 6 years
    @BarisKasikci en.wikipedia.org/wiki/Race_condition#Software "The memory model defined in the C11 and C++11 standards uses the term "data race" for a race condition caused by potentially concurrent operations on a shared memory location, of which at least one is a write. A C or C++ program containing a data race has undefined behavior.[2][3]"
  • Baris Kasikci
    Baris Kasikci almost 6 years
    @Noldorin that is a sentence on Wikipedia, not a definition in the standard. Can you point to a reference for your claim "The C and C++ memory models actually define "data races" as a subset of race conditions, last time I checked" from the memory model discussion in the standard?
  • Noldorin
    Noldorin almost 6 years
    No, I was referring to the Wikipedia claim, referencing the standards, sorry. It does make a lot of sense to me though. Your definition of a race condition above either seems to make everything trivially a race condition or conversely nothing...
  • Valentin Kovalenko
    Valentin Kovalenko over 5 years
    More about race condition vs data race specifically for Java: sites.google.com/site/aboutmale/techblog/raceconditiondatara‌​ce
  • aniliitb10
    aniliitb10 about 5 years
    @MarkoTopolnik I am little confused by the example. Could you please explain: "Since i is volatile, there is no data race"? Voltility only ensured that it is visible but still: 1) it is not synchronised and multiple threads can read/write at the same and 2) It is shared non-final field So, according to Java Concurrency in Practice (quoted below as well), it is data race and not race condition, isn't it?
  • Marko Topolnik
    Marko Topolnik about 5 years
    @aniliitb10 Instead of relying on second-hand quotes torn out of their context, you should review the JLS section 17.4 I linked to in my answer. Access to a volatile variable is a synchronization action as defined in §17.4.2.
  • martinkunev
    martinkunev over 4 years
    Note that the question has language-agnostic tag.
  • Xiao-Feng Li
    Xiao-Feng Li over 4 years
    @aniliitb10 Votaltiles do not cause data race, because their accesses can be ordered. That is, you can reason their order in this way or that way, leading to different result. With data race, you have no way to reason the order. For example, the i++ operation of each thread may just happen upon their respective locally cached value i. Globally you have no way to order those operations (from programmer's point of view) - unless you have a certain language memory model in place.
  • Marko Topolnik
    Marko Topolnik over 4 years
    To simplify the above, each individual read and write access to a volatile var appears in the global synchronization order that is unique across all threads within the same JVM instance. But the access to plain variables has a possibly different order for each thread, and the only guarantee is that a given thread orders its own accesses "correctly" (as specified by the Java code). When one thread observes the actions of another thread, those can be in any order and may also never be observed.
  • user805623
    user805623 over 3 years
    The problem I have with the bank account example in the link you provided is that the distinction between race condition and data race depends on what you consider a memory location. Consider transfer2: The article claims this is an example of a function suffering from a race condition but not from a data race. Oh really? I think it would be very sensible to regard the whole account struct a single memory location. So even if you protect individual struct members within against concurrent access, there is still concurrent access to the overall structured variable, i.e. you do have a data race!
  • Krishanu Singh
    Krishanu Singh over 3 years
    I am a bit confused about the claim that sequential consistency guarantees a global happens before relationship, hence data race freedom. I mean each thread's instructions would be ordered by the sc but its not clear to me why it should enforce any ordering between instructions of different threads. Thus it seems to me that there could still be data races.
  • Xiao-Feng Li
    Xiao-Feng Li over 3 years
    @KrishanuSingh Memory consistency refers to the memory access order seen by other processors. SC enforces the same memory access order seen by all processors (i.e., the threads on the cores) in one execution. That means, different processors execute their memory instructions in a same global order observed by all of them - any two memory instructions from different processors are executed "in order", in other words, "without data race". On the other hand, the global order is not necessarily the same across different executions of the same program, which may result different execution results.
  • Krishanu Singh
    Krishanu Singh over 3 years
    Okay I get it now. Thanks!
  • N.S.
    N.S. over 2 years
    You're one of the few people to make this distinction, which is an important one to make. But I think there's a further refinement possible. Although your example and the banking example you have in the link both illustrate race conditions, they have very different solutions. In the banking example, the problem was resolved by making the check and update code atomic. However atomicity won't address the race condition in your example. Rather we have to be careful to ensure that the threads are executed in the desired order, otherwise there is nondeterminism in the outcome.