Difference between section and task openmp

c parallel-processing openmp

36,492

The difference between tasks and sections is in the time frame in which the code will execute. Sections are enclosed within the sections construct and (unless the nowait clause was specified) threads will not leave it until all sections have been executed:

                 [    sections     ]
Thread 0: -------< section 1 >---->*------
Thread 1: -------< section 2      >*------
Thread 2: ------------------------>*------
...                                *
Thread N-1: ---------------------->*------

Here N threads encounter a sections construct with two sections, the second taking more time than the first. The first two threads execute one section each. The other N-2 threads simply wait at the implicit barrier at the end of the sections construct (show here as *).

Tasks are queued and executed whenever possible at the so-called task scheduling points. Under some conditions, the runtime could be allowed to move task between threads, even in the mid of their lifetime. Such tasks are called untied and an untied task might start executing in one thread, then at some scheduling point it might be migrated by the runtime to another thread.

Still, tasks and sections are in many ways similar. For example, the following two code fragments achieve essentially the same result:

// sections
...
#pragma omp sections
{
   #pragma omp section
   foo();
   #pragma omp section
   bar();
}
...

// tasks
...
#pragma omp single nowait
{
   #pragma omp task
   foo();
   #pragma omp task
   bar();
}
#pragma omp taskwait
...

taskwait works very like barrier but for tasks - it ensures that current execution flow will get paused until all queued tasks have been executed. It is a scheduling point, i.e. it allows threads to process tasks. The single construct is needed so that tasks will be created by one thread only. If there was no single construct, each task would get created num_threads times, which might not be what one wants. The nowait clause in the single construct instructs the other threads to not wait until the single construct was executed (i.e. removes the implicit barrier at the end of the single construct). So they hit the taskwait immediately and start processing tasks.

taskwait is an explicit scheduling point shown here for clarity. There are also implicit scheduling points, most notably inside the barrier synchronisation, no matter if explicit or implicit. Therefore, the above code could also be written simply as:

// tasks
...
#pragma omp single
{
   #pragma omp task
   foo();
   #pragma omp task
   bar();
}
...

Here is one possible scenario of what might happen if there are three threads:

               +--+-->[ task queue ]--+
               |  |                   |
               |  |       +-----------+
               |  |       |
Thread 0: --< single >-|  v  |-----
Thread 1: -------->|< foo() >|-----
Thread 2: -------->|< bar() >|-----

Show here within the | ... | is the action of the scheduling point (either the taskwait directive or the implicit barrier). Basically thread 1 and 2 suspend what they are doing at that point and start processing tasks from the queue. Once all tasks have been processed, threads resume their normal execution flow. Note that threads 1 and 2 might reach the scheduling point before thread 0 has exited the single construct, so the left |s need not necessary be aligned (this is represented on the diagram above).

It might also happen that thread 1 is able to finish processing the foo() task and request another one even before the other threads are able to request tasks. So both foo() and bar() might get executed by the same thread:

               +--+-->[ task queue ]--+
               |  |                   |
               |  |      +------------+
               |  |      |
Thread 0: --< single >-| v             |---
Thread 1: --------->|< foo() >< bar() >|---
Thread 2: --------------------->|      |---

It is also possible that the singled out thread might execute the second task if thread 2 comes too late:

               +--+-->[ task queue ]--+
               |  |                   |
               |  |      +------------+
               |  |      |
Thread 0: --< single >-| v < bar() >|---
Thread 1: --------->|< foo() >      |---
Thread 2: ----------------->|       |---

In some cases the compiler or the OpenMP runtime might even bypass the task queue completely and execute the tasks serially:

Thread 0: --< single: foo(); bar() >*---
Thread 1: ------------------------->*---
Thread 2: ------------------------->*---

If no task scheduling points are present inside the region's code, the OpenMP runtime might start the tasks whenever it deems appropriate. For example it is possible that all tasks are deferred until the barrier at the end of the parallel region is reached.

36,492

Author by

Arkerone

Passionate developer, I'm specialized on the back-end development. I'm the creator of the french blog : https://www.codeheroes.fr/

Updated on July 08, 2022

Comments

Arkerone almost 2 years

What is the difference in OpenMP between :

#pragma omp parallel sections
{
    #pragma omp section
    {
       fct1();
    }
    #pragma omp section
    {
       fct2();
    }
}

and :

#pragma omp parallel 
{
    #pragma omp single
    {
       #pragma omp task
       fct1();
       #pragma omp task
       fct2();
    }
}

I'm not sure that the second code is correct...

dreamcrash over 11 years

+1,@Arkerone yes it a good explanation, you should also give a up-vote :)
dreamcrash over 11 years

Is it there much of a difference using 3 consecutive singles vs sections?
Chris over 11 years

@HristoIliev Do you have a source on a task being created num_threads times when a task pragma is not within a single pragma? I don't see anything that suggests this in IBM's OpenMP documentation.
Hristo Iliev over 11 years

@Chris, OpenMP 3.1 specification §2.7.1: "When a thread encounters a task construct, a task is generated from the code for the associated structured block." Unless there is a single/master` or a worksharing construct, or conditionals in place, each thread executes exactly the same code and hence all threads encounter the task directive.
towi_parallelism about 11 years

I believe there is no need to have "nowait" and "#pragma mop task wait". Threads start doing their work as soon as the tasks are created. correct me if I am wrong.
Joe C almost 9 years

@HristoIliev: "The other N-2 threads simply wait at the implicit barrier at the end of the sections construct". If the other threads are not assigned to any section inside this parallel sections cluster, do they still need to wait?
Hristo Iliev almost 9 years

@JoeC, sections is a worksharing construct, which means that all threads in the team associated with a given parallel region must encounter it in order for the construct to succeed. If it is not desirable that idle threads wait at the implicit barrier, one applies the nowait clause, which removes the implicit barrier.
Joe C almost 9 years

@HristoIliev: I see. So this means threads not related to the parallel region do not need to wait at this barrier.
Hristo Iliev almost 9 years

@JoeC, that's correct. But notice that mixing OpenMP with other threading paradigms, e.g. pthreads or std::thread, though functioning perfectly with e.g. the GCC OpenMP runtime, is not standardised and could result in non-portable code.
Manuel Selva about 8 years

@HristoIliev Can we thus answer to the initial OP question on the difference between Tasks and Sections that it is mainly a performance concern ? Tasks allows more flexibility for the runtime scheduling (If we don't consider the new tasks dependency thing of course)
Hristo Iliev about 8 years

@ManuelSelva, I would refrain from stating that the main difference between the two constructs is the performance as it depends heavily on the OpenMP runtime.
Yiling Liu almost 4 years

I test fib sequence by both task and sections, but not sure if it is right or not. It is too long to post in comment so I put it below your answer. Is that right to say that "task is much wiser than sections while distributing computing resources" ?
Hristo Iliev almost 4 years

@YilingLiu I'm not sure what your definition of "wiser" is.
Hristo Iliev almost 4 years

The two code examples are not equivalent. The one with sections is using nested parallelism, i.e., creating a new parallel region on each recursive call. Nested parallelism is disabled by default, so anything but the top recursion level is running with teams of one thread, which is why you see so many thread IDs equal to 0. Even if nested parallelism was enabled, you may end up with thousands of threads, which will be really inefficient.
Yiling Liu almost 4 years

@Hristo Iliev So can we calculate Fibonacci by using sections? I mean, enable parallelism while using sections
Hristo Iliev almost 4 years

To a very limited extent only. Sections aren't meant for solving recursive problems. They are meant to solve the case of independent blocks in your program's linear execution.
Yiling Liu almost 4 years

@Hristo Iliev Got it
Laci almost 3 years

Excellent answer, but the most important difference between Tasks and Sections may be worth to emphasize (even thought does not apply to the code above): Sections are static, i.e the number of sections is set when the code is written, whereas Tasks can be created anytime, and in any number, under the control of the program’s logic.