How does OpenMP handle nested loops?

64,333

Solution 1

The lines you have written will parallelize only the outer loop. To parallelize both you need to add a collapse clause:

#pragma omp parallel for collapse(2)
    for (int i=0;i<N;i++)
    { 
      for (int j=0;j<M;j++)
      {
       //do task(i,j)//
      }
    }

You may want to check OpenMP 3.1 specifications (sec 2.5.1) for more details.

Solution 2

You will be able to better understand this with the following example. Let's do this with two threads.

#pragma omp parallel for num_threads(2)
for(int i=0; i< 3; i++) {
    for (int j=0; j< 3; j++) {
        printf("i = %d, j= %d, threadId = %d \n", i, j, omp_get_thread_num());
    }
}

then the result will be,

i = 0, j= 0, threadId = 0 
i = 0, j= 1, threadId = 0 
i = 0, j= 2, threadId = 0 
i = 1, j= 0, threadId = 0 
i = 1, j= 1, threadId = 0 
i = 1, j= 2, threadId = 0 
i = 2, j= 0, threadId = 1 
i = 2, j= 1, threadId = 1 
i = 2, j= 2, threadId = 1

That means, when you add #pragma omp parallel for to the uppermost for loop, the index of that for loop is divided among the threads. As you can see, when index of i is same the thread id is also the same.

Instead of that, we can parallel the combinations that we have in a nested for loop. In this example we can have following combinations of i and j.

i = 0, j= 0
i = 0, j= 1
i = 0, j= 2
i = 1, j= 0
i = 1, j= 1
i = 1, j= 2
i = 2, j= 0
i = 2, j= 1
i = 2, j= 2

In order to parallelize the code combination wise, we can add the collapse keyword as follows.

#pragma omp parallel for num_threads(2) collapse(2)
for(int i=0; i< 3; i++) {
    for (int j=0; j< 3; j++) {
        printf("i = %d, j= %d, threadId = %d \n", i, j, omp_get_thread_num());
    }
}

then the result will be as follows.

i = 0, j= 0, threadId = 0 
i = 0, j= 1, threadId = 0 
i = 1, j= 2, threadId = 1 
i = 2, j= 0, threadId = 1 
i = 2, j= 1, threadId = 1 
i = 2, j= 2, threadId = 1 
i = 0, j= 2, threadId = 0 
i = 1, j= 0, threadId = 0 
i = 1, j= 1, threadId = 0 

Then you can see that unlike before, for the same index i, there can be different thread ids ( when (i=1 and j=2 threadId=1) also (i=1 and j=0 threadId=0)). That means in this scenario, the combinations of i and j are divided among the threads.

Share:
64,333
user0002128
Author by

user0002128

Updated on July 08, 2022

Comments

  • user0002128
    user0002128 almost 2 years

    Does the following code just parallelize the first (outer) loops, or it parallelize the entire nested loops?

        #pragma omp parallel for
        for (int i=0;i<N;i++)
        { 
          for (int j=0;j<M;j++)
          {
           //do task(i,j)//
          }
        }
    

    I just want to make sure if the above code will parallelize the entire nested for-loops (thus one thread directly related task(i,j)), or it only parallelizes the outer for-loop (thus it ensures that, for each parrallel thread with loop index i, its inner loop will be done sequentially in a single thread, which is very import).