OpenMP function calls in parallel

c++ openmp

14,524

Solution 1

Is this what you are after?

Live On Coliru

#include <omp.h>
#include <cstdio>

int main()
{

    int nthreads, tid;

#pragma omp parallel private(tid)
    {

        tid = ::omp_get_thread_num();
        printf("Hello World from thread = %d\n", tid);

        /* Only master thread does this */
        if (tid == 0) {
            nthreads = ::omp_get_num_threads();
            printf("Number of threads = %d\n", nthreads);
        }

    } /* All threads join master thread and terminate */
}

Output:

Hello World from thread = 0
Number of threads = 8
Hello World from thread = 4
Hello World from thread = 3
Hello World from thread = 5
Hello World from thread = 2
Hello World from thread = 1
Hello World from thread = 6
Hello World from thread = 7

Solution 2

You should be doing something like this :

#pragma omp parallel private(tid)
{ 
    tid = omp_get_thread_num();
    parDF(tid);
}

I think its quite straight forward.

Solution 3

There are two ways to achieve what you want:

Exactly the way you are describing it: each thread starts the function with it's own thread id:
```
#pragma omp parallel
{
    int threadId = omp_get_thread_num();
    parDF(threadId);
}
```
The parallel block starts as many threads as the system reports that it supports, and each of them executes the block. Since they differ in threadId, they will process different data. To force that starting of more threads you can add a numthreads(100) or whatever to the pragma.
The correct way to do what you want is to use a parallel for block.
```
#pragma omp parallel for
for (int i=0; i < numThreads; ++i) {
    parDF(i);
}
```
This way each iteration of the loop (value of i) gets assigned to a thread, that executes it. As many iterations will be ran in parallel, as there are available threads.

Method 1. is not very general, and is inefficient because you have to have as many threads as you want function calls. Method 2. is the canonical (right) way to get your problem solved.

14,524

MikkelSecher

Fresh out of Uni with as masters in Computer Science, and have recently started working as a ASP.NET developer at a Danish E-commerce company. I'm hoping to be active on Stack Overflow and improve my professional skills in this way.

Updated on June 22, 2022

Comments

MikkelSecher about 2 years

I'm looking for a way to call a function in parallel.

For example, if I have 4 threads, I want to each of them to call the same function with their own thread id as an argument.

Because of the argument, no thread will work on the same data.

#pragma omp parallel
{
    for(int p = 0; p < numberOfThreads; p++)
    {
        if(p == omp_get_thread_num())
            parDF(p);
    }
}

Thread 0 should run parDF(0)

Thread 1 should run parDF(1)

Thread 2 should run parDF(2)

Thread 3 should run parDF(3)

All this should be done at the same time...

This (obviously) doesn't work, but what is the right way to do parallel function calls?

EDIT: The actual code (This might be too much information... But it was asked for...)

From the function that calls parDF():

omp_set_num_threads(NUM_THREADS);
#pragma omp parallel
{

    numberOfThreads = omp_get_num_threads();
    //split nodeQueue
    #pragma omp master
    {
        splitNodeQueue(numberOfThreads);
    }
    int tid = omp_get_thread_num();

    //printf("Hello World from thread = %d\n", tid);
    #pragma omp parallel for private(tid)
    for(int i = 0; i < numberOfThreads; ++i)
    {
            parDF(tid, originalQueueSize, DFlevel);
    }
}

The parDF function:

bool Tree::parDF(int id, int originalQueueSize, int DFlevel)
{
double possibilities[20];
double sequence[3];
double workingSequence[3];
int nodesToExpand = originalQueueSize/omp_get_num_threads();
int tenthsTicks = nodesToExpand/10;
int numPossibilities = 0;
int percentage = 0;
list<double>::iterator i;
list<TreeNode*>::iterator n;

cout << "My ID is: "<< omp_get_thread_num() << endl;

        while(parNodeQueue[id].size() > 0 and parNodeQueue[id].back()->depth == DFlevel)
        {

            if(parNodeQueue[id].size()%tenthsTicks == 0)
            {
                cout << endl;
                cout << percentage*10 << "% done..." << endl;
                if(percentage == 10)
                {
                    percentage = 0;
                }
                percentage++;
            }

            //countStartPoints++;
            depthFirstQueue.push_back(parNodeQueue[id].back());
            numPossibilities = 0;

            for(i = parNodeQueue[id].back()->content.sortedPoints.begin(); i != parNodeQueue[id].back()->content.sortedPoints.end(); i++)
            {

                for(int j = 0; j < deltas; j++)
                {
                    if(parNodeQueue[id].back()->content.doesPointExist((*i) + delta[j]))
                    {
                        for(int k = 0; k <= numPossibilities; k++)
                        {
                            if(fabs((*i) + delta[j] - possibilities[k]) < 0.01)
                            {
                                goto pointAlreadyAdded;
                            }
                        }
                        possibilities[numPossibilities] = ((*i) + delta[j]);
                        numPossibilities++;
                        pointAlreadyAdded:
                        continue;
                    }
                }
            }

            // Out of the list of possible points. All combinations of 3 are added, building small subtrees in from the node.
            // If a subtree succesfully breaks the lower bound, true is returned.

            for(int i = 0; i < numPossibilities; i++)
            {
                for(int j = 0; j < numPossibilities; j++)
                {
                    for(int k = 0; k < numPossibilities; k++)
                    {
                        if( k != j and j != i and i != k)
                        {
                            sequence[0] = possibilities[i];
                            sequence[1] = possibilities[j];
                            sequence[2] = possibilities[k];
                            //countSeq++;
                            if(addSequence(sequence, id))
                            {
                                //successes++;
                                workingSequence[0] = sequence[0];
                                workingSequence[1] = sequence[1];
                                workingSequence[2] = sequence[2];
                                parNodeQueue[id].back()->workingSequence[0] = sequence[0];
                                parNodeQueue[id].back()->workingSequence[1] = sequence[1];
                                parNodeQueue[id].back()->workingSequence[2] = sequence[2];
                                parNodeQueue[id].back()->live = false;
                                succesfulNodes.push_back(parNodeQueue[id].back());
                                goto nextNode;
                            }
                            else
                            {
                                destroySubtree(parNodeQueue[id].back());
                            }
                        }
                    }
                }
            }
            nextNode:
            parNodeQueue[id].pop_back();
        }

dkg over 9 years

Do not forget to compile and link with OpenMP : -fopenmp with gcc.

MikkelSecher over 9 years

That was my thought exactly, but this executes in a sequential way. With 2 threads it first it runs the parDF(0) and then, when it is done, it runs parDF(1)...
jepio over 9 years

This sounds like you have a problem with your runtime environment. Are you sure OpenMP is even working for you? What machine/system are you using, how are you compiling the program?
dkg over 9 years

Shouldn't you add num_threads(4) at the end of the #pragma ? Like this : #pragma omp parallel private(tid) num_threads(4)then you have 4 threads executing your code.
sehe over 9 years

@dkg your guess is as good as anyone's. Why would I? The OP is also using all available threads as far as I can tell
dkg over 9 years

@sehe : OpenMP dynamically set the number of threads. If the number choosen by OpenMP is lower than the number of times you want your function to be executed then you won’t get your desired results. Otherwise you should place the call in a loop iterating the number of desired times.
sehe over 9 years

@dkg I know all this. I'm assuming the OP also knows, and he was just asking about how to achieve... well what he asks: "For example, if I have 4 threads, I want to each of them to call the same function with their own thread id as an argument.". I think you're somehow reading a different question. (Did you notice you can replace printf with parDF e.g.?)
sehe over 9 years

@user3162941 You should try it before you claim this. Unless numthreads is already configured to be 1, you are just wrong. My answer shows exactly the same and it's live on Coliru, so you can even check this from your lounge chair.
MikkelSecher over 9 years

I put in some prints that sits in the beginning of the parDF function. It looks like the function is called several times by each thread, but without running all the code in the function. "My ID is: X" is printed a couple of times for each thread in random order, and then it looks like it runs the function 4 consecutive times... There are a few loops in the parDF function, but since they are called from a #pragma omp parallel block, they should be run individually by each thread, right?
sehe over 9 years

@user3162941 why don't you share the actual code? That'll be a lot more useful. Thanks
MikkelSecher over 9 years

The function is called, but the entire code is not executed. I print each threads ID in the beginning of the function, but the actual loops, where it does something is run only once... Do I need pragma parallel block inside the called function to get it to run in parallel or is it enough that the function is called from a parallel block?
MikkelSecher over 9 years

I can do that, sure... But it is will probably just confuse more... I'm trying to cook it down to the basic problem instead of confusing you guys with a small part of big program...
sehe over 9 years

Don't! Unconfuse yourself by looking at the whole of a small program instead. See also Nobody Writes Testcases Anymore and Solve your problem by almost asking a question on Stackoverflow.
Anshul Sharma over 9 years

Can you post the code of function? Its a possibility that you doing something that creates a bottleneck to be executed in parallel or the later instructions is changing the same resources in similar way.