MPI gather array on root process

c mpi
10,227

Yes, MPI_Gather will do exactly that. From the anl page for MPI_Gather:

int MPI_Gather(void *sendbuf, int sendcnt, MPI_Datatype sendtype, 
               void *recvbuf, int recvcnt, MPI_Datatype recvtype, 
               int root, MPI_Comm comm)

Here, sendbuf is your array on each process (my_array). recvbuf is the long array (all_arrays) on the receiving process into which short arrays are being gathered to. The short array on the receiving process is being copied into its contiguous position in the long array, so you don't need to worry about doing it yourself. The arrays from each process will be arranged contiguously in the long array.

EDIT:

In the case where the receiving process does not contribute sendbuf in the gathering, you may want to use MPI_Gatherv instead (Thanks to @HristoIliev for pointing this out).

Share:
10,227
covstat
Author by

covstat

Updated on June 28, 2022

Comments

  • covstat
    covstat almost 2 years

    I'm new to MPI. I have 4 processes: processes 1 through 3 populate a vector and send it to process 0, and process 0 collects the vectors into one very long vector. I have code that works (too long to post), but process 0's recv operation is clumsy and very slow.

    In abstract, the code does the following:

    MPI::Init();
    int id = MPI::COMM_WORLD.Get_rank();
    
    if(id>0) {
        double* my_array = new double[n*m]; //n,m are int
        Populate(my_array, id);
        MPI::COMM_WORLD.Send(my_array,n*m,MPI::DOUBLE,0,50);
    }
    
    if(id==0) {
        double* all_arrays = new double[3*n*m];
        /* Slow Code Starts Here */
        double startcomm = MPI::Wtime();
        for (int i=1; i<=3; i++) {
        MPI::COMM_WORLD.Recv(&all_arrays[(i-1)*m*n],n*m,MPI::DOUBLE,i,50);
        }
        double endcomm = MPI::Wtime();
        //Process 0 has more operations...
    }
    MPI::Finalize();
    

    It turns out that endcomm - startcomm accounts for 50% of the total time (0.7 seconds compared to 1.5 seconds for the program to complete).

    Is there a better way to receive the vectors from processes 1-3 and store them in process 0's all_arrays?

    I checked out MPI::Comm::Gather, but I'm not sure how to use it. In particular, will it allow me to specify that process 1's array is the first array in all_arrays, process 2's array the second, etc.? Thanks.

    Edit: I removed the "slow" loop, and instead put the following between the "if" blocks:

    MPI_Gather(my_array,n*m,MPI_DOUBLE,
        &all_arrays[(id-1)*m*n],n*m,MPI_DOUBLE,0,MPI_COMM_WORLD);
    

    The same slow performance resulted. Does this have something to do with the fact that the root process "waits" for each individual receive to complete before attempting the next one? Or is that not the right way to think about it?