MPI - Printing in an order

c parallel-processing printf mpi

11,359

Solution 1

There is no way to guarantee that messages from many different processes will arrive in the "correct" order when they arrive to another process. This is essentially what is happening here.

Even though you aren't explicitly sending messages, when you print something to the screen, it has to be sent to the process on your local system (mpiexec or mpirun) where it can be printed to the screen. There is no way for MPI to know what the correct order for these messages is so it just prints them as they arrive.

If you require that your messages are printed in a specific order, you must send them all to one rank which can print them in whatever order you like. As long as one rank does all of the printing, all of the messages will be ordered correctly.

It should be said that there will probably be answers that you can find out there which say you can put a newline at the end of your string or use flush() to ensure that the buffers are flushed, but that won't guarantee ordering on the remote end for the reasons mentioned above.

Solution 2

So, you can do something like this:

MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if (rank == 0) {
    MPI_Send(&message, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
    printf("1 SIZE = %d RANK = %d MESSAGE = %d \n",size,rank, message);
} else {
    int buffer;
    MPI_Status status;
    MPI_Probe(MPI_ANY_SOURCE, 0, MPI_COMM_WORLD, &status);
    MPI_Get_count(&status, MPI_INT, &buffer);
    if (buffer == 1) {
        printf("2 SIZE = %d RANK = %d MESSAGE = %d \n",size,rank, message);
        MPI_Recv(&message, buffer, MPI_INT, MPI_ANY_SOURCE, 0, MPI_COMM_WORLD, &status);
        if (rank + 1 != size) {
            MPI_Send(&message, 1, MPI_INT, ++rank, 0, MPI_COMM_WORLD);
        }
    };
};
MPI_Finalize();

After execute:

$ mpirun -n 5 ./a.out 
1 SIZE = 5 RANK = 0 MESSAGE = 999 
2 SIZE = 5 RANK = 1 MESSAGE = 999 
2 SIZE = 5 RANK = 2 MESSAGE = 999 
2 SIZE = 5 RANK = 3 MESSAGE = 999 
2 SIZE = 5 RANK = 4 MESSAGE = 999

Solution 3

I was inspired by Святослав Павленко's answer: using the blocking MPI communications to enforce serial-in-time output. While Wesley Bland has a point about MPI not being built for serial output. So if we want to output data, it makes sense either have each processor output (non-colliding) data. Alternatively, if the order of the data is important (and it's not too big) the recommended approach is to send it all to on cpu (say rank 0), which then formats the data correctly.

To me, this seems to be a bit of overkill especially when the data can be variable-length strings, which all too often is what std::cout << "a=" << some_varible << " b=" << some_other_variable often is. So if we want some quick-and-dirty in-order printing, we can exploit Святослав Павленко's answer to build a serial output stream. This solution works fine, but its performance scales badly with many cpus, so don't use it of data output!

#include <iostream>
#include <sstream>
#include <mpi.h>

MPI House-keeping:

int mpi_size;
int mpi_rank;

void init_mpi(int argc, char * argv[]) {
    MPI_Init(& argc, & argv);
    MPI_Comm_size(MPI_COMM_WORLD, & mpi_size);
    MPI_Comm_rank(MPI_COMM_WORLD, & mpi_rank);
}

void finalize_mpi() {
    MPI_Finalize();
}

General-purpose class which enables MPI message-chaining

template<class T, MPI_Datatype MPI_T> class MPIChain{
    // Uses a chained MPI message (T) to coordinate serial execution of code (the content of the message is irrelevant).
    private:
        T message_out; // The messages aren't really used here
        T message_in;
        int size;
        int rank;

    public:
        void next(){
            // Send message to next core (if there is one)
            if(rank + 1 < size) {
            // MPI_Send - Performs a standard-mode blocking send.
            MPI_Send(& message_out, 1, MPI_T, rank + 1, 0, MPI_COMM_WORLD);
            }
        }

        void wait(int & msg_count) {
            // Waits for message to arrive. Message is well-formed if msg_count = 1
            MPI_Status status;

            // MPI_Probe - Blocking test for a message.
            MPI_Probe(MPI_ANY_SOURCE, 0, MPI_COMM_WORLD, & status);
            // MPI_Get_count - Gets the number of top level elements.
            MPI_Get_count(& status, MPI_T, & msg_count);

            if(msg_count == 1) {
                // MPI_Recv - Performs a standard-mode blocking receive.
                MPI_Recv(& message_in, msg_count, MPI_T, MPI_ANY_SOURCE, 0, MPI_COMM_WORLD, & status);
            }
        }

        MPIChain(T message_init, int c_rank, int c_size): message_out(message_init), size(c_size), rank(c_rank) {}

        int get_rank() const { return rank;}
        int get_size() const { return size;}
};

We can now use our MPIChain class to create our class which manages to output stream:

class ChainStream : public MPIChain<int, MPI_INT> {
    // Uses the MPIChain class to implement a ostream with a serial operator<< implementation.
    private:
        std::ostream & s_out;

    public:
        ChainStream(std::ostream & os, int c_rank, int c_size)
            : MPIChain<int, MPI_INT>(0, c_rank, c_size), s_out(os) {};

        ChainStream & operator<<(const std::string & os){
            if(this->get_rank() == 0) {
                this->s_out << os;
                // Initiate chain of MPI messages
                this->next();
            } else {
                int msg_count;
                // Wait untill a message arrives (MPIChain::wait uses a blocking test)
                this->wait(msg_count);
                if(msg_count == 1) {
                    // If the message is well-formed (i.e. only one message is recieved): output string
                    this->s_out << os;
                    // Pass onto the next member of the chain (if there is one)
                    this->next();
                }
            }

            // Ensure that the chain is resolved before returning the stream
            MPI_Barrier(MPI_COMM_WORLD);

            // Don't output the ostream! That would break the serial-in-time exuction.
            return *this;
       };
};

Note the MPI_Barrier at the end of operator<<. This is to prevent the code starting a second output chain. Even though this could be moved outside the operator<<, I figured that I would put it here, since this is supposed to be serial output anyway....

Putting it all together:

int main(int argc, char * argv[]) {
    init_mpi(argc, argv);

    ChainStream cs(std::cout, mpi_rank, mpi_size);

    std::stringstream str_1, str_2, str_3;
    str_1 << "FIRST:  " << "MPI_SIZE = " << mpi_size << " RANK = " << mpi_rank << std::endl;
    str_2 << "SECOND: " << "MPI_SIZE = " << mpi_size << " RANK = " << mpi_rank << std::endl;
    str_3 << "THIRD:  " << "MPI_SIZE = " << mpi_size << " RANK = " << mpi_rank << std::endl;

    cs << str_1.str() << str_2.str() << str_3.str();
    // Equivalent to:
    //cs << str_1.str();
    //cs << str_2.str();
    //cs << str_3.str();

    finalize_mpi();
}

Note that we are concatenating the strings str_1, str_2, str_3 before we send them the the ChainStream instance. Normally one would do something like:

std::cout << "a" << "b" << "c"" << std::endl

but this applies operator<< from left-to-right, and we want the strings to be ready for output before sequentially running through each process.

g++-7 -O3 -lmpi serial_io_obj.cpp -o serial_io_obj
mpirun -n 10 ./serial_io_obj

Outputs:

FIRST:  MPI_SIZE = 10 RANK = 0
FIRST:  MPI_SIZE = 10 RANK = 1
FIRST:  MPI_SIZE = 10 RANK = 2
FIRST:  MPI_SIZE = 10 RANK = 3
FIRST:  MPI_SIZE = 10 RANK = 4
FIRST:  MPI_SIZE = 10 RANK = 5
FIRST:  MPI_SIZE = 10 RANK = 6
FIRST:  MPI_SIZE = 10 RANK = 7
FIRST:  MPI_SIZE = 10 RANK = 8
FIRST:  MPI_SIZE = 10 RANK = 9
SECOND: MPI_SIZE = 10 RANK = 0
SECOND: MPI_SIZE = 10 RANK = 1
SECOND: MPI_SIZE = 10 RANK = 2
SECOND: MPI_SIZE = 10 RANK = 3
SECOND: MPI_SIZE = 10 RANK = 4
SECOND: MPI_SIZE = 10 RANK = 5
SECOND: MPI_SIZE = 10 RANK = 6
SECOND: MPI_SIZE = 10 RANK = 7
SECOND: MPI_SIZE = 10 RANK = 8
SECOND: MPI_SIZE = 10 RANK = 9
THIRD:  MPI_SIZE = 10 RANK = 0
THIRD:  MPI_SIZE = 10 RANK = 1
THIRD:  MPI_SIZE = 10 RANK = 2
THIRD:  MPI_SIZE = 10 RANK = 3
THIRD:  MPI_SIZE = 10 RANK = 4
THIRD:  MPI_SIZE = 10 RANK = 5
THIRD:  MPI_SIZE = 10 RANK = 6
THIRD:  MPI_SIZE = 10 RANK = 7
THIRD:  MPI_SIZE = 10 RANK = 8
THIRD:  MPI_SIZE = 10 RANK = 9

11,359

Author by

Κωστας Ιωαννου

Updated on June 04, 2022

Comments

Κωστας Ιωαννου almost 2 years
I'm trying to write a function in C where every processor prints it's own data. Here is what i have:
```
void print_mesh(int p,int myid,int** U0,int X,int Y){
    int i,m,n;
    for(i=0;i<p;i++){
        if(myid==i){
            printf("myid=%d\n",myid);
            for(n=0;n<X;n++){
                for(m=0;m<Y;m++){
                    printf("%d ",U0[n][m]);
                }
                printf("\n");
            }
        }
        else MPI_Barrier(MPI_COMM_WORLD);
    }
}
```
It doesn't work for some reason. The arrays are printed all mixed up. Do you have any insight as to why this doesn't work? Any other ideas that work? If possible, I don't want to send the whole array in a master process. Also I don't want to use precompiled functions.