Parallel output using MPI IO to a single file

10,224

Your binary file output is almost right; but your calculations for your offset within the file and the amount of data to write is incorrect. You want your offset to be

MPI_Offset offset = sizeof(double)*Pstart;

not

MPI_Offset offset = sizeof(double)*rank;

otherwise you'll have each rank overwriting each others data as (say) rank 3 out of nprocs=5 starts writing at double number 3 in the file, not (30/5)*3 = 18.

Also, you want each rank to write NNN/nprocs doubles, not sizeof(double) doubles, meaning you want

MPI_File_write(file, localArray, NNN/nprocs, MPI_DOUBLE, &status);

How to write as a text file is a much bigger issue; you have to convert the data into string internally and then output those strings, making sure you know how many characters each line requires by careful formatting. That is described in this answer on this site.

Share:
10,224
Arnold Klein
Author by

Arnold Klein

Updated on June 04, 2022

Comments

  • Arnold Klein
    Arnold Klein almost 2 years

    I have a very simple task to do, but somehow I am still stuck.

    I have one BIG data file ("File_initial.dat"), which should be read by all nodes on the cluster (using MPI), each node will perform some manipulation on part of this BIG file (File_size / number_of_nodes) and finally each node will write its result to one shared BIG file ("File_final.dat"). The number of elements of files remain the same.

    1. By googling I understood, that it is much better to write data file as a binary file (I have only decimal numbers in this file) and not as *.txt" file. Since no human will read this file, but only computers.

    2. I tried to implement myself (but using formatted in/output and NOT binary file) this, but I get incorrect behavior.

    My code so far follows:

    #include <fstream>
    #define NNN 30
    
    int main(int argc, char **argv)
    {   
        ifstream fin;
    
        // setting MPI environment
    
        int rank, nprocs;
        MPI_File file;
        MPI_Init(&argc, &argv);
        MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
        MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    
        // reading the initial file
    
        fin.open("initial.txt");
        for (int i=0;i<NNN;i++)
        {  
            fin  >> res[i];
            cout << res[i] << endl; // to see, what I have in the file
        }  
        fin.close();
    
        // starting position in the "res" array as a function of "rank" of process
        int Pstart = (NNN / nprocs) * rank ;
        // specifying Offset for writing to file
        MPI_Offset offset = sizeof(double)*rank;
        MPI_File file;
        MPI_Status status;
    
        // opening one shared file
        MPI_File_open(MPI_COMM_WORLD, "final.txt", MPI_MODE_CREATE|MPI_MODE_WRONLY,
                              MPI_INFO_NULL, &file);
    
        // setting local for each node array
    
        double * localArray;
        localArray = new double [NNN/nprocs];
    
        // Performing some basic manipulation (squaring each element of array)
        for (int i=0;i<(NNN / nprocs);i++)
        {
            localArray[i] = res[Pstart+i]*res[Pstart+i];
        }
    
        // Writing the result of each local array to the shared final file:
    
        MPI_File_seek(file, offset, MPI_SEEK_SET);
        MPI_File_write(file, localArray, sizeof(double), MPI_DOUBLE, &status);
        MPI_File_close(&file);
    
        MPI_Finalize();
    
        return 0;
    }
    

    I understand, that I do something wrong, while trying to write double as a text file.

    How one should change the code in order to be able to save

    1. as .txt file (format output)
    2. as .dat file (binary file)