MPI_Scatter of 2D array and malloc

c arrays malloc mpi

17,370

I think you have fundamentally misunderstood what the scatter operation does and how MPI expects memory to be allocated and used.

MPI_Scatter takes the source array and splits it into pieces, sending a unique piece to each member of the MPI communicator. In your example, you would need your matrix to be an allocation of contiguous p*p elements in linear memory, which would send p values to each process. Your source "matrix" is an array of pointers. There is no guarantee that the rows are sequentially arranged in memory, and MPI_Scatter doesn't know how to traverse the array of pointers you have passed it. As a result, the call is simply reading beyond the end of the first row you passed by indirection of the matrix pointer, treating whatever follows that in memory as data. That is why you get garbage values in the processes which receive the data after the first row.

All MPI data copy routines expect that source and destination memory are "flat" linear arrays. Multidimensional C arrays should be stored in row major order not in arrays of pointers as you have done here. A cheap and nasty hack of your example to illustrate a scatter call working correctly would be like this:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <mpi.h>

int *createMatrix (int nrows, int ncols) {
    int *matrix;
    int h, i, j;

    if (( matrix = malloc(nrows*ncols*sizeof(int))) == NULL) {
        printf("Malloc error");
        exit(1);
    }

    for (h=0; h<nrows*ncols; h++) {
        matrix[h] = h+1;
    }

    return matrix;
}

void printArray (int *row, int nElements) {
    int i;
    for (i=0; i<nElements; i++) {
        printf("%d ", row[i]);
    }
    printf("\n");
}

int main (int argc, char **argv) {

    if (MPI_Init(&argc, &argv) != MPI_SUCCESS) {
        perror("Error initializing MPI");
        exit(1);
    }

    int p, id;
    MPI_Comm_size(MPI_COMM_WORLD, &p); // Get number of processes
    MPI_Comm_rank(MPI_COMM_WORLD, &id); // Get own ID

    int *matrix;

    if (id == 0) {
        matrix = createMatrix(p, p); // Master process creates matrix
        printf("Initial matrix:\n");
        printArray(matrix, p*p);
    }

    int *procRow = malloc(sizeof(int) * p); // received row will contain p integers
    if (procRow == NULL) {
        perror("Error in malloc 3");
        exit(1);
    }

    if (MPI_Scatter(matrix, p, MPI_INT, // send one row, which contains p integers
                procRow, p, MPI_INT, // receive one row, which contains p integers
                0, MPI_COMM_WORLD) != MPI_SUCCESS) {

        perror("Scatter error");
        exit(1);
    }

    printf("Process %d received elements: ", id);
    printArray(procRow, p);

    MPI_Finalize();

    return 0;
}

which does this:

$ mpicc -o scatter scatter.c 
$ mpiexec -np 4 scatter
Initial matrix:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 
Process 0 received elements: 1 2 3 4 
Process 1 received elements: 5 6 7 8 
Process 2 received elements: 9 10 11 12 
Process 3 received elements: 13 14 15 16

ie. when you pass the data stored in linear memory, it works. The equivalent row major array would be statically allocated like this:

int matrix[4][4] = { {  1,  2,  3,  4 }, 
                     {  5,  6,  7,  8 },
                     {  9, 10, 11, 12 },
                     { 13, 14, 15, 16 } };

Note the difference between a statically allocated two dimensional array, and the array of pointers which your code allocated dynamically. They are not at all the same thing, even though they look superficially similar.

17,370

Author by

rvw

Updated on June 30, 2022

Comments

rvw almost 2 years

I'm trying to write a program in C with the MPI library, in which the master process creates a 2D-array and distributes its rows of to the other processes. The matrix has dimensions p*p, in which p is the number of processes.

Here's the code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <mpi.h>

int **createMatrix (int nrows, int ncols) {
    int **matrix;
    int h, i, j;

    if (( matrix = malloc(nrows*sizeof(int*))) == NULL) {
        printf("Malloc error");
        exit(1);
    }

    for (h=0; h<nrows; h++) {
        if (( matrix[h] = malloc( ncols * sizeof(int))) == NULL) {
            printf("Malloc error 2");
            exit(1);
        }
    }

    for (i=0; i<ncols; i++) {
        for (j=0; j<nrows; j++) {
            matrix[i][j] = ((i*nrows) + j);
        }
    }

    return matrix;
}

void printArray (int *row, int nElements) {
    int i;
    for (i=0; i<nElements; i++) {
        printf("%d ", row[i]);
    }
    printf("\n");
}

void printMatrix (int **matrix, int nrows, int ncols) {
    int i;
    for (i=0; i<nrows; i++) {
        printArray(matrix[i], ncols);
    }
}

int main (int argc, char **argv) {

    if (MPI_Init(&argc, &argv) != MPI_SUCCESS) {
        perror("Error initializing MPI");
        exit(1);
    }

    int p, id;
    MPI_Comm_size(MPI_COMM_WORLD, &p); // Get number of processes
    MPI_Comm_rank(MPI_COMM_WORLD, &id); // Get own ID

    int **matrix;

    if (id == 0) {
        matrix = createMatrix(p, p); // Master process creates matrix
        printf("Initial matrix:\n");
        printMatrix(matrix, p, p);
    }

    int *procRow = malloc(sizeof(int) * p); // received row will contain p integers
    if (procRow == NULL) {
        perror("Error in malloc 3");
        exit(1);
    }

    if (MPI_Scatter(*matrix, p, MPI_INT, // send one row, which contains p integers
                    procRow, p, MPI_INT, // receive one row, which contains p integers
                    0, MPI_COMM_WORLD) != MPI_SUCCESS) {

        perror("Scatter error");
        exit(1);
    }

    printf("Process %d received elements: ", id);
    printArray(procRow, p);

    MPI_Finalize();

    return 0;
}

The output I receive when running this code is

$ mpirun -np 4 test
Initial matrix:
0 1 2 3 
4 5 6 7 
8 9 10 11 
12 13 14 15 
Process 0 received elements: 0 1 2 3 
Process 1 received elements: 1 50 32 97 
Process 2 received elements: -1217693696 1 -1217684120 156314784 
Process 3 received elements: 1 7172196 0 0

Process 0 apparently receives the correct input, but the other processes show numbers I can't make sense of. Also note that the numbers of process 1 and 3 are consistent over multiple runs of the program, whereas the numbers of process 2 change in each run.

It seems to me that there's something wrong in my memory allocating or pointer usage, but I'm quite new to programming in C. Could anyone explain to me how and why this output is produced? Secondary, obviously, I'm also interested in how to solve my issue :) thanks in advance!