Values of array after cudaMemcpy

c++ c cuda

12,922

Solution 1

Yes they have their values inside. But you can't print them out on the host. For this you will need to copy your data back using

cudaMemcpy((void *) array_host_2, (void *) array_device_2, SIZE_INT*size, cudaMemcpyDeviceToHost);

And then you can print the values of array_host_2.

A bit more explanation: Your array_device_* lives on the GPU and from your CPU (that is printing your output) you do not have direct access to this data. So you need to copy it back to your CPUs memory first before printing it out.

Solution 2

Example of copying array with data to device, altering values in kernel, copy back to host and printing the new values:

// Function to run on device by many threads
__global__ void myKernel(int *d_arr) {
    int idx = blockIdx.x * blockDim.x + threadIdx.x;
    d_arr[idx] = d_arr[idx]*2;
}

int main(void) {
    int *h_arr, *d_arr;
    h_arr = (int *)malloc(10*sizeof(int));
    for (int i=0; i<10; ++i)
        h_arr[i] = i; // Or other values

    // Sends data to device
    cudaMalloc((void**) &d_arr, 10*sizeof(int));
    cudaMemcpy(d_arr, h_arr, 10*sizeof(int), cudaMemcpyHostToDevice);

    // Runs kernel on device
    myKernel<<< 2, 5 >>>(d_arr);

    // Retrieves data from device 
    cudaMemcpy(h_arr, d_arr, 10*sizeof(int), cudaMemcpyDeviceToHost);

    for (int i = 0; i<10; ++i)
        printf("Post kernel value in h_arr[%d] is: %d\n", i,h_arr[i]);

    cudaFree(d_arr);
    free(h_arr);
    return 0;
}

Solution 3

The code snippet you provided seems correct, other than the first few lines as leftaroundabout pointed out. Are you sure the kernel is correct? Perhaps you are not writing the modified values back to global memory. If you make another set of host arrays and copy the GPU arrays back before running the kernel, are they correct? From what you have, the values inside array_host_* should have been copied to array_device_* properly.

12,922

Author by

spaghettifunk

I'm a Sr. Engineering Manager in the morning and a banana in the night.

Updated on June 29, 2022

Comments

spaghettifunk almost 2 years

I'd like to know if, when i'm calling cudaMemcpy(...) to get memory on the GPU, also the values inside the array are copied or not. I will explain better: I'm copying the values from one array to another and then i call cudaMalloc and cudaMemcpy.

// Copying values of the arrays
for(int i = 0; i<16; i++){
    array_device_1[i] = array_host_1[i];
    array_device_2[i] = array_host_2[i];
}

// Memory allocation of array_device_1 and array_device_2
cudaMalloc((void**) &array_device_1, SIZE_INT*size);
cudaMalloc((void**) &array_device_2, SIZE_INT*size);

// Transfer array_device_1 and array_device_2
cudaMemcpy(array_device_1, array_host_1, SIZE_INT*size, cudaMemcpyHostToDevice);
cudaMemcpy(array_device_2, array_host_2, SIZE_INT*size, cudaMemcpyHostToDevice);

kernel<<<N, N>>>(array_device_1, array_device_2);

cudaMemcpy(array_host_1, array_device_1, SIZE_INT*size, cudaMemcpyDeviceToHost);
cudaMemcpy(array_host_2, array_device_2, SIZE_INT*size, cudaMemcpyDeviceToHost);

cudaFree(array_device_1);
cudaFree(array_device_2);

So, when i'm executing all those instructions and I'm using all the arrays inside the kernel, are the values inside the array_device_1 and array_device_2 or not ? I tried to print out after the kernel and i noticed that all the arrays are empty! Really i can't understand how i can keep the values inside them and then changing their values with kernel function.

spaghettifunk about 12 years

I tested my kernel without exploiting the GPU part, I mean i tested the algorithm and it works fine. I really can't explain what i'm missing