colored image to greyscale image using CUDA parallel processing

12,032

Solution 1

Now, since I posted this question I have been continuously working on this problem
there are a couple of improvements that should be done in order to get this problem correct now I realize my initial solution was wrong .
Changes to be done:-

 1. absolute_position_x =(blockIdx.x * blockDim.x) + threadIdx.x;
 2. absolute_position_y = (blockIdx.y * blockDim.y) + threadIdx.y;

Secondly,

 1. const dim3 blockSize(24, 24, 1);
 2. const dim3 gridSize((numCols/16), (numRows/16) , 1);

In the solution we are using a grid of numCols/16 * numCols/16
and blocksize of 24 * 24

code executed in 0.040576 ms

@datenwolf : thanks for answering above!!!

Solution 2

I recently joined this course and tried your solution but it don't work so, i tried my own. You are almost correct. The correct solution is this:

__global__`
void rgba_to_greyscale(const uchar4* const rgbaImage,
               unsigned char* const greyImage,
               int numRows, int numCols)
{`

int pos_x = (blockIdx.x * blockDim.x) + threadIdx.x;
int pos_y = (blockIdx.y * blockDim.y) + threadIdx.y;
if(pos_x >= numCols || pos_y >= numRows)
    return;

uchar4 rgba = rgbaImage[pos_x + pos_y * numCols];
greyImage[pos_x + pos_y * numCols] = (.299f * rgba.x + .587f * rgba.y + .114f * rgba.z); 

}

The rest is same as your code.

Solution 3

Since you are not aware of the image size. It is best to choose any reasonable dimension of the two-dimensional block of threads and then check for two conditions. The first one is that the pos_x and pos_y indexes in the kernel do not exceed numRows and numCols. Secondly the grid size should be just above the total number of threads in all the blocks.

const dim3 blockSize(16, 16, 1);
const dim3 gridSize((numCols%16) ? numCols/16+1 : numCols/16,
(numRows%16) ? numRows/16+1 : numRows/16, 1);

Solution 4

You still should have a problem with run time - the conversion will not give a proper result.

The lines:

  1. uchar4 rgba = rgbaImage[absolute_image_position_x + absolute_image_position_y];
  2. greyImage[absolute_image_position_x + absolute_image_position_y] = channelSum;

should be changed to:

  1. uchar4 rgba = rgbaImage[absolute_image_position_x + absolute_image_position_y*numCols];
  2. greyImage[absolute_image_position_x + absolute_image_position_y*numCols] = channelSum;

Solution 5

libdc1394 error: Failed to initialize libdc1394

I don't think that this is a CUDA problem. libdc1394 is a library used to access IEEE1394 aka FireWire aka iLink video devices (DV camcorders, Apple iSight camera). That library doesn'r properly initialize, hence you're not getting usefull results. Basically it's NINO: Nonsens In Nonsens Out.

Share:
12,032
Ashish Singh
Author by

Ashish Singh

Creator of Angularcasts Youtube Twitter

Updated on June 23, 2022

Comments

  • Ashish Singh
    Ashish Singh almost 2 years

    I am trying to solve a problem in which i am supposed to change a colour image to a greyscale image. For this purpose i am using CUDA parallel approach.

    The kerne code i am invoking on the GPU is as follows.

    __global__
    void rgba_to_greyscale(const uchar4* const rgbaImage,
                       unsigned char* const greyImage,
                       int numRows, int numCols)
    {
        int absolute_image_position_x = blockIdx.x;  
        int absolute_image_position_y = blockIdx.y;
    
      if ( absolute_image_position_x >= numCols ||
       absolute_image_position_y >= numRows )
     {
         return;
     }
    uchar4 rgba = rgbaImage[absolute_image_position_x + absolute_image_position_y];
    float channelSum = .299f * rgba.x + .587f * rgba.y + .114f * rgba.z;
    greyImage[absolute_image_position_x + absolute_image_position_y] = channelSum;
    
    }
    
    void your_rgba_to_greyscale(const uchar4 * const h_rgbaImage,
                                uchar4 * const d_rgbaImage,
                                unsigned char* const d_greyImage,
                                size_t numRows,
                                size_t numCols)
    {
      //You must fill in the correct sizes for the blockSize and gridSize
      //currently only one block with one thread is being launched
      const dim3 blockSize(numCols/32, numCols/32 , 1);  //TODO
      const dim3 gridSize(numRows/12, numRows/12 , 1);  //TODO
      rgba_to_greyscale<<<gridSize, blockSize>>>(d_rgbaImage,
                                                 d_greyImage,
                                                 numRows,
                                                 numCols);
    
      cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError());
    }
    


    i see a line of dots in the first pixel line.

    error i am getting is

    libdc1394 error: Failed to initialize libdc1394
    Difference at pos 51 exceeds tolerance of 5
    Reference: 255
    GPU : 0
    my input/output images Can anyone help me with this??? thanks in advance.

  • Ashish Singh
    Ashish Singh about 11 years
    @datewolf please see i have added a link to input/output image output i am getting.
  • Ashish Singh
    Ashish Singh about 11 years
    what i see is an error at pos 51 exceeds tolernace of 5 so i am guessing if its related to color pattern and not any other linker type error.
  • datenwolf
    datenwolf about 11 years
    @ashish173: It's not a linker problem, it's a runtime problem. The dc1394 library fails to initialize properly upon program startup and will likely produce only garbage when used to retrieve pictures. You must first fix that initialization problem (this is a runtime thing, i.e. something you must code).
  • Ashish Singh
    Ashish Singh about 11 years
    thnks for answering i've figured it out ,i wasn't using any threads that was so stupid of me.
  • alvas
    alvas almost 9 years
    any idea why the blockSize needs to be 24,24 and gridSize numCols/16, numRows/16? Is there a reason why? Can other number work?
  • labheshr
    labheshr over 6 years
    can you explain the formula: pos_x + pos_y * numCols?
  • labheshr
    labheshr over 6 years
    although you may get the right answer, you do this in a very weird way..You pass in columns where rows need to be passed into your gridsize, and your formula for pixel_pos does not tie with the std. way of flattening a 2d array into 1d array...it should either be numRowsy + x, or numColsx+y, but it all works out b/c your gird is set to cols, rows instead of rows, cols
  • labheshr
    labheshr over 6 years
    nevermind: this answered my question stackoverflow.com/questions/2151084/…