colored image to greyscale image using CUDA parallel processing

image-processing cuda parallel-processing gpu

12,032

Solution 1

Now, since I posted this question I have been continuously working on this problem
there are a couple of improvements that should be done in order to get this problem correct now I realize my initial solution was wrong .
Changes to be done:-

 1. absolute_position_x =(blockIdx.x * blockDim.x) + threadIdx.x;
 2. absolute_position_y = (blockIdx.y * blockDim.y) + threadIdx.y;

Secondly,

 1. const dim3 blockSize(24, 24, 1);
 2. const dim3 gridSize((numCols/16), (numRows/16) , 1);

In the solution we are using a grid of numCols/16 * numCols/16
and blocksize of 24 * 24

code executed in 0.040576 ms

@datenwolf : thanks for answering above!!!

Solution 2

I recently joined this course and tried your solution but it don't work so, i tried my own. You are almost correct. The correct solution is this:

__global__`
void rgba_to_greyscale(const uchar4* const rgbaImage,
               unsigned char* const greyImage,
               int numRows, int numCols)
{`

int pos_x = (blockIdx.x * blockDim.x) + threadIdx.x;
int pos_y = (blockIdx.y * blockDim.y) + threadIdx.y;
if(pos_x >= numCols || pos_y >= numRows)
    return;

uchar4 rgba = rgbaImage[pos_x + pos_y * numCols];
greyImage[pos_x + pos_y * numCols] = (.299f * rgba.x + .587f * rgba.y + .114f * rgba.z); 

}

The rest is same as your code.

Solution 3

Since you are not aware of the image size. It is best to choose any reasonable dimension of the two-dimensional block of threads and then check for two conditions. The first one is that the pos_x and pos_y indexes in the kernel do not exceed numRows and numCols. Secondly the grid size should be just above the total number of threads in all the blocks.

const dim3 blockSize(16, 16, 1);
const dim3 gridSize((numCols%16) ? numCols/16+1 : numCols/16,
(numRows%16) ? numRows/16+1 : numRows/16, 1);

Solution 4

You still should have a problem with run time - the conversion will not give a proper result.

The lines:

uchar4 rgba = rgbaImage[absolute_image_position_x + absolute_image_position_y];
greyImage[absolute_image_position_x + absolute_image_position_y] = channelSum;

should be changed to:

uchar4 rgba = rgbaImage[absolute_image_position_x + absolute_image_position_y*numCols];
greyImage[absolute_image_position_x + absolute_image_position_y*numCols] = channelSum;

Solution 5

libdc1394 error: Failed to initialize libdc1394

I don't think that this is a CUDA problem. libdc1394 is a library used to access IEEE1394 aka FireWire aka iLink video devices (DV camcorders, Apple iSight camera). That library doesn'r properly initialize, hence you're not getting usefull results. Basically it's NINO: Nonsens In Nonsens Out.

View more solutions

12,032

Author by

Ashish Singh

Creator of Angularcasts Youtube Twitter

Updated on June 23, 2022

Comments

Ashish Singh almost 2 years

I am trying to solve a problem in which i am supposed to change a colour image to a greyscale image. For this purpose i am using CUDA parallel approach.

The kerne code i am invoking on the GPU is as follows.

__global__
void rgba_to_greyscale(const uchar4* const rgbaImage,
                   unsigned char* const greyImage,
                   int numRows, int numCols)
{
    int absolute_image_position_x = blockIdx.x;  
    int absolute_image_position_y = blockIdx.y;

  if ( absolute_image_position_x >= numCols ||
   absolute_image_position_y >= numRows )
 {
     return;
 }
uchar4 rgba = rgbaImage[absolute_image_position_x + absolute_image_position_y];
float channelSum = .299f * rgba.x + .587f * rgba.y + .114f * rgba.z;
greyImage[absolute_image_position_x + absolute_image_position_y] = channelSum;

}

void your_rgba_to_greyscale(const uchar4 * const h_rgbaImage,
                            uchar4 * const d_rgbaImage,
                            unsigned char* const d_greyImage,
                            size_t numRows,
                            size_t numCols)
{
  //You must fill in the correct sizes for the blockSize and gridSize
  //currently only one block with one thread is being launched
  const dim3 blockSize(numCols/32, numCols/32 , 1);  //TODO
  const dim3 gridSize(numRows/12, numRows/12 , 1);  //TODO
  rgba_to_greyscale<<<gridSize, blockSize>>>(d_rgbaImage,
                                             d_greyImage,
                                             numRows,
                                             numCols);

  cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError());
}

i see a line of dots in the first pixel line.

error i am getting is

libdc1394 error: Failed to initialize libdc1394
Difference at pos 51 exceeds tolerance of 5
Reference: 255
GPU : 0
my input/output images Can anyone help me with this??? thanks in advance.

Ashish Singh about 11 years

@datewolf please see i have added a link to input/output image output i am getting.
Ashish Singh about 11 years

what i see is an error at pos 51 exceeds tolernace of 5 so i am guessing if its related to color pattern and not any other linker type error.
datenwolf about 11 years

@ashish173: It's not a linker problem, it's a runtime problem. The dc1394 library fails to initialize properly upon program startup and will likely produce only garbage when used to retrieve pictures. You must first fix that initialization problem (this is a runtime thing, i.e. something you must code).
Ashish Singh about 11 years

thnks for answering i've figured it out ,i wasn't using any threads that was so stupid of me.
alvas almost 9 years

any idea why the blockSize needs to be 24,24 and gridSize numCols/16, numRows/16? Is there a reason why? Can other number work?
labheshr over 6 years

can you explain the formula: pos_x + pos_y * numCols?
labheshr over 6 years

although you may get the right answer, you do this in a very weird way..You pass in columns where rows need to be passed into your gridsize, and your formula for pixel_pos does not tie with the std. way of flattening a 2d array into 1d array...it should either be numRowsy + x, or numColsx+y, but it all works out b/c your gird is set to cols, rows instead of rows, cols
labheshr over 6 years

nevermind: this answered my question stackoverflow.com/questions/2151084/…