colored image to greyscale image using CUDA parallel processing
Solution 1
Now, since I posted this question I have been continuously working on this problem
there are a couple of improvements that should be done in order to get this problem correct now I realize my initial solution was wrong .
Changes to be done:-
1. absolute_position_x =(blockIdx.x * blockDim.x) + threadIdx.x;
2. absolute_position_y = (blockIdx.y * blockDim.y) + threadIdx.y;
Secondly,
1. const dim3 blockSize(24, 24, 1);
2. const dim3 gridSize((numCols/16), (numRows/16) , 1);
In the solution we are using a grid of numCols/16 * numCols/16
and blocksize of 24 * 24
code executed in 0.040576 ms
@datenwolf : thanks for answering above!!!
Solution 2
I recently joined this course and tried your solution but it don't work so, i tried my own. You are almost correct. The correct solution is this:
__global__`
void rgba_to_greyscale(const uchar4* const rgbaImage,
unsigned char* const greyImage,
int numRows, int numCols)
{`
int pos_x = (blockIdx.x * blockDim.x) + threadIdx.x;
int pos_y = (blockIdx.y * blockDim.y) + threadIdx.y;
if(pos_x >= numCols || pos_y >= numRows)
return;
uchar4 rgba = rgbaImage[pos_x + pos_y * numCols];
greyImage[pos_x + pos_y * numCols] = (.299f * rgba.x + .587f * rgba.y + .114f * rgba.z);
}
The rest is same as your code.
Solution 3
Since you are not aware of the image size. It is best to choose any reasonable dimension of the two-dimensional block of threads and then check for two conditions. The first one is that the pos_x
and pos_y
indexes in the kernel do not exceed numRows
and numCols
. Secondly the grid size should be just above the total number of threads in all the blocks.
const dim3 blockSize(16, 16, 1);
const dim3 gridSize((numCols%16) ? numCols/16+1 : numCols/16,
(numRows%16) ? numRows/16+1 : numRows/16, 1);
Solution 4
You still should have a problem with run time - the conversion will not give a proper result.
The lines:
- uchar4 rgba = rgbaImage[absolute_image_position_x + absolute_image_position_y];
- greyImage[absolute_image_position_x + absolute_image_position_y] = channelSum;
should be changed to:
- uchar4 rgba = rgbaImage[absolute_image_position_x + absolute_image_position_y*numCols];
- greyImage[absolute_image_position_x + absolute_image_position_y*numCols] = channelSum;
Solution 5
libdc1394 error: Failed to initialize libdc1394
I don't think that this is a CUDA problem. libdc1394 is a library used to access IEEE1394 aka FireWire aka iLink video devices (DV camcorders, Apple iSight camera). That library doesn'r properly initialize, hence you're not getting usefull results. Basically it's NINO: Nonsens In Nonsens Out.
Comments
-
Ashish Singh almost 2 years
I am trying to solve a problem in which i am supposed to change a colour image to a greyscale image. For this purpose i am using CUDA parallel approach.
The kerne code i am invoking on the GPU is as follows.__global__ void rgba_to_greyscale(const uchar4* const rgbaImage, unsigned char* const greyImage, int numRows, int numCols) { int absolute_image_position_x = blockIdx.x; int absolute_image_position_y = blockIdx.y; if ( absolute_image_position_x >= numCols || absolute_image_position_y >= numRows ) { return; } uchar4 rgba = rgbaImage[absolute_image_position_x + absolute_image_position_y]; float channelSum = .299f * rgba.x + .587f * rgba.y + .114f * rgba.z; greyImage[absolute_image_position_x + absolute_image_position_y] = channelSum; } void your_rgba_to_greyscale(const uchar4 * const h_rgbaImage, uchar4 * const d_rgbaImage, unsigned char* const d_greyImage, size_t numRows, size_t numCols) { //You must fill in the correct sizes for the blockSize and gridSize //currently only one block with one thread is being launched const dim3 blockSize(numCols/32, numCols/32 , 1); //TODO const dim3 gridSize(numRows/12, numRows/12 , 1); //TODO rgba_to_greyscale<<<gridSize, blockSize>>>(d_rgbaImage, d_greyImage, numRows, numCols); cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError()); }
i see a line of dots in the first pixel line.error i am getting is
libdc1394 error: Failed to initialize libdc1394
Difference at pos 51 exceeds tolerance of 5
Reference: 255
GPU : 0
my input/output images Can anyone help me with this??? thanks in advance. -
Ashish Singh about 11 years@datewolf please see i have added a link to input/output image output i am getting.
-
Ashish Singh about 11 yearswhat i see is an error at pos 51 exceeds tolernace of 5 so i am guessing if its related to color pattern and not any other linker type error.
-
datenwolf about 11 years@ashish173: It's not a linker problem, it's a runtime problem. The dc1394 library fails to initialize properly upon program startup and will likely produce only garbage when used to retrieve pictures. You must first fix that initialization problem (this is a runtime thing, i.e. something you must code).
-
Ashish Singh about 11 yearsthnks for answering i've figured it out ,i wasn't using any threads that was so stupid of me.
-
alvas almost 9 yearsany idea why the blockSize needs to be
24,24
and gridSizenumCols/16, numRows/16
? Is there a reason why? Can other number work? -
labheshr over 6 yearscan you explain the formula: pos_x + pos_y * numCols?
-
labheshr over 6 yearsalthough you may get the right answer, you do this in a very weird way..You pass in columns where rows need to be passed into your gridsize, and your formula for pixel_pos does not tie with the std. way of flattening a 2d array into 1d array...it should either be numRowsy + x, or numColsx+y, but it all works out b/c your gird is set to cols, rows instead of rows, cols
-
labheshr over 6 yearsnevermind: this answered my question stackoverflow.com/questions/2151084/…