Finding max value in CUDA
Solution 1
This is a purely a reduction problem. Here's a good presentation by NVIDIA for optimizing reduction on GPUs. You can use the same technique to either find the minimum, maximum or sum of all elements.
Solution 2
The link for Thrust library is broken.
If anyone finds it useful to use it in this case, you can find the documentation here:
Thrust, extrema reductions
kar
Updated on June 04, 2022Comments
-
kar almost 2 years
I am trying to write a code in CUDA for finding the max value for the given set of numbers.
Assume you have 20 numbers, and the kernel is running on 2 blocks of 5 threads. Now assume the 10 threads compare the first 10 values at the same time, and thread 2 finds a max value, so thread 2 is updating the max value variable in global memory. While thread 2 is updating, what will happen to the remaining threads (1,3-10) that will be comparing using the old value?
If I lock the global variable using atomicCAS(), will the threads (1,3-10) compare using the old max value? How can I overcome this problem?