How to launch custom OpenCL kernel in OpenCV (3.0.0) OCL?

11,045

Although I am not 100% sure, I figured out a way to do this. This example contains tips on how to pass/retrieve data to/from a custom kernel using cv::UMat, basic types (e.g. int/float/uchar), and Image2D.

#include <iostream>
#include <fstream>
#include <string>
#include <iterator>
#include <opencv2/opencv.hpp>
#include <opencv2/core/ocl.hpp>

using namespace std;

void main()
{
    if (!cv::ocl::haveOpenCL())
    {
        cout << "OpenCL is not avaiable..." << endl;
        return;
    }
    cv::ocl::Context context;
    if (!context.create(cv::ocl::Device::TYPE_GPU))
    {
        cout << "Failed creating the context..." << endl;
        return;
    }

    // In OpenCV 3.0.0 beta, only a single device is detected.
    cout << context.ndevices() << " GPU devices are detected." << endl;
    for (int i = 0; i < context.ndevices(); i++)
    {
        cv::ocl::Device device = context.device(i);
        cout << "name                 : " << device.name() << endl;
        cout << "available            : " << device.available() << endl;
        cout << "imageSupport         : " << device.imageSupport() << endl;
        cout << "OpenCL_C_Version     : " << device.OpenCL_C_Version() << endl;
        cout << endl;
    }

    // Select the first device
    cv::ocl::Device(context.device(0));

    // Transfer Mat data to the device
    cv::Mat mat_src = cv::imread("Lena.png", cv::IMREAD_GRAYSCALE);
    mat_src.convertTo(mat_src, CV_32F, 1.0 / 255);
    cv::UMat umat_src = mat_src.getUMat(cv::ACCESS_READ, cv::USAGE_ALLOCATE_DEVICE_MEMORY);
    cv::UMat umat_dst(mat_src.size(), CV_32F, cv::ACCESS_WRITE, cv::USAGE_ALLOCATE_DEVICE_MEMORY);

    std::ifstream ifs("shift.cl");
    if (ifs.fail()) return;
    std::string kernelSource((std::istreambuf_iterator<char>(ifs)), std::istreambuf_iterator<char>());
    cv::ocl::ProgramSource programSource(kernelSource);

    // Compile the kernel code
    cv::String errmsg;
    cv::String buildopt = cv::format("-D dstT=%s", cv::ocl::typeToStr(umat_dst.depth())); // "-D dstT=float"
    cv::ocl::Program program = context.getProg(programSource, buildopt, errmsg);

    cv::ocl::Image2D image(umat_src);
    float shift_x = 100.5;
    float shift_y = -50.0;
    cv::ocl::Kernel kernel("shift", program);
    kernel.args(image, shift_x, shift_y, cv::ocl::KernelArg::ReadWrite(umat_dst));

    size_t globalThreads[3] = { mat_src.cols, mat_src.rows, 1 };
    //size_t localThreads[3] = { 16, 16, 1 };
    bool success = kernel.run(3, globalThreads, NULL, true);
    if (!success){
        cout << "Failed running the kernel..." << endl;
        return;
    }

    // Download the dst data from the device (?)
    cv::Mat mat_dst = umat_dst.getMat(cv::ACCESS_READ);

    cv::imshow("src", mat_src);
    cv::imshow("dst", mat_dst);
    cv::waitKey();
}

Below is a "shift.cl" file.

__constant sampler_t samplerLN = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_LINEAR;
__kernel void shift(
   __global const image2d_t src,
   float shift_x,
   float shift_y,
   __global uchar* dst,
   int dst_step, int dst_offset, int dst_rows, int dst_cols)
{
   int x = get_global_id(0);
   int y = get_global_id(1);
   if (x >= dst_cols) return;
   int dst_index = mad24(y, dst_step, mad24(x, (int)sizeof(dstT), dst_offset));
   __global dstT *dstf = (__global dstT *)(dst + dst_index);
   float2 coord = (float2)((float)x+0.5f+shift_x, (float)y+0.5f+shift_y);
   dstf[0] = (dstT)read_imagef(src, samplerLN, coord).x;
}

The point is the use of UMat. We recieve 5 parameters in the kernel (*data_ptr, int step, int offset, int rows, int cols) with KernelArg::ReadOnly(umat); 3 (*data_ptr, int step, int offset) with KernelArg::ReadOnlyNoSize(umat); and only 1 (*data_prt) with KernelArg::PtrReadOnly(umat). This rule is the same for WriteOnly and ReadWrite.

The step and offset are required when accessing the data array, since UMat may not be dense matrix due to the memory-address alignment.

cv::ocl::Image2D can be constructed from an UMat instance, and can be directly passed to kernel.args(). With image2D_t and sampler_t, we can benefit from GPU's hardware texture-units for linear-interpolation sampling (with real-valued pixel coordinates).

Note that the "-D xxx=yyy " build-option offers text replacement from xxx to yyy in the kernel code.

You can find more codes at my post: http://qiita.com/tackson5/items/8dac6b083071d31baf00

Share:
11,045
Mickael Caruso
Author by

Mickael Caruso

Enthusiast Programmer working in mostly C#, the .net Framework, SQL Server, Flex/Actionscript, and (new to) Android.

Updated on June 15, 2022

Comments

  • Mickael Caruso
    Mickael Caruso almost 2 years

    I'm probably misusing OpenCV by using it as wrapper to the official OpenCL C++ bindings so that I can launch my own kernels.

    However, OpenCV does have classes like Program, ProgramSource, Kernel, Queue, etc. that seem to tell me that I can launch my own (even non-image-based) kernels with OpenCV. I am having trouble finding documentation out there for these classes, let alone examples. So, I took a stab at it so far:

    #include <fstream>
    #include <iostream>
    
    #include "opencv2/opencv.hpp"
    #include "opencv2/core/ocl.hpp"
    
    #define ARRAY_SIZE 128
    
    using namespace std;
    using namespace cv;
    
    int main(int, char)
    {
        std::ifstream file("kernels.cl");
        std::string kcode(std::istreambuf_iterator<char>(file),
            (std::istreambuf_iterator<char>()));
    
        cv::ocl::ProgramSource * programSource;
        programSource = new cv::ocl::ProgramSource(kcode.c_str());
    
        cv::String errorMessage;
        cv::ocl::Program * program;
        program = new cv::ocl::Program(*programSource, NULL, errorMessage);
    
        cv::ocl::Kernel * kernel;
        kernel = new cv::ocl::Kernel("simple_add", *program);
        /* I'm stuck here at the args. */
    
        size_t globalSize[2] = { ARRAY_SIZE, 1 };
        size_t localSize[2] = { ARRAY_SIZE, 1 };    
        kernel->run(ARRAY_SIZE, globalSize, localSize, true);
    
        return 0;
    }
    

    Note that I haven't set up my host variables yet. I'm stuck at kernel->args(...). There are 15 overloads and none of them specify what order I should specify the following, per argument:

    1. The parameter index, so I manually match the parameter in the order given in the kernel.
    2. The host variable itself.
    3. The host variable's array size - typically I say something like sizeof(int) * ARRAY_SIZE, though I used to specify that on the clEnqueueWriteBuffer function in plain OpenCL.
    4. Device buffer memory access, for example CL_MEM_READ_ONLY

    It doesn't look like I call enqueueWriteBufer(...), enqueueNDRangeKernel(...), or enqueueReadBuffer(...) because (I guess) the kernel->run() does all of that for me under the hood. I assume that kernel->run() will write the new values to my output parameter.

    I didn't specify a command queue, device, or context. I think that there is only one command queue and one context, and the default device - all created under-the-hood and are accessible from these classes.

    So again, how do I use the args function of the kernel?

  • max0r
    max0r over 7 years
    I can't compile your OpenCL kernel. error: parameter may not be qualified with an address space: __global const image2d_t src My OCL device is an Intel Iris GPU. Any suggestions?
  • Catree
    Catree over 7 years
    @max0r In my case, I solved the issue by replacing: __global const image2d_t src by read_only image2d_t src. Not sure if it is the correct way as I am starting to learn OpenCL.
  • WY Hsu
    WY Hsu over 6 years
    @Catree How do you know which input argument we should use? Any official document here? thx in advance
  • Catree
    Catree over 6 years
    @WeiYuangHsu You can find some references on the OpenCL site. There is a cheat sheet and the reference guide for OpenCL 2.2 here.
  • WY Hsu
    WY Hsu over 6 years
    @Catree Thx for your kindly reply!
  • Tomáš Zato
    Tomáš Zato over 3 years
    error: use of undeclared identifier 'dstT'; did you mean 'dst'? I cannot get past this. I don't know OpenCL at all, but I really wanted to try something afdter 6+hours of setup and compiling