Fastest method for calculating convolution

14,747

Solution 1

If your kernel is separable, the greatest speed gains will be realized by performing multiple sequential 1D convolutions.

Steve Eddins of MathWorks describes how to take advantage of the associativity of convolution to speed up convolution when the kernel is separable in a MATLAB context on his blog. For a P-by-Q kernel, the computational advantage of performing two separate and sequential convolutions vs. 2D convolution is PQ/(P+Q), which corresponds to 4.5x for a 9x9 kernel and ~11x for a 15x15 kernel. EDIT: An interesting unwitting demonstration of this difference was given in this Q&A.

To figure out if the kernel is separable (i.e. the outer product of two vectors) the blog goes on to describe how to check if your kernel is separable with SVD and how to get the 1D kernels. Their example is for a 2D kernel. For a solution for N-dimensional separable convolution, check this FEX submission.


Another resource worth pointing out is this SIMD (SSE3/SSE4) implementation of 3D convolution by Intel, which includes both source and a presentation. The code is for 16 bit integers. Unless you move to GPU (e.g. cuFFT), it is probably hard to get faster than Intel's implementations, which also includes Intel MKL. There is an example of 3D convolution (single-precision float) at the bottom of this page of the MKL documentation (link fixed, now mirrored in https://stackoverflow.com/a/27074295/2778484).

Solution 2

You could try the overlap-add and overlap-save methods. They involve breaking up your input signal into smaller chunks and then using either of the above methods.

An FFT is most likely - and I could be wrong - the fastest method, especially if you are using built-in routines in MATLAB or a library in C++. Apart from that, breaking the input signal into smaller chunks should be a good bet.

Solution 3

i have 2 way to calc fastconv

and 2 betther than 1

1- armadillo you can use armadillo library for calcing conv with this code

cx_vec signal(1024,fill::randn);
cx_vec code(300,fill::randn);
cx_vec ans = conv(signal,code);

2-use fftw ans sigpack and armadillo library for calcing fast conv in this way you must init fft of your code in constructor

FastConvolution::FastConvolution(cx_vec inpCode)
{
    filterCode = inpCode;
    fft_w = NULL;
}


cx_vec FastConvolution::filter(cx_vec inpData)
{
int length = inpData.size()+filterCode.size();
    if((length & (length - 1)) == 0)
    {

    }
    else
    {
        length = pow(2 , (int)log2(length) + 1);
    }
    if(length != fftCode.size())
        initCode(length);

    static cx_vec zeroPadedData;
    if(length!= zeroPadedData.size())
    {
        zeroPadedData.resize(length);
    }
    zeroPadedData.fill(0);
    zeroPadedData.subvec(0,inpData.size()-1) = inpData;


    cx_vec fftSignal = fft_w->fft_cx(zeroPadedData);
    cx_vec mullAns = fftSignal % fftCode;
    cx_vec ans = fft_w->ifft_cx(mullAns);
    return ans.subvec(filterCode.size(),inpData.size()+filterCode.size()-1);
}

void FastConvolution::initCode(int length)
{
    if(fft_w != NULL)
    {
        delete fft_w;
    }
    fft_w = new sp::FFTW(length,FFTW_ESTIMATE);
    cx_vec conjCode(length,fill::zeros);
    fftCode.resize(length);
    for(int i = 0; i < filterCode.size();i++)
    {
        conjCode.at(i) = filterCode.at(filterCode.size() - i - 1);
    }
    conjCode = conj(conjCode);
    fftCode = fft_w->fft_cx(conjCode);
}
Share:
14,747
Nicole
Author by

Nicole

Updated on July 24, 2022

Comments

  • Nicole
    Nicole over 1 year

    Anybody know about the fastest method for calculating convolution? Unfortunately the matrix which I deal with is very large (500x500x200) and if I use convn in MATLAB it takes a long time (I have to iterate this calculation in a nested loop). So, I used convolution with FFT and it is faster now. But, I am still looking for a faster method. Any idea?

  • Nicole
    Nicole over 10 years
    Since I want to use convolution for pattern matching, I think breaking down the matrix will be questionable !
  • blackbird
    blackbird over 10 years
    @Nicole If you can use the Signal Processing Toolbox, fftfilt should be able to do the heavy lifting for you. mathworks.de/de/help/signal/ref/fftfilt.html
  • Nicole
    Nicole over 10 years
    @blackbird: But how I can use that command instead of convn in matlab? Suppose I have a = rand(500,500,100) and b = rand(20,20,20)
  • blackbird
    blackbird over 10 years
    @Nicole I'm sorry, I failed to notice that you want to compute three-dimensional convolution. I have to pass on that one.
  • JustinBlaber
    JustinBlaber over 10 years
    Just as an interesting aside, the function imfilter actually does this implicitly. It takes a 2d array for the kernel but will check to see if it is separable before applying the filter. Also, as already mentioned, if you're doing circular convolutions FFT will be fast as well.
  • chappjc
    chappjc over 10 years
    @jucestain That's an excellent point. I have noted this before as the reason why imfilter is faster when called in a loop if you are trying to filter a stack of 2D images each with the same 2D kernel instead of giving it the stack of images even though it supports doing that. If it detects 3D data, it declares the kernel non-separable even if the 2D kernel is separable (feature or bug?).
  • Nicole
    Nicole over 10 years
    Unfortunately my matrix seems to be un-separatable !! and cannot use that function.
  • chappjc
    chappjc over 10 years
    @Nicole What is the kernel? The data matrix does not have to be anything in particular, just the kernel.