What are the downsides of convolution by FFT compared to realspace convolution?

numpy scipy signal-processing fft convolution

16,172

Solution 1

FFT convolutions are based on the convolution theorem, which states that given two functions f and g, if Fd() and Fi() denote the direct and inverse Fourier transform, and * and . convolution and multiplication, then:

f*g = Fi(Fd(d).Fd(g))

To apply this to a signal f and a kernel g, there are some things you need to take care of:

f and g have to be of the same size for the multiplication step to be possible, so you need to zero-pad the kernel (or input, if the kernel is longer than it).
When doing a DFT, which is what FFT does, the resulting frequency domain representation of the function is periodic. This means that, by default, your kernel wraps around the edge when doing the convolution. If you want this, then all is great. But if not, you have to add an extra zero-padding the size of the kernel to avoid it.
Most (all?) FFT packages only work well (performance-wise) with sizes that do not have any large prime factors. Rounding the signal and kernel size up to the next power of two is a common practice that may result in a (very) significant speed-up.

If your signal and kernel sizes are f_l and g_l, doing a straightforward convolution in time domain requires g_l * (f_l - g_l + 1) multiplications and (g_l - 1) * (f_l - g_l + 1) additions.

For the FFT approach, you have to do 3 FFTs of size at least f_l + g_l, as well as f_l + g_l multiplications.

For large sizes of both f and g, the FFT is clearly superior with its n*log(n) complexity. For small kernels, the direct approach may be faster.

scipy.signal has both convolve and fftconvolve methods for you to play around. And fftconvolve handles all the padding described above transparently for you.

Solution 2

While fast convolution has better "big O" complexity than direct form convolution; there are a few drawbacks or caveats. I did some thinking about this topic for an article I wrote a while back.

Better "big O" complexity is not always better. Direct form convolution can be faster than using FFTs for filters smaller than a certain size. The exact size depends on the platform and implementations used. The crossover point is usually in the 10-40 coefficient range.
Latency. Fast convolution is inherently a blockwise algorithm. Queueing up hundreds or thousands of samples at a time before transforming them may be unacceptable for some real-time applications.
Implementation complexity. Direct form is simpler in terms of the memory, code space and in the theoretical background of the writer/maintainer.
On a fixed point DSP platform (not a general purpose CPU): the limited word size considerations of fixed-point FFT make large fixed point FFTs nearly useless. At the other end of the size spectrum, these chips have specialized MAC intstructions that are well designed for performing direct form FIR computation, increasing the range over which te O(N^2) direct form is faster than O(NlogN). These factors tend to create a limited "sweet spot" where fixed point FFTs are useful for Fast Convolution.

16,172

ABDreverhaven

Updated on June 06, 2022

Comments

ABDreverhaven almost 2 years

So I am aware that a convolution by FFT has a lower computational complexity than a convolution in real space. But what are the downsides of an FFT convolution?

Does the kernel size always have to match the image size, or are there functions that take care of this, for example in pythons numpy and scipy packages? And what about anti-aliasing effects?
- Paul R over 10 years
  
  Note that convolution in the frequency domain is only more efficient once the kernel exceeds a certain size. For relatively small kernels direct convolution is more efficient.
J. Martinot-Lagarde over 10 years

"Most (all?) FFT packages only work well with sizes that do not have any large prime factors." : precise that you're talking about speed only, the computation itself is no problem (in numpy and scipy, at least).
Jaime over 10 years

Yes indeed, added a note in my answer. The typical FFT algorithm breaks down a DFT of size N = a*b into a FFTs of size b recursively. If N is prime, you need more sophisticated algorithms to speed things up. On my PC: %timeit numpy.fft.fft(np.random.rand(1024)); 10000 loops, best of 3: 36.2 us per loop; %timeit numpy.fft.fft(np.random.rand(1021)); 100 loops, best of 3: 2.15 ms per loop, a x100 slowdown, which seems to indicate that the prime-sized DFT is computed directly, with no acceleration.
J. Martinot-Lagarde over 10 years

It looks like FFTW has acceleration: "Sizes with small prime factors are best, but FFTW uses O(N log N) algorithms even for prime sizes.". Is supports multiprocessing, too. It is usable from python with pyfftw.
xioxox almost 4 years

Note that the kernel doesn't have to be zero-padded to the same size as the signal if using the "overlap-add" or "overlap-save" methods, which split the input signal into multiple pieces for processing.
Mattia Surricchio almost 3 years

I am really interested in the 4th point. Could you please elaborate "the limited word size considerations of fixed-point FFT make large fixed point FFTs nearly useless"? What do you mean?
Dave Neary almost 3 years

At each of the log2(N) FFT stages, a fixed point sum can either be scaled down or risk saturation. Either approach risks losing information -- either low or high magnitude sinusoids, respectively. Some libs alternate approaches between stages to hedge the losses. Regardless, the information loss increases with more stages (i.e. bigger FFTs).