Convolve2d just by using Numpy

python numpy image-processing matrix convolution

68,341

Solution 1

You could generate the subarrays using as_strided:

import numpy as np

a = np.array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24]])

sub_shape = (3,3)
view_shape = tuple(np.subtract(a.shape, sub_shape) + 1) + sub_shape
strides = a.strides + a.strides

sub_matrices = np.lib.stride_tricks.as_strided(a,view_shape,strides)

To get rid of your second "ugly" sum, alter your einsum so that the output array only has j and k. This implies your second summation.

conv_filter = np.array([[0,-1,0],[-1,5,-1],[0,-1,0]])
m = np.einsum('ij,ijkl->kl',conv_filter,sub_matrices)

# [[ 6  7  8]
#  [11 12 13]
#  [16 17 18]]

Solution 2

Cleaned up using as_strided and @Crispin 's einsum trick from above. Enforces the filter size into the expanded shape. Should even allow non-square inputs if the indices are compatible.

def conv2d(a, f):
    s = f.shape + tuple(np.subtract(a.shape, f.shape) + 1)
    strd = numpy.lib.stride_tricks.as_strided
    subM = strd(a, shape = s, strides = a.strides * 2)
    return np.einsum('ij,ijkl->kl', f, subM)

Solution 3

You can also use fft (one of the faster methods to perform convolutions)

from numpy.fft import fft2, ifft2
import numpy as np

def fft_convolve2d(x,y):
    """ 2D convolution, using FFT"""
    fr = fft2(x)
    fr2 = fft2(np.flipud(np.fliplr(y)))
    m,n = fr.shape
    cc = np.real(ifft2(fr*fr2))
    cc = np.roll(cc, -m/2+1,axis=0)
    cc = np.roll(cc, -n/2+1,axis=1)
    return cc

https://gist.github.com/thearn/5424195
you must pad the filter to be the same size as image ( place it in the middle of a zeros_like mat.)

cheers, Dan

68,341

Allosteric

Updated on May 11, 2022

Comments

Allosteric almost 2 years
I am studying image-processing using Numpy and facing a problem with filtering with convolution.

I would like to convolve a gray-scale image. (convolve a 2d Array with a smaller 2d Array)

Does anyone have an idea to refine my method ?

I know that scipy supports convolve2d but I want to make a convolve2d only by using Numpy.

What I have done

First, I made a 2d array the submatrices.
```
a = np.arange(25).reshape(5,5) # original matrix

submatrices = np.array([
     [a[:-2,:-2], a[:-2,1:-1], a[:-2,2:]],
     [a[1:-1,:-2], a[1:-1,1:-1], a[1:-1,2:]],
     [a[2:,:-2], a[2:,1:-1], a[2:,2:]]])
```
the submatrices seems complicated but what I am doing is shown in the following drawing.

Next, I multiplied each submatrices with a filter.
```
conv_filter = np.array([[0,-1,0],[-1,4,-1],[0,-1,0]])
multiplied_subs = np.einsum('ij,ijkl->ijkl',conv_filter,submatrices)
```
and summed them.
```
np.sum(np.sum(multiplied_subs, axis = -3), axis = -3)
#array([[ 6,  7,  8],
#       [11, 12, 13],
#       [16, 17, 18]])
```
Thus this procudure can be called my convolve2d.
```
def my_convolve2d(a, conv_filter):
    submatrices = np.array([
         [a[:-2,:-2], a[:-2,1:-1], a[:-2,2:]],
         [a[1:-1,:-2], a[1:-1,1:-1], a[1:-1,2:]],
         [a[2:,:-2], a[2:,1:-1], a[2:,2:]]])
    multiplied_subs = np.einsum('ij,ijkl->ijkl',conv_filter,submatrices)
    return np.sum(np.sum(multiplied_subs, axis = -3), axis = -3)
```
However, I find this my_convolve2d troublesome for 3 reasons.
1. Generation of the submatrices is too awkward that is difficult to read and can only be used when the filter is 3*3
2. The size of the varient submatrices seems to be too big, since it is approximately 9 folds bigger than the original matrix.
3. The summing seems a little non intuitive. Simply said, ugly.
Thank you for reading this far.

Kind of update. I wrote a conv3d for myself. I will leave this as a public domain.
```
def convolve3d(img, kernel):
    # calc the size of the array of submatracies
    sub_shape = tuple(np.subtract(img.shape, kernel.shape) + 1)

    # alias for the function
    strd = np.lib.stride_tricks.as_strided

    # make an array of submatracies
    submatrices = strd(img,kernel.shape + sub_shape,img.strides * 2)

    # sum the submatraces and kernel
    convolved_matrix = np.einsum('hij,hijklm->klm', kernel, submatrices)

    return convolved_matrix
```
- Marijn van Vliet about 7 years
  
  thank you for providing drawings of the matrices :) If I understand correctly, you want tips on how to make your solution more elegant?
- Allosteric about 7 years
  
  Glad it helps! Yes. I would be grateful if you can provide me tips to overcome the 3 problems written in the very last lines.
- Allosteric about 7 years
  
  I should add that the 3 points are rather arranged in a priority order. The first one is quite important for me and the last one seems kinda trivial. I will also be glad if there are other problems and refinements about it.
- Andreas K. over 5 years
  
  Isn't the second drawing (after the equality sign) wrong? Shouldn't each submatrix be multiplied (element-wise) with the filter, and then the elements of each of the resulting submatrices summed?
- Allosteric over 5 years
  
  @AndyK They will produce the same result.
NaN about 7 years

if a_s is the strided array and filter is your laplacian like filter, then try... np.sum(a_s*filter, axis=(2,3)) if indeed your answer is array([[ 6, 7, 8], [11, 12, 13], [16, 17, 18]])
NaN about 7 years

simplify further... see my comment below... np.sum(a_s * filter, axis=(2,3)) if indeed your answer is array([[ 6, 7, 8], [11, 12, 13], [16, 17, 18]]) ... where a_s is strided array and filter is the 3x3 filter
Allosteric about 7 years

Thank you for the tip. I am trying this myself right now. Maybe trivial but I believe that the name filter is not appropriate because it is a built-in function of python.
Daniel F about 7 years

Not sure why that works @NaN since it is certainly not doing what the problem asked - but it does, even for arbitrary a matrices
Crispin about 7 years

You can do the summation directly in the Einstein summation. See answer
Allosteric about 7 years

In this question, isn't it better to write, sub_shape = conv_filter.shape ?
Daniel F about 7 years

It doesn't seem that it would necessarily be the case. conv_filter.shape should be a.shape - sub_shape + 1. It just so happens in your example a.shape = [5, 5] and sub_shape = (3, 3) so conv_filter.shape == sub_shape is true but not strictly enforced. If you want it to be enforced, then you should set sub_shape = a.shape - conv_filter.shape + 1
Allosteric about 7 years

Got it! So view_shape = conv_filter.shape?
Daniel F about 7 years

well view_shape should be a 4D tuple, so view.shape = tuple(conv_filter.shape) + tuple(sub_shape), so you still need to find sub_shape
Allosteric about 7 years

at least at numpy v12, a.shape and f.shape is a tuple so s should be tuple(np.subtract(a.shape, f.shape)+1), I think.
Daniel F about 7 years

True! Still used to shape giving an ndarray. Fixed that.
michaelsnowden over 6 years

Follow-up question here: stackoverflow.com/questions/45580013/…
sirgogo about 6 years

Hi @DanielF, is there a generalization of this that works for RGB? Not super familiar with einsum notation, so an idea how to generalize would be awesome.
Daniel F about 6 years

@sirgogo, I suggest asking a seperate question linking this answer. There are people watching numpy way better at multidimensional convlcutions than I am.
sirgogo almost 6 years

@DanielF, thanks, here is the more general version of what I'm trying to do: [stackoverflow.com/questions/50239641/…
Foad S. Farimani over 5 years

I have a similar issue for higher dimensions. I was wondering if you could be so kind to take a look at my question here
Cris Luengo about 4 years

It is certainly not efficient to use the FFT for convolution if the kernel has only 9 elements. This is only useful for large kernels.
Dan Erez about 4 years

True. It depends on the kernel and image size but the threshold for fft to outperform is very low. It also depends on your computer architecture but for most generic use-cases fft is the go to in high performance libraries.
Naman Bansal over 3 years

The following article compares the performance of various image convolution methods, and thus, we can conclude that the FFT method of Numpy is the fastest method to perform convolution. laurentperrinet.github.io/sciblog/posts/…
Cris Luengo over 3 years

This code is not correct. It will shift the output image. The larger the kernel, the larger the shift.
Naman Bansal over 3 years

Can you elaborate on how will it shift the output image?
Cris Luengo over 3 years

stackoverflow.com/questions/54877892/…
Michael Kopp over 2 years

Is using einsum("ij,klij", ...) sufficient/better? Using the code from this answer with input of different shape, I got operands could not be broadcast together with remapped shapes, using "ij,klij" seems to work without issues.
OuttaSpaceTime over 2 years

is the kernel flipped here or is it without flipping?