Calculating input and output size for Conv2d in PyTorch for image classification

23,626

Solution 1

You have to shape your input to this format (Batch, Number Channels, height, width). Currently you have format (B,H,W,C) (4, 32, 32, 3), so you need to swap 4th and 2nd axis to shape your data with (B,C,H,W). You can do it this way:

inputs, labels = Variable(inputs), Variable(labels)
inputs = inputs.transpose(1,3)
... the rest

Solution 2

I know it is an old question, but I stumbled upon this again when working with non-standard kernel sizes, dilations, etc. Here is a function I came up with, which does the calculation for me and checks for a given output shape:

def find_settings(shape_in, shape_out, kernel_sizes, dilation_sizes, padding_sizes, stride_sizes, transpose=False):
    from itertools import product

    import torch
    from torch import nn

    import numpy as np

    # Fake input
    x_in = torch.tensor(np.random.randn(4, 1, shape_in, shape_in), dtype=torch.float)

    # Grid search through all combinations
    for kernel, dilation, padding, stride in product(kernel_sizes, dilation_sizes, padding_sizes, stride_sizes):
        # Define a layer
        if transpose:
            layer = nn.ConvTranspose2d
        else:
            layer = nn.Conv2d
        layer = layer(
                1, 1,
                (4, kernel),
                stride=(2, stride),
                padding=(2, padding),
                dilation=(2, dilation)
            )

        # Check if layer is valid for given input shape
        try:
            x_out = layer(x_in)
        except Exception:
            continue

        # Check for shape of out tensor
        result = x_out.shape[-1]

        if shape_out == result:
            print('Correct shape for:\n ker: {}\n dil: {}\n pad: {}\n str: {}\n'.format(kernel, dilation, padding, stride))

Here is an example usage of it:

transpose = True
shape_in = 128
shape_out = 1024


kernel_sizes = [3, 4, 5, 7, 9, 11]
dilation_sizes = list(range(1, 20))
padding_sizes = list(range(15))
stride_sizes = list(range(4, 16))
find_settings(shape_in, shape_out, kernel_sizes, dilation_sizes, padding_sizes, stride_sizes, transpose)

I hope it can help people in the future with this problem. Note that it's not parallelized, and if given a lot of choices it can run for a while.

Solution 3

I finally changed the input to a new shape using

inputs = inputs.view(4, 3, 32, 32), right under

inputs, labels = data['image'], data['class'].

Solution 4

You can use torch.nn.AdaptiveMaxPool2d to set a specific output.

For example, if I set nn.AdaptiveMaxPool2d((5,7)) I am forcing the image to be a 5X7. Then you can just multiply that by out_channels from your previous Conv2d layer.

https://pytorch.org/docs/stable/nn.html#torch.nn.AdaptiveMaxPool2d

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.adapt = nn.AdaptiveMaxPool2d((5,7))
        self.fc1 = nn.Linear(16*5*7, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.adapt(F.relu(self.conv2(x)))
        x = x.view(-1, 16*5*7)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x
Share:
23,626
boltthrower
Author by

boltthrower

Mostly writing Python, SQL & Java. Data Science | Software | Music Information Retrieval

Updated on March 02, 2020

Comments

  • boltthrower
    boltthrower about 4 years

    I'm trying to run the PyTorch tutorial on CIFAR10 image classification here - http://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html#sphx-glr-beginner-blitz-cifar10-tutorial-py

    I've made a small change and I'm using a different dataset. I have images from the Wikiart dataset that I want to classify by artist (label = artist name).

    Here is the code for the Net -

    class Net(nn.Module):
        def __init__(self):
            super(Net, self).__init__()
            self.conv1 = nn.Conv2d(3, 6, 5)
            self.pool = nn.MaxPool2d(2, 2)
            self.conv2 = nn.Conv2d(6, 16, 5)
            self.fc1 = nn.Linear(16*5*5, 120)
            self.fc2 = nn.Linear(120, 84)
            self.fc3 = nn.Linear(84, 10)
    
        def forward(self, x):
            x = self.pool(F.relu(self.conv1(x)))
            x = self.pool(F.relu(self.conv2(x)))
            x = x.view(-1, 16*5*5)
            x = F.relu(self.fc1(x))
            x = F.relu(self.fc2(x))
            x = self.fc3(x)
            return x
    

    Then there is this section of the code where I start training the Net.

    for epoch in range(2):
         running_loss = 0.0
    
         for i, data in enumerate(wiki_train_dataloader, 0):
            inputs, labels = data['image'], data['class']
            print(inputs.shape)
            inputs, labels = Variable(inputs), Variable(labels)
    
            optimizer.zero_grad()
    
            # forward + backward + optimize
            outputs = net(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
    
            # print statistics
            running_loss += loss.data[0]
            if i % 2000 == 1999:  # print every 2000 mini-batches
                print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 2000))
                running_loss = 0.0
    

    This line print(inputs.shape) gives me torch.Size([4, 32, 32, 3]) with my Wikiart dataset whereas in the original example with CIFAR10, it prints torch.Size([4, 3, 32, 32]).

    Now, I'm not sure how to change the Conv2d in my Net to be compatible with torch.Size([4, 32, 32, 3]).

    I get this error:

    RuntimeError: Given input size: (3 x 32 x 3). Calculated output size: (6 x 28 x -1). Output size is too small at /opt/conda/conda-bld/pytorch_1503965122592/work/torch/lib/THNN/generic/SpatialConvolutionMM.c:45

    While reading the images for the Wikiart dataset, I resize them to (32, 32) and these are 3-channel images.

    Things I tried:

    1) The CIFAR10 tutorial uses a transform which I am not using. I could not incorporate the same into my code.

    2) Changing self.conv2 = nn.Conv2d(6, 16, 5) to self.conv2 = nn.Conv2d(3, 6, 5). This gave me the same error as above. I was only changing this to see if the error message changes.

    Any resources on how to calculate input & output sizes in PyTorch or automatically reshape Tensors would be really appreciated. I just started learning Torch & I find the size calculations complicated.

  • theSekyi
    theSekyi almost 3 years
    The link is broken