Calculating input and output size for Conv2d in PyTorch for image classification
Solution 1
You have to shape your input to this format (Batch, Number Channels, height, width). Currently you have format (B,H,W,C) (4, 32, 32, 3), so you need to swap 4th and 2nd axis to shape your data with (B,C,H,W). You can do it this way:
inputs, labels = Variable(inputs), Variable(labels)
inputs = inputs.transpose(1,3)
... the rest
Solution 2
I know it is an old question, but I stumbled upon this again when working with non-standard kernel sizes, dilations, etc. Here is a function I came up with, which does the calculation for me and checks for a given output shape:
def find_settings(shape_in, shape_out, kernel_sizes, dilation_sizes, padding_sizes, stride_sizes, transpose=False):
from itertools import product
import torch
from torch import nn
import numpy as np
# Fake input
x_in = torch.tensor(np.random.randn(4, 1, shape_in, shape_in), dtype=torch.float)
# Grid search through all combinations
for kernel, dilation, padding, stride in product(kernel_sizes, dilation_sizes, padding_sizes, stride_sizes):
# Define a layer
if transpose:
layer = nn.ConvTranspose2d
else:
layer = nn.Conv2d
layer = layer(
1, 1,
(4, kernel),
stride=(2, stride),
padding=(2, padding),
dilation=(2, dilation)
)
# Check if layer is valid for given input shape
try:
x_out = layer(x_in)
except Exception:
continue
# Check for shape of out tensor
result = x_out.shape[-1]
if shape_out == result:
print('Correct shape for:\n ker: {}\n dil: {}\n pad: {}\n str: {}\n'.format(kernel, dilation, padding, stride))
Here is an example usage of it:
transpose = True
shape_in = 128
shape_out = 1024
kernel_sizes = [3, 4, 5, 7, 9, 11]
dilation_sizes = list(range(1, 20))
padding_sizes = list(range(15))
stride_sizes = list(range(4, 16))
find_settings(shape_in, shape_out, kernel_sizes, dilation_sizes, padding_sizes, stride_sizes, transpose)
I hope it can help people in the future with this problem. Note that it's not parallelized, and if given a lot of choices it can run for a while.
Solution 3
I finally changed the input to a new shape using
inputs = inputs.view(4, 3, 32, 32)
, right under
inputs, labels = data['image'], data['class']
.
Solution 4
You can use torch.nn.AdaptiveMaxPool2d to set a specific output.
For example, if I set nn.AdaptiveMaxPool2d((5,7)) I am forcing the image to be a 5X7. Then you can just multiply that by out_channels from your previous Conv2d layer.
https://pytorch.org/docs/stable/nn.html#torch.nn.AdaptiveMaxPool2d
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.adapt = nn.AdaptiveMaxPool2d((5,7))
self.fc1 = nn.Linear(16*5*7, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.adapt(F.relu(self.conv2(x)))
x = x.view(-1, 16*5*7)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
boltthrower
Mostly writing Python, SQL & Java. Data Science | Software | Music Information Retrieval
Updated on March 02, 2020Comments
-
boltthrower about 4 years
I'm trying to run the PyTorch tutorial on CIFAR10 image classification here - http://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html#sphx-glr-beginner-blitz-cifar10-tutorial-py
I've made a small change and I'm using a different dataset. I have images from the Wikiart dataset that I want to classify by artist (label = artist name).
Here is the code for the Net -
class Net(nn.Module): def __init__(self): super(Net, self).__init__() self.conv1 = nn.Conv2d(3, 6, 5) self.pool = nn.MaxPool2d(2, 2) self.conv2 = nn.Conv2d(6, 16, 5) self.fc1 = nn.Linear(16*5*5, 120) self.fc2 = nn.Linear(120, 84) self.fc3 = nn.Linear(84, 10) def forward(self, x): x = self.pool(F.relu(self.conv1(x))) x = self.pool(F.relu(self.conv2(x))) x = x.view(-1, 16*5*5) x = F.relu(self.fc1(x)) x = F.relu(self.fc2(x)) x = self.fc3(x) return x
Then there is this section of the code where I start training the Net.
for epoch in range(2): running_loss = 0.0 for i, data in enumerate(wiki_train_dataloader, 0): inputs, labels = data['image'], data['class'] print(inputs.shape) inputs, labels = Variable(inputs), Variable(labels) optimizer.zero_grad() # forward + backward + optimize outputs = net(inputs) loss = criterion(outputs, labels) loss.backward() optimizer.step() # print statistics running_loss += loss.data[0] if i % 2000 == 1999: # print every 2000 mini-batches print('[%d, %5d] loss: %.3f' % (epoch + 1, i + 1, running_loss / 2000)) running_loss = 0.0
This line
print(inputs.shape)
gives metorch.Size([4, 32, 32, 3])
with my Wikiart dataset whereas in the original example with CIFAR10, it printstorch.Size([4, 3, 32, 32])
.Now, I'm not sure how to change the Conv2d in my Net to be compatible with
torch.Size([4, 32, 32, 3])
.I get this error:
RuntimeError: Given input size: (3 x 32 x 3). Calculated output size: (6 x 28 x -1). Output size is too small at /opt/conda/conda-bld/pytorch_1503965122592/work/torch/lib/THNN/generic/SpatialConvolutionMM.c:45
While reading the images for the Wikiart dataset, I resize them to (32, 32) and these are 3-channel images.
Things I tried:
1) The CIFAR10 tutorial uses a transform which I am not using. I could not incorporate the same into my code.
2) Changing
self.conv2 = nn.Conv2d(6, 16, 5)
toself.conv2 = nn.Conv2d(3, 6, 5)
. This gave me the same error as above. I was only changing this to see if the error message changes.Any resources on how to calculate input & output sizes in PyTorch or automatically reshape Tensors would be really appreciated. I just started learning Torch & I find the size calculations complicated.
-
theSekyi almost 3 yearsThe link is broken