pytorch loss value not change
Solution 1
I realised that L2_loss in Adam Optimizer make loss
value remain unchanged (I haven't tried in other Optimizer yet). It works when I remove L2_loss:
# optimizer = optim.Adam(net.parameters(), lr=0.01, weight_decay=0.1)
optimizer = optim.Adam(model.parameters(), lr=0.001)
=== UPDATE (See above answer for more detail!) ===
self.features = nn.Sequential(self.flat_layer)
self.classifier = nn.Linear(out_channels * len(filter_sizes), num_classes)
...
optimizer = optim.Adam([
{'params': model.features.parameters()},
{'params': model.classifier.parameters(), 'weight_decay': 0.1}
], lr=0.001)
Solution 2
I have seen that in your original code, weight_decay
term is set to be 0.1
. weight_decay
is used to regularize the network's parameters. This term maybe too strong so that the regularization is too much. Try to reduce the value of weight_decay
.
For convolutional neural networks in computer vision tasks. weight_decay
term are usually set to be 5e-4
or 5e-5
. I am not familiar with text classification. These values may work for you out of the box or you have to tweak it a little bit by trial and error.
Let me know if it works for you.
Related videos on Youtube
Viet Phan
Updated on June 04, 2022Comments
-
Viet Phan almost 2 years
I wrote a module based on this article: http://www.wildml.com/2015/12/implementing-a-cnn-for-text-classification-in-tensorflow/
The idea is pass the input into multiple streams then concat together and connect to a FC layer. I divided my source code into 3 custom modules:
TextClassifyCnnNet
>>FlatCnnLayer
>>FilterLayer
FilterLayer:
class FilterLayer(nn.Module): def __init__(self, filter_size, embedding_size, sequence_length, out_channels=128): super(FilterLayer, self).__init__() self.model = nn.Sequential( nn.Conv2d(1, out_channels, (filter_size, embedding_size)), nn.ReLU(inplace=True), nn.MaxPool2d((sequence_length - filter_size + 1, 1), stride=1) ) for m in self.modules(): if isinstance(m, nn.Conv2d): n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels m.weight.data.normal_(0, math.sqrt(2. / n)) def forward(self, x): return self.model(x)
FlatCnnLayer:
class FlatCnnLayer(nn.Module): def __init__(self, embedding_size, sequence_length, filter_sizes=[3, 4, 5], out_channels=128): super(FlatCnnLayer, self).__init__() self.filter_layers = nn.ModuleList( [FilterLayer(filter_size, embedding_size, sequence_length, out_channels=out_channels) for filter_size in filter_sizes]) def forward(self, x): pools = [] for filter_layer in self.filter_layers: out_filter = filter_layer(x) # reshape from (batch_size, out_channels, h, w) to (batch_size, h, w, out_channels) pools.append(out_filter.view(out_filter.size()[0], 1, 1, -1)) x = torch.cat(pools, dim=3) x = x.view(x.size()[0], -1) x = F.dropout(x, p=dropout_prob, training=True) return x
TextClassifyCnnNet (main module):
class TextClassifyCnnNet(nn.Module): def __init__(self, embedding_size, sequence_length, num_classes, filter_sizes=[3, 4, 5], out_channels=128): super(TextClassifyCnnNet, self).__init__() self.flat_layer = FlatCnnLayer(embedding_size, sequence_length, filter_sizes=filter_sizes, out_channels=out_channels) self.model = nn.Sequential( self.flat_layer, nn.Linear(out_channels * len(filter_sizes), num_classes) ) def forward(self, x): x = self.model(x) return x def fit(net, data, save_path): if torch.cuda.is_available(): net = net.cuda() for param in list(net.parameters()): print(type(param.data), param.size()) optimizer = optim.Adam(net.parameters(), lr=0.01, weight_decay=0.1) X_train, X_test = data['X_train'], data['X_test'] Y_train, Y_test = data['Y_train'], data['Y_test'] X_valid, Y_valid = data['X_valid'], data['Y_valid'] n_batch = len(X_train) // batch_size for epoch in range(1, n_epochs + 1): # loop over the dataset multiple times net.train() start = 0 end = batch_size for batch_idx in range(1, n_batch + 1): # get the inputs x, y = X_train[start:end], Y_train[start:end] start = end end = start + batch_size # zero the parameter gradients optimizer.zero_grad() # forward + backward + optimize predicts = _get_predict(net, x) loss = _get_loss(predicts, y) loss.backward() optimizer.step() if batch_idx % display_step == 0: print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format( epoch, batch_idx * len(x), len(X_train), 100. * batch_idx / (n_batch + 1), loss.data[0])) # print statistics if epoch % display_step == 0 or epoch == 1: net.eval() valid_predicts = _get_predict(net, X_valid) valid_loss = _get_loss(valid_predicts, Y_valid) valid_accuracy = _get_accuracy(valid_predicts, Y_valid) print('\r[%d] loss: %.3f - accuracy: %.2f' % (epoch, valid_loss.data[0], valid_accuracy * 100)) print('\rFinished Training\n') net.eval() test_predicts = _get_predict(net, X_test) test_loss = _get_loss(test_predicts, Y_test).data[0] test_accuracy = _get_accuracy(test_predicts, Y_test) print('Test loss: %.3f - Test accuracy: %.2f' % (test_loss, test_accuracy * 100)) torch.save(net.flat_layer.state_dict(), save_path) def _get_accuracy(predicts, labels): predicts = torch.max(predicts, 1)[1].data[0] return np.mean(predicts == labels) def _get_predict(net, x): # wrap them in Variable inputs = torch.from_numpy(x).float() # convert to cuda tensors if cuda flag is true if torch.cuda.is_available: inputs = inputs.cuda() inputs = Variable(inputs) return net(inputs) def _get_loss(predicts, labels): labels = torch.from_numpy(labels).long() # convert to cuda tensors if cuda flag is true if torch.cuda.is_available: labels = labels.cuda() labels = Variable(labels) return F.cross_entropy(predicts, labels)
It seems that parameters 're just updated slightly each epoch, the accuracy remains for all the process. While with the same implementation and the same params in Tensorflow, it runs correctly.
I'm new to Pytorch, so maybe my instructions has something wrong, please help me to find out. Thank you!
P.s: I try to use
F.nll_loss
+F.log_softmax
instead ofF.cross_entropy
. Theoretically, it should return the same, but in fact another result is printed out (but it still be a wrong loss value) -
Viet Phan over 6 yearshow can I set
weight_decay
for Fully Connected Layer only? or set specificweight_decay
for each type of layer -
jdhao over 6 yearsThis is easy to achieve in PyTorch. The optimizer accept parameter groups, and in each parameter group, you can set
lr
,weight_decay
separately. See here for more info. Also, searching google fordifferent learning rate for different layer in pytorch
will give you pretty much information. Another resource is the wonderful PyTorch forum. Make sure to search through the forum before posting your questions as many questions have already been asked and have good answers. -
jdhao over 6 years@VietPhan does decreasing weight decay value works for you?
-
Viet Phan over 6 yearsi disabled weight decay for conv2d, and only used on FC. It works!