Pytorch loss inf nan
Solution 1
Once the loss becomes inf after a certain pass, your model gets corrupted after backpropagating. This probably happens because the values in "Salary" column are too big. try normalizing the salaries.
Alternatively, you could try to initialize the parameters by hand (rather than letting it be initialized randomly), letting the bias term be the average of salaries, and the slope of the line be 0 (for instance). That way the initial model would be close enough to the optimal solution, so that the loss does not blow up.
Solution 2
Please reduce the learning rate "lr" to 0.001 or 0.0001. Having larger values for lr makes the gradient to explode and result in inf. I have tried by both lr=0.001 and lr=0.0001 it works fine for me for. Please try once and let me know.
Related videos on Youtube
JAbrams
Updated on June 04, 2022Comments
-
JAbrams almost 2 years
I'm trying to do simple linear regression with 1 feature. It's a simple 'predict salary given years experience' problem. The NN trains on years experience (X) and a salary (Y). For some reason the loss is exploding and ultimately returns
inf
ornan
This is the code I have:
import torch import torch.nn as nn import pandas as pd import numpy as np dataset = pd.read_csv('./salaries.csv') x_temp = dataset.iloc[:, :-1].values y_temp = dataset.iloc[:, 1:].values X_train = torch.FloatTensor(x_temp) Y_train = torch.FloatTensor(y_temp) class Model(torch.nn.Module): def __init__(self): super().__init__() self.linear = torch.nn.Linear(1,1) def forward(self, x): y_pred = self.linear(x) return y_pred model = Model() loss_func = torch.nn.MSELoss(size_average=False) optim = torch.optim.SGD(model.parameters(), lr=0.01) #training for epoch in range(200): #calculate y_pred y_pred = model(X_train) #calculate loss loss = loss_func(y_pred, Y_train) print(epoch, "{:.2f}".format(loss.data)) #backward pass + update weights optim.zero_grad() loss.backward() optim.step() test_exp = torch.FloatTensor([[8.0]]) print("8 years experience --> ", model(test_exp).data[0][0].item())
As I mentioned, once it starts training the loss gets super big and ends up showing
inf
after like the 10th epoch.I suspect it may have something to do with how i'm loading the data? This is what is in
salaries.csv
file:Years Salary 1.1 39343 1.3 46205 1.5 37731 2 43525 2.2 39891 2.9 56642 3 60150 3.2 54445 3.2 64445 3.7 57189 3.9 63218 4 55794 4 56957 4.1 57081 4.5 61111 4.9 67938 5.1 66029 5.3 83088
Thank you for your help
-
Vadim almost 6 yearsdo you have NaNs or infs in your dataset?
-
Ryan almost 6 yearsCan you post the link to the salaries csv?
-
dedObed almost 6 yearsI would start by getting the average loss, instead of a sum (why did not avoid averaging in the first place?). And/or decrease the learning rate. Finally, you would make the problem more sensible for MSE by downscaling the output values (I'd suggest a factor of 10 000, so the values stay readable).
-