Pytorch loss inf nan

regression pytorch

16,501

Solution 1

Once the loss becomes inf after a certain pass, your model gets corrupted after backpropagating. This probably happens because the values in "Salary" column are too big. try normalizing the salaries.

Alternatively, you could try to initialize the parameters by hand (rather than letting it be initialized randomly), letting the bias term be the average of salaries, and the slope of the line be 0 (for instance). That way the initial model would be close enough to the optimal solution, so that the loss does not blow up.

Solution 2

Please reduce the learning rate "lr" to 0.001 or 0.0001. Having larger values for lr makes the gradient to explode and result in inf. I have tried by both lr=0.001 and lr=0.0001 it works fine for me for. Please try once and let me know.

16,501

JAbrams

Updated on June 04, 2022

Comments

JAbrams almost 2 years

I'm trying to do simple linear regression with 1 feature. It's a simple 'predict salary given years experience' problem. The NN trains on years experience (X) and a salary (Y). For some reason the loss is exploding and ultimately returns inf or nan

This is the code I have:

    import torch
    import torch.nn as nn
    import pandas as pd
    import numpy as np
    
    dataset = pd.read_csv('./salaries.csv')
    
    x_temp = dataset.iloc[:, :-1].values
    y_temp = dataset.iloc[:, 1:].values
    
    X_train = torch.FloatTensor(x_temp)
    Y_train = torch.FloatTensor(y_temp)
   
    class Model(torch.nn.Module): 
        def __init__(self):
            super().__init__()
            self.linear = torch.nn.Linear(1,1)
    
        def forward(self, x):
            y_pred = self.linear(x)
            return y_pred
    
    model = Model()
    
    loss_func = torch.nn.MSELoss(size_average=False)
    optim = torch.optim.SGD(model.parameters(), lr=0.01)
    
    #training 
    for epoch in range(200):
        #calculate y_pred
        y_pred = model(X_train)
    
        #calculate loss
        loss = loss_func(y_pred, Y_train)
        print(epoch, "{:.2f}".format(loss.data))
    
        #backward pass + update weights
        optim.zero_grad()
        loss.backward()
        optim.step()
    
    
    test_exp = torch.FloatTensor([[8.0]])
    print("8 years experience --> ", model(test_exp).data[0][0].item())

As I mentioned, once it starts training the loss gets super big and ends up showing inf after like the 10th epoch.

I suspect it may have something to do with how i'm loading the data? This is what is in salaries.csv file:

Years Salary
1.1 39343
1.3 46205
1.5 37731
2   43525
2.2 39891
2.9 56642
3   60150
3.2 54445
3.2 64445
3.7 57189
3.9 63218
4   55794
4   56957
4.1 57081
4.5 61111
4.9 67938
5.1 66029
5.3 83088

Thank you for your help

Vadim almost 6 years

do you have NaNs or infs in your dataset?
Ryan almost 6 years

Can you post the link to the salaries csv?
dedObed almost 6 years

I would start by getting the average loss, instead of a sum (why did not avoid averaging in the first place?). And/or decrease the learning rate. Finally, you would make the problem more sensible for MSE by downscaling the output values (I'd suggest a factor of 10 000, so the values stay readable).