Pytorch loss inf nan

16,501

Solution 1

Once the loss becomes inf after a certain pass, your model gets corrupted after backpropagating. This probably happens because the values in "Salary" column are too big. try normalizing the salaries.

Alternatively, you could try to initialize the parameters by hand (rather than letting it be initialized randomly), letting the bias term be the average of salaries, and the slope of the line be 0 (for instance). That way the initial model would be close enough to the optimal solution, so that the loss does not blow up.

Solution 2

Please reduce the learning rate "lr" to 0.001 or 0.0001. Having larger values for lr makes the gradient to explode and result in inf. I have tried by both lr=0.001 and lr=0.0001 it works fine for me for. Please try once and let me know.

Share:
16,501

Related videos on Youtube

JAbrams
Author by

JAbrams

Updated on June 04, 2022

Comments

  • JAbrams
    JAbrams almost 2 years

    I'm trying to do simple linear regression with 1 feature. It's a simple 'predict salary given years experience' problem. The NN trains on years experience (X) and a salary (Y). For some reason the loss is exploding and ultimately returns inf or nan

    This is the code I have:

        import torch
        import torch.nn as nn
        import pandas as pd
        import numpy as np
        
        dataset = pd.read_csv('./salaries.csv')
        
        x_temp = dataset.iloc[:, :-1].values
        y_temp = dataset.iloc[:, 1:].values
        
        X_train = torch.FloatTensor(x_temp)
        Y_train = torch.FloatTensor(y_temp)
       
        class Model(torch.nn.Module): 
            def __init__(self):
                super().__init__()
                self.linear = torch.nn.Linear(1,1)
        
            def forward(self, x):
                y_pred = self.linear(x)
                return y_pred
        
        model = Model()
        
        loss_func = torch.nn.MSELoss(size_average=False)
        optim = torch.optim.SGD(model.parameters(), lr=0.01)
        
        #training 
        for epoch in range(200):
            #calculate y_pred
            y_pred = model(X_train)
        
            #calculate loss
            loss = loss_func(y_pred, Y_train)
            print(epoch, "{:.2f}".format(loss.data))
        
            #backward pass + update weights
            optim.zero_grad()
            loss.backward()
            optim.step()
        
        
        test_exp = torch.FloatTensor([[8.0]])
        print("8 years experience --> ", model(test_exp).data[0][0].item())
    
    

    As I mentioned, once it starts training the loss gets super big and ends up showing inf after like the 10th epoch.

    I suspect it may have something to do with how i'm loading the data? This is what is in salaries.csv file:

    Years Salary
    1.1 39343
    1.3 46205
    1.5 37731
    2   43525
    2.2 39891
    2.9 56642
    3   60150
    3.2 54445
    3.2 64445
    3.7 57189
    3.9 63218
    4   55794
    4   56957
    4.1 57081
    4.5 61111
    4.9 67938
    5.1 66029
    5.3 83088
    

    Thank you for your help

    • Vadim
      Vadim almost 6 years
      do you have NaNs or infs in your dataset?
    • Ryan
      Ryan almost 6 years
      Can you post the link to the salaries csv?
    • dedObed
      dedObed almost 6 years
      I would start by getting the average loss, instead of a sum (why did not avoid averaging in the first place?). And/or decrease the learning rate. Finally, you would make the problem more sensible for MSE by downscaling the output values (I'd suggest a factor of 10 000, so the values stay readable).