TypeError: object of type 'numpy.int64' has no len()

21,848

Solution 1

Reference:
https://github.com/pytorch/pytorch/issues/9211

Just add .tolist() to indices line.

def random_split(dataset, lengths):
    """
    Randomly split a dataset into non-overlapping new datasets of given lengths.
    Arguments:
        dataset (Dataset): Dataset to be split
        lengths (sequence): lengths of splits to be produced
    """
    if sum(lengths) != len(dataset):
        raise ValueError("Sum of input lengths does not equal the length of the input dataset!")

    indices = randperm(sum(lengths)).tolist()
    return [Subset(dataset, indices[offset - length:offset]) for offset, length in zip(_accumulate(lengths), lengths)]

Solution 2

I think the issue is that after using random_split, index is now a torch.Tensor rather than an int. I found that adding a quick type check to __getitem__ and then using .item() on the tensor works for me:

def __getitem__(self, index):

    if type(index) == torch.Tensor:
        index = index.item()

    x = torch.tensor(self.x_data.iloc[index].values, dtype=torch.float)
    y = torch.tensor(self.y_data.iloc[index], dtype=torch.float)
    return (x, y)

Source: https://discuss.pytorch.org/t/issues-with-torch-utils-data-random-split/22298/8

Share:
21,848
joe
Author by

joe

PyTorch/Django lover and Flutter newbie. Dream to make my own cooking business

Updated on September 15, 2021

Comments

  • joe
    joe over 2 years

    I am making a DataLoader from DataSet in PyTorch.

    Start from loading the DataFrame with all dtype as an np.float64

    result = pd.read_csv('dummy.csv', header=0, dtype=DTYPE_CLEANED_DF)

    Here is my dataset classes.

    from torch.utils.data import Dataset, DataLoader
    class MyDataset(Dataset):
        def __init__(self, result):
            headers = list(result)
            headers.remove('classes')
    
            self.x_data = result[headers]
            self.y_data = result['classes']
            self.len = self.x_data.shape[0]
    
        def __getitem__(self, index):
            x = torch.tensor(self.x_data.iloc[index].values, dtype=torch.float)
            y = torch.tensor(self.y_data.iloc[index], dtype=torch.float)
            return (x, y)
    
        def __len__(self):
            return self.len
    

    Prepare the train_loader and test_loader

    train_size = int(0.5 * len(full_dataset))
    test_size = len(full_dataset) - train_size
    train_dataset, test_dataset = torch.utils.data.random_split(full_dataset, [train_size, test_size])
    
    train_loader = DataLoader(dataset=train_dataset, batch_size=16, shuffle=True, num_workers=1)
    test_loader = DataLoader(dataset=train_dataset)
    

    Here is my csv file

    When I try to iterate over the train_loader. It raises the error

    for i , (data, target) in enumerate(train_loader):
        print(i)
    
    TypeError                                 Traceback (most recent call last)
    <ipython-input-32-0b4921c3fe8c> in <module>
    ----> 1 for i , (data, target) in enumerate(train_loader):
          2     print(i)
    
    /opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py in __next__(self)
        635                 self.reorder_dict[idx] = batch
        636                 continue
    --> 637             return self._process_next_batch(batch)
        638 
        639     next = __next__  # Python 2 compatibility
    
    /opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py in _process_next_batch(self, batch)
        656         self._put_indices()
        657         if isinstance(batch, ExceptionWrapper):
    --> 658             raise batch.exc_type(batch.exc_msg)
        659         return batch
        660 
    
    TypeError: Traceback (most recent call last):
      File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop
        samples = collate_fn([dataset[i] for i in batch_indices])
      File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in <listcomp>
        samples = collate_fn([dataset[i] for i in batch_indices])
      File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataset.py", line 103, in __getitem__
        return self.dataset[self.indices[idx]]
      File "<ipython-input-27-107e03bc3c6a>", line 12, in __getitem__
        x = torch.tensor(self.x_data.iloc[index].values, dtype=torch.float)
      File "/opt/conda/lib/python3.6/site-packages/pandas/core/indexing.py", line 1478, in __getitem__
        return self._getitem_axis(maybe_callable, axis=axis)
      File "/opt/conda/lib/python3.6/site-packages/pandas/core/indexing.py", line 2091, in _getitem_axis
        return self._get_list_axis(key, axis=axis)
      File "/opt/conda/lib/python3.6/site-packages/pandas/core/indexing.py", line 2070, in _get_list_axis
        return self.obj._take(key, axis=axis)
      File "/opt/conda/lib/python3.6/site-packages/pandas/core/generic.py", line 2789, in _take
        verify=True)
      File "/opt/conda/lib/python3.6/site-packages/pandas/core/internals.py", line 4537, in take
        new_labels = self.axes[axis].take(indexer)
      File "/opt/conda/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2195, in take
        return self._shallow_copy(taken)
      File "/opt/conda/lib/python3.6/site-packages/pandas/core/indexes/range.py", line 267, in _shallow_copy
        return self._int64index._shallow_copy(values, **kwargs)
      File "/opt/conda/lib/python3.6/site-packages/pandas/core/indexes/numeric.py", line 68, in _shallow_copy
        return self._shallow_copy_with_infer(values=values, **kwargs)
      File "/opt/conda/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 538, in _shallow_copy_with_infer
        if not len(values) and 'dtype' not in kwargs:
    TypeError: object of type 'numpy.int64' has no len()
    

    Related issues:
    https://github.com/pytorch/pytorch/issues/10165
    https://github.com/pytorch/pytorch/pull/9237
    https://github.com/pandas-dev/pandas/issues/21946

    Questions:
    How to workaround pandas issue here?

    • Sheldore
      Sheldore over 5 years
      Try looking at the shape of train_loader using train_loader.shape. Most probably, there is some issue with the number of entries.
    • joe
      joe over 5 years
      @Bazingaa ['_DataLoader__initialized', 'batch_sampler', 'batch_size', 'collate_fn', 'dataset', 'drop_last', 'num_workers', 'pin_memory', 'sampler', 'timeout', 'worker_init_fn'] It does not has shape
    • MBT
      MBT over 5 years
      Your problem is caused by this line: x = torch.tensor(self.x_data.iloc[index].values, dtype=torch.float), I guess more precisely it is caused by calling .values. But I'm no expert in pandas. So this doesn't seem to to have something to do with PyTorch itself. I added the pandas tag to your question, I guess someone there will be able to tell you exactly what the problem is.
    • joe
      joe over 5 years
      @blue-phoenox same error