PyTorch: How to use DataLoaders for custom Datasets
Solution 1
Yes, that is possible. Just create the objects by yourself, e.g.
import torch.utils.data as data_utils
train = data_utils.TensorDataset(features, targets)
train_loader = data_utils.DataLoader(train, batch_size=50, shuffle=True)
where features
and targets
are tensors. features
has to be 2-D, i.e. a matrix where each line represents one training sample, and targets
may be 1-D or 2-D, depending on whether you are trying to predict a scalar or a vector.
Hope that helps!
EDIT: response to @sarthak's question
Basically yes. If you create an object of type TensorData
, then the constructor investigates whether the first dimensions of the feature tensor (which is actually called data_tensor
) and the target tensor (called target_tensor
) have the same length:
assert data_tensor.size(0) == target_tensor.size(0)
However, if you want to feed these data into a neural network subsequently, then you need to be careful. While convolution layers work on data like yours, (I think) all of the other types of layers expect the data to be given in matrix form. So, if you run into an issue like this, then an easy solution would be to convert your 4D-dataset (given as some kind of tensor, e.g. FloatTensor
) into a matrix by using the method view
. For your 5000xnxnx3 dataset, this would look like this:
2d_dataset = 4d_dataset.view(5000, -1)
(The value -1
tells PyTorch to figure out the length of the second dimension automatically.)
Solution 2
You can easily do this be extending the data.Dataset
class.
According to the API, all you have to do is implement two function: __getitem__
and __len__
.
You can then wrap the dataset with the DataLoader as shown in the API and in @pho7 's answer.
I think the ImageFolder
class is a reference. See code here.
Solution 3
Yes, you can do it. Hope this helps for future readers.
from torch.utils.data import TensorDataset, DataLoader
import torch.utils.data as data_utils
inputs = [[ 1, 2, 3, 4, 5],[ 2, 3, 4, 5, 6]]
targets = [ 6,7]
batch_size = 2
inputs = torch.tensor(inputs)
targets = torch.IntTensor(targets)
dataset =TensorDataset(inputs, targets)
data_loader = DataLoader(dataset, batch_size, shuffle = True)
Solution 4
In addition to user3693922's answer and the accepted answer, which respectively link the "quick" PyTorch documentation example to create custom dataloaders for custom datasets, and create a custom dataloader in the "simplest" case, there is a much more detailed dedicated official PyTorch tutorial on how to create a custom dataloader with the associated preprocessing: "writing custom datasets, dataloaders and transforms" official PyTorch tutorial
Sarthak
trying to figure out........... ............................... ............................... ............................... ............................... ............. oops something went wrong
Updated on March 10, 2021Comments
-
Sarthak about 3 years
How to make use of the
torch.utils.data.Dataset
andtorch.utils.data.DataLoader
on your own data (not just thetorchvision.datasets
)?Is there a way to use the inbuilt
DataLoaders
which they use onTorchVisionDatasets
to be used on any dataset? -
Sarthak about 7 yearsI have 3D features : 2D for an image and one extra dimension for color channels. Would it still work if I pass the features as 5000xnxnx3. 5000 is the number of data points nxnx3 is the image size
-
Sarthak about 7 yearsA 4d Dataset can be passed as features there is no need for the view statement.
-
flaudre almost 7 years@pho7 You say the
features
matrix is 2D and contains lines of input data. This makes sense to me if the input data is 1D (such as a voice signal or so..), but what if it is an image (2D) say 32x32? How would thefeatures
matrix look like? -
YellowPillow over 6 yearsIt's probably flattened, and you would need to reshape it when you load it from the DataLoader? I'm not sure though