Tensorflow - About mnist.train.next_batch()

11,368

Re 1, when shuffle=True the order of examples in the data is randomized. Re 2, yes, it should respect whatever order the examples have in the numpy arrays.

Share:
11,368
YOON
Author by

YOON

Updated on June 04, 2022

Comments

  • YOON
    YOON almost 2 years

    When I search about mnist.train.next_batch() I found this https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/learn/python/learn/datasets/mnist.py

    In this code

     def next_batch(self, batch_size, fake_data=False, shuffle=True):
      """Return the next `batch_size` examples from this data set."""
      if fake_data:
      fake_image = [1] * 784
      if self.one_hot:
        fake_label = [1] + [0] * 9
      else:
        fake_label = 0
      return [fake_image for _ in xrange(batch_size)], [
          fake_label for _ in xrange(batch_size)
      ]
    start = self._index_in_epoch
    # Shuffle for the first epoch
    if self._epochs_completed == 0 and start == 0 and shuffle:
      perm0 = numpy.arange(self._num_examples)
      numpy.random.shuffle(perm0)
      self._images = self.images[perm0]
      self._labels = self.labels[perm0]
    # Go to the next epoch
    if start + batch_size > self._num_examples:
      # Finished epoch
      self._epochs_completed += 1
      # Get the rest examples in this epoch
      rest_num_examples = self._num_examples - start
      images_rest_part = self._images[start:self._num_examples]
      labels_rest_part = self._labels[start:self._num_examples]
      # Shuffle the data
      if shuffle:
        perm = numpy.arange(self._num_examples)
        numpy.random.shuffle(perm)
        self._images = self.images[perm]
        self._labels = self.labels[perm]
      # Start next epoch
      start = 0
      self._index_in_epoch = batch_size - rest_num_examples
      end = self._index_in_epoch
      images_new_part = self._images[start:end]
      labels_new_part = self._labels[start:end]
      return numpy.concatenate((images_rest_part, images_new_part), axis=0) , numpy.concatenate((labels_rest_part, labels_new_part), axis=0)
    else:
      self._index_in_epoch += batch_size
      end = self._index_in_epoch
      return self._images[start:end], self._labels[start:end]
    

    I know that mnist.train.next_batch(batch_size=100) means it randomly pick 100 data from MNIST dataset. Now, Here's my question

    1. What is shuffle=true means?
    2. If I set next_batch(batch_size=100,fake_data=False, shuffle=False) then it picks 100 data from the start to the end of MNIST dataset sequentially? Not randomly?