Solution 1

It's probably an issue with specifying input data to Keras' fit() function. I would recommend using a as input to fit() like so:

import tensorflow as tf

train_data =, trainVocals))
valid_data =, testVocals)), epochs=10, validation_data=valid_data)

You can then also use functions like shuffle() and batch() on the TF datasets.

EDIT: It also seems like your input shapes are incorrect. The input_shape you specified for the first conv layer is (513, 25, 1), so the input should be a batch tensor of shape (batch_size, 513, 25, 1), whereas you're inputting the shape (batch_size, 2584). So you'll need to reshape and probably cut your inputs to the specified shape, or specify a new shape.

Solution 2

Basically, no matter what you define the shape of Conv2D is 2D, 3D,... it requires 4D when you feeding input X to it, where X.shape is look like this (batch,row,col,channel).

The below example here is the clarify about Conv2D

input_layer= layers.InputLayer(input_shape=(2,2,1))
conv1 = layers.Conv2D(3,(2,2))
X= np.ones((2,2))
X =X.reshape(1,X.shape[0],X.shape[1],1) # shape of X is 4D, (1, 2, 2, 1) 


Now let's elaborating above codes

Line 1 input_layer was defined with the shape of 3D, but at line no.4 X was reshaped to 4D shape which is not matching the shape at all. However, in order to feed any input X to input_layer or Conv2D must pass with 4D shape.

Updated on August 09, 2022


    I'm working in a project that isolate vocal parts from an audio. I'm using the DSD100 dataset, but for doing tests I'm using the DSD100subset dataset from I only use the mixtures and the vocals. I'm basing this work on this article

    First I process the audios to extract a spectrogram and put it on a list, with all the audios forming four lists (trainMixed, trainVocals, testMixed, testVocals). Like this:

    def to_spec(wav, n_fft=1024, hop_length=256):
        return librosa.stft(wav, n_fft=n_fft, hop_length=hop_length)
    def prepareData(filename, sr=22050, hop_length=256, n_fft=1024):
      audio_wav = librosa.load(filename, sr=sr, mono=True, duration=30)[0]
      audio_spec=to_spec(audio_wav, n_fft=n_fft, hop_length=hop_length)
      audio_spec_mag = np.abs(audio_spec)
      maxVal = np.max(audio_spec_mag)
      return audio_spec_mag, maxVal
    # FOR EVERY LIST (trainMixed, trainVocals, testMixed, testVocals)
    trainMixed = []
    trainMixedNum = 0
    for (root, dirs, files) in walk('./Dev-subset-mix/Dev/'):
      for d in dirs:
        filenameMix = './Dev-subset-mix/Dev/'+d+'/mixture.wav'
        spec_mag, maxVal = prepareData(filenameMix, n_fft=1024, hop_length=256)

    Next i build the model:

    import keras
    from keras.models import Sequential
    from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D
    from keras.optimizers import SGD
    from keras.layers.advanced_activations import LeakyReLU
    model = Sequential()
    model.add(Conv2D(16, (3,3), padding='same', input_shape=(513, 25, 1)))
    model.add(Conv2D(16, (3,3), padding='same'))
    model.add(Conv2D(16, (3,3), padding='same'))
    model.add(Conv2D(16, (3,3), padding='same'))
    model.add(Dense(1, activation='sigmoid'))
    sgd = SGD(lr=0.001, decay=1e-6, momentum=0.9, nesterov=True)
    model.compile(loss=keras.losses.binary_crossentropy, optimizer=sgd, metrics=['accuracy'])

    And run the model:, trainVocals,epochs=10, validation_data=(testMixed, testVocals))

    But I'm getting this result:

    ValueError: in user code:
        /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/ train_function  *
            return step_function(self, iterator)
        /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/ step_function  **
            outputs =, args=(data,))
        /usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/ run
            return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
        /usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/ call_for_each_replica
            return self._call_for_each_replica(fn, args, kwargs)
        /usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/ _call_for_each_replica
            return fn(*args, **kwargs)
        /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/ run_step  **
            outputs = model.train_step(data)
        /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/ train_step
            y_pred = self(x, training=True)
        /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/ __call__
        /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/ assert_input_compatibility
            ' input tensors. Inputs received: ' + str(inputs))
        ValueError: Layer sequential_1 expects 1 inputs, but it received 2 input tensors. Inputs received: [<tf.Tensor 'IteratorGetNext:0' shape=(None, 2584) dtype=float32>, <tf.Tensor 'IteratorGetNext:1' shape=(None, 2584) dtype=float32>]

    I am new to this topic, thanks for the help provided in advance.

  • Jorge Ramón
    Jorge Ramón over 3 years
    Hi, thanks for the help. I tried your code but the error change to: ValueError: Input 0 of layer sequential is incompatible with the layer: : expected min_ndim=4, found ndim=2. Full shape received: [513, 2584]
  • Aaron Keesing
    Aaron Keesing over 3 years
    I've updated my answer. The problem is that the shapes are incompatible, so you'll need to get your input to the shape that the Conv2D layer expects. Where did the shape (513, 25, 1) come from?
  • Jorge Ramón
    Jorge Ramón over 3 years
    Oh yes, I forgot slices the input data in that shape, thanks for the reply.
  • Danny Bullis
    Danny Bullis about 2 years
    Could you perhaps add some context as to why you are making this recommendation? How/why does converting these into solve the problem? Asking as someone a bit new to Tensorflow/Keras and I'm trying to wrap my head around this. Cheers
  • Aaron Keesing
    Aaron Keesing about 2 years
    @DannyBullis In my experience, using the TensorFlow data pipeline minimises incompatibilties to do with converting between NumPy arrays, Python objects, and tensors, as well as being useful for manipulating data asynchronously and using multiple workers, etc.