InvalidArgumentError : ConcatOp : Dimensions of inputs should match

14,057

The error is coming from LSTMCell.call call method. There we are trying to tf.concat([inputs, h], 1) meaning that we want to concatenate the next input with the current hidden state before matmul'ing with the kernel variables matrix. The error is saying that you can't do it because the batch (0th) dimensions don't match up - your input is shaped [20,25] and your hidden state is shaped [30,100].

For some reason on your 32nd iteration, or whenever you see the error, the input is not batched to 30, but only to 20. This usually happens at the end of your training data when the total number of training examples does not evenly divide your batch size. This hypothesis is also consistent with "When i used smaller batch , it seems the code can run longer" statement.

14,057

Author by

DiIli

Updated on June 25, 2022

Comments

DiIli almost 2 years

Tensorflow 1.7 when using dynamic_rnn.It runs fine at first , but at the 32th(it changes when i run the code) step , the error appears. When i used smaller batch , it seems the code can run longer , however the error still poped up .Just cannt figure out what's wrong.

    from mapping import *


def my_input_fn(features, targets, batch_size=20, shuffle=True, num_epochs=None, sequece_lenth=None):
    ds = tf.data.Dataset.from_tensor_slices(
        (features, targets, sequece_lenth))  # warning: 2GB limit
    ds = ds.batch(batch_size).repeat(num_epochs)

    if shuffle:
        ds = ds.shuffle(10000)
    features, labels, sequence = ds.make_one_shot_iterator().get_next()
    return features, labels, sequence


def lstm_cell(lstm_size=50):
    return tf.contrib.rnn.BasicLSTMCell(lstm_size)


class RnnModel:
    def __init__(self,
                 batch_size,
                 hidden_units,
                 time_steps,
                 num_features
                 ):
        self.batch_size = batch_size
        self.hidden_units = hidden_units
        stacked_lstm = tf.contrib.rnn.MultiRNNCell(
            [lstm_cell(i) for i in self.hidden_units])
        self.initial_state = stacked_lstm.zero_state(batch_size, tf.float32)
        self.model = stacked_lstm
        self.state = self.initial_state
        self.time_steps = time_steps
        self.num_features = num_features

    def loss_mean_squre(self, outputs, targets):
        pos = tf.add(outputs, tf.ones(self.batch_size))
        eve = tf.div(pos, 2)
        error = tf.subtract(eve,
                            targets)
        return tf.reduce_mean(tf.square(error))

    def train(self,
              num_steps,
              learningRate,
              input_fn,
              inputs,
              targets,
              sequenceLenth):

        periods = 10
        step_per_periods = int(num_steps / periods)

        input, target, sequence = input_fn(inputs, targets, self.batch_size, shuffle=True, sequece_lenth=sequenceLenth)

        initial_state = self.model.zero_state(self.batch_size, tf.float32)

        outputs, state = tf.nn.dynamic_rnn(self.model, input, initial_state=initial_state)  

        loss = self.loss_mean_squre(tf.reshape(outputs, [self.time_steps, self.batch_size])[-1], target)
        optimizer = tf.train.AdamOptimizer(learning_rate=learningRate)
        grads_and_vars = optimizer.compute_gradients(loss, self.model.variables)
        optimizer.apply_gradients(grads_and_vars)

        init_op = tf.global_variables_initializer()
        with tf.Session() as sess:

            for i in range(num_steps):
                sess.run(init_op)
                state2, current_loss= sess.run([state, loss])
                if i % step_per_periods == 0:
                    print("period " + str(int(i / step_per_periods)) + ":" + str(current_loss))
        return self.model, self.state


def processFeature(df):
    df = df.drop('class', 1)
    features = []

    for i in range(len(df["vecs"])):
        features.append(df["vecs"][i])

    aa = pd.Series(features).tolist()  # tramsform into list
    featuresList = []
    for i in features:
        p1 = []
        for k in i:
            p1.append(list(k))
        featuresList.append(p1)

    return featuresList


def processTargets(df):
    selected_features = df[
        "class"]
    processed_features = selected_features.copy()
    return tf.convert_to_tensor(processed_features.astype(float).tolist())


if __name__ == '__main__':
    dividNumber = 30
    """
    some code here to modify my data to input 
    
    it looks like this:
    inputs before use input function : [fullLenth, charactorLenth, embeddinglenth]
    """

    model = RnnModel(15, [100, 80, 80, 1], time_steps=dividNumber, num_features=25)
    model.train(5000, 0.0001, my_input_fn, training_examples, training_targets, sequenceLenth=trainSequenceL)

And error is under here

Traceback (most recent call last):
      File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\client\session.py", line 1330, in _do_call
        return fn(*args)
      File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\client\session.py", line 1315, in _run_fn
        options, feed_dict, fetch_list, target_list, run_metadata)
      File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\client\session.py", line 1423, in _call_tf_sessionrun
        status, run_metadata)
      File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 516, in __exit__
        c_api.TF_GetCode(self.status.status))
    tensorflow.python.framework.errors_impl.InvalidArgumentError: ConcatOp : Dimensions of inputs should match: shape[0] = [20,25] vs. shape[1] = [30,100]
         [[Node: rnn/while/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/concat = ConcatV2[N=2, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](rnn/while/TensorArrayReadV3, rnn/while/Switch_4:1, rnn/while/rnn/multi_rnn_cell/cell_3/basic_lstm_cell/Const)]]
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "D:/programming/mlwords/dnn_gragh.py", line 198, in <module>
        model.train(5000, 0.0001, my_input_fn, training_examples, training_targets, sequenceLenth=trainSequenceL)
      File "D:/programming/mlwords/dnn_gragh.py", line 124, in train
        state2, current_loss, nowAccuracy = sess.run([state, loss, accuracy])
      File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\client\session.py", line 908, in run
        run_metadata_ptr)
      File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\client\session.py", line 1143, in _run
        feed_dict_tensor, options, run_metadata)
      File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\client\session.py", line 1324, in _do_run
        run_metadata)
      File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\client\session.py", line 1343, in _do_call
        raise type(e)(node_def, op, message)
    tensorflow.python.framework.errors_impl.InvalidArgumentError: ConcatOp : Dimensions of inputs should match: shape[0] = [20,25] vs. shape[1] = [30,100]
         [[Node: rnn/while/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/concat = ConcatV2[N=2, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](rnn/while/TensorArrayReadV3, rnn/while/Switch_4:1, rnn/while/rnn/multi_rnn_cell/cell_3/basic_lstm_cell/Const)]]

Caused by op 'rnn/while/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/concat', defined at:
  File "D:/programming/mlwords/dnn_gragh.py", line 198, in <module>
    model.train(5000, 0.0001, my_input_fn, training_examples, training_targets, sequenceLenth=trainSequenceL)
  File "D:/programming/mlwords/dnn_gragh.py", line 95, in train
    outputs, state = tf.nn.dynamic_rnn(self.model, input, initial_state=initial_state)#,sequence_length=sequence
  File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\ops\rnn.py", line 627, in dynamic_rnn
    dtype=dtype)
  File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\ops\rnn.py", line 824, in _dynamic_rnn_loop
    swap_memory=swap_memory)
  File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 3205, in while_loop
    result = loop_context.BuildLoop(cond, body, loop_vars, shape_invariants)
  File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 2943, in BuildLoop
    pred, body, original_loop_vars, loop_vars, shape_invariants)
  File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 2880, in _BuildLoop
    body_result = body(*packed_vars_for_body)
  File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 3181, in <lambda>
    body = lambda i, lv: (i + 1, orig_body(*lv))
  File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\ops\rnn.py", line 795, in _time_step
    (output, new_state) = call_cell()
  File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\ops\rnn.py", line 781, in <lambda>
    call_cell = lambda: cell(input_t, state)
  File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\ops\rnn_cell_impl.py", line 232, in __call__
    return super(RNNCell, self).__call__(inputs, state)
  File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\layers\base.py", line 714, in __call__
    outputs = self.call(inputs, *args, **kwargs)
  File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\ops\rnn_cell_impl.py", line 1283, in call
    cur_inp, new_state = cell(cur_inp, cur_state)
  File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\ops\rnn_cell_impl.py", line 339, in __call__
    *args, **kwargs)
  File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\layers\base.py", line 714, in __call__
    outputs = self.call(inputs, *args, **kwargs)
  File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\ops\rnn_cell_impl.py", line 620, in call
    array_ops.concat([inputs, h], 1), self._kernel)
  File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\ops\array_ops.py", line 1181, in concat
    return gen_array_ops.concat_v2(values=values, axis=axis, name=name)
  File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\ops\gen_array_ops.py", line 1101, in concat_v2
    "ConcatV2", values=values, axis=axis, name=name)
  File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\framework\ops.py", line 3309, in create_op
    op_def=op_def)
  File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\framework\ops.py", line 1669, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): ConcatOp : Dimensions of inputs should match: shape[0] = [20,25] vs. shape[1] = [30,100]
     [[Node: rnn/while/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/concat = ConcatV2[N=2, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](rnn/while/TensorArrayReadV3, rnn/while/Switch_4:1, rnn/while/rnn/multi_rnn_cell/cell_3/basic_lstm_cell/Const)]]

this is my code used to check my input

def checkData(inputs, targets, sequencelence):
    batch_size = 20
    features, target, sequece = my_input_fn(inputs, targets, batch_size=batch_size, shuffle=True, num_epochs=None,
                                            sequece_lenth=sequencelence)
    with tf.Session() as sess:
        for i in range(1000):
            features1, target1, sequece1 = sess.run([features, target, sequece])
            assert len(features1) == batch_size
            for sentence in features1 :
                assert len(sentence) == 30
                for word in sentence:
                    assert len(word) == 25

            assert len(target1) == batch_size
            assert len(sequece1) == batch_size
            print(target1)
    print("OK")

DiIli about 6 years

At first , it can run without error popped up if the batch size is under a threshold like 50 .But after i add a function to calculate the accuracy, it cannot run even with small batch size. now i delete the code again.The code up there is the original type.

DiIli about 6 years

Thanks to reply. But when I check the input , it is exactly [batch_size, time_steps, embedding_lenth].The check code is attached up there.
iga almost 6 years

I don't see how you call this function, but here are some possible explanations. You call it once where there is still a full batch of data, so everything matches. It seems like in the code you use batch of 15, but in this test function it is 20. It might be that total number of examples is divisible by 20 but not by 15. Here is an issue discussing this and some ways of handling it - github.com/tensorflow/tensorflow/issues/13161
Admin over 2 years

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.