InvalidArgumentError : ConcatOp : Dimensions of inputs should match

14,057

The error is coming from LSTMCell.call call method. There we are trying to tf.concat([inputs, h], 1) meaning that we want to concatenate the next input with the current hidden state before matmul'ing with the kernel variables matrix. The error is saying that you can't do it because the batch (0th) dimensions don't match up - your input is shaped [20,25] and your hidden state is shaped [30,100].

For some reason on your 32nd iteration, or whenever you see the error, the input is not batched to 30, but only to 20. This usually happens at the end of your training data when the total number of training examples does not evenly divide your batch size. This hypothesis is also consistent with "When i used smaller batch , it seems the code can run longer" statement.

Share:
14,057
DiIli
Author by

DiIli

Updated on June 25, 2022

Comments

  • DiIli
    DiIli almost 2 years

    Tensorflow 1.7 when using dynamic_rnn.It runs fine at first , but at the 32th(it changes when i run the code) step , the error appears. When i used smaller batch , it seems the code can run longer , however the error still poped up .Just cannt figure out what's wrong.

        from mapping import *
    
    
    def my_input_fn(features, targets, batch_size=20, shuffle=True, num_epochs=None, sequece_lenth=None):
        ds = tf.data.Dataset.from_tensor_slices(
            (features, targets, sequece_lenth))  # warning: 2GB limit
        ds = ds.batch(batch_size).repeat(num_epochs)
    
        if shuffle:
            ds = ds.shuffle(10000)
        features, labels, sequence = ds.make_one_shot_iterator().get_next()
        return features, labels, sequence
    
    
    def lstm_cell(lstm_size=50):
        return tf.contrib.rnn.BasicLSTMCell(lstm_size)
    
    
    class RnnModel:
        def __init__(self,
                     batch_size,
                     hidden_units,
                     time_steps,
                     num_features
                     ):
            self.batch_size = batch_size
            self.hidden_units = hidden_units
            stacked_lstm = tf.contrib.rnn.MultiRNNCell(
                [lstm_cell(i) for i in self.hidden_units])
            self.initial_state = stacked_lstm.zero_state(batch_size, tf.float32)
            self.model = stacked_lstm
            self.state = self.initial_state
            self.time_steps = time_steps
            self.num_features = num_features
    
        def loss_mean_squre(self, outputs, targets):
            pos = tf.add(outputs, tf.ones(self.batch_size))
            eve = tf.div(pos, 2)
            error = tf.subtract(eve,
                                targets)
            return tf.reduce_mean(tf.square(error))
    
        def train(self,
                  num_steps,
                  learningRate,
                  input_fn,
                  inputs,
                  targets,
                  sequenceLenth):
    
            periods = 10
            step_per_periods = int(num_steps / periods)
    
            input, target, sequence = input_fn(inputs, targets, self.batch_size, shuffle=True, sequece_lenth=sequenceLenth)
    
            initial_state = self.model.zero_state(self.batch_size, tf.float32)
    
            outputs, state = tf.nn.dynamic_rnn(self.model, input, initial_state=initial_state)  
    
            loss = self.loss_mean_squre(tf.reshape(outputs, [self.time_steps, self.batch_size])[-1], target)
            optimizer = tf.train.AdamOptimizer(learning_rate=learningRate)
            grads_and_vars = optimizer.compute_gradients(loss, self.model.variables)
            optimizer.apply_gradients(grads_and_vars)
    
            init_op = tf.global_variables_initializer()
            with tf.Session() as sess:
    
                for i in range(num_steps):
                    sess.run(init_op)
                    state2, current_loss= sess.run([state, loss])
                    if i % step_per_periods == 0:
                        print("period " + str(int(i / step_per_periods)) + ":" + str(current_loss))
            return self.model, self.state
    
    
    def processFeature(df):
        df = df.drop('class', 1)
        features = []
    
        for i in range(len(df["vecs"])):
            features.append(df["vecs"][i])
    
        aa = pd.Series(features).tolist()  # tramsform into list
        featuresList = []
        for i in features:
            p1 = []
            for k in i:
                p1.append(list(k))
            featuresList.append(p1)
    
        return featuresList
    
    
    def processTargets(df):
        selected_features = df[
            "class"]
        processed_features = selected_features.copy()
        return tf.convert_to_tensor(processed_features.astype(float).tolist())
    
    
    if __name__ == '__main__':
        dividNumber = 30
        """
        some code here to modify my data to input 
        
        it looks like this:
        inputs before use input function : [fullLenth, charactorLenth, embeddinglenth]
        """
    
        model = RnnModel(15, [100, 80, 80, 1], time_steps=dividNumber, num_features=25)
        model.train(5000, 0.0001, my_input_fn, training_examples, training_targets, sequenceLenth=trainSequenceL)
    

    And error is under here

    Traceback (most recent call last):
          File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\client\session.py", line 1330, in _do_call
            return fn(*args)
          File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\client\session.py", line 1315, in _run_fn
            options, feed_dict, fetch_list, target_list, run_metadata)
          File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\client\session.py", line 1423, in _call_tf_sessionrun
            status, run_metadata)
          File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 516, in __exit__
            c_api.TF_GetCode(self.status.status))
        tensorflow.python.framework.errors_impl.InvalidArgumentError: ConcatOp : Dimensions of inputs should match: shape[0] = [20,25] vs. shape[1] = [30,100]
             [[Node: rnn/while/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/concat = ConcatV2[N=2, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](rnn/while/TensorArrayReadV3, rnn/while/Switch_4:1, rnn/while/rnn/multi_rnn_cell/cell_3/basic_lstm_cell/Const)]]
        
        During handling of the above exception, another exception occurred:
        
        Traceback (most recent call last):
          File "D:/programming/mlwords/dnn_gragh.py", line 198, in <module>
            model.train(5000, 0.0001, my_input_fn, training_examples, training_targets, sequenceLenth=trainSequenceL)
          File "D:/programming/mlwords/dnn_gragh.py", line 124, in train
            state2, current_loss, nowAccuracy = sess.run([state, loss, accuracy])
          File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\client\session.py", line 908, in run
            run_metadata_ptr)
          File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\client\session.py", line 1143, in _run
            feed_dict_tensor, options, run_metadata)
          File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\client\session.py", line 1324, in _do_run
            run_metadata)
          File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\client\session.py", line 1343, in _do_call
            raise type(e)(node_def, op, message)
        tensorflow.python.framework.errors_impl.InvalidArgumentError: ConcatOp : Dimensions of inputs should match: shape[0] = [20,25] vs. shape[1] = [30,100]
             [[Node: rnn/while/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/concat = ConcatV2[N=2, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](rnn/while/TensorArrayReadV3, rnn/while/Switch_4:1, rnn/while/rnn/multi_rnn_cell/cell_3/basic_lstm_cell/Const)]]
    
    Caused by op 'rnn/while/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/concat', defined at:
      File "D:/programming/mlwords/dnn_gragh.py", line 198, in <module>
        model.train(5000, 0.0001, my_input_fn, training_examples, training_targets, sequenceLenth=trainSequenceL)
      File "D:/programming/mlwords/dnn_gragh.py", line 95, in train
        outputs, state = tf.nn.dynamic_rnn(self.model, input, initial_state=initial_state)#,sequence_length=sequence
      File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\ops\rnn.py", line 627, in dynamic_rnn
        dtype=dtype)
      File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\ops\rnn.py", line 824, in _dynamic_rnn_loop
        swap_memory=swap_memory)
      File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 3205, in while_loop
        result = loop_context.BuildLoop(cond, body, loop_vars, shape_invariants)
      File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 2943, in BuildLoop
        pred, body, original_loop_vars, loop_vars, shape_invariants)
      File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 2880, in _BuildLoop
        body_result = body(*packed_vars_for_body)
      File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 3181, in <lambda>
        body = lambda i, lv: (i + 1, orig_body(*lv))
      File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\ops\rnn.py", line 795, in _time_step
        (output, new_state) = call_cell()
      File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\ops\rnn.py", line 781, in <lambda>
        call_cell = lambda: cell(input_t, state)
      File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\ops\rnn_cell_impl.py", line 232, in __call__
        return super(RNNCell, self).__call__(inputs, state)
      File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\layers\base.py", line 714, in __call__
        outputs = self.call(inputs, *args, **kwargs)
      File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\ops\rnn_cell_impl.py", line 1283, in call
        cur_inp, new_state = cell(cur_inp, cur_state)
      File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\ops\rnn_cell_impl.py", line 339, in __call__
        *args, **kwargs)
      File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\layers\base.py", line 714, in __call__
        outputs = self.call(inputs, *args, **kwargs)
      File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\ops\rnn_cell_impl.py", line 620, in call
        array_ops.concat([inputs, h], 1), self._kernel)
      File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\ops\array_ops.py", line 1181, in concat
        return gen_array_ops.concat_v2(values=values, axis=axis, name=name)
      File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\ops\gen_array_ops.py", line 1101, in concat_v2
        "ConcatV2", values=values, axis=axis, name=name)
      File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
        op_def=op_def)
      File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\framework\ops.py", line 3309, in create_op
        op_def=op_def)
      File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\framework\ops.py", line 1669, in __init__
        self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access
    
    InvalidArgumentError (see above for traceback): ConcatOp : Dimensions of inputs should match: shape[0] = [20,25] vs. shape[1] = [30,100]
         [[Node: rnn/while/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/concat = ConcatV2[N=2, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](rnn/while/TensorArrayReadV3, rnn/while/Switch_4:1, rnn/while/rnn/multi_rnn_cell/cell_3/basic_lstm_cell/Const)]]
    

    this is my code used to check my input

    def checkData(inputs, targets, sequencelence):
        batch_size = 20
        features, target, sequece = my_input_fn(inputs, targets, batch_size=batch_size, shuffle=True, num_epochs=None,
                                                sequece_lenth=sequencelence)
        with tf.Session() as sess:
            for i in range(1000):
                features1, target1, sequece1 = sess.run([features, target, sequece])
                assert len(features1) == batch_size
                for sentence in features1 :
                    assert len(sentence) == 30
                    for word in sentence:
                        assert len(word) == 25
    
                assert len(target1) == batch_size
                assert len(sequece1) == batch_size
                print(target1)
        print("OK")
    
    • DiIli
      DiIli about 6 years
      At first , it can run without error popped up if the batch size is under a threshold like 50 .But after i add a function to calculate the accuracy, it cannot run even with small batch size. now i delete the code again.The code up there is the original type.
  • DiIli
    DiIli about 6 years
    Thanks to reply. But when I check the input , it is exactly [batch_size, time_steps, embedding_lenth].The check code is attached up there.
  • iga
    iga almost 6 years
    I don't see how you call this function, but here are some possible explanations. You call it once where there is still a full batch of data, so everything matches. It seems like in the code you use batch of 15, but in this test function it is 20. It might be that total number of examples is divisible by 20 but not by 15. Here is an issue discussing this and some ways of handling it - github.com/tensorflow/tensorflow/issues/13161
  • Admin
    Admin over 2 years
    Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.