Tensorflow 3 channel order of color inputs
TL;DR: With your current program, the in-memory layout of the data should be should be R-G-B-R-G-B-R-G-B-R-G-B...
I assume from this line that you are passing in RGB images with 28x28 pixels:
self.x_image = tf.reshape(self.c_x, [-1, 28, 28, 3])
We can call the dimensions of self.x_image
are "batch", "height", "width", and "channel". This matches the default data format for tf.nn.conv_2d()
and tf.nn.max_pool()
.
In TensorFlow, the in-memory representation of a tensor is row-major order (or "C" ordering, because that is the representation of arrays in the C programming language). Essentially this means that the rightmost dimension is the fastest changing, and the elements of the tensor are packed together in memory in the following order (where ?
stands for the unknown batch size, minus 1):
[0, 0, 0, 0]
[0, 0, 0, 1]
[0, 0, 0, 2]
[0, 0, 1, 0]
...
[?, 27, 27, 1]
[?, 27, 27, 2]
Therefore your program probably isn't interpreting the image data correctly. There are at least two options:
-
Reshape your data to match its true order ("batch", "channels", "height", "width"):
self.x_image = tf.reshape(self.c_x, [-1, 3, 28, 28])
In fact, this format is sometimes more efficient for convolutions. You can instruct
tf.nn.conv2d()
andtf.nn.max_pool()
to use it without transposing by passing the optional argumentdata_format="NCHW"
, but you will also need to change the shape of your bias variables to match. -
Transpose your image data to match the result of your program using
tf.transpose()
:self.x_image = tf.transpose(tf.reshape(self.c_x, [-1, 3, 28, 28]), [0, 2, 3, 1])
![D Liebman](https://i.stack.imgur.com/PNDPR.jpg?s=256&g=1)
D Liebman
Updated on June 04, 2022Comments
-
D Liebman about 2 years
I'm using tensor flow to process color images with a convolutional neural network. A code snippet is below.
My code runs so I think I got the number of channels right. My question is, how do I correctly order the rgb data? Is it in the form rgbrgbrgb or would it be rrrgggbbb? Presently I am using the latter. Thanks. Any help would be appreciated.
c_output = 2 c_input = 784 * 3 def weight_variable(shape): initial = tf.truncated_normal(shape, stddev=0.1) return tf.Variable(initial) def bias_variable(shape): initial = tf.constant(0.1, shape=shape) return tf.Variable(initial) def conv2d(x, W): return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME') def max_pool_2x2(x): return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME') self.c_x = tf.placeholder(tf.float32, shape=[None, c_input]) self.c_y_ = tf.placeholder(tf.float32, shape=[None, c_output]) self.W_conv1 = weight_variable([5, 5, 3, 32]) self.b_conv1 = bias_variable([32]) self.x_image = tf.reshape(self.c_x, [-1, 28, 28 , 3]) self.h_conv1 = tf.nn.relu(conv2d(self.x_image, self.W_conv1) + self.b_conv1) self.h_pool1 = max_pool_2x2(self.h_conv1) self.W_conv2 = weight_variable([5, 5, 32, 64]) self.b_conv2 = bias_variable([64]) self.h_conv2 = tf.nn.relu(conv2d(self.h_pool1, self.W_conv2) + self.b_conv2) self.h_pool2 = max_pool_2x2(self.h_conv2) self.W_fc1 = weight_variable([7 * 7 * 64, 1024]) self.b_fc1 = bias_variable([1024]) self.h_pool2_flat = tf.reshape(self.h_pool2, [-1, 7 * 7 * 64 ]) self.h_fc1 = tf.nn.relu(tf.matmul(self.h_pool2_flat, self.W_fc1) + self.b_fc1) self.keep_prob = tf.placeholder(tf.float32) self.h_fc1_drop = tf.nn.dropout(self.h_fc1, self.keep_prob) self.W_fc2 = weight_variable([1024, c_output]) self.b_fc2 = bias_variable([c_output]) self.y_conv = tf.matmul(self.h_fc1_drop, self.W_fc2) + self.b_fc2 self.c_cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(self.y_conv, self.c_y_)) self.c_train_step = tf.train.AdamOptimizer(1e-4).minimize(self.c_cross_entropy) self.c_correct_prediction = tf.equal(tf.argmax(self.y_conv, 1), tf.argmax(self.c_y_, 1)) self.c_accuracy = tf.reduce_mean(tf.cast(self.c_correct_prediction, tf.float32))
-
jbm over 6 yearsI'm also trying to format some (non-image) data for training using an image-based architecture. You mention "batch", "height", "width", "channel", but I'm a bit confused about what these dimensions contain. My guess would be <batch_number>, <pixel_x_value>, <pixel_y_value>, <channel_0_value>, <batch_number>, <pixel_x_value>, <pixel_y_value>, <channel_1_value>, <batch_number>, <pixel_x_value>, <pixel_y_value>, <channel_2_value>, etc. Is that correct?
-
mrry over 6 yearsAlmost: element
[i, j, k, l]
in a 4-D tensor in NHWC format is the pixel for batch elementi
, y-coordinatej
, x-coordinatek
, and channell
. -
jbm over 6 yearsOh, of course... Just got my coordinates reversed. Thanks!
-
jbm over 6 yearsOne more question on this: My data won't explicitly represent every pixel, so should I pad out the empty pixels? Or is there a library function to pad/zero pixels that haven't been given a value?
-
mrry over 6 yearsMaybe
tf.image.resize_image_with_crop_or_pad()
would work for you? -
jbm over 6 yearsThanks, I'll give that a look.