The auto-encoder has been applied widely for unsupervised learning, which is usually composed of two symmetric parts namely encoder and decoder. It is easy to realize an autoencoder only with fully-connected layers, i.e., DNN, but which is not that clear in CNN.

For convolution case, the layer in the decoder maintains the shape and kernel configurations for its symmetric layer in the encoder, thus the deconvolution, or transpose convolution operation will be used instead of the convolution operation.

TensorFlow provides a method namedly conv2d_transpose in both tf.nn module and tf.contrib.layers module, which are very convenient. However, for tf.contrib.layers.conv2d_transpose, if the output shape of the transpose convolutution is odd when convolution stride setting as 2, it cannot control the output shape to desired one.

For example, denote a [None, 9, 9, 1] 4D-tensor $X$, convolved by a kernel of size [3, 3] with a 2 step stride and halp padding (SAME), the output 4D tensor $y$ will be [None, 5, 5, 1]. However, the transpose convolution from y by the same parameters setting generates $x’$ into a [None, 10, 10, 1] tensor, not [None, 9, 9, 1].

To handle this, I provide a naive but effective way, see as follows,

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import tensorflow as tf
import tensorflow.contrib.layers as layers
x = tf.placeholder(tf.float32, shape=[None, 5, 5, 1])
y = tf.placeholder(tf.float32, shape=[None, 9, 9, 1])
kernel_size = [3, 3]
stride = 2
x_r = layers.conv2d_transpose(
inputs=x,
num_outputs=x.get_shape().as_list()[1],
kernel_size=kenerl_size,
padding='SAME',
stride=stride,
scope='conv2d_transpose'
)
x_r = x_r[:, 0:-1, 0:-1, :]

Above solution played well in my code, though ths crop may introduce bias..