Today, let’s talk about transfer learning based on TensorFlow. Firstly, what is transfer learning? It is a strategy of building your own deeplearning based project with existing well-trained networks, so as to avoid some risks like overfitting, time-consuming and etc.

In our work, we are trying to classify some astronomical images with convolutional neural networks (CNN). As we know, the CNN networks should be trained with billions of samples to achieve optimum the parameters of weights and biases. However, only thousands of labelled samples do we have, which can’t activate the performance of the network. Since that, we propose to train the network by transfer learning.

Here we come to the main body, i.e. how to realize transfer learning with your computer? In this blog, I’m going to tell you some tricks to realize this staff with TensorFlow, a famous deeplearning framework based on Python.

1. How to save and restore the net?

In order to realize the transfer learning, your script should pocess the ability to save and restore the network. Bellow are the code example of this two processes.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import tensorflow as tf
# Save the session as a check point
sess = tf.InteractiveSession() # Instance a session
saver = tf.train.Saver() # Instance a saver
saver.save(sess, savepath) # save the session, i.e. the network
# Restore
# [Note] The saver should be initialized after the graph defined...
graph = tf.Graph()
with graph.as_default():
# [Variable and model creation goes here.]
saver = tf.train.Saver()
sess = tf. InteractiveSession()
saver.restore(sess, netpath)

It should be noted that, the instancing of saver should be after defination of the network. I find that though the restored session sess saved the graph of the newtork, including tensors, variables, and operations, the variables still need to be initialized when training after restoration.

2. How to realize the transfer learning?

Here we come to the transfer learning. A typical process is that we save the paramters of the convolutional layers (ConvLayer), and replace the fully connected and output layers with new weights and biases. This is because the ConvLayers are the part to extract features of those samples, while the fully connected layers composing the classifier itself. Our target is to save the feature represantation part and update the classifier according to our project.

Denote the last output of the ConvLayers is a tensor with name ConvLayer_output, and suppose a session namely sess has been restored, then we can build our own classifier, see as follows,

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# Get tensor 'ConvLayer_output' from sess
l_conv_output = sess.graph.get_tensor_by_name("ConvLayer_output")
# Add new fully connected layers and softmax layer
# Suppose we have 10 class to be classied
numclass = 10
y_ = tf.placeholder(tf.float32, shape=[None, numclass], name="cnn-softmax")
# Add a fully connected as an example
def weight_variable(shape):
initial = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)
def bias_variable(shape):
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial)
last_conv_shape = l_conv_output.get_shape().as_list()
input_shape = last_conv_shape[1] * last_conv_shape[2] * last_conv_shape[3]
output_shape = 1024
W_fc = weight_variable(shape = [input_shape, output_shape])
b_fc = bias_variable(shape=[output_shape])
l_fc = tf.nn.relu(tf.matmul(l_conv_output, W_fc) + b_fc)
# Fully connected to softmax
input_shape = 1024
output_shape = numclass
W_soft = weight_variable(shape = [input_shape, output_shape])
b_soft = bias_variable(shape = [output_shape])
l_y = tf.nn.softmax(tf.matmul(l_fc, W_soft) + b_soft)

After that, our new network can be trained on our samples,

1
2
3
4
5
# Initialize all the parameters, including the pretrained net and concated layers
init_op = tf.global_variables_initializer()
sess.run(init_op)
# [Training lines goes here]

Finally, we obtain the network, and evaluations can be conducted, enjoy yourselves.

References