Adjustable learning rate for deep learning by TensorFlow

趁着最近空闲一点，继续纠结TensorFlow学习率调整的问题。学习率对网路训练的影响还是挺大的，初始化时可以采用大的学习率，帮助网络快速收敛。但网络的梯度是非线性的，随着迭代次数的增加，梯度的导数趋于变小，如果继续保持大学习率，会出现优化目标在最优解附近波动的情况，如下图所示。此时，为了接近收敛点，需要调整学习率。

参考TensorFlow: 实战Google深度学习框架这本书，利用指数衰减的方法设置梯度下降算法的学习率，TensofFLow中集成了这一算法，即tf.train.exponential_decay。其实现了

$L^{t+1}_{R} = L^{t}_{R} \cdot R_{d}^{S_g/S_d},$

其中$L_R$表示学习率，$R_d$表示衰减率，$S_g$和$S_d$分别表示总Epochs和每个Epochs中的Batches。从而没迭代一轮，学习率便更新一次。

在TensorFlow中可以用如下代码实现指数衰减的学习率更新。

import numpy as np
import tensorflow as tf
init_lr = tf.Variable(0., name="LR")
global_step = tf.Variable(0., name="global_step")
decay_step = tf.Variable(0., name="decay_step")
decay_rate = tf.Variable(0., name="decay_rate")
learning_rate = tf.train.exponential_decay(
    learning_rate = init_lr ,
    global_step = global_step,
    decay_steps = decay_step,
    decay_rate = decay_rate,
    staircase=False,
    name=None
	)
# Test
lr_init = 0.1
epochs = 200
batches = 100.
d_rate = 0.9
epoch = np.arange(0,epochs,1)
lr = np.zeros(epoch.shape)
# Init a session
init_op = tf.global_variables_initializer()
sess = tf.InteractiveSession()
sess.run(init_op)
for i in epoch.astype(int):
	lr[i] = sess.run(learning_rate, 
                     feed_dict={init_lr: lr_init,
                                global_step: i,
                                decay_step: batches,
                                decay_rate: d_rate
                                })

这里提供了一个样例，其输出结果如下图，通过设置tf.exponential_decay的参数staircase可以控制learning rate是否为阶梯型或者平滑的。

而在训练网络，优化目标函数的过程中，可以参考官方手册，用如下语句进行梯度更新。

...
global_step = tf.Variable(0, trainable=False)
starter_learning_rate = 0.1
learning_rate = tf.train.exponential_decay(starter_learning_rate, global_step,
                                           100000, 0.96, staircase=True)
# Passing global_step to minimize() will increment it at each step.
learning_step = (
    tf.train.GradientDescentOptimizer(learning_rate)
    .minimize(...my loss..., global_step=global_step)
)

Reference

[1] exponential decay

Adjustable learning rate for deep learning by TensorFlow

Reference

Jason Ma