Transpose convolution by tensorflow--odd kernel shape

Jan 31, 2018

The auto-encoder has been applied widely for unsupervised learning, which is usually composed of two symmetric parts namely encoder and decoder. It is easy to realize an autoencoder only with fully-connected layers, i.e., DNN, but which is not that clear in CNN.

For convolution case, the layer in the decoder maintains the shape and kernel configurations for its symmetric layer in the encoder, thus the deconvolution, or transpose convolution operation will be used instead of the convolution operation.

TensorFlow provides a method namedly conv2d_transpose in both tf.nn module and tf.contrib.layers module, which are very convenient. However, for tf.contrib.layers.conv2d_transpose, if the output shape of the transpose convolutution is odd when convolution stride setting as 2, it cannot control the output shape to desired one.

For example, denote a [None, 9, 9, 1] 4D-tensor $X$, convolved by a kernel of size [3, 3] with a 2 step stride and halp padding (SAME), the output 4D tensor $y$ will be [None, 5, 5, 1]. However, the transpose convolution from y by the same parameters setting generates $x’$ into a [None, 10, 10, 1] tensor, not [None, 9, 9, 1].

To handle this, I provide a naive but effective way, see as follows,

import tensorflow as tf
import tensorflow.contrib.layers as layers
x = tf.placeholder(tf.float32, shape=[None, 5, 5, 1])
y = tf.placeholder(tf.float32, shape=[None, 9, 9, 1])
kernel_size = [3, 3]
stride = 2
x_r = layers.conv2d_transpose(
        inputs=x,
        num_outputs=x.get_shape().as_list()[1],
        kernel_size=kenerl_size,
        padding='SAME',
        stride=stride,
        scope='conv2d_transpose'
        )
x_r = x_r[:, 0:-1, 0:-1, :]

Above solution played well in my code, though ths crop may introduce bias..

Comment and share

Upsampling for 2D convolution by tensorflow

Jan 27, 2018

A convolutional auto-encoder is usually composed of two sysmmetric parts, i.e., the encoder and decoder. By TensorFlow, it is easy to build the encoder part using modules like tf.contrib.layers or tf.nn, which encapsulate methods for convolution, downsampling, and dense operations.

However, as for the decoder part, TF does not provide method like upsampling, which is the reverse operation of downsampling (avg_pool2, max_pool2). This is because max pooling is applied more frequently than average pooling, while recover an image from max-pooled matrix is difficult for lossing of locations of the max points.

For the average-pooled feature maps, there is a simple way to realize upsampling without high-level API like keras, but with basic functions of TF itself.

Now, suppose the input is a 4-D tenser whose shape is [1, 4, 4, 1] and sampling rate is [1, 2, 2, 1], then the upsampled matrix is also a 4-D tenser of shape [1, 8, 8, 1]. Following lines can realize this operation.

import tensorflow as tf
x = tf.ones([1, 4, 4, 1])
k = tf.ones([2, 2, 1, 1]) # note k.shape = [rows, cols, depth_in, depth_output]
output_shape=[1, 8, 8, 1]
y = tf.nn.conv2d_transpose(
    value=x,
    filter=k,
    output_shape=output_shape,
    strides=[1, 2, 2, 1],
    padding='SAME'
        )
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print(sess.run(y))

Then, y is the upsampled matrix.

You may also realize upsampling by the resize_images function of moduletf.image, which is,

y = tf.image.resize_images(
    images=x,
    size=[1, 8, 8, 1],
    method=ResizeMethod.NEAREST_NEIGHBOR
        )

Enjoy yourself.

References

[1] Transposed convolution arithmetic

Comment and share

Residual network II -- realization by tensorflow

Jan 27, 2018

来填ResNet的坑，residual network的原理已经在上一篇里做了介绍，这一篇来讨论如何用TensorFlow实现。

虽然TF提供了slim这个库，可以很方便地搭建网络，但考虑到移植和扩展性，还是决定用tf.contrib.layers的函数和tf基本的函数来写。我们知道，ResNet的核心模块是Bottleneck，如下图所示，每个bottleneck的输入会通过两条路径在输出汇聚，计算残差，作为下一层的输入。

多个botleneck组合成一个block，通常会在每个block的最后一个bottleneck进行降采样，以缩小特征图大小。

具体的实现可以参考我的notebook, 下面贴一个在手写体识别样本上的测试结果，对比了这篇文章里讨论的DNN网络。

可以看出ResNet的效果还是非常显著的。但是得强调一下，由于网络显著加深，训练时占用的显存资源非常大，普通的GPU非常吃力。

Comment and share

How to apply the batch-normalized net

Jan 25, 2018

继续填这篇文章的坑，如何测试和应用包含了Batch Normalization层的网络？在训练过程中，每个BN层直接从输入样本中求取mean和variance量，不是通过学习获取的固定值。因此，在测试网络时，需要人工提供这两个值。

在BN的文章里的处理方法是，对所有参与训练的mini-batch的均值和方差进行收集，采用无偏估计的方式估计总体样本的均值和方差，来表征测试样本的均值和方差，其公式如下，

$\begin{align} E[x] &= E[\mu_B], \notag \\ \mathrm{Var}[x] &= \frac{m}{m-1} \cdot E[{\sigma_B}^2], \notag \end{align}$

进而，BN layer的输出定义为，

$y = \frac{\gamma}{\sqrt{\mathrm{Var}[x]+\epsilon}}\cdot x + (\beta - \frac{\gamma E[x]}{\sqrt{\mathrm{Var}[x]+\epsilon}}).$

那么有如下几个问题需要解决，

训练和测试过程中如何给BN传递mean和variance？即如何在计算图上体现这一运算？
如何动态收集每个mini-batch的mean和variance，用于总体样本的无偏估计moving_mean, moving_variance

针对以上问题，TensorFlow的解决思路是设定is_training这个flag，如果为真，则每个mini-batch都会计算均值和方差，训练网络; 如果为假，则进入测试流程。

基于tf.nn.batch_normalization的底层实现

TF提供了tf.nn.batch_normalization函数从底层搭建网络，其直接参考了Ioeff\&Szegdy的论文，这里需要利用tf.nn.moments求取mini-batch的均值和方差，详细的实现代码参考这里.

with tf.name_scope('BatchNorm'):  
	axis = list(range(len(x.get_shape()) - 1))
    mean,var = tf.nn.moments(x_h, axis)
    with tf.name_scope('gamma'):
        gamma = tf.Variable(tf.constant(0.1, shape=mean.get_shape()))
    with tf.name_scope('beta'):
        beta = tf.Variable(tf.constant(0.1, shape=mean.get_shape()))
    y = tf.nn.batch_normalization(
           x = x_h,
           mean = mean,
           variance = var,
           offset = beta,
           scale = gamma,
           variance_epsilon = 1e-5,
           name= 'BN')

基于tf.contrib.layers.batch_norm的实现

在tf.contrib.layers提供了batch_norm方法，该方法是对tf.nn.batch_normalization的封装，增加了如center，is_training等变量，并对BN的基础算法做了更新，用滑动平均来实现均值和房车的估计。

那么，如何实现包含BN层的网络的训练和测试？其核心是利用is_training作为flag控制输入给BN的mean和variance的来源，以及如何将moving_mean和moving_variance加入网络的训练过程中。

TF官方的建议方法解释是，
Note: when training, the moving_mean and moving_variance need to be updated. By default the update ops are placed in tf.GraphKeys.UPDATE_OPS, so they need to be added as a dependency to the train_op. For example:

1
2
3

update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
  with tf.control_dependencies(update_ops):
    train_op = optimizer.minimize(loss)

参考这篇博客，作者对此做了更棒的解释！！！！！
When you execute an operation (such as train_step), only the subgraph components relevant to train_step will be executed. Unfortunately, the update_moving_averages operation is not a parent of train_step in the computational graph, so we will never update the moving averages!

作者的解决方法：Personally, I think it makes more sense to attach the update ops to the train_step itself. So I modified the code a little and created the following training function

update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
    with tf.control_dependencies(update_ops):
        # Ensures that we execute the update_ops before performing the train_step
        train_step = tf.train.GradientDescentOptimizer(0.01).minimize(loss)
    sess = tf.Session()
    sess.run(tf.global_variables_initializer())

以上代码在tf.slim.batch_norm中也有体现，slim是对tf的一个更高层的封装，利用slim实现的ResNet-v2-152可以参考这里。

最后，贴上基于tf.contrib.layers.batch_norm的实现样例，更详细的实现见我的notebook。

import tensorflow as tf
import tensorflow.contrib.layers as layers
with tf.name_scope('BatchNorm'):  
	y = layers.batch_norm(
    	x_h,
        center=True,
        scale=True,
        is_training=is_training)
        
# Train step 
# note: should add update_ops to the train graph
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
    with tf.name_scope('train'):
        train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(cross_entropy)

MLP是否采用BN的结果对比

最后，贴一个是否采用BN层的结果对比，效果还是比较显著的。但是我也发现由于我设置的网络层数和FC长度都比较可观，随着Epochs增大，BN的优势并没有那么明显了。。。

Enjoy it !! 我终于把这个问题看懂了，开心

References

[1] Ioffe, S. and Szegedy, C., 2015, June. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning (pp. 448-456).
[2] tensorflow 中batch normalize 的使用
[3] docs: batch normalization usage in slim #7469
[4] tf.layers.batch_normalization
[5] TENSORFLOW GUIDE: BATCH NORMALIZATION

Comment and share

An interesting usage of excel sumif function

Jan 24, 2018

新开一个Tag,关于Excel的，突然有去学VBA的想法了。。。然而作为码农，有了python和pandas，为什么还要用VBA。。。言归正传，描述一下问题，如图所示，

已知A列和B列，其中A列对应ID,B列对应该ID的数值。设C列某一行为R,统计A列A2:AR中出现ID=AR的元素的行号，将B列中相应行的值相加，输出到CR单元格中。用如下算法表示，

1
2
3

R <-- ROW(CR)
Rows <-- FIND(A2:AR, AR) 
CR <-- SUM(B(Rows))

利用Excel自带的公式SUMIF可以实现以上算法，其核心是固定A2,用A$2表示，相应的公式则为

1	=SUMIF(A$2:A2, A2, B$2:B2)

最后，再利用excel单元格的自动填充功能，获取CR对应的运算结果。

References

[1] SUMIF函数

Comment and share

TensorFlow graph and TensorBoard

Jan 24, 2018

继续挖坑，因为Batch normalization的inference还没有解决，不知道怎么在tf.Tensor中更新数值，参考tf.layer.BatchNormalization的思路，需要利用is_training来选择feed给BN的mean和variance。因此，需要理解TensorFlow的graph概念。除此之外，TF提供了TensorBoard作为计算图可视化的工具，也是值得借鉴的。

计算图—Graph

TensorFlow是一个通过计算图的形式来表达张量之间通过计算相互转化的过程[1]。作为TensoFlow的基本概念，TF中的所有计算都会转化为计算图上的节点，节点之间的边用于描述计算之间的依赖关系。TF中不同的计算图之间维护的节点和边是独立的。通过tf.GraphKeys可以对图进行维护，稍后会介绍。

利用tf.Graph类实例化计算图，有如下的样例，

import tensorflow as tf
g1 = tf.Graph()
with g1.as_default():
	v = tf.get_variable{"v", initializer=tf.zeros_initializer(shape=[1])}

TensorBoard 可视化

TensorBoard可以有效地展示TF在运行过程中的计算图、各种指标随时间的变化趋势以及训练中使用的图像等信息，其利用TF运行过程中输出的日志文件可视化程序的运行状态。TensorBoard和TensorFlow运行在不同的进程中，TB会自动读取TF的日志文件，并呈现当前的运行状态。根据TensorBoard的官方文档，其实现过程有如下步骤，

1. 搭建网络

First, create the TensorFlow graph that you’d like to collect summary data from, and decide which nodes you would like to annotate with summary operations.

利用TensorFlow搭建网络，例如基于MLP的手写体识别网络，提供一个样例，可以参考。

2. Select variables to be summarized

通常我们关注训练过程中待学习的参数 (如weights, biases等) 的变化和收敛情况，通过tf.summary.scalar和tf.summary.histogram可以很方便地收集这些数据。下面给出一个样例，参考了[3]，

def variable_summaries(var):
  """Attach a lot of summaries to a Tensor (for TensorBoard visualization)."""
  with tf.name_scope('summaries'):
    mean = tf.reduce_mean(var)
    tf.summary.scalar('mean', mean)
    with tf.name_scope('stddev'):
      stddev = tf.sqrt(tf.reduce_mean(tf.square(var - mean)))
    tf.summary.scalar('stddev', stddev)
    tf.summary.scalar('max', tf.reduce_max(var))
    tf.summary.scalar('min', tf.reduce_min(var))
    tf.summary.histogram('histogram', var)

而对于单个Variable，可以用tf.summary.scalar(name,variable)的形式进行收集。

3. Merge summary data

定义好要收集的信息，需要对他们进行汇总，此时用tf.summary.merge_all进行汇总，例如

1	merged = tf.summary.merge_all()

4. FileWriter for saving the log

为了将训练中收集的数据保存到日志中，可以用tf.summary.FileWriter来实现，此时需要提供存放日志的文件夹位置，即logdir。

logdir = './log'
if not os.path.exists(logdir):
    os.mkdir(logdir)
train_writer = tf.summary.FileWriter(logdir + '/train',
                                      sess.graph)
test_writer = tf.summary.FileWriter(logdir + '/test')

此时，利用sess.run(tf.global_variables_initializer())初始化所有Tensor和Variable,便可以开始训练网络。

5. Running

merged的类型为tensorflow.python.framework.ops.Tensor，属于tf.Graph的一部分。因此也需要在session中运行，并且要提供feed_dict。相应的python实现如下

summary, _ = sess.run(
                [merged, train_step], 
                feed_dict=feed_dict)
train_writer.add_summary(summary)

6. 利用TensorBoard可视化

TensorFlow提供了tensorboard.py脚本用于可视化。利用如下指令启动并运行TensorBoard, 其默认端口为6006，这里利用--port显式表达

1	$ tensorboard --logdir='./log --port=6006'

注：在执行此条指令的时候，可能会报错command not found: tensorboard，解决方法参考本文

以下给出运行的样例，我写了一个notebook，欢迎来PR :)

Graph及FC1和train节点

Scalars and histograms

References

[1] TensorFlow实战Google深度学习框架
[2] tf.summary.FileWriter
[3] TensorBoard: Visualizing Learning
[3] tensorboard在linux下的启动问题

Comment and share

multiple websites with nginx

Jan 23, 2018

前面写过文章讨论如何在Hexo搭建的网站里添加Google Analytics, 想着把百度的分析也用起来，顺便去百度站长看了一下网站的检索情况，然后就发现github把百度的爬虫给墙了，返回的都是403。解决方法有以下几种，

利用gitcafe作为github的镜像，再利用dnspod将国内访问解析到gitcafe上，国外域名解析到github上
自己利用vps搭服务器，同样用dnspod进行解析。

我准备尝试后者。

考虑到已经在vps上挂了一个网站，此刻如果镜像github的网站，就得让nginx部署两个网站，即虚拟服务器，参考了几篇博客，记录一下配置的过程。

1. 定位nginx的配置文件

如果是用lnmp安装的nginx,其配置文件在/usr/local/nginx/conf/nginx.conf中，可以通过cat查看，应该是默认配置了一个default服务器。

2. 修改./vhost/*.conf，添加多个网站配置

参考[2]，按照如下过程配置，

1 2	$ cd /usr/local/nginx/cong/vhost $ sudo vim vhost_myweb.conf

添加如下行，以网站路径为/home/wwwroot/myweb为例

server
    {
	listen 80;
	server_name xxx.xxx.xxx;
	index index.html;
	root /home/wwwroot/myweb;
	#php
	location ~ /.php$ {
		include fastcgi_params;
		fastcgi_pass 127.0.0.1:9000;
		fastcgi_index index.php;
		fastcgi_param SCRIPT_FILENAME /home/wwwroot/myweb$fastcgi_script_name;
		}
}

紧接着，修改/usr/local/php/etc/php-fpm.conf为

1
2
3

[www]
# listen = /tmp/php-cgi.sock
listen = 127.0.0.1:9000

最后重启php和nginx服务，如下

1
2
3

$ sudo killall php-fpm
$ sudo systemctl start php-fpm.service
$ sudo /etc/init.d/nginx restart

到此便配置完毕。

502 bad gateway 问题

在配置过程中碰到了502 bad gateway的问题，引起502的可能因素很多，其中对于lnmp常见的就是nginx和php通信的问题。常用的通信方法有两种，分别是socket和ip:port。这里我采用的是ip:port。详细的解释可以参考这篇文章

References

[1] 如何解决百度爬虫无法爬取搭建在Github上的个人博客的问题？
[2] 在Nginx上配置多个站点
[3] nginx502错误和错误日志级别
[4] LNMP 常见502 Bad Gateway问题汇总

Comment and share

Batch normalization with TensorFlow

Jan 23, 2018

ResNet的坑还没填，先挖Batch Normalization (BN)的坑吧。读了Ioffe & Szegedy 2015年的论文,做个笔记。参考了几篇相关的博客，还是很有帮助的。如此文所述，BN也是神经网络中的一层，并且包含待训练的参数，而这些参数也是其精髓之一。下面我将分两个部分来说，(1) BN的优势、原理及推导, 以及(2) BN在TensorFlow下的实现。

BN的优势、原理及推导

Motivations

首先，Ioffe和Szegedy在摘要中说明了提出BN的动机，其原文为，
The deep neural network is complicated by the fact that the distribution of each layer’s inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities.

即在网络的训练过程中，受到上一层参数调整的影响，该层输入的分布会发生变化，使得参数训练要重新适应这一分布变化，导致网络的训练时间和迭代次数增加。为了避免这些问题，传统方法只能采用小的学习率以及适当的参数初始化策略; 并且无法避免大数值输出在经过激活函数后进入饱和区 (saturating nonlinearities) 的问题。Ioffe和Szegedy将这一现象命名为internal covariate shift。

LeCun在1998年的论文中提出过这样一个观点，The newtork training converges faster if its inputs are whitened, i.e., linearily transformed to have zero means and unit variances, and decorrelated. 即对网络的输入进行归一化，有利于加速网络的收敛。这一点也是比较好理解的，参考传统机器学习算法，通常分类器的输入是人工提取的特征，这些特征在量纲和数量级上会有相当大的差异性，如果直接输入网络，数值较大的特征便会影响训练结果。为了解决这一问题，通常对特征进行去量纲的操作，最好的方式便是归一化，这种针对每一维特征进行归一化的处理方法也是BN的核心。

那么，在具有多层结构的网络内部，是否可以对每一层的输入做归一化处理，既稳定了样本的分布，也能加速网络的收敛。

BN的优势

根据论文的Abstract和Intro, BN的优势可以概括为以下四点，

解决internal convariate shift问题，加速网络训练;
避免了梯度传递过程原始数据大小对参数的影响，使得可以选择大的学习率;
正则化了网络，可以不使用dropout;
将输入集中在小区间内，避免经过激活函数的输出进入饱和区。

BN的原理与推导

BN文的第二节对normalization进行了讨论，并在第三节通过两个简化对Batch Normalization进行定义，以解决Full whitenning of each layer’s inputs is cotly的问题。

Nomalize each scalar feature indepently, by making whitened
Computation over a mini-batch instead of m individuals.

定义 $\mathbf{x}=[x^{(1)}, x^{(2)}, ... , x^{(K)}]$ , $x\in \chi,\chi={x_1,x_2,...,x_m}$ 为该层的输入 (上一层的输出)，其中m表示mini-batch中的样本数。$\hat{x}^{(k)}$表示$\mathbf{x}$第$k$维特征的whitening量，由下式给出

$\begin{align} \hat{x}^{(k)} = \frac{x^{(k)} - E[x^{(k)}]}{\sqrt{Var[x^{(k)}]}}, \notag \end{align}$

其中$E[\cdot]$和$Var[\cdot]$分别对应$x_i^{(k)}, i=1,2,…,m$的期望和方差。

以上虽然对输入做了归一话，但如BN文中所述Simply normalizing each input of a layer may change what the layer can represent。以激活函数sigmoid为例，如果$x^{(k)}$本来的数值较大，经过sigmoid函数以后会分布在接近饱和区的两端，而白化以后则剧集在sigmoid(x)=0附近，使得原有的分布消失。为了解决这一问题，需要对$\hat{x}$进行尺度(scale)和位移(shift)变换，从而有，

$y^{(k)} = \gamma^{(k)} \cdot \hat{x}^{(k)} + \beta^{(k)},$

其中$y^{(k)}$即为BN的输出，$\gamma^{(k)}$和$\beta^{(k)}$分别对应scale和shift，是Batch Normalization layer的参数，需要进行初始化，并在网络训练的过程中进行学习和调整。

BN和activation层的关系

我们知道，通常在Conv layer和FC layer之后，会加上非线性的激活函数层对输出进行调整，并作为下一层的输入。而BN层也是在Conv layer和FC layer之后对输出进行归一化的，二者的先后关系如何？参考BN文3.2节，定义$z$为activation layer的输出，则有，

$z = g[\mathrm{BN}(x)],$

即先进行Batch Normalization,后通过激活函数。

BN的训练和测试(inference)

BN的训练和测试可以参考下图，截取自Ioffe和Szegedy的论文，这里要注意的是inference的理解，即新样本需要经过怎样的预处理，以利用训练好的网络进行测试？

网络的训练过程采用了batch training，每个batch会有对应的期望$\muB$和标准差$\sigma{B}$，用于测试网络的样本在每一个layer依然需要归一化，此时对应的$E[x]$和$\mathrm{Var}[x]$用如下式进行估计，

$\begin{align} E[x] &= E[\mu_B], \notag \\ \mathrm{Var}[x] &= \frac{m}{m-1} \cdot E[{\sigma_B}^2], \notag \end{align}$

进而，BN layer的输出定义为，

$y = \frac{\gamma}{\sqrt{\mathrm{Var}[x]+\epsilon}}\cdot x + (\beta - \frac{\gamma E[x]}{\sqrt{\mathrm{Var}[x]+\epsilon}}).$

BN的TensofFlow实现

白化本身是很简单的，关键问题是如何训练每个$x^{(k)}$对应的$\gamma^{(k)}$和$\beta^{(k)}$。TensofFlow提供了多个batch_normalization相关的类或方法，比较简单的有tf.nn.batch_normalization，其参数如下

batch_normalization(
    x, # inputs
    mean, # 均值
    variance, # 方差
    offset, # shift beta
    scale, # scale gamma
    variance_epsilon, # epision
    name=None
)

简单实现一个DNN分类手写体的样例，这里基于tf.nn.moments生成mean和varianciences两个tensor

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data", one_hot=True)
x = tf.placeholder(tf.float32, shape=[None, 784])
y_ = tf.placeholder(tf.float32, shape=[None, 10]) # one-hot 10-dimensional vector
l_FC1 = 512
W_FC1 = tf.Variable(tf.truncated_normal(shape=[784,l_FC1], stddev=0.1))
b_FC1 = tf.Variable(tf.constant(0.1,shape=[l_FC1]))
# batch_norm_l1
x_FC1 = tf.matmul(x, W_FC1)+b_FC1
axis = list(range(len(x_FC1.get_shape()) - 1))
mean_FC1,var_FC1 = tf.nn.moments(x_FC1, axis)
gamma_FC1 = tf.Variable(tf.constant(0.1, shape=mean_FC1.get_shape()))
beta_FC1 = tf.Variable(tf.constant(0.1, shape=mean_FC1.get_shape()))
y_bn1 = tf.nn.batch_normalization(
    x = x_FC1,
    mean = mean_FC1,
    variance = var_FC1,
    offset = beta_FC1,
    scale = gamma_FC1,
    variance_epsilon = 1e-5,
    name= 'BN_FC1')
y_FC1 = tf.nn.relu(y_bn1)
...

测试部分单独开一个坑，见How to apply the batch-normalized net.

References

[1] Ioffe, S. and Szegedy, C., 2015, June. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning (pp. 448-456).
[2] Batch Normalization 学习笔记
[3] 谈谈Tensorflow的Batch Normalization

Comment and share

Hexo-- deploy google analytics to your website

Jan 22, 2018

写之前吐槽一下百度的站长平台，添加新站搜索必须要备案，然后就放弃了。。。转到谷歌的Google Analytics, GA，可以统计网站的访问数据。搜索了一下，多数hexo模板都嵌入了google analytics相关的ejs，添加起来还是比较方便的。

下面以我用的模板tranquilpeak为例，介绍一下添加的步骤，

1. 注册GA账户并添加网站

首先利用谷歌帐号登录Google Analytics，设置关联，而后添加网站信息，如下图所示，

2. 获取UA

添加好网站信息以后，GA会提供一个名为UA的ID (例如 UA-xxxxxx-x) 以及用于添加进网站页面的gtga.js段代码，如下

<!-- Global site tag (gtag.js) - Google Analytics -->
<script async src="https://www.googletagmanager.com/gtag/js?id=UA-xxxxx-x"></script>
<script>
  window.dataLayer = window.dataLayer || [];
  function gtag(){dataLayer.push(arguments);}
  gtag('js', new Date());
  gtag('config', 'UA-xxxxx-x');
</script>

3. 在模板中添加GA支持

对于tranquilpeak模板，配置google-analytics是比较简单的，只需要修改theme/tranquilpeal/_config.yml文件中的google-analytics-id部分，如下所示，

1 2	# Your Google analystics web property ID : UA-XXXXX-X google_analytics_id: UA-xxxx-x

同样的，相应的google-analytics.ejs文件可以在theme/tranquilpeak/layout/_partial中找到, 其代码如下所示，

<% if (theme.google_analytics_id) { %>
    <script type="text/javascript">
        (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
        (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
        m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
        })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
        ga('create', '<%= theme.google_analytics_id %>', 'auto');
        ga('send', 'pageview');
    </script>
<% } %>

其中的theme.google_analytics_id便对应刚刚我们添加的UA-xxxx-x，该文件不需要修改。当然也可以尝试将GA提供的基于gtag.js代码替换到google-analysis.ejs`中，感兴趣可以试试看。

4. 编译并部署

最后，利用hexo g重新编译相关文件，在每一个.html页面中加入GA相关的内容，谷歌的分析服务便可以发挥作用。

Reference

[1] hexo-theme-tranquilpeak
[2] Hexo搭建Github静态博客

Comment and share

Residual network I -- block and bottleneck

Jan 21, 2018

ResNet (Residual Neural Network)由何凯明等在15年提出，这里做个笔记，参考了TensorFlow 实战的6.4节。ResNet的结构能加速超深神经网络的训练，能有效避免过拟合，而且推广性非常好。

深度神经网络存在Degradation的问题，即随着网络层数的增加，其准确率会从逐渐提升到饱和，进而下降。这一问题在我们近期的工作也出现了，我们在增加了一层全连接后，网络的准确率反而下降了。为了解决这一问题，ResNet提出了一种新的网络思路，即允许原始输入信息直接传输到后面的层中。假定某段网络的输入是 $\mathrm{x}$ ，期望输出是 $H(\mathrm{x})$ ，如果直接将 $\mathrm{x}$ 传到输出作为初始结果，那么目标函数变为

$\begin{equation} F(\mathrm{x}) = H(\mathrm{x}) - \mathrm{x} \end{equation}$

如下图所示为一个ResNet的残差学习单元，即将学习目标从 $H(\mathrm{x})$ 改为期望与输入的残差。这种结构也被称为shortcut或skip connections.

考虑到神经网络相当于对原始数据进行了压缩，存在信息损失的问题，而ResNet结构将部分输入信息直接传递到输出端，可以保留部分的信息。并且ResNet的训练误差会随着层数增加而逐渐减少，并且在测试集上也有很好的表现。

引自TensorFlow实战，在ResNet的第二篇论文Identity mapping in deep residual networks中，提出了ResNet V2。想对于ResNet V1, 将激活函数ReLU改为了Identity mapping，即y=x。同时，ResNet V2在每一层都使用了Batch Normalization，提升网络的范化能力。

ResNet的block的理解

在ResNet的实现中，包含多个block，每个block又由多个bottleneck组成，称为残差学习单元，ResNet的残差运算即在每个bottleneck内实现。因为要获取输入 $\mathrm{x}$ 与block输出的残差，每个bottleneck虽然做了多次卷积和激活运算，但其输出的shape应该保持不变，因此内部的卷积运算的stride和padding都有特殊的设置 (数学推导可以参考此文)。以ResNet-50的第二个block为例，假设该block内包含两个bottleneck单元，则有，

Bottleneck One

Layer	Input	Kernel	Stride	Padding	Output
Conv1,1	56x56x64	1x1x64	1	SAME	56x56x64
Conv1,2	56x56x64	3x3x64	1	SAME	56x56x64
Conv1,3	56x56x64	1x1x256	1	SAME	56x56x256

Bottleneck Two

Layer	Input	Kernel	Stride	Padding	Output
Conv2,1	56x56x256	1x1x64	1	SAME	56x56x64
Conv2,2	56x56x64	3x3x64	2	SAME	28x28x64
Conv2,3	28x28x64	1x1x256	1	SAME	28x28x256

注意Bottleneck Two的Conv2,3层的stride为2,因此输出的尺度大小相当于做了factor为2的降采样，因此该bottleneck不做残差的运算，其输出直接作为下一个block的输入。

ResNet-V2相对于ResNet-V1增加了Batch Normalization和L2正则，如何在block中体现？且听下回分解。

Comment and share

References

基于tf.nn.batch_normalization的底层实现

基于tf.contrib.layers.batch_norm的实现

MLP是否采用BN的结果对比

References

References

计算图—Graph

TensorBoard 可视化

1. 搭建网络

2. Select variables to be summarized

3. Merge summary data

4. FileWriter for saving the log

5. Running

6. 利用TensorBoard可视化

References

1. 定位nginx的配置文件

2. 修改./vhost/*.conf，添加多个网站配置

502 bad gateway 问题

References

BN的优势、原理及推导

Motivations

BN的优势

BN的原理与推导

BN和activation层的关系

BN的训练和测试(inference)

BN的TensofFlow实现

References

1. 注册GA账户并添加网站

2. 获取UA

3. 在模板中添加GA支持

4. 编译并部署

Reference

ResNet的block的理解

Bottleneck One

Bottleneck Two

Jason Ma