Question

我正在尝试使用tf.layers.batch_normalization()进行批量规范化，我的代码如下所示：

def create_conv_exp_model(fingerprint_input, model_settings, is_training):


  # Dropout placeholder
  if is_training:
    dropout_prob = tf.placeholder(tf.float32, name='dropout_prob')

  # Mode placeholder
  mode_placeholder = tf.placeholder(tf.bool, name="mode_placeholder")

  he_init = tf.contrib.layers.variance_scaling_initializer(mode="FAN_AVG")

  # Input Layer
  input_frequency_size = model_settings['bins']
  input_time_size = model_settings['spectrogram_length']
  net = tf.reshape(fingerprint_input,
                   [-1, input_time_size, input_frequency_size, 1],
                   name="reshape")
  net = tf.layers.batch_normalization(net, 
                                      training=mode_placeholder,
                                      name='bn_0')

  for i in range(1, 6):
    net = tf.layers.conv2d(inputs=net,
                           filters=8*(2**i),
                           kernel_size=[5, 5],
                           padding='same',
                           kernel_initializer=he_init,
                           name="conv_%d"%i)
    net = tf.layers.batch_normalization(net,
                                        training=mode_placeholder,
                                        name='bn_%d'%i)
    with tf.name_scope("relu_%d"%i):
      net = tf.nn.relu(net)
    net = tf.layers.max_pooling2d(net, [2, 2], [2, 2], 'SAME', 
                                  name="maxpool_%d"%i)

  net_shape = net.get_shape().as_list()
  net_height = net_shape[1]
  net_width = net_shape[2]
  net = tf.layers.conv2d( inputs=net,
                          filters=1024,
                          kernel_size=[net_height, net_width],
                          strides=(net_height, net_width),
                          padding='same',
                          kernel_initializer=he_init,
                          name="conv_f")
  net = tf.layers.batch_normalization( net, 
                                        training=mode_placeholder,
                                        name='bn_f')
  with tf.name_scope("relu_f"):
    net = tf.nn.relu(net)

  net = tf.layers.conv2d( inputs=net,
                          filters=model_settings['label_count'],
                          kernel_size=[1, 1],
                          padding='same',
                          kernel_initializer=he_init,
                          name="conv_l")

  ### Squeeze
  squeezed = tf.squeeze(net, axis=[1, 2], name="squeezed")

  if is_training:
    return squeezed, dropout_prob, mode_placeholder
  else:
    return squeezed, mode_placeholder

我的火车步骤如下：

update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
  optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate_input)
  gvs = optimizer.compute_gradients(cross_entropy_mean)
  capped_gvs = [(tf.clip_by_value(grad, -2., 2.), var) for grad, var in gvs]
  train_step = optimizer.apply_gradients(gvs))

在训练期间，我正在用以下方式提供图表：

train_summary, train_accuracy, cross_entropy_value, _, _ = sess.run(
    [
        merged_summaries, evaluation_step, cross_entropy_mean, train_step,
        increment_global_step
    ],
    feed_dict={
        fingerprint_input: train_fingerprints,
        ground_truth_input: train_ground_truth,
        learning_rate_input: learning_rate_value,
        dropout_prob: 0.5,
        mode_placeholder: True
    })

验证期间，

validation_summary, validation_accuracy, conf_matrix = sess.run(
                [merged_summaries, evaluation_step, confusion_matrix],
                feed_dict={
                    fingerprint_input: validation_fingerprints,
                    ground_truth_input: validation_ground_truth,
                    dropout_prob: 1.0,
                    mode_placeholder: False
                })

我的损失和准确度曲线（橙色是训练，蓝色是验证）： Plot of loss vs number of iterations， Plot of accuracy vs number of iterations

验证损失（和准确性）似乎非常不稳定。我的Batch Normalization实现错了吗？或者这是正常的批量标准化，我应该等待更多的迭代？

Answer 1

您需要将is_training传递给tf.layers.batch_normalization(..., training=is_training)，或者尝试使用minibatch统计信息而不是训练统计信息来规范化推理小批量，这是错误的。

Answer 2

主要有两件事要检查。

1。您确定在火车操作中正确使用了批次归一化（BN）吗？

如果您阅读图层文档：

注意：训练时，需要更新moving_mean和moving_variance。默认情况下，更新操作位于tf.GraphKeys.UPDATE_OPS中，因此它们需要添加为对train_op的依赖。另外，请务必添加获取update_ops集合之前的所有batch_normalization ops。否则，update_ops将为空，并且训练/推论将不起作用正确地。

例如：

x_norm = tf.layers.batch_normalization(x, training=training)

# ...
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
     train_op = optimizer.minimize(loss)

2。否则，请尝试降低BN中的“动量”。

实际上，在训练过程中，国阵使用了均值和方差的两个移动平均值，这些均值被认为可以近似人口统计数据。均值和方差分别初始化为0和1，然后逐步将它们乘以动量值（默认值为0.99）并添加新值* 0.01。在推断（测试）时，归一化使用这些统计信息。因此，要花些时间才能得出这些数据的“真实”均值和方差。

来源：

https://www.tensorflow.org/api_docs/python/tf/layers/batch_normalization

https://github.com/keras-team/keras/issues/7265

https://github.com/keras-team/keras/issues/3366

可在此处找到原始的BN论文：

https://arxiv.org/abs/1502.03167

Answer 3

在ReLU之前添加批处理规范时，我还观察到验证损失的振荡。我们发现，在ReLU解决了该问题之后，就移动了批处理规范。

Tensorflow：使用批量标准化可以提供较差（不稳定）的验证损失和准确性

3 个答案: