tf.contrib.layers.batch_norm(*args, **kwargs)
See the guide: Layers (contrib) > Higher level ops for building neural network layers
Adds a Batch Normalization layer from http://arxiv.org/abs/1502.03167.
"Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift"
Sergey Ioffe, Christian Szegedy
Can be used as a normalizer function for conv2d and fully_connected.
Note: When is_training is True the moving_mean and moving_variance need to be updated, by default the update_ops are placed intf.GraphKeys.UPDATE_OPS
so they need to be added as a dependency to thetrain_op
, example:
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) if update_ops: updates = tf.group(*update_ops) total_loss = control_flow_ops.with_dependencies([updates], total_loss)
One can set updates_collections=None to force the updates in place, but that can have speed penalty, specially in distributed settings.
inputs
: a tensor with 2 or more dimensions, where the first dimension has batch_size
. The normalization is over all but the last dimension if data_format
is NHWC
and the second dimension if data_format
is NCHW
.decay
: decay for the moving average. Reasonable values for decay
are close to 1.0, typically in the multiple-nines range: 0.999, 0.99, 0.9, etc. Lower decay
value (recommend trying decay
=0.9) if model experiences reasonably good training performance but poor validation and/or test performance. Try zero_debias_moving_mean=True for improved stability.center
: If True, add offset of beta
to normalized tensor. If False, beta
is ignored.scale
: If True, multiply by gamma
. If False, gamma
is not used. When the next layer is linear (also e.g. nn.relu
), this can be disabled since the scaling can be done by the next layer.epsilon
: small float added to variance to avoid dividing by zero.activation_fn
: activation function, default set to None to skip it and maintain a linear activation.param_initializers
: optional initializers for beta, gamma, moving mean and moving variance.updates_collections
: collections to collect the update ops for computation. The updates_ops need to be executed with the train_op. If None, a control dependency would be added to make sure the updates are computed in place.is_training
: whether or not the layer is in training mode. In training mode it would accumulate the statistics of the moments into moving_mean
and moving_variance
using an exponential moving average with the given decay
. When it is not in training mode then it would use the values of the moving_mean
and the moving_variance
.reuse
: whether or not the layer and its variables should be reused. To be able to reuse the layer scope must be given.variables_collections
: optional collections for the variables.outputs_collections
: collections to add the outputs.trainable
: If True
also add variables to the graph collection GraphKeys.TRAINABLE_VARIABLES
(see tf.Variable
).batch_weights
: An optional tensor of shape [batch_size]
, containing a frequency weight for each batch item. If present, then the batch normalization uses weighted mean and variance. (This can be used to correct for bias in training example selection.)fused
: Use nn.fused_batch_norm if True, nn.batch_normalization otherwise.data_format
: A string. NHWC
(default) and NCHW
are supported.zero_debias_moving_mean
: Use zero_debias for moving_mean. It creates a new pair of variables 'moving_mean/biased' and 'moving_mean/local_step'.scope
: Optional scope for variable_scope
.A Tensor
representing the output of the operation.
ValueError
: if batch_weights
is not None and fused
is True.ValueError
: if data_format
is neither NHWC
nor NCHW
.ValueError
: if the rank of inputs
is undefined.ValueError
: if rank or channels dimension of inputs
is undefined.Defined in tensorflow/contrib/framework/python/ops/arg_scope.py
.
© 2017 The TensorFlow Authors. All rights reserved.
Licensed under the Creative Commons Attribution License 3.0.
Code samples licensed under the Apache 2.0 License.
https://www.tensorflow.org/api_docs/python/tf/contrib/layers/batch_norm