#include <array_ops.h>
Quantizes then dequantizes a tensor.
This op simulates the precision loss from the quantized forward pass by:
There are different ways to quantize. This version does not use the full range of the output type, choosing to elide the lowest possible value for symmetry (e.g., output range is -127 to 127, not -128 to 127 for signed 8 bit quantization), so that 0.0 maps to 0.
To perform this op, we first find the range of values in our tensor. The range we use is always centered on 0, so we find m such that
Our input tensor range is then [-m, m].
Next, we choose our fixed-point quantization buckets, [min_fixed, max_fixed]. If signed_input is true, this is
[min_fixed, max_fixed ] = [-(1 << (num_bits - 1) - 1), (1 << (num_bits - 1)) - 1].
Otherwise, if signed_input is false, the fixed-point range is
[min_fixed, max_fixed] = [0, (1 << num_bits) - 1].
From this we compute our scaling factor, s:
s = (max_fixed - min_fixed) / (2 * m).
Now we can quantize and dequantize the elements of our tensor. An element e is transformed into e':
e' = (e * s).round_to_nearest() / s.
Note that we have a different number of buckets in the signed vs. unsigned cases. For example, if num_bits == 8, we get 254 buckets in the signed case vs. 255 in the unsigned case.
For example, suppose num_bits = 8 and m = 1. Then
[min_fixed, max_fixed] = [-127, 127], and s = (127 + 127) / 2 = 127.
Given the vector {-1, -0.5, 0, 0.3}, this is quantized to {-127, -63, 0, 38}, and dequantized to {-1, -63.0/127, 0, 38.0/127}.
Arguments:
Optional attributes (see Attrs
):
Returns:
Output
: The output tensor. Constructors and Destructors | |
---|---|
QuantizeAndDequantize(const ::tensorflow::Scope & scope, ::tensorflow::Input input) | |
QuantizeAndDequantize(const ::tensorflow::Scope & scope, ::tensorflow::Input input, const QuantizeAndDequantize::Attrs & attrs) |
Public attributes | |
---|---|
output |
Public functions | |
---|---|
node() const | ::tensorflow::Node * |
operator::tensorflow::Input() const | |
operator::tensorflow::Output() const |
Public static functions | |
---|---|
InputMax(float x) | |
InputMin(float x) | |
NumBits(int64 x) | |
RangeGiven(bool x) | |
SignedInput(bool x) |
Structs | |
---|---|
tensorflow::ops::QuantizeAndDequantize::Attrs | Optional attribute setters for QuantizeAndDequantize. |
::tensorflow::Output output
QuantizeAndDequantize( const ::tensorflow::Scope & scope, ::tensorflow::Input input )
QuantizeAndDequantize( const ::tensorflow::Scope & scope, ::tensorflow::Input input, const QuantizeAndDequantize::Attrs & attrs )
::tensorflow::Node * node() const
operator::tensorflow::Input() const
operator::tensorflow::Output() const
Attrs InputMax( float x )
Attrs InputMin( float x )
Attrs NumBits( int64 x )
Attrs RangeGiven( bool x )
Attrs SignedInput( bool x )
© 2017 The TensorFlow Authors. All rights reserved.
Licensed under the Creative Commons Attribution License 3.0.
Code samples licensed under the Apache 2.0 License.
https://www.tensorflow.org/api_docs/cc/class/tensorflow/ops/quantize-and-dequantize.html