BFloat16 represents 16-bit floating-point values.
Shapeless-powered type-class used to build literal tensors.
Float16 represents 16-bit floating-point values.
Float16 represents 16-bit floating-point values.
This type does not actually support arithmetic directly. The expected use case is to convert to Float to perform any actual arithmetic, then convert back to a Float16 if needed.
Binary representation:
sign (1 bit) | | exponent (5 bits) | | | | mantissa (10 bits) | | | x xxxxx xxxxxxxxxx
Value interpretation (in order of precedence, with _ wild):
0 00000 0000000000 (positive) zero 1 00000 0000000000 negative zero _ 00000 subnormal number _ 11111 0000000000 +/- infinity _ 11111 not-a-number _ _ normal number
An exponent of all 1s signals a sentinel (NaN or infinity), and all 0s signals a subnormal number. So the working "real" range of exponents we can express is [-14, +15].
For non-zero exponents, the mantissa has an implied leading 1 bit, so 10 bits of data provide 11 bits of precision for normal numbers.
For normal numbers:
x = (1 - sign*2) * 2^exponent * (1 + mantissa/1024)
For subnormal numbers, the implied leading 1 bit is absent. Thus, subnormal numbers have the same exponent as the smallest normal numbers, but without an implied 1 bit.
So for subnormal numbers:
x = (1 - sign*2) * 2^(-14) * (mantissa/1024)
Onnx has several types of Integral numbers
Onnx has several types of Integral numbers
represent Onnx's notion of a number
Parser tensors from numpy output strings useful for testing
A type class for serializing values of type A
into bytes.
BFloat16 represents 16-bit floating-point values.
This type does not actually support arithmetic directly. The expected use case is to convert to Float to perform any actual arithmetic, then convert back to a BFloat16 if needed.
Binary representation:
sign (1 bit) | | exponent (8 bits) | | | | mantissa (7 bits) | | | x xxxxxxxx xxxxxxx
Value interpretation (in order of precedence, with _ wild):
0 00000000 0000000 (positive) zero 1 00000000 0000000 negative zero _ 00000000 _ subnormal number _ 11111111 0000000 +/- infinity _ 11111111 _ not-a-number _ _ normal number
An exponent of all 1s signals a sentinel (NaN or infinity), and all 0s signals a subnormal number. So the working "real" range of exponents we can express is [-126, +127].
For non-zero exponents, the mantissa has an implied leading 1 bit, so 7 bits of data provide 8 bits of precision for normal numbers.
For normal numbers:
x = (1 - sign*2) * 2^exponent * (1 + mantissa/128)
For subnormal numbers, the implied leading 1 bit is absent. Thus, subnormal numbers have the same exponent as the smallest normal numbers, but without an implied 1 bit.
So for subnormal numbers:
x = (1 - sign*2) * 2^(-127) * (mantissa/128)