According to IEEE 754-2008 there are
There are three binary floating-point basic formats (which can be encoded using 32, 64 or 128 bits) and two decimal floating-point basic formats (which can be encoded using 64 or 128 bits).
This chart is under it. In C++ I believe float
and double
are single and double precision (binary32
and binary64
).
Name Common name Base Digits E min E max Digits E max
binary32 Single precision 2 23+1 −126 +127 7.22 38.23
binary64 Double precision 2 52+1 −1022 +1023 15.95 307.95
binary128 Quadruple precision 2 112+1 -16382 +16383 34.02 4931.77
decimal32 10 7 −95 +96 7 96
decimal64 10 16 −383 +384 16 384
decimal128 10 34 −6143 +6144 34 6144
What class/struct may I use for decimalX
and is there something I can use for binary128
? Are these classes/structs standard or nonstandard?