Both are ways to represent non-integer/fractional numbers with bounded ranges and precision.
Fixed-point
With a fixed-point number, the point position is static.
There is no fundamental type for the fixed-point numbers in C++ yet.
Typical fixed-point layout:
– Q15.16 (32-bit): 1 bit sign, 15 bits integer part, 16 bits fractional part
– Q31.32 (64-bit): 1 bit sign, 31 bits integer part, 32 bits fractional part
Example: 10,186 in Q15.16
Calculation:
stored_val = real_val × 2^16
= 10,186 × 65536
= 667549,696
≈ 667550 (Round to nearest)
Representation:
Sign (1) | Integer (15) | Fraction (16)
Bin | 0 | 0000 0000 0000 1010 | 0010 1111 1001 1110
Hex | 0 | A | 2F9E
Dec | 0 | 10 | 12190
Reversing:
real_val = (−1)^sign × stored_val / 2^16
= 667550 / 65536
≈ 10,18600464
They are calculated by the ALU.
Floating-point
With a floating-point number, the point position is dynamic.
In C++ float/double and std::floatN_t are floating-point numbers.
– float/double are usually 32-bit/64-bit and based on IEEE 754, but not on all systems.
– std::floatN_t is optional and is only provided if the system supports IEEE 754.
The IEEE 754 binary floating-point layout:
– binary32 (32-bit): 1 bit sign, 8 bit exponent, 23 bit mantissa (std::float32_t, usually also float)
– binary64 (64-bit): 1 bit sign, 11 bit exponent, 52 bit mantissa (std::float64_t, usually also double)
Example: 10,186 in IEEE 754 binary32
Calculation:
real_exp = log2(10.186) = 3 (Round to floor)
stored_exp = real_exp + bias = 3 + 127 = 130
real_man = real_val / 2^real_exp ≈ 1010.001011111001110110110₂ / 2^3 = 1.010001011111001110110110₂
stored_man = (real_man−1) × 2^23 ≈ (1.010001011111001110110110₂-1) × 2^23 ≈ 01000101111100111011011₂
Representation:
Sign (1) | Exponent (8) | Mantissa (23)
Bin | 0 | 1000 0010 | 0100 0101 1111 0011 1011 011
Hex | 0 | 82 | 22F9DB
Dec | 0 | 130 | 2292187
Reversing:
real_value = (−1)^sign × (1 + stored_man/2^23) × 2^(stored_exp−127)
= (1 + 2292187 / 2^23) × 2^3
≈ 10.18599987
They are calculated by the FPU.
Comparison
| Fixed-point | Floating-point | |
|---|---|---|
| Range | Constant | Bigger |
| Precision | Constant | Depended 1 |
| Performance 2 | Faster | Slower |
1 If the magnitude is small the precision is high and vice versa.
2 On modern CPU’s the performance can be nearly the same.