{"id":19364,"date":"2025-12-12T11:15:29","date_gmt":"2025-12-12T11:15:29","guid":{"rendered":"http:\/\/www.max-sperling.bplaced.net\/?p=19364"},"modified":"2026-01-05T13:32:20","modified_gmt":"2026-01-05T13:32:20","slug":"fixed-point-vs-floating-point","status":"publish","type":"post","link":"http:\/\/www.max-sperling.bplaced.net\/?p=19364","title":{"rendered":"Fixed-point vs. Floating-point"},"content":{"rendered":"<p>Both are ways to represent non-integer\/fractional numbers with bounded ranges and precision.<\/p>\n<hr>\n<h2>Fixed-point<\/h2>\n<p>With a fixed-point number, the point position is static.<\/p>\n<p>There is no fundamental type for the fixed-point numbers in C++ yet.<\/p>\n<p>Typical fixed-point layout:<br \/>\n&#8211; Q15.16 (32-bit): 1 bit sign, 15 bits integer part, 16 bits fractional part<br \/>\n&#8211; Q31.32 (64-bit): 1 bit sign, 31 bits integer part, 32 bits fractional part<\/p>\n<p>Example: 10,186 in Q15.16<\/p>\n<pre>\r\nCalculation:\r\nstored_val = real_val \u00d7 2^16\r\n           = 10,186 \u00d7 65536\r\n           = 667549,696\r\n           \u2248 667550 (Round to nearest)\r\n\r\nRepresentation:\r\n      Sign (1) | Integer (15)        | Fraction (16)\r\nBin | 0        | 0000 0000 0000 1010 | 0010 1111 1001 1110\r\nHex | 0        | A                   | 2F9E\r\nDec | 0        | 10                  | 12190\r\n\r\nReversing:\r\nreal_val = (\u22121)^sign \u00d7 stored_val \/ 2^16\r\n         = 667550 \/ 65536\r\n         \u2248 10,18600464\r\n<\/pre>\n<p>They are calculated by the ALU.<\/p>\n<hr>\n<h2>Floating-point<\/h2>\n<p>With a floating-point number, the point position is dynamic.<\/p>\n<p>In C++ float\/double and std::floatN_t are floating-point numbers.<br \/>\n&#8211; float\/double are usually 32-bit\/64-bit and based on IEEE 754, but not on all systems.<br \/>\n&#8211; std::floatN_t is optional and is only provided if the system supports IEEE 754.<\/p>\n<p>The IEEE 754 binary floating-point layout:<br \/>\n&#8211; binary32 (32-bit): 1 bit sign, 8 bit exponent, 23 bit mantissa (std::float32_t, usually also float)<br \/>\n&#8211; binary64 (64-bit): 1 bit sign, 11 bit exponent, 52 bit mantissa (std::float64_t, usually also double)<\/p>\n<p>Example: 10,186 in IEEE 754 binary32 <\/p>\n<pre>\r\nCalculation:\r\nreal_exp = log2(10.186) = 3 (Round to floor)\r\nstored_exp = real_exp + bias = 3 + 127 = 130\r\nreal_man = real_val \/ 2^real_exp \u2248 1010.001011111001110110110\u2082 \/ 2^3 = 1.010001011111001110110110\u2082\r\nstored_man = (real_man\u22121) \u00d7 2^23 \u2248 (1.010001011111001110110110\u2082-1) \u00d7 2^23 \u2248 01000101111100111011011\u2082\r\n\r\nRepresentation:\r\n      Sign (1) | Exponent (8) | Mantissa (23)\r\nBin | 0        | 1000 0010    | 0100 0101 1111 0011 1011 011\r\nHex | 0        | 82           | 22F9DB\r\nDec | 0        | 130          | 2292187\r\n\r\nReversing:\r\nreal_value = (\u22121)^sign \u00d7 (1 + stored_man\/2^23) \u00d7 2^(stored_exp\u2212127)\r\n           = (1 + 2292187 \/ 2^23) \u00d7 2^3\r\n           \u2248 10.18599987\r\n<\/pre>\n<p>They are calculated by the FPU.<\/p>\n<hr>\n<h2>Comparison<\/h2>\n<table>\n<tr>\n<th><\/th>\n<th>Fixed-point<\/th>\n<th>Floating-point<\/th>\n<\/tr>\n<tr>\n<th>Range<\/th>\n<td>Constant<\/td>\n<td>Bigger<\/td>\n<\/tr>\n<tr>\n<th>Precision<\/th>\n<td>Constant<\/td>\n<td>Depended <sup>1<\/sup><\/td>\n<\/tr>\n<tr>\n<th>Performance <sup>2<\/sup><\/th>\n<td>Faster<\/td>\n<td>Slower<\/td>\n<\/tr>\n<\/table>\n<p><sup>1<\/sup> If the magnitude is small the precision is high and vice versa.<br \/>\n<sup>2<\/sup> On modern CPU&#8217;s the performance can be nearly the same.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Both are ways to represent non-integer\/fractional numbers with bounded ranges and precision. Fixed-point With a fixed-point number, the point position<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false},"categories":[80],"tags":[],"_links":{"self":[{"href":"http:\/\/www.max-sperling.bplaced.net\/index.php?rest_route=\/wp\/v2\/posts\/19364"}],"collection":[{"href":"http:\/\/www.max-sperling.bplaced.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.max-sperling.bplaced.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.max-sperling.bplaced.net\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.max-sperling.bplaced.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=19364"}],"version-history":[{"count":19,"href":"http:\/\/www.max-sperling.bplaced.net\/index.php?rest_route=\/wp\/v2\/posts\/19364\/revisions"}],"predecessor-version":[{"id":19388,"href":"http:\/\/www.max-sperling.bplaced.net\/index.php?rest_route=\/wp\/v2\/posts\/19364\/revisions\/19388"}],"wp:attachment":[{"href":"http:\/\/www.max-sperling.bplaced.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=19364"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.max-sperling.bplaced.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=19364"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.max-sperling.bplaced.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=19364"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}