1And in Conclusion¶
The IEEE 754 standard defines a binary representation for floating point values using three fields.
The sign determines the sign of the number ( for positive, for negative).
The exponent is in biased notation. For instance, the bias is , which comes from for single-precision floating point numbers. For double-precision floating point numbers, the bias is . An exponent of
00000000represents a denormalized number and an exponent of11111111represents either NaN, if there is a non-zero mantissa, or infinity, if there is a zero mantissa.The significand is used to store a fraction instead of an integer and refers to the bits to the right of the leading “
1” when normalized. For example, if a mantissa is1.010011, its significand is010011.
Figure 3 shows the bit breakdown for the single-precision (32-bit) representation. The leftmost bit is the MSB, and the rightmost bit is the LSB.
For normalized floats:
For denormalized floats, including zero:
Table 1 shows that the IEEE 754 exponent field has values from 0 to 255. When translating between binary and decimal floating point values, we must remember that there is a bias for the exponent.
2Textbook Readings¶
P&H 3.5, 3.9
3Additional References¶
4Exercises¶
Check your knowledge!
4.1Conceptual Review¶
Solution to Exercise 1 #
True. Floating point:
Provides support for a wide range of values. (Both very small and very large)
Helps programmers deal with errors in real arithmetic because floating point can represent , , (Not a Number)
Keeps high precision. Recall that precision is a count of the number of bits in a computer word used to represent a value. IEEE 754 allocates a majority of bits for the significand, allowing for the use of a combination of negative powers of two to represent fractions.
Solution to Exercise 2 #
False. Floating Point can represent infinities as well as NaNs, so the total amount of representable numbers is lower than Two’s Complement, where every bit combination maps to a unique integer value.
Solution to Exercise 3 #
True. The uneven spacing is due to the exponent representation of floating point numbers. There are a fixed number of bits in the significand. In IEEE 32-bit storage there are bits for the significand, which means the LSB represents times 2 to the exponent. For example, if the exponent is zero (after allowing for the offset) the difference between two neighboring floats will be . If the exponent is , the difference between two neighboring floats will be because the mantissa is multiplied by . Limited precision makes binary floating-point numbersdiscontinuous; there are gaps between them.
Solution to Exercise 4 #
False. Because of rounding errors, you can find Big and Small numbers such that: (Small + Big) + Big != Small + (Big + Big)
FP approximates results because it only has 23 bits for the significand.
Solution to Exercise 5 #
A non-zero digit is required prior to the radix in scientific notation, and since the only non-zero digit in base-2 is 1, the normalized value will always start with a 1.