@JonB said in float problem:
Oh!  Is that how it works?!  So my floating point number wants to be made by adding 2 ^ -n values together to be accurately representable? And 1.872 doesn't happen to be.  1.875 does.  I kinda thought the numbers it could represent precisely were "randomly" distributed :)
I hope randomly isn't an example of your hardly conceivable English sarcasm. ;)
But yes, that's how it works, exactly the same as with the decimal. Here's a but of a fuller story:
1.872 in decimal is represented as: 1.872 = 1 * 10^0 + 8 * 10^-1 + 7 * 10^-2 + 2 * 10^-3
The same idea is true for a base 2 number system, however one'd adjust for the base:
1.111 -> 1 * 2^0 + 1 * 2^-1 + 1 * 2^-2 + 1 * 2^-3, which is incidentally 1.875 in decimal.
So that's  what the IEEE standard does explicitly:
Representation is split into 2 parts - exponent and mantissa (significand), that's to say each number is represented as m * 2^p, where m is a fractional part in the range [1; 2)* and an exponent, which is a biased integer** (in reality it's unsigned). The leading bit of the mantissa (the one that's responsible for the 0th power) is implicit and is always assumed to be raised*** (i.e. signifying 1.(...)). This means the following: each bit in the mantissa starting from the higher to lower is a division by 2^n, hence my using of the principal values as a sum (principal values here'd mean the specific bits of the mantissa being 1).
Now if you think about it the multiplication/division by 2 due to the exponent is equivalent to bit-shifts in the mantissa, which is what the FPU does for you when it renormalizes the numbers during calculations. It's always going to try to keep the higher bits in the mantissa raised if possible so you don't lose the precision at the lower end. Incidentally this is also why in reality the FP operations are done in extended registers (typically 2 times larger****) to allow storage of bits that otherwise'd be lost to be shifted back after normalization; truncation is done at the very end.
* Realistically it's in the range [0.5, 1.0) but for simplicity we roll with a somewhat "wrong" representation.
** It's biased for a specific reason, so when its bits are all 0 the value is the minimum the integer can represent and thus it's implying a denormal FP number.
*** Except when representing a denormal, then the exponent's raw value is 0 (representing the minimum possible value after debiasing) and thus the mantissa is fully explicit. Denormals are a special case to represent numbers very close by absolute value to the zero. The IEEE standard allows this for one specific purpose - to represent numbers that it'd otherwise couldn't in the normalized representation, however loss of precision is traded off for that. (i.e. the leading zeroes in the mantissa are the number of bits of precision lost).
**** In fact some of the operations are done iteratively with infinite precision and renormalized on the fly doing so until the required truncated precision is acquired. One such example is the FMA instruction (std::fma).