Explain this code in K&R 2-1

Question

I'm trying to determine range of the various floating-point types. When I read this code:

#include <stdio.h>

main()
{
    float fl, fltest, last;
    double dbl, dbltest, dblast;

    fl = 0.0;
    fltest = 0.0;
    while (fl == 0.0) {
        last = fltest;
        fltest = fltest + 1111e28;
        fl = (fl + fltest) - fltest;
    }
    printf("Maximum range of float variable: %e\n", last);

    dbl = 0.0;
    dbltest = 0.0;
    while (dbl == 0.0) {
        dblast = dbltest;
        dbltest = dbltest + 1111e297;
        dbl = (dbl + dbltest) - dbltest;
    }
    printf("Maximum range of double variable: %e\n", dblast);
    return 0;
}

I don't understand why author added 1111e28 at fltest variable ?

Don't understand the code, or don't understand the value or something else? You get downvoted and/or closed if it's hard to know what the question is. — david.pfx, Jun 10 '14 at 15:21
@david.pfx; It is clear that OP do not understand that why `1111e28` is added to `fltest` — haccks, Jun 10 '14 at 15:24
The code is fundamentally wrong. Say `DBL_MAX == FLT_MAX` with a value about `1E+37`, certainly allowable by the spec and common in simple platforms that have `float` and `double` as the same 4-byte type. The code reports "Maximum range of double variable: 0.0e00\n". This code depends on _prior_ knowledge of the max value of a `float/double` to work. — chux - Reinstate Monica, Jun 10 '14 at 17:18
@didierc Later if I have time or if someone else posts it - feel open to do so. — chux - Reinstate Monica, Jun 10 '14 at 17:38

score 7 · Accepted Answer · answered Jun 10 '14 at 15:31

7

The loop terminates when fltest reaches +Inf, as at that point fl = (fl + fltest) - fltest becomes NaN, which is unequal to 0.0. last contains a value which when added to 1111e28 produces +Inf and so is close to the upper limit of float.

1111e28 is chosen to reach +Inf reasonably quickly; it also needs to be large enough that when added to large values the loop continues to progress i.e. it is at least as large as the gap between the largest and second-largest non-infinite float values.

answered Jun 10 '14 at 15:31

ecatmur

152,476
27
293
366

Good point about `1111e28` being chosen larger than `FLT_MAX*FLT_EPSILON`. – Pascal Cuoq Jun 10 '14 at 15:33
How `0.0` becomes `NaN` ? o_O – haccks Jun 10 '14 at 15:34
3

@haccks `inf - inf` results in NaN (in `fl = (fl + fltest) - fltest;`) – Pascal Cuoq Jun 10 '14 at 15:35
@PascalCuoq; Oops! Got it :) – haccks Jun 10 '14 at 15:36
Hmmm. If `1111e28` is chosen based upon largest and second-largest non-infinite values, then what is the point of the code? Just return the largest value. – chux - Reinstate Monica Jun 10 '14 at 17:03
@chux yes, it's not particularly good code, is it? If I was writing it I'd use repeated doubling rather than adding a constant value. – ecatmur Jun 10 '14 at 17:14
@ecatmur Agreed about the weakness of code in question. Still your answer is good. – chux - Reinstate Monica Jun 10 '14 at 17:20
A good answer, but incomplete. You did not provide a cogent explanation for the specific value 1111e28 (as against something slightly smaller or larger), or how to determine whether it is optimal, or how to calculate an optimal value if it is not. – david.pfx Jun 11 '14 at 11:48
@david.pfx Unfortunately the `1111e28` or test increment value needs, in general, to be about `FLT_MAX*FLT_EPSILON` to `FLT_MAX*FLT_EPSILON*1.5`. Big enough to affect the addition, but small enough to step through each `float` value. So for this method to well find the answer `FLT_MAX`, code needs to approximately know the answer and `FLT_EPSILON`. – chux - Reinstate Monica Jun 11 '14 at 17:17
@chux: Still not an answer. To be specific, why 1111e28 and not 1110e28 or 1112e28? Because it's a cute value? Or because it doesn't really matter? – david.pfx Jun 11 '14 at 23:39

chux - Reinstate Monica · Answer 2 · 2014-06-12T03:17:06.430

OP: ... why author added 1111e28 at fltest variable ?
A: [Edit] For code to work using float, 1111e28, or 1.111e31 this delta value needs careful selection. It should be big enough such that if fltest was FLT_MAX, the sum of fltest + delta would overflow and become float.infinity. With round to nearest mode, this is FLT_MAX*FLT_EPSILON/4. On my machine:

min_delta           1.014120601e+31 1/2 step between 2nd largest and FLT_MAX
FLT_MAX             3.402823466e+38
        FLT_EPSILON 8.388608000e+06
FLT_MAX*FLT_EPSILON 4.056481679e+31

delta needs to be small enough so if f1test is the 2nd largest number, adding delta, would not sum right up to float.infinity and skip FLT_MAX. This is 3x min_delta

max_delta           3.042361441e+31

So 1.014120601e+31 <= 1111e28 < 3.042361441e+31.

@david.pfx Yes. 1111e28 is a cute number and it is in range.

Note: Complications occur when the math and its intermediate values, even though the variables are float may calcuate at higher precsison like double. This is allowed in C and control by FLT_EVAL_METHOD or very careful coding.

1111e28 is a curious value that makes sense if the author all ready knew the general range ofFLT_MAX.

The below code is expected to loop many times (24946069 on one test platform). Hopefully, the value fltest eventually becomes "infinite". Then f1 will becomes NaN as the difference of Infinity - Infinity. The the while loop ends as Nan != 0.0. @ecatmur

while (fl == 0.0) {
    last = fltest;
    fltest = fltest + 1111e28;
    fl = (fl + fltest) - fltest;
}

The looping, if done in small enough increments, will arrive at a precise answer. Prior knowledge of FLT_MAX and FLT_EPSILON are needed to insure this.

The problem with this is that C does not define the range FLT_MAX and DBL_MAX other than they must be at least 1E+37. So if the maximum value was quite large, the increment value of 1111e28 or 1111e297 would have no effect. Example: dbltest = dbltest + 1111e297;, for dbltest = 1e400 would certainly not increase 1e400 unless dbltest a hundred decimal digits of precision.

If DBL_MAX was smaller than 1111e297, the method fails too. Note: On simple platforms in 2014, it is not surprising to find double and float to be the same 4-byte IEEE binary32 ) The first time though the loop, dbltest becomes infinity and the loop stops, reporting "Maximum range of double variable: 0.000000e+00".

There are many ways to efficiently derive the maximum float point value. A sample follows that uses a random initial value to help show its resilience to potential variant FLT_MAX.

float float_max(void) {
  float nextx = 1.0 + rand()/RAND_MAX;
  float x;
  do {
    x = nextx;
    nextx *= 2;
  } while (!isinf(nextx));
  float delta = x;
  do {
    nextx = x + delta/2;
    if (!isinf(nextx)) {
      x = nextx;
    }
    delta /= 2;
  } while (delta >= 1.0);
  return x;
}

isinf() is a new-ish C function. Simple enough to roll your own if needed.

In re: @didierc comment

[Edit]
The precision of a float and double is implied with "epsilon": "the difference between 1 and the least value greater than 1 that is representable in the given floating point type ...". The maximum values follow

FLT_EPSILON 1E-5
DBL_EPSILON 1E-9

Per @Pascal Cuoq comment. "... 1111e28 being chosen larger than FLT_MAX*FLT_EPSILON.", 1111e28 needs to be at least FLT_MAX*FLT_EPSILON to impact the loop's addition, yet small enough to precisely reach the number before infinity. Again, prior knowledge of FLT_MAX and FLT_EPSILON are needed to make this determination. If these values are known ahead of time, then the code simple could have been:

printf("Maximum range of float variable: %e\n", FLT_MAX);

Does not fully answer the question. Why 1111e28 and not (say) 1110e28 or 1112e28? — david.pfx, Jun 11 '14 at 23:40

score 2 · Answer 3 · answered Jun 10 '14 at 15:31

2

The largest value representable in a float is 3.40282e+38. The constant 1111e28 is chosen such that adding that constant to a number in the range of 10^38 still produces a different floating point value, so that the value of fltest will continue to increase as the function runs. It needs to be large enough that it will still be significant at the 10^38 range, and small enough that the result will be accurate.

answered Jun 10 '14 at 15:31

nneonneo

171,345
36
312
383

2

"largest value representable in a float is 3.40282e+38". Commonly true when `float` is a [binary32](http://en.wikipedia.org/wiki/Binary32). But C does not specify this - the max value is implementation dependant. – chux - Reinstate Monica Jun 10 '14 at 17:06
Does not fully answer the question. Why 1111e28 and not (say) 1110e28 or 1112e28? – david.pfx Jun 11 '14 at 23:40
1

@chux: If the max value is substantially different, this code probably wouldn't work at all. – nneonneo Jun 12 '14 at 00:17
@david.pfx: 1111e28 looks arbitrary to me. It should probably work fine with 1110e28 or 1112e28. – nneonneo Jun 12 '14 at 00:18

Explain this code in K&R 2-1

3 Answers3

Linked