8

In the following code, the functions foo1,foo2 and foo3 are intended to be equivalent. However when run foo3 does not terminate from the loop, is there a reason why this is the case?

template <typename T>
T foo1()
{
   T x = T(1);
   T y = T(0);
   for (;;)
   {
      if (x == y) break;
      y = x;
      ++x;
   }
   return x;
}

template <typename T>
T foo2()
{
   T x = T(0);
   for (;;)
   {
      T y = x + T(1);
      if (!(x != y)) break;
      ++x;
   }
   return x;
}

template <typename T>
T foo3()
{
   T x = T(0);
   while (x != (x + T(1))) ++x;
   return x;
}

int main()
{
   printf("1 float:  %20.5f\n", foo1<float>());
   printf("2 float:  %20.5f\n", foo2<float>());
   printf("3 float:  %20.5f\n", foo3<float>());
   return 0;
}

Note: This was compiled using VS2010 with /fp precise in release mode. Not sure how GCC etc would treat this code, any information would be great. Could this be an issue where in foo3, the x and x+1 values become NaN somehow?

Gelly Ristor
  • 573
  • 1
  • 6
  • 17
  • 3
    Interesting issue. All three functions terminate as expected on gcc 4.2.1. I'm tempted to call it a bug in VS. – ComicSansMS May 23 '12 at 05:52
  • 3
    Hmm. Smells like an overeager optimization (i.e., a compiler bug) to me. – Mark Dickinson May 23 '12 at 06:01
  • 2
    @MarkDickinson: Hangs for me under a debug build in VS2010 – Ed S. May 23 '12 at 06:09
  • Works fine with GCC 4.4.3 on x86-64. Tested with "-O2", "-O2 -ffast-math", and "-O3 -ffast-math". All work fine and print the same result (16777216.00000) – janneb May 23 '12 at 06:18
  • @MarkDickinson: I'm beginning to think the same thing, don't have access to gcc or clang atm, but will try it shortly. – Gelly Ristor May 23 '12 at 06:20
  • 2
    If you compare the disassembly of the hanging version to the semantically equivalent `T y = x + T(1); while(x != y) ++x;` (which works) they look identical (at least, the important bits like the load and compare). May take a more experienced eye to figure out, but I'm still looking into it... – Ed S. May 23 '12 at 06:20
  • @janneb: Thanks for confirming that!!! :) – Gelly Ristor May 23 '12 at 06:21
  • @Ed S.: It get a little more tricky, when precision/strict/fast options are used. I wonder if msvc injects code that changes the floating point control world (FLDCW) depending on the mode its compiling with. – Gelly Ristor May 23 '12 at 06:26
  • 2
    Thinking a bit more, I take it back about the bug. For C at least (I don't know C++ that well), it's permissible for the compiler to use extra precision when evaluating expressions (assuming no assignment to local variables or explicit conversions). I *think* (but need to check) that this applies to the `x + T(1)` in the 3rd example. So my guess is that the sum and comparison there are being performed using the `double` type, so the comparison is always true, but after `x` hits `2**24` it doesn't budge any further. Will write a properly researched answer later if no-one else gets there. – Mark Dickinson May 23 '12 at 06:30
  • @MarkDickinson: Seems like a reasonable conclusion, however wouldn't the bit pattern then be NaN for 32-bit float? If thats the case then if the computation is done using double or even extended 80-bit precision that can definetly result in a NaN when converted done to 32-bit float, once that's the case then nothing can ever match 'true' to a NaN hence the loop never exists? – Gelly Ristor May 23 '12 at 06:33
  • NaNs shouldn't enter the picture; any implicit conversions are going to via value, not bit pattern. – Mark Dickinson May 23 '12 at 06:45
  • 2
    Also, per IEEE 754, a narrowing conversion shall result in +-inf if the value exceeds the range. Try e.g. "float x = DBL_MAX;". – janneb May 23 '12 at 06:48
  • 2
    @GellyRistor: VS uses the FLD instruction to load the values, which iirc promotes it to an 80 bit precision value. – Ed S. May 23 '12 at 06:49
  • 4
    BTW, the relevant passage from C99 (presumably there's something similar for C++) is in section 6.3.1.8, paragraph 2: "The values of floating operands and of the results of floating expressions may be represented in greater precision and range than that required by the type; the types are not changed thereby." So while the behaviour is surprising and annoying, it's not a compiler bug. – Mark Dickinson May 23 '12 at 07:19

1 Answers1

13

What happens is most likely the following. On the x86 arch, intermediate calculations can be done with 80 bits of precision (long double is the corresponding C/C++ type). The compiler uses all 80 bits for the (+1) operation and for the (!=) operation, but truncates the results before storage.

So what your compiler really does is this:

while ((long double)(x) != ((long double)(x) + (long double)(1))) {
  x = (float)((long double)(x) + (long double)(1));
} 

This is absolutely non-IEEE-conforming and causes endless headaches for everyone, but this is the default for MSVC. Use /fp:strict compiler flag to disable this behaviour.

This is my recollection of the problem from about 10 years ago so please forgive me if this is somehow not entirely correct. See this for the official Microsoft documentation.

EDIT I was very surprised to learn that g++ by default exhibits exactly the same behaviour (on i386 linux, but not with e.g. -mfpmath=sse).

n. m. could be an AI
  • 112,515
  • 14
  • 128
  • 243
  • 4
    +1. Just using `double` rather than `long double` would be enough to cause problems here, but IIRC MS still likes to use the x87 FPU even for 64-bit builds, so `long double` seems more likely. – Mark Dickinson May 23 '12 at 06:43
  • 1
    Great knowledgeable answer but I notice that on VS2008 even with `/fp:strict` foo3() still doesn't terminate? – acraig5075 May 23 '12 at 06:56
  • 2
    @acraig5075: Agreed; I don't see how `/fp:strict` could help here. For `/fp:precise`, Microsoft's docs even specify that "Expression evaluation will follow the C99 FLT_EVAL_METHOD=2", confirming that intermediate values are computed to long double type. For `/fp:strict`, the documentation doesn't say, but I don't see any reason to assume a `FLT_EVAL_METHOD=0` - like behaviour there. – Mark Dickinson May 23 '12 at 07:25
  • @acraig5075: strangely, g++ does it too on x86/x87 32-bit arch. So this is probably a common x87 behaviour. – n. m. could be an AI May 23 '12 at 08:35