5

I was reading the famous Undefined Behavior can cause time travel post and noticed this part:

First of all, you might notice the off-by-one error in the loop control. The result is that the function reads one past the end of the table array before giving up. A classical compiler wouldn't particularly care. It would just generate the code to read the out-of-bounds array element (despite the fact that doing so is a violation of the language rules), and it would return true if the memory one past the end of the array happened to match.

A post-classical compiler, on the other hand, might perform the following analysis:

The first four times through the loop, the function might return true.

When i is 4, the code performs undefined behavior. Since undefined behavior lets me do anything I want, I can totally ignore that case and proceed on the assumption that i is never 4. (If the assumption is violated, then something unpredictable happens, but that's okay, because undefined behavior grants me permission to be unpredictable.)

According to this post the (newer) compiler can already act upon Undefined Behavior at compile-time which means that it is perfectly capable to spot Undefined Behavior in some cases. Instead of letting demons fly out of your nose or spawning dragons by eliminating the UB code or just transforming it because it's allowed to why doesn't the compiler just emit a warning that this is probably not intended?

Hatted Rooster
  • 35,759
  • 6
  • 62
  • 122
  • 2
    No, you misunderstand. The compiler realizes that `i` will never be `4` or above in a well-defined program, so it generates code with this assumption. It does _not_ need to realize that `i` will in fact become `4`, violating the assumption. – Magnus Hoff Jun 21 '16 at 10:45
  • 1
    In other words, it's working backwards. Since `i` can never legally become `4`, one of the `if (table[i] == v) return true;` statements must succeed before then. So there's nothing to warn about. – Barmar Jun 21 '16 at 10:52

3 Answers3

6

The work of compiler is to compile the code from high level language to lower level. If you get a descriptive error or warning message, it is the time to thanks to the compiler that it did extra job for you. For getting the required warning, use some static code analysis tool.

And anything not well defined in the spec is undefined, and it is not possible to prepare a comprehensive list of undefined behaviour. Emitting warning on all such behaviours may not be possible.

Practically, in many cases, compilers do warn about undefined behaviours specially with proper warning flags like -W -Wall -Wextra -O2 on gcc. (with optimization flags like -O2 compiler would do regress analysis of code and may generate more warning)

Mohit Jain
  • 30,259
  • 8
  • 73
  • 100
  • It depends how much does compiler want to analyze. All compiler knows is you would **never** access the array out of bound and thus loop never completes. Can do some optimizations with this assumption i.e. chop off everything after the loop. (Simlar example @ in [this question](http://stackoverflow.com/questions/28631378/function-optimized-to-infinite-loop-at-gcc-o2)). If you get anything to warn on the way of previous analysis, compiler writer would be possibly happy to emit it as side product. – Mohit Jain Jun 21 '16 at 10:54
2

Compilers becomes better and better in diagnostics. But in principal code analyzes is not the work they have to do. It is only a present to you.

So it is common:

To have more then one compiler for translation and check the warnings and errors ( like in jenkins with gcc and clang).

To have static code analyzes also automated ( like also in jenkins )

For example:

cppcheck main.cpp

Results in:

[main.cpp:8]: (error) Array 'table[4]' accessed at index 4, which is out of bounds.

There are a lot more tools around, commercial or not. To see also run time effects you can give valgrind with all the tools inside a chance.

And yes, a lot more can be done by replacing parts of the standard libs by test tools, e.g. for memory consumption, stack tracing etc. See efence, duma and others.

But all that will not find all your errors automatically.

It is a must to have unit tests in place ( e.g. gtest ) and check against your requirements. This should be done with coverage analyzes. Without the last you have no idea which code you have really tested and which parts of lines/branches are out of control of your tests.

The compiler is only one aspect for getting errors in your code.

Last but not least: A good code peer review is very helpful!

Klaus
  • 24,205
  • 7
  • 58
  • 113
1

A fundamental problem with trying to produce diagnostics in cases where a compiler is assuming that a program will never receive input that would cause Undefined Behavior, and using that assumption to ignore code that would otherwise affect how that input is processed, is that optimizations are most worth doing when they can result in many different pieces of code being eliminated, but in those scenarios the number of diagnostics that would be produced would be so unmanageable as to make the diagnostics useless.

Personally I think the right approach would be to have compilers document behavioral constraints in many cases where the Standard would impose no requirements, but some behavioral constraints could be guaranteed very cheaply(*), but then have defined directives to facilitate optimization and indicate programmer intention. A static analysis tool could then identify places where eliminating code for places a programmer probably doesn't care about could improve efficiency, and allow a programmer to add a directive to say "Feel free to optimize on the blind assumption that condition xxx will be true", or "Feel free to abnormally terminate the program if condition xxx is false, if not having to handle xxx would save code downstream", or "I need behavior to remain constrained to yyy degree if xxx is false." [the latter directive would eliminate messages about xxx if the static analysis tool is run again].

(*)E.g. guaranteeing that x+1>y will yield 0 if x==INT_MAX would be expensive, but guaranteeing that the expression will never do anything other than yield one or zero would be cheap; if either a 1 or 0 would meet requirements, the guarantee would allow a programmer to write code that would be more amenable to optimization than would be otherwise possible).

If having xxx not be true would imply that there's a problem that the programmer should know about, a directive suggesting that the compiler could terminate the program if it isn't true could help the programmer found out about it. Hyper-modern compiler design, however, could have the opposite effect, by making the compiler short-circuit logic like:

if (xxx) log_message("xxx was okay at checkpoint #57");

[if xxx not being true would cause UB downstream, a compiler could make the log message call unconditional, thus making it appear as though x was true at that point even though it wasn't]. Shifting a focus toward more programmer-focused optimization (e.g. if a program contains many "Feel free to terminate the program if xxx is false") would allow a compiler to achieve the same optimizations in places where the programmer cared enough about speed to supply such directives, but without having the compiler's behavior obscure problems.

supercat
  • 77,689
  • 9
  • 166
  • 211