Can a compiler read twice from a global variable, instead of storing a local one?

Question

I've been trying to get re-familiarized multi-threading recently and found this paper. One of the examples says to be careful when using code like this:

int my_counter = counter; // Read global
int (* my_func) (int);
if (my_counter > my_old_counter) {
  ... // Consume data
  my_func = ...;
  ... // Do some more consumer work
}
... // Do some other work
if (my_counter > my_old_counter) {
... my_func(...) ...
}

Stating that:

If the compiler decides that it needs to spill the register containing my counter between the two tests, it may well decide to avoid storing the value (it’s just a copy of counter, after all), and to instead simply re-read the value of counter for the second comparison involving my counter[...]

Doing this would turn the code into:

int my_counter = counter; // Read global
int (* my_func) (int);
if (my_counter > my_old_counter) {
  ... // Consume data
  my_func = ...;
  ... // Do some more consumer work
}
... // Do some other work
my_counter = counter; // Reread global!
if (my_counter > my_old_counter) {
... my_func(...) ...
}

I, however, am skeptical about this. I don't understand why the compiler is allowed to do this, since to my understanding a data race only occurs when trying to access the same memory area with any number of reads and at least a write at the same time. The author goes on to motivate that:

the core problem arises from the compiler taking advantage of the assumption that variable values cannot asynchronously change without an explicit assignment

It seems to me that the condition is respected in this case, as the local variable my_counter is never accessed twice and cannot be accessed by other threads. How would the compiler know that the global variable can not be set elsewhere, in another translation unit by another thread? It cannot, and in fact, I assume that the second if case would just be actually optimized away.

Is the author wrong, or am I missing something?

The compiler knows because this would be undefined behavior. — tkausl, Nov 02 '17 at 20:31
The compiler don't read your variables. It emits code which does. — Basile Starynkevitch, Nov 02 '17 at 20:35
If possible you can also read #40 of Meyer Effective C++ http://shop.oreilly.com/product/0636920033707.do You will find meaningful information concerning data race, atomic operations and volatil storage. It also explains why (for non volatile declaration) the compiler can change operation order etc. — Picaud Vincent, Nov 02 '17 at 20:35
The compiler assumes single thread operation by default, even in MT programs, this makes optimization possible. But that's not all, Multi-core processors and their multi-layer memory caches add an extra level of complexity. Even when a variable has _seemingly_ been written to memory, it may not be visible immediately by other cores.... Unless the software explicitly demands it. To be thread-ready on a modern multi-core computer, your counter should be declared a std::atomic. — Michaël Roy, Nov 02 '17 at 20:37
'Doing this would turn the code into: ...' no it wouldn't. It would eliminate `my_counter` altogether and just use `counter`. Read your quotation again. — user207421, Nov 03 '17 at 10:10
@EJP The code following that is quoted exactly as the transformation appears in the article. — Dan Ganea, Nov 03 '17 at 11:16

Ext3h · Accepted Answer · 2017-11-03T10:06:39.593

9

Unless counter is explicitly volatile, the compiler may assume that it never changes if nothing in the current scope of execution could change it. That means if there can be either no alias on a variable, or there are no function calls in between for which the compiler can't know the effects, any external modification is undefined behavior. With volatile you would be declaring external changes as possible, even if the compiler can't know how.

So that optimization is perfectly valid. In fact, even if it did actually perform the copy it still wouldn't be threadsafe as the value may change partially mid read, or might even be completely stale as cache coherency is not guaranteed without synchronisation primitives or atomics.

Well, actually on x86 you won't get an intermediate value for an integer, at least as long as it is aligned. That's one of the guarantees the architecture makes. Stale caches still apply, the value may already have been modified by another thread.

Use either a mutex or an atomic if you need this behavior.

edited Nov 03 '17 at 10:06

answered Nov 02 '17 at 20:33

Ext3h

5,713
17
43

I did not know it is undefined behavior for a global value to change in the current scope of execution. This seemed presumptuous to me, I'll look into finding the appropriate phrasing in the standard. Thanks! – Dan Ganea Nov 02 '17 at 20:43
2

@GaneaDanAndrei That's not undefined behavior. What is undefined behavior is for *a different thread* to change a global value. If you have 2 threads, A and B, and only Thread B changes the value, neither thread on its own observes any undefined behavior: A sees that the value never changes (because it doesn't change it) and B sees that the value changes (but doesn't see anyone else changing it). The UB comes from the intersection of those two threads, where because A is assuming that the value doesn't change, but B *did* change it, odd results can occur. – Xirema Nov 02 '17 at 20:45

score 6 · Answer 2 · answered Nov 02 '17 at 20:34

Compilers [are allowed to] optimize presuming that anything which is "undefined behavior" simply cannot happen: that the programmer will prevent the code from being executed in such a way that would invoke the undefined behavior.

This can lead to rather silly executions where, for example, the following loop never terminates!

int vals[10];
for(int i = 0; i < 11; i++) {
    vals[i] = i;
}

This is because the compiler knows that vals[10] would be undefined behavior, therefore it assumes it cannot happen, and since it cannot happen, i will never exceed or be equal to 11, therefore this loop never terminates. Not all compilers will aggressively optimize a loop like this in this way, though I do know that GCC does.

In the specific case you're working with, reading a global variable in this way can be undefined behavior iff [sic] it is possible for another thread to modify it in the interim. As a result, the compiler is assuming that cross-thread modifications never happen (because it's undefined behavior, and compilers can optimize presuming UB doesn't happen), and thus it's perfectly safe to reread the value (which it knows doesn't get modified in its own code).

The solution is to make counter atomic (std::atomic<int>), which forces the compiler to acknowledge that there might be some kind of cross-thread manipulation of the variable.

@tkausl Yyyyup. "Integer Overflows are undefined behavior, therefore it cannot happen, therefore any code which can cause an integer overflow won't happen, therefore this code, which *always* causes an interger overflow, doesn't happen, therefore all of it can be ignored." — Xirema, Nov 02 '17 at 20:39
A compiler need not apply any kind of optimization for a loop like that not to terminate. Any kind of havoc could be wreaked by storing a value past the end of the array. A more interesting situation would be an expression like `x+y>z` when `y` and `z` are constants. If it would be acceptable for the expression to yield either 0 or 1 in case of overflow *provided it has no other side-effects*, a compiler that can guarantee that may allow a programmer to write more efficient code than if overflow had to be avoided at all costs. — supercat, Nov 10 '17 at 22:50

Can a compiler read twice from a global variable, instead of storing a local one?

2 Answers2

Linked