0

I am learning/re-learning C, and I've learnt about the register keyword. On many websites, people said it is recommended to not be used, or even useless. The book I am using says it is useful in for-loops, then I tried without the keyword:

#include <stdio.h>
#include <time.h>

int main() {
    int count = 1;

    clock_t start = clock();
    while(count != 0) {
        count++;
    }
    clock_t stop = clock();
    double time = (double)(stop - start);

    printf("%f", time);

    return 0;
}

and, then with it:

[..] register int count = 1; [...]

Compiling the source code with GCC, I have seen that using register made the program around 5x faster.

Hence, my question, is there something wrong with me (e.g. my GCC) or is the register keyword actually useful?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Schilive
  • 187
  • 7
  • Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackoverflow.com/rooms/248362/discussion-on-question-by-schilive-why-does-using-the-register-keyword-makes-m). – Samuel Liew Sep 27 '22 at 06:09

2 Answers2

2

Even if signed overflow is undefined behavior, binaries do have a definitive behavior. So we can check why using register makes the code faster.

Using an undefined behavior with no optimizations means that GCC can produce a completely different assembly even for small, apparently insignificant, modifications.
register is such insignificant modification since it's an historical micro-optimization that GCC doesn't honor anymore (apparently).
Quoting the GNU C Manual Reference:

20.10 auto and register
For historical reasons, you can write auto or register before a local variable declaration. auto merely emphasizes that the variable isn’t static; it changes nothing.
register suggests to the compiler storing this variable in a register. How- ever, GNU C ignores this suggestion, since it can choose the best variables to store in registers without any hints.

However, the two binaries differ exactly in the fact that count is held in a register (ebx specifically) vs a local variable (and in the fact that a frame pointer is created).

Left: binary without register. Right: binary with register

So register does indeed makes your code faster. You can see, on the left, that without it (at -O0 optimization level) GCC generated add [rbp+count], 1 (this is a 32-bit increment, IDA doesn't show that) while with the register modifier add ebx, 1 was generated.
The 5x slow-down seems to match the store-load forwarding latency.

Note however that GCC may move count to a register or in memory at its will (unless maybe with volatile), this can happen if you change int to unsigned int or to unsigned long long or if too many other local variables are in use or with any other compiler switch (like specific optimizations).

register had the desired effect in this simple code because there were no other constraints in place from the compiler analysis.
It's however interesting to see that GCC doesn't completely ignore it like Stallman claims in his manual.

Margaret Bloom
  • 41,768
  • 5
  • 78
  • 124
  • *GNU C ignores this suggestion, since it can choose the best variables to store in registers without any hints.* - That's assuming optimization is enabled, which it always should be for any code where performance is relevant at all. But that was not the case here, so it's not applicable. Instead, this is basically a duplicate of [Why does clang produce inefficient asm with -O0 (for this simple floating point sum)?](https://stackoverflow.com/q/53366394) – Peter Cordes Sep 27 '22 at 06:30
  • @PeterCordes Why? GCC could be made to choose to ignore `register` at any optimization level. In fact, it could ignore `register` at the parsing layer. Truth is, it just doesn't ignore it, it ignores whatever weight `register` add to its internal register allocating heuristic if optimizations are enabled. This is different than ignoring the keyword. – Margaret Bloom Sep 27 '22 at 07:53
  • Yeah, interesting point. It needs to reject C that takes the address of a `register` variable, and print a diagnostic. IIRC, a diagnostic is requires about ill-formed programs like that. Since it already has to know the variable can't have an address, it makes some sense to make asm where that's true. I haven't checked what it does when there are so many `register` vars that there aren't enough registers; I'd guess the first n get regs, later are just like non-`register` locals. Also, it already has to parse it to some degree for stuff like `register int foo asm("r8")`. – Peter Cordes Sep 27 '22 at 08:12
1

Your program has undefined behavior because you rely on count becoming 0 by incrementation, which may occur as a side effect of signed arithmetic overflow, but is not defined.

You could modify your test by defining count as an unsigned int as unsigned int arithmetics is defined as computed modulo UINT_MAX+1.

#include <stdio.h>
#include <time.h>

int main() {
    unsigned int count = 1;

    clock_t start = clock();
    while(count != 0) {
        count++;
    }
    clock_t stop = clock();
    double time = (double)(stop - start);

    printf("%f\n", time);

    return 0;
}

Yet the above code executes instantly when compiled with optimisations because the compiler can determine that no side effect occurs during the 4294967295 iterations leading to count == 0. Adding a register keyword does not change that.

Compiling with optimisations disabled gives a different result for gcc as can be observed on Godbolt's Compiler Explorer. Without the register keyword, the variable count is stored in memory within the stack frame, whereas it is indeed stored in register %ebx when the register keyword is used, leading to a much faster execution time. This probably explains the the behavior you observe, but be aware that clang for example does not do that: variable count is stored in the stack frame memory in both cases.

Using the register keyword may have an impact on the code generation under special circumstances, but modern compilers generally ignore it and choose which variables to store in registers and when much more efficiently than the programmer would. Here are my recommendations:

  • don't use register
  • do use compiler optimisations for production builds
  • your book is probably outdated, try and use a more recent one.
chqrlie
  • 131,814
  • 10
  • 121
  • 189