-1

Writing some very basic C code for a Cortex-M0 device, I'm surprised to see the disassembly:

void delay(void) {
for (int x=0;x<0xffff;x++) ;
}

This becomes:

for (int x=0;x<0xffff;x++) ;
    2300        movs r3, #0
    9301        str r3, [sp, #4]
    E002        b 0x0800026E
    9B01        ldr r3, [sp, #4]      //0x08000268
    3301        adds r3, #1
    9301        str r3, [sp, #4]
    9B01        ldr r3, [sp, #4]      //0x0800026E
    4A03        ldr r2, =0x0000FFFE
    4293        cmp r3, r2
    DDF8        ble 0x08000268
--- main.c -- 8 --------------------------------------------
}
    46C0        nop
    46C0        nop
    B002        add sp, sp, #8
    4770        bx lr
    46C0        nop
    0000FFFE    .word 0x0000FFFE

Now this seems awfully wasteful. I know my purpose was to 'waste time' with the simple delay function, but it seems like gcc uses only two registers to access variables on the stack.

This is stock Rowley Crossworks 4.10 with all default settings using the GCC compiler that came with it. The debug configuration adds no optimization flags.

Wouldn't something like this be significantly better?

# Counter reset
  movs r0, #0x0
  ldr r1, =0xffff

loopone:
  adds r0,#0x1
  cmp r0,r1
  bne loopone

It seems like default unoptimized gcc output prefers stack variables over registers. But we have 4 registers available as per AAPCS which lets us bypass any stack pushes and pops above the usual. This function was also not inlined, which could possibly explain this, but just saving the original values to stack and recovering them would still be better than repeatedly using the stack like this.

Why does gcc prefer the stack over available registers?

  • 9
    You are not compiling with optimizations enabled and so the compiler will make no attempt at producing efficient instructions at all. It will stupidly translate every read/write on the language level into load/stores to memory on the target. If you did enable optimizations, the compiler would completely remove the loop because it has no side effects at all. – user17732522 May 07 '23 at 06:55
  • 10
    so .. you are asking gcc to not optimize the code at all then you are surprised that it ... didn't? – bolov May 07 '23 at 06:56
  • 1
    Compilers don't keep vars in regs in `-O0` builds unless you use `register int x`, partly for consistent debugging. (So `jump ` works inside GDB, see the linked duplicate). When you enable optimization you have the opposite problem, and would need something like `asm volatile ("" : "r"(x))` to force the compiler to materialize every `x` value in a register. – Peter Cordes May 07 '23 at 13:43

1 Answers1

1

The way compilers work is to translate your code rather simplistically but very correctly, then if you ask for it, optimize the translation by detailed analysis of that simplistically generated translation.

These optimizations can be very expensive from a compile-time performance perspective (and for some projects, build time is critical), plus they make debugging more difficult (e.g. variables can disappear) — so optimization is optional.

This means that, for simplistic very correct initial translation, the variables all get memory locations — the compiler knows that it can't go wrong with that, meaning it will be a correct translation.  The compiler knows that it won't run out of registers, for example, without expensive analysis. 

For a variety of reasons, it has the general capability of removing loads and stores, relocating values to registers, which it applies broadly to the generated code during (optional) optimization, not just to declared variables, so the mechanisms are there anyway, and there's little merit to special handling of declared variables (i.e. putting them in registers right away in the simplistic translation).

In short, there's lots of room for improvement of the unoptimized code, but of course, that's what the optimizer is for.  The approach of detailed analysis in optimizing the simplistic translation catches improvements of a general and obvious nature (like putting variables in registers) as well as hidden improvements that are very specific to the exact patterns found in the statements and expressions of the input and their translations at a lower level.  It is not always a win to relocate a variable from memory to a register (if the variable is used only once and live across a call), and the detailed analysis can determine (by some measure) where that is a win and where not.  This is a methodical approach that is more effective than simply trying to generate good code in the first place.

Erik Eidt
  • 23,049
  • 2
  • 29
  • 53