Why does GCC use additional registers for pushing values onto the stack?

Question

This C code

void test_function(int a, int b, int c, int d) {}

int main() {
  test_function(1, 2, 3, 4);
  return 0;
}

gets compiled by GCC (no flags, version 12.1.1, target x86_64-redhat-linux) into

0000000000401106 <test_function>:
  401106:   55                      push   rbp
  401107:   48 89 e5                mov    rbp,rsp
  40110a:   89 7d fc                mov    DWORD PTR [rbp-0x4],edi
  40110d:   89 75 f8                mov    DWORD PTR [rbp-0x8],esi
  401110:   89 55 f4                mov    DWORD PTR [rbp-0xc],edx
  401113:   89 4d f0                mov    DWORD PTR [rbp-0x10],ecx
  401116:   90                      nop
  401117:   5d                      pop    rbp
  401118:   c3                      ret    

0000000000401119 <main>:
  401119:   55                      push   rbp
  40111a:   48 89 e5                mov    rbp,rsp
  40111d:   b9 04 00 00 00          mov    ecx,0x4
  401122:   ba 03 00 00 00          mov    edx,0x3
  401127:   be 02 00 00 00          mov    esi,0x2
  40112c:   bf 01 00 00 00          mov    edi,0x1
  401131:   e8 d0 ff ff ff          call   401106 <test_function>
  401136:   b8 00 00 00 00          mov    eax,0x0
  40113b:   5d                      pop    rbp
  40113c:   c3                      ret

Why are additional registers (ecx, edx, esi, edi) used as intermediary storage for values 1, 2, 3, 4 instead of putting them into rbp directly?

It's part of the 64 bit ABI. The first six integer/pointer arguments are _not_ placed on the stack. They are put in registers. See: https://www.intel.com/content/dam/develop/external/us/en/documents/mpx-linux64-abi.pdf and https://en.wikipedia.org/wiki/X86_calling_conventions `Integer Arguments 1-6 --> RDI, RSI, RDX, RCX, R8, R9`, `Floating Point Arguments 1-8 --> XMM0 - XMM7`, and `Excess Arguments --> Stack` — Craig Estey, Jun 19 '22 at 19:20
@CraigEstey thank you! What's the reason for this? Naively, you'd think that these additional operations would result in less performant code, but I guess that this is not the case? — whospugisthis, Jun 19 '22 at 19:30
On the contrary, they _increase_ performance. Why push a value to the stack (does memory write) only to have the called function fetch it (a memory read) to put it into a register in order to be able to use it? The compiler knows which regs are used, so the callee "just uses" the values that are already in the register. Further, the compiler knows it's about to call a function, so any instructions that have to calculate something and put it in a register will put the final value in the "correct" register for the arg being passed. — Craig Estey, Jun 19 '22 at 19:34

user17732522 · Accepted Answer · 2022-06-19T21:20:36.603

"as intermediary storage": You confusion seems to be this part.

The ABI specifies that these function arguments are passed in the registers you are seeing (see comments under the question). The registers are not just used as intermediary. The value are never supposed to be put on the stack at all. They stay in the register the whole time, unless the function needs to reuse the register for something else or pass on a pointer to the function parameter or something similar.

What you are seeing in test_function is just an artifact of not compiling with optimizations enabled. The mov instructions putting the registers on the stack are pointless, since nothing is done with them afterwards. The stack pointer is just immediately restored and then the function returns.

The whole function should just be a single ret instruction. See https://godbolt.org/z/qG9GjMohY where -O2 is used.

Without optimizations enabled the compiler makes no attempt to remove instructions even if they are pointless and it always stores values of variables to memory and loads them from memory again, even if they could have been held in registers. That's why it is almost always pointless to look at -O0 assembly.

score 2 · Answer 2 · answered Jun 19 '22 at 19:47

The registers are used for the arguments to call the function. The standard calling convertion calls for aguments to be placed in certain register, so the code you see in main puts the arguments into those registers and the code in test_function expects them in those registers and reads them from there.

So your follow-on question might be "why is test_function copying those argument on to the stack?". That's because you're compiling without optimization, so the compiler produces inefficient code, allocation space in the stack frame for every argument and local var and copying the arguments from their input register into the stack frame as part of the function prolog. If you were to use those values in th function, you would see it reading them from the stack frame locations even though they are probably still in the registers. If you compile with -O, you'll see the compiler get rid of all this, as the stack frame is not needed.

Why does GCC use additional registers for pushing values onto the stack?

2 Answers2