Why does GCC move variables to a temporary location before assigning them?

Question

When looking at some decompiled C code I saw this:

movl    -0xc(%rbp), %esi
movl    %esi, -0x8(%rbp)

This corresponds to this C code:

x = y;

This got me thinking: how come gcc moves y to %esi and then move %esi to x instead of just moving y to x directly?

This is the entire C and decompiled code, if it matters:

C

int main(void) {
    int x, y, z;

    while(1) {
        x = 0;
        y = 1;
        do {
            printf("%d\n", x);

            z = x + y;
            x = y;
            y = z;
        } while(x < 255);
    }
}

Decompiled

pushq    %rbp
movq     %rsp, %rbp
subq     $0x20, %rsp
movl     $0x0, -0x4(%rbp)

movl     $0x0, -0x8(%rbp) ; x = 0
movl     $0x1, -0xc(%rbp) ; y = 1

; printf
leaq     0x56(%rip), %rdi
movl     -0x8(%rbp), %esi
movb     $0x0, %al
callq    0x100000f78

; z = x + y
movl     -0x8(%rbp), %esi  ; x -> esi
addl     -0xc(%rbp), %esi  ; y + esi
movl     %esi, -0x10(%rbp) ; z = esi

; x = y
movl     -0xc(%rbp), %esi
movl     %esi, -0x8(%rbp)

; y = z
movl     -0x10(%rbp), %esi
movl     %esi, -0xc(%rbp)

movl     %eax, -0x14(%rbp) ; not sure... I believe printf return value?
cmpl     $0xff, -0x8(%rbp) ; x < 255
jl       0x100000f3d ; do...while(x < 255)
jmp      0x100000f2f ; while(1)

A direct memory-to-memory move (if possible in the x86 world, I don't remember) would still require the CPU core to store the value internally one way or another as part of reading it from memory. — Some programmer dude, Aug 06 '17 at 14:08
Probably because you don't ask the compiler to apply *any* optimizations: https://godbolt.org/g/SKUrDo — , Aug 06 '17 at 14:09
It's also aiding debugger, so you can easily check value of variables at every line of code. With optimized code the debugger has no simple way to tell which variables is stored where during stepping through optimized calculation, until it lands into some "fixed" target position, like some memory array. But also it is quite straightforward (fast) way how to produce working machine code for C/C++ source, and you want debug executable to be produced as fast as possible, there's no reason to produce better code (in trade of compilation speed of course, if it's for free, then it's ok). — Ped7g, Aug 06 '17 at 14:44
@Someprogrammerdude yes it is possible in cases like `pop (%rax)` and obviously with the string move instructions. — Ajay Brahmakshatriya, Aug 06 '17 at 18:23
@StoryTeller: not quite true. `mov dword [mem], imm32` works, using the `mov r/m, imm` encoding. The limitation is that you can't have two arbitrary addressing modes (modr/m + ...) in one instruction. See https://stackoverflow.com/questions/31904964/x86-assembly-mov-instruction-register-to-register-and-memory-to-memory. But you can do `push [mem]`/`pop [mem]` or `movs` to copy memory to memory. — Peter Cordes, Aug 06 '17 at 19:13
A move from x to y directly would be impossible. You have to read x and then you have to write y. You have to hold what you read somewhere. — David Schwartz, Aug 07 '17 at 02:52
@DavidSchwartz Wrong abstraction layer. There certainly could be an instruction that does memory to memory transfer (and I can think of a handful instructions that do exactly that). The CPU might have to store it in some temporary location to implement the instruction but that's an implementation detail. Also thanks to propagation delays it's quite possible to read from and write to say a register in the same cycle. — Voo, Aug 07 '17 at 07:09
@Voo Sure, there could be an *instruction* that does a memory to memory transfer, but so what? It would still require intermediary storage somewhere, so still wouldn't accomplish what the OP hopes to accomplish. — David Schwartz, Aug 07 '17 at 08:00
@Charanor I think there is an error here. When it says "This got me thinking: how come gcc moves x to %esi and then move %esi to y instead of just moving x to y directly?" Should be "This got me thinking: how come gcc moves y to %esi and then move %esi to x instead of just moving x to y directly?": x and y are interchanged in the first appearance — Raul Luna, Aug 07 '17 at 08:07
the code you're quoting is *disassembled*, not *decompiled*. — Igor Skochinsky, Aug 07 '17 at 13:00

interjay · Accepted Answer · 2017-08-06T14:16:11.683

75

Most x86 instructions (other than some specialized instructions such as movsb) can only access one memory location. Therefore a move from memory to memory requires going through a register with two mov instructions.

The mov instruction can be used in the following ways:

mov mem, reg
mov reg, mem
mov reg, reg
mov reg, imm
mov mem, imm

There is no mov mem, mem.

Note that if you had compiled with optimizations, the variables would be placed in registers so this wouldn't be an issue.

edited Aug 06 '17 at 14:16

answered Aug 06 '17 at 14:10

interjay

107,303
21
270
254

3

It should also be noted that the instructions, which _can_ access two memory locations, need several instructions before them to prepare the registers (like `DS:ESI`/`RSI` and `ES:EDI`/`RDI` for `movs`). This would in any case remove any potential advantage of using such instructions for copying a single dword from memory to memory, even if there would be one otherwise. – Ruslan Aug 07 '17 at 09:09
Anyway, the CPU cannot simultaneously read from one memory location and write the same data immediately to another, so it has to save the value in some internal storage between read and write. With all the optimizations going on in modern CPUs, I guess there wouldn't be any runtime difference between the 2-opcode sequence and the hypothetical "mov mem,mem". – Ralf Kleberhoff Aug 07 '17 at 12:31
`movs?` instructions aren't the only ones. `PUSH` and `POP` with a memory operand have an explicit memory location (the operand) and an implicit memory location (the stack) – Michael Petch Aug 07 '17 at 16:08
That makes sense now! I never realized there were no way to move from memory to memory without intermediate external storage (probably because of how RAM works, I assume). Excellent answer thank you. – Charanor Aug 08 '17 at 16:28
1

Calling CPU registers "external" storage is rather unusual. The programming model doesn't recognize anything within the pipeline as storage at all, so work registers are as internal as any storage gets. TTA or VLIW architectures are more likely to expose pipeline details, though they also occur in delay slots. – Yann Vernier Aug 12 '17 at 13:24

Why does GCC move variables to a temporary location before assigning them?

C

Decompiled

1 Answers1