45

I am reading this question about inline on isocpp FAQ, the code is given as

void f()
{
  int x = /*...*/;
  int y = /*...*/;
  int z = /*...*/;
  // ...code that uses x, y and z...
  g(x, y, z);
  // ...more code that uses x, y and z...
 }

then it says that

Assuming a typical C++ implementation that has registers and a stack, the registers and parameters get written to the stack just before the call to g(), then the parameters get read from the stack inside g() and read again to restore the registers while g() returns to f(). But that’s a lot of unnecessary reading and writing, especially in cases when the compiler is able to use registers for variables x, y and z: each variable could get written twice (as a register and also as a parameter) and read twice (when used within g() and to restore the registers during the return to f()).

I have a big difficulty understanding the paragraph above. I try to list my questions as below:

  1. For a computer to do some operations on some data which are residing in the main memory, is it true that the data must be loaded to some registers first then the CPU can operate on the data? (I know this question is not particularly related to C++, but understanding this will be helpful to understand how C++ works.)
  2. I think f() is a function in a way the same as g(x, y, z) is a function. How come x, y, z before calling g() are in the registers, and the parameters passed in g() are on the stack?
  3. How is it known that the declarations for x, y, z make them stored in the registers? Where the data inside g() is stored, register or stack?

PS

It's very hard to choose an acceptable answer when the answers are all very good(E.g., the ones provided by @MatsPeterson, @TheodorosChatzigiannakis, and @superultranova) I think. I personally like the one by @Potatoswatter a little bit more since the answer offers some guidelines.

briantist
  • 45,546
  • 6
  • 82
  • 127
Allanqunzi
  • 3,230
  • 1
  • 26
  • 58
  • have you looked at the generated asm? – Daniel A. White Aug 30 '15 at 16:22
  • No, I have a very limited knowledge on asm. – Allanqunzi Aug 30 '15 at 16:22
  • 5
    (1) Yes. (2) There is no guarantee that `x`, `y`, and `z`, are in registers. The text talks about the case *if* they are in registers: ""in cases when the compiler is able to use registers for variables x, y and z..." The text also assumes that parameters are passed on the stack. ("Assuming...") (3) See (2). Basically, your questions (2) and (3) are saying "How come X?" when the text says "Let's assume X." So we don't know X. We are just assuming it. – Raymond Chen Aug 30 '15 at 16:24
  • 2
    The text explicitly says *"Assuming a typical C++ implementation"*... – Christian Hackl Aug 30 '15 at 16:28
  • 1
    If you pass the address of a variable somewhere, it won't be stored in a register (so it will be on the stack). Otherwise, you can't tell, and probably shouldn't care. – Jonathan Leffler Aug 30 '15 at 16:30
  • 1
    A good compiler typically allocates registers for heavily used variables until it runs out of registers, then places all the rest on the stack. It is normally not known which variable ends up where unless you study the compiled code. – n. m. could be an AI Aug 30 '15 at 16:31
  • @RaymondChen, if the answer for (1) is yes, then assume `x, y, z` are in the registers before calling `g()`, then by `calling g()` `x, y, z` are passed into stack, then inside `g()` according to the answer for (1), `x, y, z` must be copied to registers again. But this seems not the text is saying, the text says `x, y ,z ` restore to registers only when `g()` is done and returns to `f()`. – Allanqunzi Aug 30 '15 at 16:31
  • 1
    Function `f` does not know which registers function `g` will use to hold `x`, `y`, and `z`. And maybe function `g` never accesses parameter `z`, so it never gets loaded into a register at all. Since `f` does not know what `g` does, it must assume that `g`'s use of registers is incompatible with `f`. That's sort of the crux of inline: It lets the two functions share registers. – Raymond Chen Aug 30 '15 at 16:47
  • Known by what? The compiler knows because the compiler is what decides that. The CPU doesn't need to know. – user253751 Aug 31 '15 at 00:29
  • @n.m.: Modern "Static Single Assignment" SSA compilers allocate registers not to variables, but variable _use_ (i.e. from write to read). Variables may share a register, if their use does not overlap. But indeed, heavily used variables are written and/or read a lot so they spend a lot of time in registers. – MSalters Aug 31 '15 at 13:10

8 Answers8

41

Don't take that paragraph too seriously. It seems to be making excessive assumptions and then going into excessive detail, which can't really be generalized.

But, your questions are very good.

  1. For a computer to do some operations on some data which are residing in the main memory, is it true that the data must be loaded to some registers first then the CPU can operate on the data? (I know this question is not particularly related to C++, but understanding this will be helpful to understand how C++ works.)

More-or-less, everything needs to be loaded into registers. Most computers are organized around a datapath, a bus connecting the registers, the arithmetic circuits, and the top level of the memory hierarchy. Usually, anything that is broadcast on the datapath is identified with a register.

You may recall the great RISC vs CISC debate. One of the key points was that a computer design can be much simpler if the memory is not allowed to connect directly to the arithmetic circuits.

In modern computers, there are architectural registers, which are a programming construct like a variable, and physical registers, which are actual circuits. The compiler does a lot of heavy lifting to keep track of physical registers while generating a program in terms of architectural registers. For a CISC instruction set like x86, this may involve generating instructions that send operands in memory directly to arithmetic operations. But behind the scenes, it's registers all the way down.

Bottom line: Just let the compiler do its thing.

  1. I think f() is a function in a way the same as g(x, y, z) is a function. How come x, y, z before calling g() are in the registers, and the parameters passed in g() are on the stack?

Each platform defines a way for C functions to call each other. Passing parameters in registers is more efficient. But, there are trade-offs and the total number of registers is limited. Older ABIs more often sacrificed efficiency for simplicity, and put them all on the stack.

Bottom line: The example is arbitrarily assuming a naive ABI.

  1. How is it known that the declarations for x, y, z make them stored in the registers? Where the data inside g() is stored, register or stack?

The compiler tends to prefer to use registers for more frequently accessed values. Nothing in the example requires the use of the stack. However, less frequently accessed values will be placed on the stack to make more registers available.

Only when you take the address of a variable, such as by &x or passing by reference, and that address escapes the inliner, is the compiler required use memory and not registers.

Bottom line: Avoid taking addresses and passing/storing them willy-nilly.

Potatoswatter
  • 134,909
  • 25
  • 265
  • 421
  • the "Bottom line" seems contradict to the usual recommended practice in C++, i.e., pass by `const reference` or `reference` to avoid unnecessary copying. – Allanqunzi Aug 30 '15 at 16:45
  • 3
    @Allanqunzi Things that fit into registers often should not be passed by reference. The standard library follows this convention for pointers, integers, iterators, overloading tags, etc. However, inlining allows the compiler to see through references, since nothing gets passed anyway. – Potatoswatter Aug 30 '15 at 16:50
  • 1
    … also it's worth mentioning that passing by non-const `&` reference shouldn't be done for the sake of performance. An rvalue `&&` reference, maybe. Also, there's some talk of adding a C++ extension to automatically choose pass-by-`const&` or by-value depending on which is likely to be faster. – Potatoswatter Aug 30 '15 at 16:55
  • @Potatoswatter do you mean [`boost::call_traits`](http://www.boost.org/doc/libs/1_55_0/libs/utility/call_traits.htm)? Or is there another thing? – nonsensation Aug 30 '15 at 18:01
  • Well, there's one more thing to consider: The variable might not exist at all, as the rules only specify that as long as the *observable behavior* (it is defined *what* counts for that) is **as if** the compiler did as specified, it is free to do it any way it wants. – Deduplicator Aug 30 '15 at 18:35
  • @Serthy I'm referring to the proposal to make the pipe `|` analogous to `&` in declarators. So you would have `foo(T | param)` and the compiler would deduce `T`, then decide whether to pass it by reference or value. I think the proposal is probably dead, but the problem remains an open issue. – Potatoswatter Aug 31 '15 at 06:30
  • "Usually at least some parameters may be passed in registers because it's more efficient.". Where do you get this nonsense from? All symbols are external by default and might be called from another compilation unit, hence there needs to be clear calling conventions. – vidstige Sep 02 '15 at 17:07
  • @vidstige Most calling conventions specify passing or returning a small number of small objects in registers. The ones that don't are mostly old. No calling convention I've encountered depends on linkage. – Potatoswatter Sep 03 '15 at 00:02
  • @Potatoswatter actually no calling convention uses the term "usually". It's always carefully dictated under what circumstances values MUST be passed or returned in registers. Realize that if the caller and callee differ in their assumptions a seg fault is imminent. – vidstige Sep 03 '15 at 08:19
  • @vidstige You could say the same of "Usually anything that is broadcast on the data path is identified with a register." Certainly, a processor with architectural registers for all datapath transactions will suffer a fault if a register is not allocated. You need to use a reasonable definition for "usually," considering that this post is a very broad overview of all implementation possibilities. But, I'll edit that particular sentence. – Potatoswatter Sep 03 '15 at 08:51
  • @Potatoswatter alright, the edit makes much more sense. tanks. – vidstige Sep 03 '15 at 11:10
15

It is entirely up to the compiler (in conjunction with the processor type) whether a variable is stored in memory or a register [or in some cases more than one register] (and what options you give the compiler, assuming it's got options to decide such things - most "good" compilers do). For example, the LLVM/Clang compiler uses a specific optimisation pass called "mem2reg" that moves variables from memory to registers. The decision to do so is based on how the variable(s) are used - for example, if you take the address of a variable at some point, it needs to be in memory.

Other compilers have similar, but not necessarily identical, functionality.

Also, at least in compilers that have some semblance of portability, there will ALSO be a phase of generatinc machine code for the actual target, which contains target-specific optimisations, which again can move a variable from memory to a register.

It is not possible [without understanding how the particular compiler works] to determine if the variables in your code are in registers or in memory. One can guess, but such a guess is just like guessing other "kind of predictable things", like looking out the window to guess if it's going to rain in a few hours - depending on where you live, this may be a complete random guess, or quite predictable - some tropical countries, you can set your watch based on when the rain arrives each afternoon, in other countries, it rarely rains, and in some countries, like here in England, you can't know for certain beyond "right now it is [not] raining right here".

To answer the actual questions:

  1. This depends on the processor. Proper RISC processors such as ARM, MIPS, 29K, etc have no instructions that use memory operands except the load and store type instructions. So if you need to add two values, you need to load the values into registers, and use the add operation on those registers. Some, such as x86 and 68K allows one of the two operands to be a memory operand, and for example PDP-11 and VAX have "full freedom", whether your operands are in memory or register, you can use the same instruction, just different addressing modes for the different operands.
  2. Your original premise here is wrong - it's not guaranteed that arguments to g are on the stack. That is just one of many options. Many ABIs (application binary interface, aka "calling conventions) use registers for the first few arguments to a function. So, again, it depends on which compiler (to some degree) and what processor (much more than which compiler) the compiler targets whether the arguments are in memory or in registers.
  3. Again, this is a decision that the compiler makes - it depends on how many registers the processor has, which are available, what the cost is if "freeing" some register for x, y and z - which ranges from "no cost at all" to "quite a bit" - again, depending on the processor model and the ABI.
Mats Petersson
  • 126,704
  • 14
  • 140
  • 227
  • The only memory-memory instruction in 68K is `move`. However, it does allow either of the two operands to be in memory, which is more freedom than x86. – Potatoswatter Sep 02 '15 at 04:25
  • @Potatoswatter: Ah, memory not as good as I would like it to be - in my defense, last time I programmed 68K was about 20 years ago... :) – Mats Petersson Sep 02 '15 at 07:47
7

For a computer to do some operations on some data which are residing in the main memory, is it true that the data must be loaded to some registers first then the CPU can operate on the data?

Not even this statement is always true. It is probably true for all the platforms you'll ever work with, but there surely can be another architecture that doesn't make use of processor registers at all.

Your x86_64 computer does however.

I think f() is a function in a way the same as g(x, y, z) is a function. How come x, y, z before calling g() are in the registers, and the parameters passed in g() are on the stack?

How is it known that the declarations for x, y, z make them stored in the registers? Where the data inside g() is stored, register or stack?

These two questions cannot be uniquely answered for any compiler and system your code will be compiled on. They cannot even be taken for granted since g's parameters might not be on the stack, it all depends on several concepts I'll explain below.

First you should be aware of the so-called calling conventions which define, among the other things, how function parameters are passed (e.g. pushed on the stack, placed in registers, or a mix of both). This isn't enforced by the C++ standard and calling conventions are a part of the ABI, a broader topic regarding low-level machine code program issues.

Secondly register allocation (i.e. which variables are actually loaded in a register at any given time) is a complex task and a NP-complete problem. Compilers try to do their best with the information they have. In general less frequently accessed variables are put on the stack while more frequently accessed variables are kept on registers. Thus the part Where the data inside g() is stored, register or stack? cannot be answered once-and-for-all since it depends on many factors including register pressure.

Not to mention compiler optimizations which might even eliminate the need for some variables to be around.

Finally the question you linked already states

Naturally your mileage may vary, and there are a zillion variables that are outside the scope of this particular FAQ, but the above serves as an example of the sorts of things that can happen with procedural integration.

i.e. the paragraph you posted makes some assumptions to set things up for an example. Those are just assumptions and you should treat them as such.

As a small addition: regarding the benefits of inline on a function I recommend taking a look at this answer: https://stackoverflow.com/a/145952/1938163

Community
  • 1
  • 1
Marco A.
  • 43,032
  • 26
  • 132
  • 246
  • Modern computers are shifting from von Neumann to Harvard architectures, at least logically: program memory becomes effectively non-writeable. Doesn't affect registers/data though. – MSalters Aug 31 '15 at 13:31
  • @MSalters I didn't imply your last sentence but I'll edit the message to make that clear. Thanks – Marco A. Aug 31 '15 at 13:33
5

You can't know, without looking at the assembly language, whether a variable is in a register, stack, heap, global memory or elsewhere. A variable is an abstract concept. The compiler is allowed to use registers or other memory as it chooses, as long as the execution isn't changed.

There's also another rule that affects this topic. If you take the address of a variable and store into a pointer, the variable may not be placed into a register because registers don't have addresses.

The variable storage may also depend on the optimization settings for the compiler. Variables can disappear due to simplification. Variables that don't change value may be placed into the executable as a constant.

Thomas Matthews
  • 56,849
  • 17
  • 98
  • 154
3

Regarding your #1 question, yes, non load/store instructions operate on registers.

Regarding your #2 question, if we are assuming that parameters are passed on the stack, then we have to write the registers to the stack, otherwise g() won't be able to access the data, since the code in g() doesn't "know" which registers the parameters are in.

Regarding your #3 question, it is not known that x, y and z will for sure be stored in registers in f(). One could use the register keyword, but that's more of a suggestion. Based on the calling convention, and assuming the compiler doesn't do any optimization involving parameter passing, you may be able to predict whether the parameters are on the stack or in registers.

You should familiarize yourself with calling conventions. Calling conventions deal with the way that parameters are passed to functions and typically involve passing parameters on the stack in a specified order, putting parameters into registers or a combination of both.

stdcall, cdecl, and fastcall are some examples of calling conventions. In terms of parameter passing, stdcall and cdecl are the same, in the parameters are pushed in right to left order onto the stack. In this case, if g() was cdecl or stdcall the caller would push z,y,x in that order:

mov eax, z
push eax
mov eax, x
push eax
mov eax, y
push eax
call g

In 64bit fastcall, registers are used, microsoft uses RCX, RDX, R8, R9 (plus the stack for functions requiring more than 4 params), linux uses RDI, RSI, RDX, RCX, R8, R9. To call g() using MS 64bit fastcall one would do the following (we assume z, x, and y are not in registers)

mov rcx, x
mov rdx, y
mov r8, z
call g

This is how assembly is written by humans, and sometimes compilers. Compilers will use some tricks to avoid passing parameters, as it typically reduces the number of instructions and can reduce the number of time memory is accessed. Take the following code for example (I'm intentionally ignoring non-volatile register rules):

f:
xor rcx, rcx
mov rsi, x
mov r8, z
mov rdx y
call g
mov rcx, rax
ret

g:
mov rax, rsi
add rax, rcx
add rax, rdx
ret

For illustrative purposes, rcx is already in use, and x has been loaded into rsi. The compiler can compile g such that it uses rsi instead of rcx, so values don't have to be swapped between the two registers when it comes time to call g. The compiler could also inline g, now that f and g share the same set of registers for x, y, and z. In that case, the call g instruction would be replaced with the contents of g, excluding the ret instruction.

f:
xor rcx, rcx
mov rsi, x
mov r8, z
mov rdx y
mov rax, rsi
add rax, rcx
add rax, rdx
mov rcx, rax
ret

This will be even faster, because we don't have to deal with the call instruction, since g has been inlined into f.

superultranova
  • 1,294
  • 8
  • 14
2

Short answer: You can't. It completely depends on your compiler and the optimizing features enabled.

The compiler concern is to translate into assembly your program, but how it is done is tighly coupled to how your compiler works. Some compilers allows you hint what variable map to register. Check for example this: https://gcc.gnu.org/onlinedocs/gcc/Global-Reg-Vars.html

Your compiler will apply transformations to your code in order to gain something, may be performance, may be lower code size, and it apply cost functions to estimate this gains, so you normally only can see the result disassembling the compilated unit.

1

For a computer to do some operations on some data which are residing in the main memory, is it true that the data must be loaded to some registers first then the CPU can operate on the data?

This depends on the architecture and the instruction set it offers. But in practice, yes - it is the typical case.

How is it known that the declarations for x, y, z make them stored in the registers? Where the data inside g() is stored, register or stack?

Assuming the compiler doesn't eliminate the local variables, it will prefer to put them in registers, because registers are faster than the stack (which resides in the main memory, or a cache).

But this is far from a universal truth: it depends on the (complicated) inner workings of the compiler (whose details are handwaved in that paragraph).

I think f() is a function in a way the same as g(x, y, z) is a function. How come x, y, z before calling g() are in the registers, and the parameters passed in g() are on the stack?

Even if we assume that the variables are, in fact, stored in the registers, when you call a function, the calling convention kicks in. That's a convention that describes how a function is called, where the arguments are passed, who cleans up the stack, what registers are preserved.

All calling conventions have some kind of overhead. One source of this overhead is the argument passing. Many calling conventions attempt to reduce that, by preferring to pass arguments through registers, but since the number of CPU registers is limited (compared to the space of the stack), they eventually fall back to pushing through the stack after a number of arguments.

The paragraph in your question assumes a calling convention that passes everything through the stack and based on that assumption, what it's trying to tell you is that it would be beneficial (for execution speed) if we could "copy" (at compile time) the body of the called function inside the caller (instead of emitting a call to the function). This would yield the same results logically, but it would eliminate the runtime cost of the function call.

Theodoros Chatzigiannakis
  • 28,773
  • 8
  • 68
  • 104
1

Variables are almost always stored in main memory. Many times, due to compiler optimizations, value of your declared variable will never move to main memory but those are intermediate variable that you use in your method which doesn't hold relevance before any other method is called (i.e. occurrence of stack operation).

This is by design - to improve performance as it is easier (and much faster) for processor to address and manipulate data in registers. Architectural registers are limited in size so everything cannot be put in registers. Even if you 'hint' your compiler to put it in register, eventually, OS may manage it outside register, in main memory, if available registers are full.

Most probably, a variable will be in main memory because it hold relevance further in the near execution and may hold reliance for longer period of CPU time. A variable is in architectural register because it holds relevance in upcoming machine instructions and execution will be almost immediate but may not be relevant for long.

Yogee
  • 1,412
  • 14
  • 22