3

Naturally, C++ compilers can inline function calls made from within a function template, when the inner function call is directly known in that scope (ref).

#include <iostream>

void holyheck()
{
   std::cout << "!\n";
}

template <typename F>
void bar(F foo)
{
   foo();
}

int main()
{
   bar(holyheck);
}

Now what if I'm passing holyheck into a class, which stores the function pointer (or equivalent) and later invokes it? Do I have any hope of getting this inlined? How?

template <typename F>
struct Foo
{
   Foo(F f) : f(f) {};
   void calledLater() { f(); }

private:
   F f;
};

void sendMonkeys();
void sendTissues();

int main()
{
   Foo<void(*)()> f(sendMonkeys);
   Foo<void(*)()> g(sendTissues);
   // lots of interaction with f and g, not shown here
   f.calledLater();
   g.calledLater();
}

My type Foo is intended to isolate a ton of logic; it will be instantiated a few times. The specific function invoked from calledLater is the only thing that differs between instantiations (though it never changes during the lifetime of a Foo), so half of the purpose of Foo is to abide by DRY. (The rest of its purpose is to keep this mechanism isolated from other code.)

But I don't want to introduce the overhead of an actual additional function call in doing so, because this is all taking place in a program bottleneck.

I don't speak ASM so analysing the compiled code isn't much use to me.
My instinct is that I have no chance of inlining here.

Community
  • 1
  • 1
Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
  • @hvd: I don't really care whether the function "pointer" is in fact a functor, `std::function`, or whatever. Or whether it's passed in as a template arg or a ctor arg. As long as it's inlined. However I (and refp) have at least now fixed the code in the question to be internally consistent, thanks :) – Lightness Races in Orbit Mar 02 '15 at 23:07
  • 3
    http://goo.gl/WDhWti ? – T.C. Mar 02 '15 at 23:08
  • Yes it can be inlined, since *at the point where it's called* (in `calledLater`), the compiler knows the full type of the functor being invoked. Assuming it's actually defined in that TU, then it should be inlinable. Now, is this reliably done by the major compilers? That I don't know. – Cameron Mar 02 '15 at 23:09
  • @Cameron: Yeah I mean will it bother tracing the history of the functor and spot that it cannot possibly have been altered during `Foo`'s lifetime and therefore pick it up from `Foo`'s place of instantiation ... meh. Doubtful. – Lightness Races in Orbit Mar 02 '15 at 23:10
  • @Lightneness: Not sure I understand -- `calledLater` is compiled more-or-less independently from how instances of `Foo<...>` are used. Even if you mutate `f` a million times its type is the same, yes? Then `calledLater` itself gets inlined in a later optimization pass. – Cameron Mar 02 '15 at 23:12
  • @Cameron: A function is defined by more than its type. I can make `void foo()` and `void bar()` but which one gets called is important to me. – Lightness Races in Orbit Mar 02 '15 at 23:16
  • Ah oops, just realized you're not passing functors but rather function pointers. Ignore my comments ;-) – Cameron Mar 02 '15 at 23:17
  • @Cameron: I think you've stumbled upon precisely what I need and had forgotten about! hvd's answer is very convincing – Lightness Races in Orbit Mar 02 '15 at 23:19
  • hvd's answer is probably the best (for both clarity and performance), but keep in mind that you *can* contort the type system and pass function pointers as non-type template arguments (basically creating functors from function pointers, but at compile time). This gives the compiler a much better chance of being able to inline the function call, since the the function pointer becomes essentially a compile time constant. – Cameron Mar 03 '15 at 04:21
  • @Cameron: Yeah as I said above that would have been an acceptable solution – Lightness Races in Orbit Mar 03 '15 at 10:42

3 Answers3

5

If you don't really need to use a function pointer, then a functor should make the optimisation trivial:

struct CallSendMonkeys {
  void operator()() {
    sendMonkeys();
  }
};
struct CallSendTissues {
  void operator()() {
    sendTissues();
  }
};

(Of course, C++11 has lambdas, but you tagged your question C++03.)

By having different instantiations of Foo with these classes, and having no internal state in these classes, f() does not depend on how f was constructed, so it's not a problem if a compiler can't tell that it remains unmodified.

1

With your example, that after fiddling to make it compile looks like this:

template <typename F>
struct Foo
{
   Foo(F f) : f(f) {};
   void calledLater() { f(); }

private:
   F f;
};

void sendMonkeys();
void sendTissues();

int main()
{
    Foo<__typeof__(&sendMonkeys)> f(sendMonkeys);
    Foo<__typeof__(&sendTissues)> g(sendTissues);
   // lots of interaction with f and g, not shown here
   f.calledLater();
   g.calledLater();
}

clang++ (3.7 as of a few weeks back which means I'd expect clang++3.6 to do this, as it's only a few weeks older in source-base) generates this code:

    .text
    .file   "calls.cpp"
    .globl  main
    .align  16, 0x90
    .type   main,@function
main:                                   # @main
    .cfi_startproc
# BB#0:                                 # %entry
    pushq   %rax
.Ltmp0:
    .cfi_def_cfa_offset 16
    callq   _Z11sendMonkeysv
    callq   _Z11sendTissuesv
    xorl    %eax, %eax
    popq    %rdx
    retq
.Ltmp1:
    .size   main, .Ltmp1-main
    .cfi_endproc

Of course, without a definition of sendMonkeys and sendTissues, we can't really inline any further.

If we implement them like this:

void request(const char *);
void sendMonkeys() { request("monkeys"); }
void sendTissues() { request("tissues"); }

the assembler code becomes:

main:                                   # @main
    .cfi_startproc
# BB#0:                                 # %entry
    pushq   %rax
.Ltmp2:
    .cfi_def_cfa_offset 16
    movl    $.L.str, %edi
    callq   _Z7requestPKc
    movl    $.L.str1, %edi
    callq   _Z7requestPKc
    xorl    %eax, %eax
    popq    %rdx
    retq

.L.str:
    .asciz  "monkeys"
    .size   .L.str, 8

    .type   .L.str1,@object         # @.str1
.L.str1:
    .asciz  "tissues"
    .size   .L.str1, 8

Which, if you can't read assembler code is request("tissues") and request("monkeys") inlined as per expected.

I'm simply amazed that g++ 4.9.2. doesn't do the same thing (I got this far and expected to continue with "and g++ does the same, I'm not going to post the code for it"). [It does inline sendTissues and sendMonkeys, but doesn't go the next step to inline request as well]

Of course, it's entirely possible to make tiny changes to this and NOT get the code inlined - such as adding some conditions that depend on variables that the compiler can't determine at compile-time.

Edit: I did add a string and an integer to Foo and updated these with an external function, at which point the inlining went away for both clang and gcc. Using JUST an integer and calling an external function, it does inline the code.

In other words, it really depends on what the code is in the section // lots of interaction with f and g, not shown here. And I think you (Lightness) have been around here long enough to know that for 80%+ of the questions, it's the code that isn't posted in the question that is the most important part for the actual answer ;)

Mats Petersson
  • 126,704
  • 14
  • 140
  • 227
  • 1
    Do anything non-trivial in the code marked `// lots of interaction with f and g, not shown here` and this will be a *very* hard optimisation for the compiler to perform. –  Mar 02 '15 at 23:14
  • If g++ 4.9.2 doesn't do this, what compiler does? – Cameron Mar 02 '15 at 23:15
  • Presumably the above is from Clang, then? Or? – Lightness Races in Orbit Mar 02 '15 at 23:17
  • @hvd: That'll clearly depend on EXACTLY what said interaction with `f` and `g` is. Modern compilers are pretty good at following conditions. If the code doesn't modify the value of `f` inside `Foo`, then I'd expect the compiler to be able to follow that. – Mats Petersson Mar 02 '15 at 23:18
  • 3
    g++ does not perform the last optimization because the function is `main`, which is known to be cold (only called once). Call the function mymain and gcc inlines like clang. – Marc Glisse Mar 02 '15 at 23:19
  • @Cameron: Sorry, meant to write that it was clang, and have edited to make that clear - switching between editing the post and modifying the code, I got a little confused [not very difficult] – Mats Petersson Mar 02 '15 at 23:19
  • @MarcGlisse: Amazing - how clever of gcc. – Mats Petersson Mar 02 '15 at 23:21
1

To make your original approach work, use

template< void(&Func)() >
struct Foo
{
    void calledLater() { Func(); }
};

In general, I've had better luck getting gcc to inline things by using function references rather than function pointers.