GCC de-virtualization of simple class

Question

The following code does not get devirtualized by gcc. Any ideas what I can do to convince gcc to devirtualize?

struct B /* final */ {
    virtual int foo() { return 3; }
};

struct C {
    B& b;

    __attribute__((noinline))
    C( B& b ) : b(b) {
    }

    int foo() {
        return b.foo();
    }
};

int main() {
    B b;
    C c(b);

    int res = c.foo();
    return res;
}

I naively thought that this would be devirtualized ( at least speculatively ) and inlined.

In real life code where the constructor is another compilation unit the compiler will not be able to see the body of the constructor (hence the noinline attribute). It is also not final to mimic some real world requirements.

What compile flags do you currently use? – Haatschii Nov 19 '15 at 03:04 — Haatschii, Nov 19 '15 at 03:04
-O3 -fdevirtualize -fdevirtualize-speculatively – Kojo Nov 19 '15 at 09:34 — Kojo, Nov 19 '15 at 09:34

score 3 · Answer 1 · answered Nov 19 '15 at 03:17

3

Devirtualization happens when compiler knows for the type of object in compile time. Here you have noinline for C::C making impossible for main to know what type of object actually end ups to C::b during construction.

answered Nov 19 '15 at 03:17

Pauli Nieminen

1,100
8
7

If this is the case then speculative devirtualization would be impossible. – Kojo Nov 19 '15 at 09:38
There is one of many reasons why link time optimization provides performance improvements in real world applications. But even without link time optimizations there is trivial cases like stack allocated object followed by its virtual function call. – Pauli Nieminen Nov 19 '15 at 13:19

Jason · Answer 2 · 2015-11-19T16:49:02.670

In real life code where the constructor is another compilation unit the compiler will not be able to see the body of the constructor (hence the noinline attribute). It is also not final to mimic some real world requirements.

To de-virtualize, the compiler generally needs to be able to prove that the class hierarchy is sealed. If the calls to the constructor are in separate translation units, the compiler can't prove it. However, using link-time optimization can give the optimizer information across translation units, which can make it easier to prove facts about class hierarchies and references.

Here's an example using clang.

b.hpp

#ifndef B_H
#define B_H

struct B {

  virtual int foo();

};

#endif

b.cpp

#include "b.h"

int B::foo() { return 3; };

c.hpp

#ifndef C_H
#define C_H

#include "b.h"

struct C {

  B& b;

  C(B& b);

  int foo();

};

#endif

c.cpp

#include "c.h"

C::C(B& b) : b(b) {}

int C::foo() {

    return b.foo();
}

main.cpp

#include <iostream>

#include "b.h"
#include "c.h"

int main(const int argc, const char* argv[argc]) {

  B b;
  C c(b);

  std::cout << c.foo() << std::endl;

  return 0;
}

Since the optimizer knows nothing about the call sites for C::C (the constructor) it knows nothing about the runtime type of B. So, it can't de-virtualize B::foo.

C::foo

_ZN1C3fooEv:                            # @_ZN1C3fooEv
    .cfi_startproc
# BB#0:
    movq    (%rdi), %rdi
    movq    (%rdi), %rax
    jmpq    *(%rax)                 # TAILCALL  <== pointer call

However, giving the optimizer link-time information (-flto) allows it to prove that the class hierarchy is sealed from the call sites.

B::foo

0000000000400960 <_ZN1B3fooEv>:
  400960:   b8 03 00 00 00          mov    $0x3,%eax
  400965:   c3                      retq   
  400966:   66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
  40096d:   00 00 00

main

0000000000400970 <main>:
  400970:   41 56                   push   %r14
  400972:   53                      push   %rbx
  400973:   50                      push   %rax
  400974:   48 c7 04 24 78 0a 40    movq   $0x400a78,(%rsp)
  40097b:   00 
  40097c:   48 8d 3c 24             lea    (%rsp),%rdi
  400980:   e8 db ff ff ff          callq  400960 <_ZN1B3fooEv> # <== direct call

@Kojo I added an example that hopefully makes it a bit clearer. — Jason, Nov 19 '15 at 16:38
FYI, if the function is not called main the devirtualization happens. Apparently main is special and gcc knows it is called once so it is always optimized for size. — Kojo, Jun 19 '16 at 19:08