I'm wondering why LLVM fails to optimize the following IR code (using the PassManagerBuilder with optimisation set to '3', and also using LLVM's 'opt' tool):
%GenericStruct = type { i32 }
define void @makeGenericStructOuter(%GenericStruct* noalias nocapture sret) {
entry:
%1 = alloca %GenericStruct
call void @makeGenericStructInner(%GenericStruct* %1)
%2 = load %GenericStruct* %1
store %GenericStruct %2, %GenericStruct* %0
ret void
}
declare void @makeGenericStructInner(%GenericStruct* noalias nocapture sret)
The expected code is:
%GenericStruct = type { i32 }
define void @makeGenericStructOuter(%GenericStruct* noalias nocapture sret) {
entry:
call void @makeGenericStructInner(%GenericStruct* %0)
ret void
}
declare void @makeGenericStructInner(%GenericStruct* noalias nocapture sret)
Are there simply no optimizations currently available to handle this case? Or am I failing to produce (this code is generated from a front-end I'm developing) the right IR that would allow optimization?
Before it's suggested, I can't produce code that returns by value since these functions must be callable from other modules/libraries that don't know the size or contents of 'GenericStruct' (and they would locally declare 'TestClass' as 'struct opaque').