I want to zero all YMM registers like this.=:
#include <immintrin.h>
void fn(float *out) {
register __m256 r0;
_mm256_zeroall();
_mm256_storeu_ps(out, r0);
}
But gcc/clang gives me a warning:
warning: 'r0' is used uninitialized [-Wuninitialized]
It's okay to use _mm256_setzero_ps() but both the code and generated assembly is ugly.
If I have 12 defined register varaibles, the gcc is likely to generate 12 vmovaps and the clang is likely to generate 12 vxorps instruction. In the worst case, the gcc would generate memset function call and many vmovaps.
I just want a single vzeroall instruction.
Is there any way to let compiler know that _mm256_zeroall() will zeros all register without handwriting asm?
Edit 1: In fact I'm writing a matrix product program, which need to clear many registers at the beginning. To simplify the question, I use the most simple code for question.
I've confirmed vzeroall is not slow compare to many vmovaps/vxorps on Zen 3, and vzeroall has smaller code size, which is more cache friendly.
Remove register qualifier doesn't work on GCC/Clang. It generates the same assembly as the previous one.
I've found that I can specify the register name on GCC to elimiate the warning, like this:
register __m256 r0 asm("ymm0");
But clang doen't obey the define and still generate the same warning.