While homebrewing some spinlocks I discovered an interesting possible regression in the compilation of the _mm_pause intrinsic on GCC. On supported architectures this intrinsic should translate to a PAUSE instruction, which can be used to stop CPU pipeline flushes in typical spin-locks after a lock has been released/acquired.
The assembly generated on GCC trunk adds a mysterious NOP
after each PAUSE
: assembly
The last version of GCC on Godbolt which “correctly” does this translation is GCC 8.3: assembly
On GCC trunk the following snippet:
#include <emmintrin.h>
void test()
{
while(true)
{
_mm_pause();
_mm_pause();
_mm_pause();
_mm_pause();
_mm_pause();
}
}
translates to:
test():
push rbp
mov rbp,rsp
pause
nop
pause
nop
pause
nop
pause
nop
pause
nop
jmp 401106 <test()+0x4>
main:
...
Note how we have a nop
in between the pause
instructions which we can’t map back to the source snippet. This translation occurs with more interesting functions and loops as well.
Why this is happening will need some digging inside GCC. Check back in the next couple of weeks and there might be answer why this change ever occurred and whether it truly is a regression or not.