While homebrewing some spinlocks I discovered an interesting possible regression in the compilation of the _mm_pause intrinsic on GCC. On supported architectures this intrinsic should translate to a PAUSE instruction, which can be used to stop CPU pipeline flushes in typical spin-locks after a lock has been released/acquired.

The assembly generated on GCC trunk adds a mysterious NOP after each PAUSE: assembly

The last version of GCC on Godbolt which “correctly” does this translation is GCC 8.3: assembly

On GCC trunk the following snippet:

#include <emmintrin.h>

void test()
{
    while(true)
    {
        _mm_pause();
        _mm_pause();
        _mm_pause();
        _mm_pause();
        _mm_pause();
    }
}

translates to:

test():
 push   rbp
 mov    rbp,rsp
 pause
 nop
 pause
 nop
 pause  
 nop
 pause  
 nop
 pause  
 nop
 jmp    401106 <test()+0x4>
main:
 ...

Note how we have a nop in between the pause instructions which we can’t map back to the source snippet. This translation occurs with more interesting functions and loops as well.

Why this is happening will need some digging inside GCC. Check back in the next couple of weeks and there might be answer why this change ever occurred and whether it truly is a regression or not.