I looked Agner's tables, and was curious how Intel fared with it. All numbers ar...

I looked Agner's tables, and was curious how Intel fared with it. All numbers are reciprocal throughput. So how many cycles per instruction in throughput. zen 2 has it's gather variants mostly 9 and 6 cycles and one variant with 16. Broadwell has only 6,7 and 5 cycles. Skylake has mostly 4 and 2 and one variant with 5.

Now I was surprised by Agners figures for zen2 LOOP and CALL which both have reciprocal throughput of 2. Being equal to doing with just normal jump instructions.

Skylake on the other hand has 5 or 6 for LOOP and two CALL variants with 3 and one variant with 2.