Is that lot true on x86? I thought it's (fairly) strict memory ordering meant yo...

Veserv · on March 16, 2024

The concept of ordering is independent of architecture. What operations need to be ordered relative to other operations to be correct does not change. But, your architecture may implicitly guarantee certain orderings so you do not need to explicitly guarantee those orderings.

If I remember correctly, x86 basically gives you everything except S-L automatically. That means a traditional acquire, L-LS, and traditional release, LS-S, barrier should be "free" and require no explicit instructions. I forgot about that, so it makes sense why they would not mention the presence of explicit barriers. But, that does not change the fact that you need that ordering to guarantee correctness, it is just built-in and automatic.

The oddity here is that speculative loads across a L-LS ordering requires some pretty deep black magic to be going on at the hardware level. I have no idea how they could safely reorder stores across a L-S ordering and still maintain "as-if" the store happens after the load, and even safely reordering loads across a L-L ordering and maintaining the "as-if" seems pretty arcane. The fact that the paper is claiming their PoC works and thus the reordering must be happening is wild. Either the processor is not correctly implementing the declared memory ordering semantics, or they are doing some real reordering magic.

BeeOnRope · on March 16, 2024

This kind of memory order speculatiom is basically required on x86 since the strong semantics would otherwise prevent many useful reorderings (especially L-L which is absolutely critical).

The basic way it works is that pretty much any reordered is allowed, speculatively, even if it violates the explicit or implicit barriers, but then the operations are tracked in a memory order buffer (MOB) until retirement which can detect whether the reordering was detectable and if so flush the pipeline (so-called "memory order nuke", visible in performance counters).

hanyragab · on March 16, 2024

Memory order is only preserved on the architectural level, but on the microarchitectural level, all kinds of reordering, optimizations, and speculations are done to maximize the execution thought. Section 7 of the paper I'm linking below could help to understand how these microarchitectural optimizations could break the memory order beyond the rollback point, causing indeed a Machine Clear (Nuke) of the entire pipeline. We have analyzed the entire class of Machine Clear and showed how the different types can create transient execution paths where memory could be transiently leaked.

http://download.vusec.net/papers/fpvi-scsb_sec21.pdf

hanyragab · on March 17, 2024

*execution throughput