you don't really wait on L1 cache writes though. The store buffer absorbs it, and if the data is needed it can be forwarded from there before the write to cache even happens.
Most x64 L1d caches have a 4-6 cycle latency depending on CPU, that 1 to 2 ns depending on frequency.
Most x64 L1d caches have a 4-6 cycle latency depending on CPU, that 1 to 2 ns depending on frequency.