> There isn't a significant advantage in having the kernel doing the polling, it would still be busy polling.
I was thinking in terms of a generic syscall-less wake functionality where the kernel could do this for all processes in the system. So you'd lose one core per system instead if one core per consumer.
Interesting. Could be used to make the kernel loop above burn less power.
A user-space implementation could presumably also be built. There could be a shared memory segment shared between producers and a monitor. A producer sets a flag in case it needs attention, and the monitor busy polls the segment. The monitor could then use e.g. a signal to wake up consumers.
The latency between the producer signaling and the consumer taking action would be a higher than with futexes. But there would be no waits/context switches in the producer at all. Might be a solution for some low latency use cases.
If you just don't want to burn power but you can still dedicate a core, there is https://www.felixcloutier.com/x86/mwait.