Finding the “second bug” in glibc’s condition variable

mort96 · on Sept 18, 2022

This seems like some really good debugging effort, kudos.

But, my takeaway from this is that there’s a glibc bug which causes deadlocks in correct code, and a proposed fix (even if partial) has been available for years, and glibc still hasn’t shipped a version with the fix so it’s up to distros and users to patch the broken glibc code.

Reminds me of how a glibc update broke GNU m4 and GNU didn’t ship a fix to m4 for years, so it was up to distros and users to patch the latest release of GNU m4 to make it compile with the latest version of GNU glibc.

And then a couple of years later GNU shipped another version of glibc which broke m4.

GNU doesn’t strike me as an organisation which places a lot of value on software quality and reliability.

rekado · on Sept 18, 2022

> GNU doesn’t strike me as an organisation which places a lot of value on software quality and reliability.

This stems from a misunderstanding of what GNU is. Ideally all projects under the GNU umbrella would work towards a unified operating system, but that's sadly not what it's like. The GNU project is not much of a project in the common sense of the word, nor is it much of an organisation. This is one of the many reasons why https://gnu.tools exists, a subset of GNU with common goals and values, including collaborative project management.

mort96 · on Sept 18, 2022

I kind of get that. And if a change in GNU glibc caused a bug in GNU wget, I would hope that wget would release a fix relatively quickly, but I wouldn't necessarily have any special expectations just because glibc and wget are both part of GNU.

But I wish that the projects in the GNU toolchain would at the very least collaborate. The combination of glibc + gcc + autotools + m4 makes such a core part of any GNU system that you'd hope that GNU cares to keep them mutually compatible. So when they release an update to glibc which breaks m4 which makes it impossible to compile the current version of the GNU toolchain using the current version of the GNU toolchain, that's extremely disappointing.

yjftsjthsd-h · on Sept 18, 2022

It does naively seem like a central library like glibc would want to run regression integration tests on every new version to make sure they're not breaking the utterly massive ocean of software that depends on them, and that the very obvious first step in that direction would be to run tests using software from the same group and/or core libraries/software that they know are required for a lot of base systems to work (ex. GNU coreutils, GCC, clang, busybox). I suspect it does in fact mostly boil down to resourcing problems - although that just begs further questions, since I would expect that organizations like Google or Red Hat would consider that kind of thing important enough to support.

dezgeg · on Sept 18, 2022

Using GNU Guix (the distribution) should be a logical and simple test-bed to see how exactly glibc/gcc pre-release would affect downstream packages.

xwolfi · on Sept 18, 2022

It sounds silly for deep theoricians and a somewhat politically active organization like GNU to have wanted to go all the way to an OS, the kind of software where you really need to throw all the beauty out of the window and solve dumb users tangible problems, like how to read a bluray from an obscure proprietary drive or how to run a game with a monopolistic corporation's binary drivers.

GNU can never succeed at it because it can never compromise and maybe that's fine. It's a shame that it s considered a failure when they accomplished so much in the OS scaffolding part.

stefan_ · on Sept 18, 2022

It's hilarious how many of these stories there are. I remember debugging a DNS issue on an embedded system; the resolv.conf changed but the program continued trying the old DNS server. Turns out GLIBC will read the resolv.conf once on startup and never again after (unless specifically invalidated). Except most people will never notice this behavior because (1) most distributions carry a patch from 2004 that reloads it transparently or (2) use libraries that just always invalidate it, just to be sure.

This stupidity surfaces everywhere; here is a Go bug report for it:

https://github.com/golang/go/issues/21083

yencabulator · on Sept 20, 2022

Similar issue for not noticing changes in /etc/localtime: https://github.com/golang/go/issues/28020

ChickeNES · on Sept 19, 2022

> This stupidity surfaces everywhere; here is a Go bug report for it:

> https://github.com/golang/go/issues/21083

The irony that they proposed a fix in 2017 but never got around to actually implementing it...

userbinator · on Sept 18, 2022

GNU doesn’t strike me as an organisation which places a lot of value on software quality and reliability.

That's not surprising, because their main purpose is to promote the GPL and "free as in freedom" software. I've heard theories that their software is deliberately overly complex and uniquely different from other implementations in order to make it easier to find GPL violations. In other words, they're optimising for something other than quality.

I've inspected the glibc condition variable implementatoin (while looking at what may be a slightly different bug, or it could be another manifestation of this one; not sure at this point since it was years ago) and the first thing I noticed was that it is disturbingly complex.

pengaru · on Sept 18, 2022

> That's not surprising, because their main purpose is to promote the GPL and "free as in freedom" software.

You're conflating the FSF and the GNU project.

userbinator · on Sept 18, 2022

Same difference; this is in contrast to e.g. BSD, which is far less politically motivated. (And AFAIK they have no problems in their CV implementation that I know of...)

wizeman · on Sept 18, 2022

Just a word of caution: I've tried replacing the mitigation patch (which I had been using for 18 months) with the author's minimal fix (the first out of the 6 patches, which is supposed to fix the issue) and I've immediately ran into a deadlock while building Poly/ML 5.9 (3 times, in fact), whereas previously I hadn't encountered this bug before, despite having built Poly/ML many times.

This was on a 32-core machine with lots of background compilation going on at the same time (i.e. with a high load).

It could be a coincidence or an unrelated bug, but it's also possible that the author's patch might not fix the issue completely.

KyleSanderson · on Sept 18, 2022

Apply the entire series... That's what was likely tested the most.

FeepingCreature · on Sept 19, 2022

If the patch that contains the entire fix (minus cleanup) doesn't work, then the entire patchset is unreliable, since the author necessarily doesn't understand it.

robocat · on Oct 2, 2022

You are being wayyyy too harsh - the author is clearly amazingly competent but lacked time to validate the code. This is open source software and if you want your problem diagnosed and fixed then the canonical response is to tell you to stop kvetching and to go ahead and fix it yourself.

I couldn’t diagnose or debug some intermittent faults in poor multi-threaded code (written by someone who struggles to think concurrently). I threw the code away and wrote it properly from scratch and that got rid of the data or race conditions causing the intermittent fault. I have had to fix multi-threaded code many many times because I am better than most at properly fixing it.

AnssiH · on Sept 24, 2022

Indeed.

But for the record, the author has since found the issue, and in this case it indeed was a bug in the first patch that was inadvertently fixed by a later patch in the series:

https://probablydance.com/2022/09/17/finding-the-second-bug-...

jart · on Sept 18, 2022

If you need really good synchronization primitives while Glibc is fixing things then I've had a lot of success with *NSYNC https://github.com/google/nsync which is what Cosmopolitan Libc uses to implement pthread_cond_t.

jeffbee · on Sept 18, 2022

Its design is similar to absl::Mutex, which is available to c++ users. They may also have been written initially by Mike Burrows?

jart · on Sept 19, 2022

Yes. *NSYNC was built by Mike Burrows. The guy famous for building Chubby and Altavista. You wouldn't think it though, considering the project only has 188 stars.

oldtimer1901 · on Sept 18, 2022

Old timers will recall that when Java first became popular in the mid 90s it brought to light all sorts of bugs in various operating systems' thread implementations - particularly in SunOS/Solaris. It is discouraging that such basic bugs still exist in GNU/Linux after 25 years. But it's not surprising - if anyone tells you that they understand all possible multi-threaded scenarios and race conditions in the implementation of these libraries - they are mistaken. It's all trial and error.

ArrayBoundCheck · on Sept 19, 2022

That's not true at all. I wrote a simple set of threading functions and had 0 deadlocks no matter what I do to it. On youtube I watched a guy write out every possible state of his data structure for hashmap. It's implemented using atomics. It preformed pretty well according to the person and it also did not deadlock after being used in production for years

sbf501 · on Sept 18, 2022

What bothers me is that the OCaml code looks so harmless:

    static void st_masterlock_acquire(st_masterlock * m)
    {
      pthread_mutex_lock(&m->lock);
      while (m->busy) {
        m->waiters ++;
        custom_condvar_wait(&m->is_free, &m->lock);
        m->waiters --;
      }
      m->busy = 1;
      pthread_mutex_unlock(&m->lock);
    }

How is this bug not more common with such a simple invocation?

asveikau · on Sept 18, 2022

Probably you can get away with that simple invocation "most of the time", but what matters is the higher level code that calls it triggers the bug where other access patterns would not.

Also, I do wonder why they roll their own mutex using condition variables. Most people probably would use a mutex and thus wouldn't hit the condition variable at all. Condition variables are more niche than mutex.

mbac32768 · on Sept 18, 2022

According to Xavier Leroy (OCaml language dev):

> It is possible to use a plain POSIX mutex as master lock, but fairness is awful.

https://discuss.ocaml.org/t/is-there-a-known-recent-linux-lo...

sbf501 · on Sept 19, 2022

Thanks. I've learned more about pthread just from this HN post than I've learned anywhere else online in the past few years. Makes me worried about some of my earlier code now. O_o

jwatte · on Sept 18, 2022

That kinda depends on what custom_condvar_wait() does, right?

grogers · on Sept 18, 2022

Just 6 days ago they merged a custom condvar implementation based on futex directly to workaround the glibc bug. Before that it was just pthread_cond_wait

pengaru · on Sept 18, 2022

In 2022 a POSIX OS with a broken pthread_cond_t implementation is a broken OS, full stop. This is the POSIX API for implementing sleeping synchronization mechanisms in threaded programs.

The glibc project should be incredibly embarrassed by this, assuming there's anyone even still paying attention there.

Jasper_ · on Sept 19, 2022

Isn't a semaphore usually a better choice than a condition variable / event, because you can still race if you try and wake the condition variable multiple times? e.g. https://devblogs.microsoft.com/oldnewthing/20060622-00/?p=30...

Why pthreads included condition variables but not semaphores, I don't know. Maybe because POSIX already had them?

comex · on Sept 19, 2022

Whether it's a justified choice or not, for whatever reason, POSIX semaphores are very rarely used. macOS does not even support them (except for the named ones used for IPC). So pthread_cond_t is certainly 'the' API in terms of actual usage.

(Personally, I like events, but sometimes I want ones that don't reset - as in, the event is only ever flipped once in its lifetime, from "not ready" to "ready", but any number of consumers can check if it's ready or wait for it to be ready. These can't be directly implemented with POSIX semaphores. But then, I didn't know until just now that fast POSIX semaphore implementations even existed – there's my Apple bias.)

Jasper_ · on Sept 19, 2022

We use POSIX semaphores in our engine. macOS has their own semaphores in libdispatch

pengaru · on Sept 19, 2022

Not sure I follow, there are semaphores (sem_init(3)) and you get them via -pthread on my glibc system.

These are the semaphores I'm familiar with, which are plain counting semaphores you can also implement trivially using condition variables. The condition variable is a lower-level concept which pairs an OS-level signaling object and a mutex, it places no limits on what the condition may be, unlike a counting semaphore.

Personally I find it more odd that they'd bother with sem_t when they already have pthread_cond_t. Perhaps there's an optimization opportunity in sem_t when the condition is specialized to involving just an integer counter, justifying the specialized interface?

WRT waking the condition variable multiple times, properly handling spurious wakeups is kind of central to the proper use of them. That's why the condition test is always part of a loop, e.g.:

  pthread_mutex_lock(&lock);
  while (!condition_satisfied(shared_state))
          pthread_cond_wait(&cond, &lock);

  { do things with condition satisfied and lock held }

I always found condition variables to be nicely composable and ergonomic when writing pthread programs in C. But they come up short when you start wanting a file descriptor event to trigger the condition wakeup. In that case you end up doing silly things like spawning a thread just for the purpose of monitoring for fd events and turning them into a pthread_cond_signal() on the appropriate pthread_cond_t. AIUI the Windows WaitForMultipleObjects() interface does this better, so I hear, but I've never really dealt with Windows.

comex · on Sept 19, 2022

POSIX semaphores were created as part of the older 'real-time extensions' aka POSIX.1b, whereas the pthread API was created slightly later as POSIX.1c. (Source: man pages, plus [1] for a list of standards.)

[1] https://man7.org/linux/man-pages/man7/standards.7.html

Jasper_ · on Sept 19, 2022

WaitForMultipleObjects is closer to a version of poll, that can also work on condition variables. I wish Linux had a WaitForMultipleObjects equivalent, eventfd is really not it.

gpderetta · on Sept 20, 2022

But WFMO can't wait on CONDITION_VARIABLE on windows either. It can wait on events whose linux equivalent is eventfd.

Unfortunately it is hard to implement select/wfmo for fast-pathed edge triggred primitives because of the race that can cause lost wakeups. Futex had a futexfd feature for polling but it was impossible to use correctly and was removed.

gpderetta · on Sept 19, 2022

Cond vars are fiendishly hard to implement correctly. Demanding an implementation to be bug free, fully featured and scale on a variety of loads is ridiculous.

pengaru · on Sept 19, 2022

> Cond vars are fiendishly hard to implement correctly. Demanding an implementation to be bug free, fully featured and scale on a variety of loads is ridiculous.

No.

What's ridiculous is neglecting to fix a known correctness bug in a core synchronization primitive causing random deadlocks for over two years.

gpderetta · on Sept 19, 2022

If you think it is easy to fix, I'm sure the glibc maintainers will accept a patch from you, because OP patch breaks other software.

pengaru · on Sept 19, 2022

They don't need a patch from me, they need the will to DTRT and at least revert the commit [0] that broke things. Glibc has had a functioning pthread_cond_t that didn't randomly deadlock/lose wakeups, this isn't some impossible to implement feature.

That commit claims to fix ordering guarantees required by the clarified spec, but in doing so has introduced real deadlock bugs.

The linked discussion at [1] is telling. It seems glibc's nptl is prioritizing playing games with optimizations over correctness, even before [0] landed.

To suggest all that's required is someone showing up with a patch is to completely ignore the fact that there's zero shortage of correct pthread_cond_t implementations all over the place. This is a maintenance failure, not a lack of code.

[0] https://github.com/bminor/glibc/commit/ed19993b5b0d05d62cc88...

[1] https://austingroupbugs.net/view.php?id=609

gpderetta · on Sept 20, 2022

Reverting that commit will break other stuff that depends on the posix behavior, not sure how's that better.

I also very much do not think that there are plenty of correct pthread_cond_t, far from it. It is possible that the musl one might be close to drop in replacement (although IIRC, musl at some point had a different thread cancellation implementation than glibc).

nix23 · on Sept 18, 2022

Well it's gnu/linux, a multiuser unix that is not usable as a multiuser system (read the sdf.org story). It's a cheap hw-framework to run more or less reliable...but no problem we have kubernetes and containers to cover/perfume the stinking parts.

chrsig · on Sept 18, 2022

Awesome to see the author post TLA+ instructions & model source. One day I'll get around to actually learning it. In the meantime, what's posted is too involved for me to grok at first glance, but I'm still glad to see more real world uses pop up.

ncmncm · on Sept 18, 2022

Someone should be very embarrassed about how they have handled this whole process.

Surely code to operate condition variables correctly is in the public domain, that does not involve mitigations or stolen signals.

gpderetta · on Sept 19, 2022

Is there? There are multiple papers describing cond vars implementations. Many have been found to have errors after years.

I'm sure the BSDs have posix cond vars implementations, but they won't use futexes but some BSD specific signalling primitive, and will likely have different scalability profiles than the one in gilibc, which has been tuned mostly for the needs of Red Hat, Oracle and other big Linux vendors.

userbinator · on Sept 18, 2022

Concurrency is bug-prone. Complex code is bug-prone. glibc's CV implementation combines both.

In my experience, both writing and teaching concurrency, there's a very strong sentiment of "I'll just use another CV/lock/etc." whenever people encounter a bug and try to fix it, which seldom fully solves the problem but usually makes it even more subtle and compounds the number of resulting states.

One of my tricks to increase chances for race conditions and deadlocks when testing is to use a modified sampling profiler to randomly delay and pause threads, but with such a long sequence of steps as this bug requires, that would've been unlikely to help. Yet Murphy's Law says that it will happen in practice...

ajross · on Sept 18, 2022

Skimming this, this isn't something glibc can do reliably. This has to be a kernel fix. The underlying FUTEX_OP operation is... obtuse. Futexes were originally designed for simple locking and semaphores, and only later extended to condition variables, and the result isn't pretty. (To be fair: it's a very hard problem on kernels that need to scale to hundreds of cores across many levels of memory interconnect.)

Trying to fix this in userspace is just about tuning to try to suppress the underlying issues, and recovering cleanly when things break.

jart · on Sept 18, 2022

How is this the kernel's fault? The article says "pthread_cond_signal() will signal on the wrong futex". Yes I understand futexes are tricky, but I don't see how they're broken.

ajross · on Sept 18, 2022

To be clear: I'm not assigning "fault", I'm saying the design is flawed. The current futex abstraction forces userspace to do things like "pick a thread to wake up", which is inherently racy when viewed from outside the kernel. So yes, you have to fix this particular issue in glibc because the microbug is in glibc. But glibc shouldn't have tried to solve the problem in the first place.

IPC is just really hard, we struggle with exactly this in Zephyr too (though we have mostly managed to avoid this particular kind of mistake as in an RTOS the kernel/API boundary is thinner), at a much smaller scale. Add NUMA and cache-locality-concern to the mix and things get harder still.

jart · on Sept 19, 2022

On Linux and OpenBSD you can use futex(FUTEX_WAKE, 1) to wake precisely any one of the threads or processes waiting on futex(FUTEX_WAIT). Are you telling me that's broken? I understand that sophisticated synchronization primitives usually maintain a list of waiters, but I wasn't under the impression they do that because kernels are broken, but rather because userspace libraries want to be in control of which waiter gets selected. The reason why things are racey as you describe probably has more to do with userspace libraries wanting to avoid issuing a system call, than it does with the kernel's design. Things would probably be easier if you always used a system call, such as the case with most WIN32 locks, but it's slow so people try not to do it and sometimes fail. Also yes, I understand it's hard. That's why Google tasked the guy who built their fiercest rival (Altavista) with building their locking primitives (Chubby and *NSYNC).

mannykannot · on Sept 18, 2022

I'm not sure whether to read your comment as disputing the author's claim of having a fix for this particular problem, or that there will inevitably be other cases where condition variables fail unless futexes are fixed in the kernel.

metadat · on Sept 18, 2022

Is there a way to bundle and use an more up-to-date version of glibc inside my docker containers? Is it a straightforward matter of creating an intermediate build-base-image where glibc gets compiled and copying over the resulting glibc .so artifact files to the final image?

I suspect there may be additional concerns, but I'm not a glibc expert and have not needed to mess with it much.

Also, does musl libc [0] have less bugs, on average?

[0] https://musl.libc.org/

sanxiyn · on Sept 19, 2022

Yes. In fact, musl 1.2.0's pthread_mutex_t was verified in 2021 without changes. The same work found a bug in seL4 (obviously in unverified part, but still).

The author remarks that this is in part because musl is conservative, and identifies barriers in musl that can be removed. But I think musl should get credits for being conservative here. Also in benchmarks, redundant barriers showed no performance impact.

https://people.mpi-sws.org/~viktor/papers/asplos2021-vsync.p...

keepquestioning · on Sept 18, 2022

Would macOS see this issue too?

saagarjha · on Sept 18, 2022

macOS uses its own pthreads implementation: https://github.com/apple-oss-distributions/libpthread

aaaaaaaaaaab · on Sept 18, 2022

[flagged]

ComputerGuru · on Sept 18, 2022

I maintain my own FOSS threads signals/events libraries in C++ [0] and in rust [1] (atop of parking lot as a futex shoe-in) and I disagree. This has nothing to do with the language and everything to do with the semantics of the locks. Writing concurrency primitives is HARD and the more functionality your API exposes, the more room there is for nuanced bugs in how everything interplays with everything else.

[0]: https://github.com/neosmart/pevents

[1]: https://github.com/neosmart/rsevents

asveikau · on Sept 18, 2022

It's worth noting that condition variables are a finicky interface. Speaking personally, I would typically much rather work with something like a semaphore. I also feel like I've seen a lot of misuse of condition variables out there.

ComputerGuru · on Sept 18, 2022

Condition variables were the POSIX equivalent to Win32’s WaitForMultipleObjects (and IOCP apis like it). Now both are available on Windows (and my library makes WFMO available on *nix/Linux/macOS).

Semaphores alone can’t mimic the same semantics (juggling between multiple, disparate locks/sync objects) without running into a possible race condition unless you have an API that can somehow wait on one of two or more semaphores to become available.

asveikau · on Sept 18, 2022

I totally disagree with your assessment. WaitForMultipleObjects enters the kernel every time and blocks on kernel objects. The nearest equivalent is select, poll, epoll, kqueue, etc. Newer innovations like eventfd start to introduce synchronization objects into the mix of what type of kernel object to block on.

A Win32 mutex or semaphore is very heavy and typically avoided in a sane high performance application unless you need it to cross process boundaries. A lot of the time (most?) you will want a win32 critical section, which is similar to the modern futex based pthread mutex implementation in that it does not enter the kernel unless there is contention.

A pthread condition variable is clumsy. You need to give it an extra mutex and loop etc. It doesn't have any of the simplicity of WaitForMultipleObjects, even with all the faults of that interface of which there are many.

ComputerGuru · on Sept 18, 2022

I was not talking about implementation details but semantics. I agree with what you are saying. My own implementations are user-mode unless sleeping.

enjoy-your-stay · on Sept 18, 2022

WaitForMultipleObjects (been in Win32 forever) is not really related to IO completion Ports, whose API came a lot later.

My understanding is that condition variables were designed to be a superior alternative to Windows Event objects.

I'm glad to read here that others find them fiddly and overly awkward to use since that was also my impression vs the consistent and usually easy to use Windows thread synchronization HANDLE based objects. Dave Cutler definitely did a good job there IMHO.

gpderetta · on Sept 18, 2022

Condition variables predate Windows and its predecessor VMS by many years (they do not predate unix itself but certainly they predate pthreads by decades).

In particular pthreads borrowed them from Mesa, but they originally were part of the monitor abstraction designed by Hoare.

So they weren't designed to be a superior alternative to windows events; they might be considered an alternative to events in general but I couldn't easily find any history of the events abstraction.

galangalalgol · on Sept 18, 2022

I'm a rust beginner, but it seems like a mutex or lock is shared mutable state, and abstracting this with unsafe code to implement events is just creating a loophole by which to call a function on another thread without having a mutable reference? I'm sure I'm missing something, or a lot of things. It just aeems like code I wouldn't writ in c++ even anymore. I typically don't do signalling between threads or have them share mutable state. If they do signal it is by read or write to a value small enough to be atomic.

ComputerGuru · on Sept 18, 2022

Rust’s synchronization types are built on the semantics of shared objects, with the synchronization object owning the data access to which is channeled exclusively via the synchronization type.

Events and signals aren’t intended to be used off you’re trying to share ownership of memory or data - you should use the std types in that case. They’re for signaling between threads or for building your own synchronization objects atop of. For example, an auto reset event can be used in lieu of a one-bit channel (sort of like Mpmc<()>) and manual reset events can be used to implement the same in broadcast mode, much more efficiently than the channel crates. A common usage is to block until an event is available or a shutdown has been requested - in lieu of manually polling an AtomicBool and aborting if it is true, making your code more responsive (no delay between polls) and more efficient (no repeated polling, no busy waits, etc). They can be used to signal state transitions or readiness, which doesn’t have anything to “own”, for example, the rsevents-extra crate [0] implements a countdown event (event becomes available when n outstanding tasks have finished in parallel threads) or a semaphore (restricting concurrency to a region of code, not limiting access to a variable or object).

[0]: https://github.com/neosmart/rsevents-extra

kibwen · on Sept 18, 2022

That doesn't help here, as Rust also uses glibc for plenty of things (although recently it did move away from pthreads in certain areas for performance reasons, see https://twitter.com/m_ou_se/status/1526211117651050497 ).

slashdev · on Sept 18, 2022

Rust doesn't help with implementing low level primitives like this, you'd have to use unsafe code.

cesarb · on Sept 18, 2022

> Rust doesn't help with implementing low level primitives like this

A charitable reading of the parent comment might be that Rust already implements these particular low-level primitives, so that by using Rust, you are not using the problematic glibc pthread code. See https://blog.rust-lang.org/2022/06/30/Rust-1.62.0.html#thinn... for the relevant announcement: "Previously, Mutex, Condvar, and RwLock were backed by the pthreads library on Linux. [...] Rust's standard library now ships with a raw futex-based implementation of these locks on Linux, [...]"

jstimpfle · on Sept 18, 2022

In either case you need to use some implementation of synchronisation primitives, and that could contain bugs. Where's the difference?

dundarious · on Sept 18, 2022

glibc's impl is a "raw futex-based" one as well. I'm no great fan of glibc as an organization or body of code, but the comparative benefits of Rust here are not well explained.

tialaramex · on Sept 18, 2022

Although glibc's implementation is indeed "raw futex-based" it is not merely a raw futex, of course. What you get from glibc isn't a futex (tricky to use properly) but instead a C structure type named pthread_mutex_t. It's quite big. 40 bytes!

All Rust wanted was a futex, which for reference is just any 32-bit aligned value. The pthread_mutex_t is less useful to Rust, yet it's ten times bigger!

So, Mara's new Mutex<T> in the Linux builds of Rust is just a futex (32-bits) plus a poison boolean plus your type T which is protected by the mutex. If your type is Zero Size then this rounds up to 8 bytes, five times smaller than pthread_mutex_t. If your type was just a u16 (unsigned 16-bit integer) then it rounds up to the same size, but you're also protecting your 16-bit unsigned integer which is nice.

Now, if you're protecting huge data structures with mutexes, you don't care, it makes no measurable difference. But if you heard futex is small (it is) and protected loads of tiny objects with one each (say, individual 32-bit floating point X,Y co-ordinate pairs) Rust now makes your choice cost roughly what you expected instead of bloating your program for no reason.

It's also a bit faster (but probably not enough that you should care) and simplifies some other book keeping in Rust itself.

[ Above I wrote about Mutex, for Condvar the argument is similar but I haven't spent lots of time understanding the innards of Condvar whereas I spent a week reading Mutex stuff ]

dundarious · on Sept 18, 2022

That's a much better justification for Rust's impl over glibc's. Given the comment this tree sprouted from, the argument seemed to be "Rust is memory safe and uses futex-es", which was pretty weak.

I see the benefit of Rust here as being a kindof modern refresh of the building block concepts, which have calcified in standard C. With Rust, the refresh is now in std, or in a popular crate (where it is accessible via package managers). Alternative C standard libraries are hampered by having to implement awful parts (like locales, the string routines), and while C has an even greater quantity of excellent alternatives for smaller slices of "std" functionality (both open and closed source/in-house), discovery is less good. I'm not sold on every aspect of package managers, but I can't deny they have made some difficult things easy.

gpderetta · on Sept 18, 2022

Condvar and mutexes are for very different use cases though, and condvars are notoriously hard to get right, especially when supporting all the full set of posix behaviours with good performance.

tialaramex · on Sept 19, 2022

You need a mutex to actually use condition variables. So, even if Rust's Condvar were just as bulky and complicated as the POSIX one offered by glibc, you saved on the associated mutex.

However from a quick glance Rust's Condvar is much smaller than pthread_cond_t

Condition variables per se don't seem that difficult. Mara did the Linux and Windows condition variables, both are pretty elegant. As with a graceful swan, there is lots happening out of sight (in the futex code Rust needs to cope with the fact that when people do futexes for $notLinux including OpenBSD and Fuchsia they inevitably put a spin on the idea, adding something they thought was cool or ignoring a hard problem they didn't fancy solving) but the general idea is very simple.

In contrast the glibc implementation is fiendishly complicated, but that's not because of POSIX, it's because they decided to do something fiendishly complicated, it hardly seems fair to demand that everybody who didn't need that complexity should respect how difficult it was anyway.

Example, glibc promises if six threads A, B, C, D, E, F wait on a condition variable in order and then it is signalled exactly four times, threads A, B, C and D are definitely woken and E, and F are definitely not woken. That's potentially useful for some purposes, but it's expensive, what if I don't care about this? Too bad, the price is baked in even though POSIX says nothing about this feature.

Rust doesn't care, so, if A, B, C, D, E, F wait on the CondVar and then it gets signalled exactly four times, well, four threads wake, but it could be any four. If A always waits again, signalling four times could just wake A four times, and each time it goes back to sleep. That seems useless, but if you wrote A, that's on you.

gpderetta · on Sept 19, 2022

For what is worth, which threads are guaranteed to be unblocked is defined by the standard, at least under realtime scheduling.

I'm sure the rust devs got their condvar implementation exactly right at the first try, including cancellation, RT support, support for robust mutexes, wait morphing, process shared cond vars, etc.

For human programmers like the glibc ones, it took many decades, and there are very few that are confident enough to make large changes.

As I understand, there was no posix correct cond var for windows for many years untill after vista MS decided to expose the right primitives.

tialaramex · on Sept 19, 2022

> For what is worth, which threads are guaranteed to be unblocked is defined by the standard, at least under realtime scheduling.

Where do you see this guarantee ?

> including cancellation, RT support, support for robust mutexes, wait morphing, process shared cond vars, etc

I'm sure doing the 100 metres sprint backwards, hopping on one leg while balancing a bowl of soup on your head would be trickier than the way Usain Bolt did it, but so what ?

What glibc tries to do is very complicated, yet when we see people using this feature they overwhelmingly don't require any of the complications, they just want a condition variable and that's all Rust's Condvar offers.

gpderetta · on Sept 20, 2022

> Where do you see this guarantee ?

The posix spec is not the easiest to read, but pthread_cond_signal has wordings on the effect that the ordering of wakeups is unspecified except under realtime scheduling.

In any case glibc was blasted in the posix bug tracker for not respecting the wake up order even on non-RT and asking for the spec to be amended. Instead glibc had to rewrite its implementation to conform with the spec (and that fix introduced this bug).

> I'm sure doing the 100 metres sprint backwards, hopping on one leg while balancing a bowl of soup on your head would be trickier than the way Usain Bolt did it, but so what ?

you said that the complexity in glibc is not due to POSIX. All the items I mentioned (except for wait morphing) are required by POSIX/SUS. So what is it then?

asveikau · on Sept 18, 2022

Which means they have plenty of opportunities to introduce their own bugs.

dgan · on Sept 18, 2022

can we have a "formal semantics for multi-threading" ? why it's not a thing?

kjeetgill · on Sept 18, 2022

There are. They're called memory models. I know Java's is pretty readable (see below). I think C and C++ have them too now. But when you're building on these things there's always lots of room to get them wrong!

[0]: https://en.wikipedia.org/wiki/Java_memory_model

[1]: https://docs.oracle.com/javase/specs/jls/se18/html/jls-17.ht...

tsimionescu · on Sept 18, 2022

Another poster shared the Java memory model, and C++ and C have them as well. The article we're discussing shows another kind of formal model for multithreaded code in TLA+, and actually uses it to investigate this bug. There is also Tony Hoare's Communicating Sequential Processes formal model, first presented in 1978; and an entire field of CS research called process calculus.

So, pick a formal model and use it - though note that formal methods tend to add significant extra effort to use, and may or may not pay off depednig on your domain and desired outcomes.

baq · on Sept 18, 2022

It’s super duper hard, mostly.