> The idea that UB is carte blanche for implementations to do whatever is an uni...

vyodaiken · on May 20, 2021

One issue is that under the prevailing interpretation, the existing semantics is not reliable. You do not know when or if the compilers will take advantage of UB to completely change the semantics they are providing. That's not tenable.

Rusky · on May 20, 2021

That's not how it works. Taking advantage of UB doesn't change the semantics, it just exposes which behaviors were never in the semantics to begin with. Barring compiler or spec bugs, we do in principle know exactly when the compiler may take advantage of UB. That's the point of a document like the standard- it describes the semantics in a precise way.

To be fair, the existing semantics are certainly complex and often surprising, and people sometimes disagree over what they are, perhaps even to an untenable degree, but that's a very different thing from being unreliable.

vyodaiken · on May 20, 2021

The net result of your argument is the language has no semantics. I write and test with -O0 and show that f(k)=m. Then I run with -O3 and f(k)=random. Am I required to be an expert on C Standard and compiler development in order to know that, with no warning, my code has always been wrong? What about if f(k)=m under Gcc 10, but now under Gcc 10.1 that whole section of code is skipped? What you are asking programmers to do is to both master the fine points of UB (which is impractical) and look into the future to see what changes may be invisibly committed to the compiler code.

Rusky · on May 20, 2021

> What you are asking programmers to do is to both master the fine points of UB (which is impractical) and look into the future to see what changes may be invisibly committed to the compiler code.

I am asking programmers to understand and avoid UB, but I am not asking them to look into the future. Future compilers will still implement the same semantics- that's, again, the point of having a spec!

I don't disagree that avoiding C's UB unaided can be difficult, but that just means the solution is to make it easier- and that's exactly what I suggested above: "precisely define the operations available to the programmer, and then provide tools to help them avoid misuse."

And this isn't a new idea. People have been making progress in this area for a long time: better documentation of the rules, sanitizers, static analyzers, changes to the spec to remove some forms of UB, new languages that reshuffle things to make it harder or impossible to invoke UB, etc.

vyodaiken · on May 21, 2021

What indicates the the current semantics won't change next week? After all, it is completely up to the compiler, no?

Rusky · on May 21, 2021

Are you serious? Again: it's not up to the compiler at all, it's up to the spec. The spec indicates the current semantics won't change next week.

The semantics for non-UB code are completely fixed across optimization levels and compiler versions. Only code that invokes UB can break on these changes, and only because this code never had any specified semantics to begin with.

vyodaiken · on May 22, 2021

It is impossible to write C applications without invoking UB and your theory that UB behavior had no semantics is nonsensical. It may have no semantics that compilers currently feel they need to keep stable, but code that compiles and runs has semantics.

Rusky · on May 22, 2021

That's not what I (or the standard, or compiler writers, or programming language researchers) mean by "semantics."

From the perspective of defining and specifying a programming language, when we say "semantics" we mean the set of rules for an abstract machine, or a similar formalism. If those rules don't specify the result of an operation, it's like the machine gets stuck- like dividing by zero in a proof, there is no correct way to proceed. The behavior is undefined.

Nobody is disputing that compilers will produce something for code with undefined behavior. They're just saying that there is absolutely no useful way to rely on it, because nobody has agreed on or even decided what it should be (and always for some specific reason!)

If that makes it impossible to use the language, that's not UB's problem. It's the design of the language, and the quality of the tools that surround it. There are ways to increase your confidence that a C application never invokes UB, and they're getting better all the time. (There are also lots of new languages that try to solve this in various ways that C can't!)

Those are the solutions we have. "Just don't have UB in C" or "just make compilers more predictable" are not very effective by comparison.