> The idea that UB is carte blanche for implementations to do whatever is an unintended consequence of the vague language of the standard.
Whether or not this was originally intended, it's certainly become the way the standard is written and used today, so that's kind of beside the point.
Further, this is not some new idea that arose from the C standard. It's a basic, core idea in both software engineering and computer science! You define some meaning for your input, which may or may not cover all possible inputs, so that you can go on to process it without considering inputs that don't make sense.
Now, to be fair, the "guardrail-free" approach where UB is silent is a bit out of the ordinary. A lot of software that makes assumptions about its input will at least try to validate them first, and a lot of programming language research will avoid UB by construction. But C is in a unique place where neither of those approaches fully work.
> The C standard allows for both kinds of environments by stating that these behaviors are undefined, allowing the implementation to error out or do something sensible, depending on the environment.
This is true, but it doesn't mean that "something sensible" is actually something the programmer should rely on! That's just asking too much of UB- programmers need to work with the semantics implemented by their toolchain, not make up an intuitive/"sensible" meaning for their undefined program and then get mad when it doesn't work.
For example, if you want to scan through a bunch of memory, tell the language that's what you're doing. Is that memory at a fixed address? Tell the linker about it so it can show up as a normal global object in the program. Is it dynamic? Memory allocators fabricate new objects in the abstract machine all the time, perhaps your compiler supports an attribute that means "this function returns a pointer to a new object."
The solution is not just to shrug and say "do something sensible but potentially dangerous." It's to precisely define the operations available to the programmer, and then provide tools to help them avoid misuse. If an operation isn't in the language, we can add it! If it's too easy to mess up, we can implement sanitizers and static analyzers, or provide alternatives! Yelling about a supposed misreading of "undefined behavior" is never going to be anywhere near as effective.
One issue is that under the prevailing interpretation, the existing semantics is not reliable. You do not know when or if the compilers will take advantage of UB to completely change the semantics they are providing. That's not tenable.
That's not how it works. Taking advantage of UB doesn't change the semantics, it just exposes which behaviors were never in the semantics to begin with. Barring compiler or spec bugs, we do in principle know exactly when the compiler may take advantage of UB. That's the point of a document like the standard- it describes the semantics in a precise way.
To be fair, the existing semantics are certainly complex and often surprising, and people sometimes disagree over what they are, perhaps even to an untenable degree, but that's a very different thing from being unreliable.
The net result of your argument is the language has no semantics. I write and test with -O0 and show that f(k)=m. Then I run with -O3 and f(k)=random. Am I required to be an expert on C Standard and compiler development in order to know that, with no warning, my code has always been wrong? What about if f(k)=m under Gcc 10, but now under Gcc 10.1 that whole section of code is skipped? What you are asking programmers to do is to both master the fine points of UB (which is impractical) and look into the future to see what changes may be invisibly committed to the compiler code.
> What you are asking programmers to do is to both master the fine points of UB (which is impractical) and look into the future to see what changes may be invisibly committed to the compiler code.
I am asking programmers to understand and avoid UB, but I am not asking them to look into the future. Future compilers will still implement the same semantics- that's, again, the point of having a spec!
I don't disagree that avoiding C's UB unaided can be difficult, but that just means the solution is to make it easier- and that's exactly what I suggested above: "precisely define the operations available to the programmer, and then provide tools to help them avoid misuse."
And this isn't a new idea. People have been making progress in this area for a long time: better documentation of the rules, sanitizers, static analyzers, changes to the spec to remove some forms of UB, new languages that reshuffle things to make it harder or impossible to invoke UB, etc.
Are you serious? Again: it's not up to the compiler at all, it's up to the spec. The spec indicates the current semantics won't change next week.
The semantics for non-UB code are completely fixed across optimization levels and compiler versions. Only code that invokes UB can break on these changes, and only because this code never had any specified semantics to begin with.
It is impossible to write C applications without invoking UB and your theory that UB behavior had no semantics is nonsensical. It may have no semantics that compilers currently feel they need to keep stable, but code that compiles and runs has semantics.
That's not what I (or the standard, or compiler writers, or programming language researchers) mean by "semantics."
From the perspective of defining and specifying a programming language, when we say "semantics" we mean the set of rules for an abstract machine, or a similar formalism. If those rules don't specify the result of an operation, it's like the machine gets stuck- like dividing by zero in a proof, there is no correct way to proceed. The behavior is undefined.
Nobody is disputing that compilers will produce something for code with undefined behavior. They're just saying that there is absolutely no useful way to rely on it, because nobody has agreed on or even decided what it should be (and always for some specific reason!)
If that makes it impossible to use the language, that's not UB's problem. It's the design of the language, and the quality of the tools that surround it. There are ways to increase your confidence that a C application never invokes UB, and they're getting better all the time. (There are also lots of new languages that try to solve this in various ways that C can't!)
Those are the solutions we have. "Just don't have UB in C" or "just make compilers more predictable" are not very effective by comparison.
Whether or not this was originally intended, it's certainly become the way the standard is written and used today, so that's kind of beside the point.
Further, this is not some new idea that arose from the C standard. It's a basic, core idea in both software engineering and computer science! You define some meaning for your input, which may or may not cover all possible inputs, so that you can go on to process it without considering inputs that don't make sense.
Now, to be fair, the "guardrail-free" approach where UB is silent is a bit out of the ordinary. A lot of software that makes assumptions about its input will at least try to validate them first, and a lot of programming language research will avoid UB by construction. But C is in a unique place where neither of those approaches fully work.
> The C standard allows for both kinds of environments by stating that these behaviors are undefined, allowing the implementation to error out or do something sensible, depending on the environment.
This is true, but it doesn't mean that "something sensible" is actually something the programmer should rely on! That's just asking too much of UB- programmers need to work with the semantics implemented by their toolchain, not make up an intuitive/"sensible" meaning for their undefined program and then get mad when it doesn't work.
For example, if you want to scan through a bunch of memory, tell the language that's what you're doing. Is that memory at a fixed address? Tell the linker about it so it can show up as a normal global object in the program. Is it dynamic? Memory allocators fabricate new objects in the abstract machine all the time, perhaps your compiler supports an attribute that means "this function returns a pointer to a new object."
The solution is not just to shrug and say "do something sensible but potentially dangerous." It's to precisely define the operations available to the programmer, and then provide tools to help them avoid misuse. If an operation isn't in the language, we can add it! If it's too easy to mess up, we can implement sanitizers and static analyzers, or provide alternatives! Yelling about a supposed misreading of "undefined behavior" is never going to be anywhere near as effective.