Generally a good writeup, but the article seems a bit confused about undefined behavior.
> What is the dreaded UB? I think the best way to understand it is to remember that, for any running program, there are FATES WORSE THAN DEATH. If something goes wrong in your program, immediate termination is great actually!
This has nothing to do with UB. UB is what it says on the tin, it's something for which no definition is given in the execution semantics of the language, whether intentionally or unintentionally. It's basically saying, "if this happens, who knows". Here's an example in C:
int x = 555;
long long *l = (long long*)&x;
x = 123;
printf("%d\n", *l);
This is a violation of the strict aliasing rule, which is undefined behavior. Unless it's compiled with no optimizations, or -fno-strict-aliasing which effectively disables this rule, the compiler is "free to do whatever it wants". Effectively though, it'll just print out 555 instead of 123. All undefined behavior is just stuff like this. The compiler output deviates from the expected input, and only maybe. You can imagine this kind of thing gets rather tricky with more aggressive optimizations, but this potential deviation is all that occurs.
Race conditions, silent bugs, etc. can occur as the result of the compiler mangling your code thanks to UB, but so can crashes and a myriad of other things. It's also possible UB is completely harmless, or even beneficial. It's really hard to reason about that kind of thing though. Optimizing compilers can be really hard to predict across a huge codebase, especially if you aren't a compiler dev yourself. That unpredictability is why we say it's bad. If you're compiling code with something like TCC instead of clang, it's a completely different story.
Race conditions, silent bugs, etc. can occur as the result of the compiler mangling your code thanks to UB, but so can crashes and a myriad of other things. [...] That's it. That's all there is to UB.
I think it's common to be taught that UB is very bad when you're new, partly to simplify your debugging experience, partly to help you understand and mentally demarcate the boundaries of what the language allows and doesn't allow, and partly because there are many Standards-Purists who genuinely avoid UB. But from my own experience, UB just means "consult your compiler to see what it does here because this question is beyond our pay grade."
Interestingly enough, and only semi related, I had to use volatile for the first time ever in my latest project. Mainly because I was writing assembly that accessed memory directly, and I wanted to make sure the compiler didn't optimize away the variable. I think that's maybe the last C keyword on my bucket list.
> But from my own experience, UB just means "consult your compiler to see what it does here because this question is beyond our pay grade."
People are taught it’s very bad because otherwise they do exactly this, which is the problem. What does your compiler do here may change from invocation to invocation, due to seemingly unrelated flags, small perturbations in unrelated code, or many other things. This approach encourages accepting UB in your program. Code that invokes UB is incorrect, full stop.
I understand, but you have to see how you would be considered one of the Standards-Purists that I was talking about, right? If Microsoft makes a guarantee in their documentation about some behavior of UB C code, and this guarantee is dated to about 14 years ago, and I see many credible people on the internet confirming that this behavior does happen and still happens, and these comments are scattered throughout those past 14 years, I think it's safe to say I can rely on that behavior, as long as I'm okay with a little vendor lock-in.
> If Microsoft makes a guarantee in their documentation about some behavior of UB C code
But do they? Where?
More likely, you mean that a particular compiler may say "while the standard says this is UB, it is not UB in this compiler". That's something wholly different, because you're no longer invoking UB.
That's not true at all, who taught you that? Think of it like this, signed integer over/underflow is UB. All addition operations over ints are potentially invoking UB.
int add (int a, int b) { return a + b; }
So this is incorrect code by that metric, that's clearly absurd.
Compilers explicitly provide you the means to disable optimizations in a granular way over undefined behavior precisely because a lot of useful behavior is undefined, but compilation units are sometimes too complex to reason about how the compiler will mangle it. -fno-strict-aliasing doesn't suddenly make pointer aliasing defined behavior.
We have compiler behavior for incorrect code, and it's refusing to compile the code in the first place. Do you think it just a quirky oversight that UB triggers a warning at most? The entire point of compilers having free reign over UB was so they could implement platform-specific optimizations in its place. UB isn't arbitrary.
"Code that misbehaves when optimized following these rules is, by definition, incorrect C code."
> We have compiler behavior for incorrect code, and it's refusing to compile the code in the first place
This isn't and will never be true in C because whether code is correct can be a runtime property. That add function defined above isn't incorrect on its own, but when combined with code that at runtime calls it with values that overflows, is incorrect.
> But from my own experience, UB just means "consult your compiler to see what it does here because this question is beyond our pay grade."
Careful. It's not just "consult your compiler", because the behavior of a given compiler on code containing UB is also allowed to vary based on specific compiler version, and OS, and hardware, and the phase of the moon.
> What is the dreaded UB? I think the best way to understand it is to remember that, for any running program, there are FATES WORSE THAN DEATH. If something goes wrong in your program, immediate termination is great actually!
This has nothing to do with UB. UB is what it says on the tin, it's something for which no definition is given in the execution semantics of the language, whether intentionally or unintentionally. It's basically saying, "if this happens, who knows". Here's an example in C:
This is a violation of the strict aliasing rule, which is undefined behavior. Unless it's compiled with no optimizations, or -fno-strict-aliasing which effectively disables this rule, the compiler is "free to do whatever it wants". Effectively though, it'll just print out 555 instead of 123. All undefined behavior is just stuff like this. The compiler output deviates from the expected input, and only maybe. You can imagine this kind of thing gets rather tricky with more aggressive optimizations, but this potential deviation is all that occurs.Race conditions, silent bugs, etc. can occur as the result of the compiler mangling your code thanks to UB, but so can crashes and a myriad of other things. It's also possible UB is completely harmless, or even beneficial. It's really hard to reason about that kind of thing though. Optimizing compilers can be really hard to predict across a huge codebase, especially if you aren't a compiler dev yourself. That unpredictability is why we say it's bad. If you're compiling code with something like TCC instead of clang, it's a completely different story.
That's it. That's all there is to UB.