> If a situation has been statically determined to invoke UB that should be a co...

MichaelBurge · on Nov 28, 2022

C++ bans undefined behavior in constexpr, so you can force GCC to prove that code has no undefined behavior by sprinkling it in declarations where applicable:

https://shafik.github.io/c++/undefined%20behavior/2019/05/11...

account42 · on Nov 28, 2022

Constant-evaluated expressions with undefined behavior are ill-formed but constexpr annotated functions which may in some invocations result in undefined behavior are not.

benj111 · on Nov 28, 2022

It is undefined behaviour if I write GCC --hlep

Does that mean it's acceptable for GCC to reformat my hard drive?

Just because something is UD doesn't give anyone a license to do crazy things.

If I misspell --help I expect the program to do something reasonable. If I invoke UD I still expect the program to do something reasonable.

Removing checks for an overflow because overflows 'can't happen' is just crazy.

UD is supposed to allow C to be implemented on different architectures if you don't know whether it will overflow to INT_MIN it makes sense to leave the implementation open. If I, the user knows what happens when an int overflows then I should be able to make use of that and guard against it myself. A compiler undermining that is a bug and user hostile.

tpush · on Nov 28, 2022

> It is undefined behaviour if I write GCC --hlep

No, it's not, and I don't know why you'd think so. UB is a concept applying to C programs, not GCC invocations.

> UD is supposed to allow C to be implemented on different architectures if you don't know whether it will overflow to INT_MIN it makes sense to leave the implementation open. If I, the user knows what happens when an int overflows then I should be able to make use of that and guard against it myself.

I think you're confusing UB with unspecified and implementation defined behavior. It's fine if you think something shouldn't be UB, but you have to go lobbying the C standard for that. Compiler writers aren't to blame here.

formerly_proven · on Nov 28, 2022

This has come up before, because, in some technical sense, the C standard does indeed not define what a "gcc" is, so "gcc --help" is undefined behavior according to the C standard, because the C standard does not define the behavior. By the same token, instrument flight rules are undefined behavior.

A slightly less textualist approach to language recognizes that when we talk about C and UB, we mean behavior, which is undefined, of operations otherwise defined by the C standard.

tpush · on Nov 28, 2022

I think this is confusing undefined behavior with behavior of something that is undefined. And either way, the C standard explicitly applies to C programs, so even this cute "textualist" interpretation would be wrong, IMO.

benj111 · on Nov 28, 2022

Do you know what a metaphor is?

No GCC --hlep isn't in the c standard.

But it is a simple example to illustrate how programs react when it receives something that isn't in the spec. GCC could do anything with Gcc --hlep just like it could do anything with INT_MAX + 1. That doesnt mean that all options open to it are reasonable.

If I typed in GCC --hlep I would be reasonably pissed that it deleted my hard drive. You pointing out that GCC never made any claims about what would happen if I did that doesn't make it ok.

If you come across UD, there's reasonable and unreasonable ways to deal with that. Reformatting your hard drive which is presumably allowed by the C standard isn't reasonable. I would contend that removing checks is also unreasonable.

ehvatum · on Nov 28, 2022

> I would contend that removing checks is also unreasonable.

Yeah, but removing a null check after a dereference has a solid rationale, so it’s very different from GCC taking it upon itself to format your drive.

benj111 · on Nov 28, 2022

The general thinking seems to be that UB can do anything so you can't complain, whatever that anything is.

That would logically include reformatting your hard drive.

I definitely disagree with that pov, if you don't accept that UB can result in anything then the line needs to be drawn somewhere.

I would contend that UB stems from the hardware. C won't take responsibility for what the hardware does. Neither will it step in to change what the hardware does. That in turn means that UB means the compiler shouldn't optimise because the behaviour is undefined.

benj111 · on Nov 28, 2022

>No, it's not, and I don't know why you'd think so. UB is a concept applying to C programs, not GCC invocations

What should happen when I invoke --hlep then? The program could give an error, could warn that it's an unrecognised flag. Could ask you if you meant --help. Infer you mean help and give you that, or it could give you a choo Choo train running across the screen. Or it could reformat your hard drive. Just because it isn't specifically listed as UD doesn't mean it's not. If it isn't defined then it's undefined. The question is what is the reasonable thing to do when someone types --hlep. I hope we can agree reformating your hard drive isn't the most reasonable thing to do.

>I think you're confusing UB with unspecified and implementation defined behavior

Am I? What's the reason for not defining integer overflow? Yes unspecified behaviour could be used to allow portability, but so can undefined.

>It's fine if you think something shouldn't be UB, but you have to go lobbying the C standard for that. Compiler writers aren't to blame here.

I'm not saying it shouldn't be UB. I'm saying there's reasonable and unreasonable things to do when you encounter UB. In the article the author took reasonable steps to protect themselves and the compiler undermined that. That isn't reasonable. In exactly the same way that --hlep shouldn't lead to my hard drive getting reformatted.

C gives you enough rope to hang yourself. It isn't required for GCC to tie the noose and stick your head in it though.

I think you're confusing UB with unspecified and implementation defined behavior

tpush · on Nov 28, 2022

> What should happen when I invoke --hlep then? The program could give an error, could warn that it's an unrecognised flag. Could ask you if you meant --help. Infer you mean help and give you that, or it could give you a choo Choo train running across the screen. Or it could reformat your hard drive. Just because it isn't specifically listed as UD doesn't mean it's not. If it isn't defined then it's undefined. The question is what is the reasonable thing to do when someone types --hlep. I hope we can agree reformating your hard drive isn't the most reasonable thing to do.

I honestly don't understand the point of this paragraph.

> Am I? What's the reason for not defining integer overflow? Yes unspecified behaviour could be used to allow portability, but so can undefined.

Yes, you are confused about that. UB is precisely the kind of behavior where the C standard deemed it unsuitable to define as implementation defined or whatever, and it usually has really good reasons to do so. You could look them up instead of asking rhetorically.

> I'm not saying it shouldn't be UB. I'm saying there's reasonable and unreasonable things to do when you encounter UB. In the article the author took reasonable steps to protect themselves and the compiler undermined that. That isn't reasonable. In exactly the same way that --hlep shouldn't lead to my hard drive getting reformatted.

Again, you seem to fundamentally misunderstand how compilers work in this case. They largely don't "encounter" UB; It's optimization passes are coded with the assumption that UB can't happen. The ability to do that is fundamentally the point of UB. Situations like in the article are not a specific act of the compiler to screw you in particular, but an emergent result.

Additionally, I think you you're also confusing Undefined Behavior with 'behavior of something that is undefined'. These are not the same things.

benj111 · on Nov 28, 2022

>Again, you seem to fundamentally misunderstand how compilers work in this case. They largely don't "encounter" UB; It's optimization passes are coded with the assumption that UB can't happen

Which is as wrong as coding GCC to assume --hlep can't happen.

It will happen and you need to deal with it when it does, and there are reasonable and unreasonable ways of dealing with that.

If you don't understand my --hlep example how about: Int mian () {

What should the compiler do there? Same rules apply should it reformat your hard drive or warn you that it can't find such a function? There are reasonable and unreasonable ways to deal with behaviour that hasn't been defined.

If I put in INT_MAX + 1 it isn't reasonable to reformat my hard drive. The compiler doesn't have carte blanche to do what it likes just because it's UD. It should be doing something reasonable. To me removing an overflow check isn't reasonable.

If you want to have a debate about what is reasonable we can have that debate but if you're going to say UB means anything tlcan happen then I'm just going to ask why it shouldn't reformat your hard drive.

tpush · on Nov 28, 2022

Again, you still don't understand.

> It will happen and you need to deal with it when it does, and there are reasonable and unreasonable ways of dealing with that.

A compiler's handling of UB simply can't work the same way handling flag passing works in GCC. Fundamentally.

With GCC, the example is something like:

  if (strcmp(argv[1], "--help") == 0) { /* do help */ } else { /* handle it not being help, for example 'hlep' or whatever */ }

Here, GCC can precisely control what happens when you pass 'hlep'.

Compilers don't and can't work this way. There is no 'if (is_undefined_behavior(ast)) { /screw the user / }'. UB is a property of an execution, i.e. what happens at runtime, and can't _generally_ be detected at compile time. And you very probably do not want checks for every operation that can result in UB at runtime! (But if you do, that's what UBSan is!).

So, the only way to handle UB is either

1) Leaving the semantics of those situation undefined (== not occuring), and coding the transformation passes (so also opt passes) that way.

or

2) Defining some semantics for those cases.

But 2) is just implementation defined behavior! And that is what you're arguing for here. You want signed integer overflow to be unspecified or implementation defined behavior. That's fine, but a job for the committee.

benj111 · on Nov 28, 2022

I get what's happening.

It's basically dead code removal. X supposedly can't happen so you never need to check for X.

The instance in the article is about checking for an overflow. The author was handling the situation. C handed him the rope, he used the rope sensibly checking for overflow. GCC took the rope and wrapped it around his neck. Fine GCC (and C) can't detect overflow at compile time and doesn't want to get involved in runtime checks. Leave it to the user then. But GCC isn't leaving it to the user it's undermining the user.

Re 2) (are you referring to gccs committee or the c committee?)

I don't mind what it's deemed to be, I expect GCC to do something reasonable with it. Whatever happens a behavior needs to be decided by someone. Some of those behaviours are reasonable some aren't. If you're doing a check for UB, the reasonable thing, to me is to maintain that check.

I could make a choice when I write an app to assume that user input never exceeds 100 bytes. I could document it saying anything could happen, then reasonably (well many people would disagree) leave it there, that is my choice.

If you come along and put 101bytes of input in you would complain if my app then reformatted your hard drive. Wouldn't you also complain if GCC did the same?

There's atleast a post a week complaining about user hostile practices with regard to apps. Why do compiler writers get a free pass? If I put up code assuming user input would be less than 100 bytes documented or not, someone would raise that as an issue so why the double standard.

I'm not even advocating the equavalent of safe user input. I'm advocating that just because you go outside the bounds of what is defined, you do something reasonable.

Liquid_Fire · on Nov 28, 2022

> If you're doing a check for UB, the reasonable thing, to me is to maintain that check.

The problem is that you need to do the check before you cause UB, not after, and here the check appears after. If you do the check before, the compiler will not touch it.

The compiler can't know that this code is part of a UB check (so it should leave it alone), whereas this other code here isn't a UB check but is just computation (so it should assume no UB and optimise it). It just optimises everything, and assumes you don't cause UB anywhere.

Now, I'm not defending this approach, but C works like this for performance and portability reasons. There are modern alternatives that give you most or all of the performance without all these traps.

benj111 · on Nov 28, 2022

>for performance and portability reasons

Is it more performant?

How would you do the check in the article in a more performant way?

Philosophically I'm not sure it's even possible. Sure you could do the check before the overflow but any way you slice it that calculation ultimately applies to something that is going to be UB so the compiler is free to optimise it out? Yes you can make it unrelated enough that the compiler doesn't realise. But really if the compiler can always assume you aren't going to overflow integers, then it should be able to optimise away 'stupid' questions like 'if I add X and y, would that be an overflow?'.

>The compiler can't know that this code is part of a UB check

If it doesn't know what the code is then it shouldn't be deleting it. It has just rearranged code that it knows is UB, it is now faced with a check on that UB. It could (and does) decide that can't possibly happen, because 'UB'. It could instead decide that it is UB and so doesn't know if this check is meaningful or not, and not delete the check, this to me is the original point of UB, C doesn't know whether your machine is 1s complement, 2s complement or 3s complement, it leaves it to the programmer to deal with the situation, if the programmer knows he's working on 2s complement machines that overflow predictably he can work on that assumption, the compiler isn't expected to know, but it should stay out of the way because the programmer does. The performance of c as I understood it is that overflow check is optional, you aren't forced to check. But you are required to ensure that the check is done if needed, or deal with the consequences.

Would you get rid of something you don't understand because you can't see it doing something useful. Or would you keep it because you don't know what you might break when you delete it? GCC in this case is deleting something it doesn't understand. Why is that not a bug?

Liquid_Fire · on Nov 28, 2022

> Sure you could do the check before the overflow but any way you slice it that calculation ultimately applies to something that is going to be UB so the compiler is free to optimise it out?

No, if you never do the calculation it's not going to be UB.

  int8_t x = some_input();
  if (x > 10) return bad_value;
  else x *= 10;

There is no UB here, because we never execute the multiplication in cases where it would have otherwise been UB. The compiler is not free to remove the check, because it can't prove that the value is not > 10.

> It has just rearranged code that it knows is UB

No - that's the problem. The compiler doesn't know that the code is UB, because this depends on the exact values at runtime, which the compiler doesn't know.

In some limited cases it could perform data flow analysis and know for sure that it will be UB, but those cases are very limited. In general there is no way for to know. So there are three things it could do:

A) Warn/error if there could possibly be UB. This would result in warnings in hundreds of thousands of pieces of legitimate code, where there are in fact guarantees about the value but the compiler can't prove or see it. It would require much more verbose code to work around these, or changing the language significantly. For example, you could represent this in the type system, or have annotations.

B) Insert runtime checks for the UB. This would have a significant performance overhead, as there are lots of "innocent" operations in the language that, in the right circumstances, lead to UB. So we would bloat the code with a lot of branches, 99.999% of which will never ever be taken, filling up the instruction cache and branch predictor. You get something more like (the runtime behaviour of) Python or JavaScript. Or even C if you enable UBSan.

C) Assume that the programmer has inserted these checks where they are needed, and omitted them where they are not. You get performance, but in exchange for that you are responsible for avoiding UB. This is what C chooses.

> C doesn't know whether your machine is 1s complement, 2s complement or 3s complement, it leaves it to the programmer to deal with the situation, if the programmer knows he's working on 2s complement machines that overflow predictably he can work on that assumption, the compiler isn't expected to know, but it should stay out of the way because the programmer does

This is mostly right, but with the caveat that you can't invoke UB. If you want to deal with whatever the underlying representation is, cast it to an unsigned type and then do whatever you want with it. The compiler will not mess with your unsigned arithmetic, because it's allowed to wrap around. But for signed types, you are promising to the compiler that you won't cause overflow. In exchange the compiler promises you fast signed arithmetic.

This promise is part of the language, not part of GCC. If you removed that promise, you would have to pay the price in reduced performance.

Could you have a C compiler that inserts these checks? Yes (see UBSan). But you would be throwing away performance - it would be slower than GCC/Clang/MSVC/etc. If you're writing performance-sensitive software, you are better off either ensuring you never trigger UB, or use another language like Rust. If performance is not so important, you are probably better off writing the thing in Go/JavaScript/whatever.

benj111 · on Nov 28, 2022

>No, if you never do the calculation it's not going to be UB.

  int8_t x = some_input();
  if (x > 10) return bad_value;
  else x *= 10

In this simple case yes. But what if you don't know what you're going to multiply by? What if you can't say that X is a bad value?

If you have: Long long x = ?; Long long y = ?;

  If (????); x *= y;

I don't know the answer to this. I've looked online and the answers invoke UB. The best I can think of is a LUT of safe / unsafe combinations, but that isn't faster, and when you're at that point you may as well give up on the MUL hardware in your cpu, I'm not even sure how to safely calculate the LUT, I suppose you could iterate with additions subbing the current total from int_max and checking if that's bigger than the number you're about to add. But that's frankly stupid. And again you are basically checking if something is going to be UB which can't happen the compiler is therefore free to remove the check. Or do you roll your own data type with unsigned ints and a sign bit? But but then what's the point of having signed ints, and what happens to Cs speed. Or is there some bit twiddling you can do?

>No - that's the problem. The compiler doesn't know that the code is UB

Ok I should properly have said, code it can't prove isn't UB.

If it can't say X + y isn't an overflow it shouldn't just assume it can't.

If y is 1 and X is probably 9 it wouldn't be reasonable to assume the sum is 10.

>C) Assume that the programmer has inserted these checks where they are needed, and omitted them where they are not. You get performance, but in exchange for that you are responsible for avoiding UB

You get the performance by avoiding option B. I'm not even sure the programmer is responsible for avoiding UB? UB just doesn't give guarantees about what will happen. You should still be able to invoke it, and I would contend, expect the compiler to do something reasonable.

nayuki · on Nov 28, 2022

It is tedious but possible to check for overflow before multiplying signed integers.

    long long x = (...);
    long long y = (...);
    long long z;
    
    
    // Portable
    bool ok = x == 0 || y == 0;
    if (!ok) {
        long long a = x > 0 ? x : -x;
        long long b = y < 0 ? y : -y;
        if ((x > 0) == (y > 0))
            ok = -LONG_LONG_MAX / a <= b;
        else
            ok = LONG_LONG_MIN / a <= b;
    }
    if (ok)
        z = x * y;
    
    
    // Compiler-specific
    bool ok = !__builtin_smulll_overflow(x, y, &z);

https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins...

benj111 · on Nov 29, 2022

Thanks for that.

Slightly worrying that I didn't come across this or a variation in my searches

radford-neal · on Nov 28, 2022

> It's fine if you think something shouldn't be UB, but you have to go lobbying the C standard for that. Compiler writers aren't to blame here.

I'm glad I don't live in your country, where the C standard has been incorporated into law, making it illegal for compiler writers to do things that are helpful to programmers and end users, but aren't required by the standard.

roblabla · on Nov 28, 2022

> UD is supposed to allow C to be implemented on different architectures

No, that's wrong. Implementation-Defined Behavior is supposed to allow C to be implemented on different architectures. In those cases, the implementation must define the behavior itself, and stick with it. UB, on the other hand, exists for compiler authors to optimize.

If you want to be mad at someone, be mad at the C standard for defining so much stuff as UB instead of implementation-defined behavior. Integer overflow should really be implementation-defined instead.

benj111 · on Nov 28, 2022

>No, that's wrong. Implementation-Defined Behavior is supposed to allow C to be implemented on different architectures.

Is it? We're talking about integer overflow here.

I wasn't in the meetings when writing all the c standards. I'm not convinced this is purely an optimisation thing though.

I would guess the story is more.

Interested party X: "can integer overflow do X?"

Party Y: "no because our processor doesn't work like that.

Party Z: "and it breaks K and R"

Party X: "how about implementation defined?"

Party A: "but our compiler targets 5 different processors"

Party B: "plus that precludes certain optimisations"

astrange · on Nov 28, 2022

Not only to optimize but to write safety tools. If you defined all the behavior, and then someone used some rare behavior like integer overflow by accident, it'd be harder to detect that since you have to assume it was intentional.

flohofwoe · on Nov 28, 2022

> UB, on the other hand, exists for compiler authors to optimize.

Was this really the original reason why there's UB in the C standard, or has this been retconned by 'malicious compiler authors'? ;)

sanxiyn · on Nov 28, 2022

It is the original reason. For example, register allocation is possible because stack smashing is UB.

masklinn · on Nov 28, 2022

UB is also very much based around software incompatibilities though, not just the ability to optimise stuff.

But where IB can have useful definitions to document, UB was defined so because the behaviours were considered sufficiently divergent that allowing them was useless, and so it was much easier to just forbid them all.

fps_doug · on Nov 28, 2022

But then again, UB doesn't mean the compiler author can't treat it as implementation-defined and do something reasonable.

tuyiown · on Nov 28, 2022

You're getting it backward. UB doesn't immediately stop compilation only due to implementation defined backward compatibility, just because you don't want to break compilation of existing programs each time the compiler converges to the C spec and identified an implementation of undefined behavior.

And since you want some cross-compiler compatibility, you also import's third parties implementation defined UB.

This is not some conceptual reasonable decision, the proper way would be to throw out compilation on each UB behavior. The reality is that the proper way would be too harsh on existing codebase, making people use a less strict compiler or not updating version, which are non-desirable effects for compilers writers.

fps_doug · on Nov 28, 2022

I can't really follow. What would be wrong with making -fwrapv the default? i.e. let the compiler assume signed integers are two's complement on according platforms (i.e. virtually everything in use today). Then stop assuming "a + 1 < a" cannot be true for signed ints. How would that make existing code worse, or break it? It's basically what you already get with -O0 afaict, so any such program would be broken with optimizations turned off.

gpderetta · on Nov 28, 2022

There is nothing wrong. Except that a subset of GCC users prefer -ftrapv and another subset wants no overhead, so the status quo remains.

tuyiown · on Nov 28, 2022

I think I misunderstood your comment, sorry, but I have difficulties in understanding how it's different that how things works already, then. You either have to rely that the compiler author did chose what you expect (not the case here), or check by yourself and hope it won't change.

dzaima · on Nov 28, 2022

And sanitizers that throw warnings on undefined behavior do indeed exist.

hoseja · on Nov 28, 2022

>UB, on the other hand, exists for compiler authors to optimize.

s/exists for/has been exploited by/g

The worst part is the optimizations aren't even that significant. (I recall a blog post of somebody testing this but I can't find it rn)

tmtvl · on Nov 28, 2022

It is undefined behaviour if I write GCC --hlep

Well no, it's a compilation error, you need at the very least a semicolon after hlep and from there on it depends on what GCC is. If it's a function you need parentheses around --hlep, if it's a type you need to remove the --, if it's a variable you need to put a semicolon after it,...

Because GCC is all-caps I'm guessing it's a macro, so here's an example of how you could write it (though it won't be UB): https://godbolt.org/z/dYMddrTjj

benj111 · on Nov 28, 2022

I'm not sure if you're supporting my pov by showing the absurdity of the other position???

Yeah sure, if my phone auto incorrects gcc to GCC then that is technically meaningless so you're completely free to interpret my comment how you want.

..... Although..... GCC stands for GNU Compiler Collection so it can be reasonably capitalised, so maybe then, rather than saying anything goes we should do something reasonable because then you aren't left saying something really stupid if you're wrong???

gpderetta · on Nov 28, 2022

Parent point is when the standard talks about UB it refers about translating C code. So parent cheekly interpreted your comment about command line flags (which are outside the remit of the standard) as code instead. I thought it was fitting.