That is not how undefined behavior works in C (or C++). Effects of UB are not te...

dataflow · on Nov 28, 2022

> The moment you enter a compilation unit (assuming no link optimizations) with a state which at some point will run into undefined behavior all bets are of. [...] Yes, UB can "time travel"

Close, but not quite. This is a common misconception in the reverse direction.

Abstractly, what UB can do is performing the inverse of the preceding instructions, effectively making the abstract machine run in reverse. However, this is only equivalent to "time-traveling" until you get to the point of the last side effect (where "side effect" here refers to predefined operations in the standard that interact with the external world, such as I/O and volatile accesses), because only everything since that point can be optimized away under the as-if rule without altering the externally visible effects of the program.

As a concrete, practical example, this means the following: if you do fflush(stdout); return INT_MAX + 1; the compiler cannot omit the fflush() call merely because the subsequent statement had undefined behavior. That is, the UB cannot time-travel to before the flush. What the program can do is to write garbage to the file afterward, or attempt to overwrite what you wrote in the file to revert it to its previous state, but the fflush() must still occur before anything wild happens. If nobody observes the in-between state, then the end result can look like time-travel, but if the system blocks on fflush() and the user terminates the program while it's blocked, there is no opportunity for UB.

saagarjha · on Nov 28, 2022

The program can logically undo the call to fflush, too. Mainly by not dispatching it at all–UB is a global program attribute, at least currently. (People have made proposals to change this, but I don't think they have gone anywhere.)

dataflow · on Nov 28, 2022

No, it cannot, and UB is not a global program property. The C standard defines valid program executions according to the behaviors of the abstract machine. UB is a property of an execution of the program given some inputs.

saagarjha · on Nov 28, 2022

Yes, sorry for not being precise: UB applies to executions. When I said "global" I meant global over that entire execution, so if your path ends up hitting undefined behavior it can go back and logically undo its entire execution, including parts which it shared with a well-defined execution or where you'd generally expect side effects to be placed.

dataflow · on Nov 28, 2022

No, that logically doesn't make sense. The program cannot know whether it is going through a particular execution ahead of time without actually executing all the side effects along that path first (which in this case would include the fflush()). The very difference between a "program" and a "program execution" is the fact that an execution includes the interactions of the program with the external world (as defined by the standard, all of which I loosely called "inputs" in my previous comment). The interactions basically extend prefixes of the execution through performing the semantics of the program according to the abstract machine and observing the responses from the external world. You don't have an "execution" of the program until the point of UB, until the interactions (aka side effects) up to that point have first occurred (and the responses of the system observed for continuing the execution).

P.S. Have you ever seen a single example of a compiler time-traveling UB through observable behavior like this? I sure haven't. If you have, I'd love to see it, because despite all the crazy ways compilers take advantage of UB, I've never seen C/C++ compilers actually agree with the stance that this way would be somehow legal (if it's even logically possible).

rocqua · on Nov 28, 2022

What about the following code:

    if (x > 4) {
        fflush();
        int y = INT_MAX + 1;
    }

Can the compiler not use that to assume that (x > 4) is false because otherwise it triggers undefined behavior? Hence it is allowed to drop the entire branch?

The only real counter-argument I could see is "fflush might terminate the program, hence we need to run the function before we know if UB will be triggered". I suppose once you call a function that the compiler cannot analyze (e.g. system-calls, FFIs) the compiler may not be certain the function doesn't contain an 'exit()' call.

bloak · on Nov 28, 2022

That's right, I think. If you replace the "fflush()" (which should have an argument by the way) with "f()" and declare "void f(void);" then the test and the call appear in the binary. But if you declare "__attribute__((pure)) void f(void);" then the test and the call disappear.

rocqua · on Nov 28, 2022

I played around a bit in godbolt: https://godbolt.org/z/vWPWcGM1P

It seems this is correct, but there are very quick cases where the compiler does not consider a program 'pure'. Even a simple call to 'puts' already is enough to be compiled. Probably because it has side-effects in setting a value for ferror(file) to return.

I wonder if we can find an example of a function that is externally observable to a user, but that is guaranteed to finish. Then specifically i wonder if the compiler can proof that the undefined behavior is guaranteed to happen so it elides the branch, proving 'real' timetravel. That is observable.

dataflow · on Nov 28, 2022

> I wonder if we can find an example of a function that is externally observable to a user, but that is guaranteed to finish.

I don't think the standard has such a thing, but if it did, the closest thing would probably be a write to a volatile variable. You'd have to make sure the compiler sees the variable as having a side-effect in the first place (so it would probably need external linkage).

dataflow · on Nov 28, 2022

> The only real counter-argument I could see is "fflush might terminate the program, hence we need to run the function before we know if UB will be triggered".

The thing to realize is there is no such thing as "UB will be triggered". The only thing that exists is "UB is triggered", combined with the as-if rule, which allows modifications that don't affect what the standard considers observable behavior. Or in other words, the standard defines a program according to its observable behavior. People think it's time-travel because they think of the program in terms of expressions and statements rather than side effects, but if you think of the programs in terms of observable behaviors rather than the lines of code executing, you see that there's no time travel.

vardump · on Nov 28, 2022

A bit pointless example, because that "int y..." is going to be pruned away anyways, since the result is not used anywhere.

Hence it won't trigger any undefined behavior.

rocqua · on Nov 28, 2022

The program still contains undefined behavior. It is probably a matter of order of optimization whether the compiler catches the undefined behavior before it elides the useless statement.

But it is certainly 'legal' for the compiler to consider that statement to invoke undefined behavior, and prune any branch that is guaranteed to reach that statement.

rocqua · on Nov 28, 2022

A quote from the C standard:

"However, if any such execution contains an undefined operation, this International Standard places no requirement on the implementation executing that program with that input (not even with regard to operations preceding the first undefined operation)."

Quote found here: https://devblogs.microsoft.com/oldnewthing/20140627-00/?p=63...

It seems that the standard explicitly disagrees with you.

tpush · on Nov 28, 2022

The "[...] executing that program with that input [...]" part maybe could be read as making it specific to a given UB triggering execution; but I'm no language lawyer :).

rocqua · on Nov 28, 2022

True, only executions of a program that exhibit undefined behavior are affected. But the moment it is clear a program will exhibit undefined behavior, the compiler is already allowed to do whatever it wants. So if 20 lines below an important function call you will certainly call a function that will certainly cause undefined behavior, the important function call can be already be left out.

dathinab · on Nov 28, 2022

And the compiler could insert additional side effect free computations ahead of time to detect if an execution will hit UB later on.

saagarjha · on Nov 29, 2022

I agree with your sentiment, but the way I square that with what I mentioned is that the compiler can undo side effects. As far as I am aware there is nothing special about fflush in the standard where you can't go back to where the program was before it happened.

(I have never actually seen a compiler act on this, but I maintain that this is just because they're either not willing to optimize on this or unable to do so. But there's a lot of UB that compilers do not exploit, so this isn't particularly concerning to me.)

dataflow · on Nov 29, 2022

Something I should add here in hindsight is that I've been rather sloppy in this discussion with a few details, and perhaps they're worth clarifying. For example, despite me using them interchangeably, "observable behavior" is not the same thing as "side effects", and you really have to refer to the standard and your implementation to see what constitutes observable behavior. For example, fflush() may in fact be elidable if the compiler can prove the file is unbuffered (and it wouldn't even need UB for that). Similarly, if the compiler can prove fflush() has no observable behavior (i.e. it is guaranteed to return without raising signals, terminating the program, etc.) then it may be able to elide the call in the UB case as well. In practice this isn't usually possible to guarantee given fflush() performs an opaque system call, but it may be more possible in a freestanding implementation than in a hosted one.

Ultimately, my point here wasn't about fflush() or even about the specifics of what exactly constitutes observable behavior in the abstract machine. (I do recall writes to volatile variables was among them, but you'd have to check all of them to be sure.) Rather, my basic point was the fact (tautology?) that any interactions with the external world that affect the program's observable behavior necessarily must be allowed to happen before the program can "know" for certain that the execution path will trigger UB—which by definition isn't possible when one of the intervening operations is an opaque call.

nayuki · on Nov 28, 2022

> if you do fflush(stdout); return INT_MAX + 1; the compiler cannot omit the fflush() call merely because the subsequent statement had undefined behavior

False! The expression (INT_MAX + 1) has no side effect (assuming no UB), so according to the rules of the C abstract machine, the compiler is allowed to hoist this calculation above the fflush(). If you run this on a machine that traps on integer overflow (which is allowed behavior), the process could crash before the fflush() is executed. Remember, everyone: With UB, anything can happen.

eru · on Nov 28, 2022

To hammer it home: UB isn't restricted to a variable having a funny value. Your C program is allowed to play Nethack on startup, if the compiler can prove that a few hours into your program, there would be UB.