> If, instead, we force compilers to think about the fact that offset + 16 could have some implementation-defined meaning like wrapping, then all bets are off & we have to throw a bunch of optimization opportunities out the window.
Uh huh. If `i` is declared as `unsigned int` instead of `int`, then overflow is defined and the compiler can't apply those optimizations. And yet the world doesn't end and the sun will still rise tomorrow...
The world doesn't end, but in the "int" case you get nice vector code and in the "unsigned int" case you get much less nice scalar code: https://gcc.godbolt.org/z/cje6naYP4
Yes, that is true. The proper way for a compiler to handle this, would be to add a single overflow check before the loop, which branches to a scalar translation of the loop. Most realistic code will need a scalar version anyway, to deal with the prolog/epilog of the unrolled loop for iteration counts that aren't multiples of the unrolling factor.
Surely you agree that treating unsigned overflow differently from signed does not make any sense semantically? Why is signed overflow UB, but unsigned wrapping, and not the other way around? The terms 'signed' and 'unsigned' denote the value range, not "operations on this type might overflow/will never overflow".
To a mathematician, wrapping 2^n+1 back to 0 is a lot more intuitive than wrapping 2^n to -2^n.
Mathematically the two systems are largely equivalent.
They are equivalent when considering addition and multiplication. Both implement arithmetic modulo 2^n+1.
However, the canonical representation of this system runs from 0 to 2^n+1. Hence, if you were going to make one kind of integer overflow, and not the other, C made the correct choice.
That leaves out the question of whether the difference between the two cases is significant enough to have a difference in how overflow works.
> The proper way for a compiler to handle this, would be to add a single overflow check before the loop, which branches to a scalar translation of the loop. Most realistic code will need a scalar version anyway, to deal with the prolog/epilog of the unrolled loop for iteration counts that aren't multiples of the unrolling factor.
That's true, I agree that that would be a clever way to handle this particular case. It would still happily invoke undefined behavior if the indices don't match the array's length, of course. Many assumptions about the programmer knowing what they are doing goes into the optimization of C code.
> Surely you agree that treating unsigned overflow differently from signed does not make any sense semantically?
Yes. Silently wrapping unsigned overflow is also very often semantically meaningless.
Yes, with lots of extra ceremony around it. (More than is needed, since it doesn't seem to realize that it will always process exactly 16 loop iterations.)
Since you've posted a lot along the lines of "these optimizations don't even make a difference", you might want to see if Clang's safer-looking version is as fast as GCC's.
It's not an interesting optimization. Micro-benchmarks are of limited utility. The extra complication is to protect the code from flying off and writing on random memory. Well worth it.
> And yet the world doesn't end and the sun will still rise tomorrow...
No, you just get much slower, non-vectorized code because the compiler is forced to forgo an optimization if you use unsigned int as the loop bound (EDIT: tom_mellior's reply illustrates this extremely well: https://gcc.godbolt.org/z/cje6naYP4)
Which is precisely the point: forcing a bunch of existing code with int loop bounds, which currently enjoys optimization, to take on the unsigned int semantics and get slower, is just going to piss off a different (and probably larger) set of people than the "compilers shouldn't assume that unsigned behaviour can't happen" set of people.
It's a tradeoff with some big downsides; this isn't the obvious win the anti-optimization crowd pretends it is.
And switch the i to a size_t and get vector code without the possibility of writing to random memory because your int overflows and GCC wants to pretend it cannot.
This is a poorly written loop. C design model is that if it is not critical, we don't care, and if it is, the programmer should fix it so optimization can work. https://gcc.godbolt.org/z/ErMP4cn6s
I wanted to show that we don't need UB justified deletes to get good code generation. There was no need to break all that working code when we could have just told people that size_t counters worked better in loops on x86-64 than ints. A lot of C optimization could work that way - relying on cooperation between the compiler and programmers. Java can't do that because Java programmers rely on complex abstractions that need a lot of compiler work to run fast.
> A lot of C optimization could work that way - relying on cooperation between the compiler and programmers.
That is precisely how it works already. The reason your code has no bounds checks is exactly because the compiler can assume that you have done your part and ensured that all indices are in bounds. This is what "the compiler can ignore UB" is all about.
The signed integer kerfuffle is just the same: The compiler assumes your cooperation in ensuring, beforehand, that your signed arithmetic never overflows. Its part of the bargain is generating the best possible code it can. Another part of its bargain is offering you the -fwrapv flag to communicate more about your expectations. A third part of the bargain is offering you sanitizers that can inform you that you have done something you probably didn't want.
The problems with that argument are
1) The expectations changed with no notice. You can say it always was that way, but that's just not correct. The bounds check worked and then didn't, no matter what you think the standard "always said" (and the UB experts on the WG14 often find it impossible to say exactly what provisions mean, so claims that all this was ever clear are also wrong.)
2) deleting overflow check reduces the power of the language. The supposed work arounds are painful and have edge cases.
3) the example, and others, show that much UB "we assume it can't happen" "optimization" is unnecessary. You make the language more difficult to use, more prone to unpleasant surprise, and in return you provide an "optmization" that could easily be produced by other means. You're insisting on using a hammer as a fork and annoyed when people don't find it convenient.
Uh huh. If `i` is declared as `unsigned int` instead of `int`, then overflow is defined and the compiler can't apply those optimizations. And yet the world doesn't end and the sun will still rise tomorrow...