> *If, instead, we force compilers to think about the fact that offset + 16 coul...

tom_mellior · on May 20, 2021

The world doesn't end, but in the "int" case you get nice vector code and in the "unsigned int" case you get much less nice scalar code: https://gcc.godbolt.org/z/cje6naYP4

adwn · on May 20, 2021

Yes, that is true. The proper way for a compiler to handle this, would be to add a single overflow check before the loop, which branches to a scalar translation of the loop. Most realistic code will need a scalar version anyway, to deal with the prolog/epilog of the unrolled loop for iteration counts that aren't multiples of the unrolling factor.

Surely you agree that treating unsigned overflow differently from signed does not make any sense semantically? Why is signed overflow UB, but unsigned wrapping, and not the other way around? The terms 'signed' and 'unsigned' denote the value range, not "operations on this type might overflow/will never overflow".

rocqua · on May 20, 2021

To a mathematician, wrapping 2^n+1 back to 0 is a lot more intuitive than wrapping 2^n to -2^n. Mathematically the two systems are largely equivalent. They are equivalent when considering addition and multiplication. Both implement arithmetic modulo 2^n+1.

However, the canonical representation of this system runs from 0 to 2^n+1. Hence, if you were going to make one kind of integer overflow, and not the other, C made the correct choice.

That leaves out the question of whether the difference between the two cases is significant enough to have a difference in how overflow works.

tom_mellior · on May 20, 2021

> The proper way for a compiler to handle this, would be to add a single overflow check before the loop, which branches to a scalar translation of the loop. Most realistic code will need a scalar version anyway, to deal with the prolog/epilog of the unrolled loop for iteration counts that aren't multiples of the unrolling factor.

That's true, I agree that that would be a clever way to handle this particular case. It would still happily invoke undefined behavior if the indices don't match the array's length, of course. Many assumptions about the programmer knowing what they are doing goes into the optimization of C code.

> Surely you agree that treating unsigned overflow differently from signed does not make any sense semantically?

Yes. Silently wrapping unsigned overflow is also very often semantically meaningless.

vyodaiken · on May 21, 2021

Clang uses vectors for both. https://gcc.godbolt.org/z/G997Ge9KT

tom_mellior · on May 21, 2021

Yes, with lots of extra ceremony around it. (More than is needed, since it doesn't seem to realize that it will always process exactly 16 loop iterations.)

Since you've posted a lot along the lines of "these optimizations don't even make a difference", you might want to see if Clang's safer-looking version is as fast as GCC's.

vyodaiken · on May 21, 2021

It's not an interesting optimization. Micro-benchmarks are of limited utility. The extra complication is to protect the code from flying off and writing on random memory. Well worth it.

msbarnett · on May 20, 2021

> And yet the world doesn't end and the sun will still rise tomorrow...

No, you just get much slower, non-vectorized code because the compiler is forced to forgo an optimization if you use unsigned int as the loop bound (EDIT: tom_mellior's reply illustrates this extremely well: https://gcc.godbolt.org/z/cje6naYP4)

Which is precisely the point: forcing a bunch of existing code with int loop bounds, which currently enjoys optimization, to take on the unsigned int semantics and get slower, is just going to piss off a different (and probably larger) set of people than the "compilers shouldn't assume that unsigned behaviour can't happen" set of people.

It's a tradeoff with some big downsides; this isn't the obvious win the anti-optimization crowd pretends it is.

vyodaiken · on May 21, 2021

And switch the i to a size_t and get vector code without the possibility of writing to random memory because your int overflows and GCC wants to pretend it cannot.

This is a poorly written loop. C design model is that if it is not critical, we don't care, and if it is, the programmer should fix it so optimization can work. https://gcc.godbolt.org/z/ErMP4cn6s

tom_mellior · on May 21, 2021

You changed writes to indices offset..offset+15 to writes to indices 0..15.

vyodaiken · on May 21, 2021

the offset is used to compute the index, not the count.

tom_mellior · on May 21, 2021

But you're not using the offset to compute the index in bar().

    void foo(int offset, int *arr) {
        for(int i = offset; i < (offset + 16); i++) {
            arr[i] = i + 32;
        }
    }

If you call this with offset = 100, the arr[i] in the loop will write to arr[100], arr[101], ..., arr[115].

    void bar(unsigned int offset, int *arr) {
        if(offset+16 < offset)fail();
        for(size_t i = 0; i < 16; i++) {
            arr[i] = i+ offset + 32;
        }
    }

If you call this with offset = 100, the arr[i] in the loop will write to arr[0], arr[1], ..., arr[15].

vyodaiken · on May 21, 2021

Fixed it, but same kind of result

https://gcc.godbolt.org/z/hx1zjE5xW

tom_mellior · on May 21, 2021

Nice. If you like, could you explain again what the perceived difference is to adding

    if (offset >= INT_MAX - 16) fail();

in foo()?

(I mean, besides the fact that the size of the buffer pointed to by arr is highly unlikely to agree exactly with either INT_MAX or UNIT_MAX.)

vyodaiken · on May 21, 2021

I wanted to show that we don't need UB justified deletes to get good code generation. There was no need to break all that working code when we could have just told people that size_t counters worked better in loops on x86-64 than ints. A lot of C optimization could work that way - relying on cooperation between the compiler and programmers. Java can't do that because Java programmers rely on complex abstractions that need a lot of compiler work to run fast.

tom_mellior · on May 22, 2021

> A lot of C optimization could work that way - relying on cooperation between the compiler and programmers.

That is precisely how it works already. The reason your code has no bounds checks is exactly because the compiler can assume that you have done your part and ensured that all indices are in bounds. This is what "the compiler can ignore UB" is all about.

The signed integer kerfuffle is just the same: The compiler assumes your cooperation in ensuring, beforehand, that your signed arithmetic never overflows. Its part of the bargain is generating the best possible code it can. Another part of its bargain is offering you the -fwrapv flag to communicate more about your expectations. A third part of the bargain is offering you sanitizers that can inform you that you have done something you probably didn't want.

vyodaiken · on May 22, 2021

The problems with that argument are 1) The expectations changed with no notice. You can say it always was that way, but that's just not correct. The bounds check worked and then didn't, no matter what you think the standard "always said" (and the UB experts on the WG14 often find it impossible to say exactly what provisions mean, so claims that all this was ever clear are also wrong.) 2) deleting overflow check reduces the power of the language. The supposed work arounds are painful and have edge cases. 3) the example, and others, show that much UB "we assume it can't happen" "optimization" is unnecessary. You make the language more difficult to use, more prone to unpleasant surprise, and in return you provide an "optmization" that could easily be produced by other means. You're insisting on using a hammer as a fork and annoyed when people don't find it convenient.