Hyrum's Law strikes again. It kind of doesn't matter what you document. If you weren't randomizing your checksum previously [1], you can't just spring this on the community and blame it for the fallout. I'm more shocked that there's resistance from the GitHub team saying "but we documented this isn't stable". Default stance for the team should be rollback & reevaluate an alternate path forward when the scope is this wide (e.g. only generating the new tarballs for future commits going forward).
[1] Apparently googlesource did do this and just had people shift to using GitHub mirrors to avoid this problem.
But look at it from the other side. Users that don't read your documentation and expect your software to work like they imagined are just a huge pain in the ass.
Fact of life: the vast majority of your users do not read your documentation (or do not do so carefully enough that what you put in your docs is an ironclad proof that all users adhere to). That's literally what Hyrum's law is about. Of course, you can choose to do whatever you want. It's valuable to recognize of course that you're trading off good will from your users with whatever technical improvement is getting made. Sometimes it's appropriate and inevitable (e.g. old behavior is just wrong or harmful and better to cut off). In the vast majority of cases though it's better to just have a better process in place to manage this with minimal disruption, identifying and communicating with broken users, and only then making that change.
Look. Even vcpkg broke which is a Microsoft product. I agree that there can be a continuum some times, but can we agree that this specific instance isn't anything like that? Even without vcpkg, the list of things impacted are anything that depends on Bazel, homebrew, conan, etc. The blast radius is quite wide regardless of documentation.
Aint nobody give a shit about you if you aren't bringing five or six figures as customer. Nobody is stopping rewrite that happened to break undocumented stuff you relied on if you $10/mo.
This case is different as breakage probably affected github/microsoft themselves
You just described >90% of users. Everyone does this for something, most people do it for most things.
You minimally read the docs, get something working and then leave it alone. Of course you're going to be pissed off when an implicit assumption which has been stable for a long time is broken.
Turns out scripts contain download an archive from github and check against a hardcoded checksum copy&pasted into that script. All of those broke. None of the authors will have looked up exactly how github had calculated said checksum.
Worse, you can't expect other people to host your data for free, forever. If you want your data distributed, you need to check first if the platform is suitable for your purposes.
If your product supports some particular behavior, it will be used regardless of what you document.
Microsoft was once renown for bug-compatibility so as not to break their users. The new wave of movers and breakers would forget that wisdom at their peril.
This has nothing to do with free vs. paid? The question is whether giving someone 99 of the same fish entitles them to expect the 100th one you throw in to be the same kind of fish, whether they paid for it or not.
This. You have to draw the line somewhere. Was this specific choice that line? Maybe not, but sometimes users aren’t right and changes just need to occur to ensure other asks from the same users can be delivered.
This isn't even a case of "we didn't documented this".
I know that the Bazel team reached out to GitHub in the past to get a confirmation that this behaviour could be relied on, and only after that was confirmed did they set that as recommendation across their ecosystem.
This is especially true of something like a git SHA, which is drilled into your head as THE stable hash of your code and git tree at a certain state. It should be expected that lots of tools use it as an identifier -- heck, I've done so myself to confirm which version of a piece of software is deployed on a particular machine, etc.
Yes, but not in this bug. I guess lots of people missed that distinction: The stable git SHA hash is the commit hash, which is an hash over gits internal representation of the commit object (containing a tree of all file hashes, and parents' hashes).
The hash that pops out of 'git archive' has nothing whatsoever to do with the commit hash and was historically stable more or less by accident: git feeds all files to 'tar' in tree order (which is fixed) and (unless you specify otherwise) always uses gzip with the same options. Since they no longer use gzip but an internal call to zlib, compression output will look different but will still contain the same tar inside.
That people have relied on this archive hash being stable is an indication of a major problem imho, because it might mean that people in their heads project integrity guarantees from the commit hash (which has such guarantees) onto the archive hash (which doesn't have those guarantees). I would suggest randomizing the archive hash on purpose by introducing randomness somewhere, so that people no longer rely on it.
The people that this broke weren't directly depending on the output of git archive being stable, but were assuming that the response data for a particular URL would stay constant. Maybe not a great idea either but not entirely unreasonable IMO.
That people use it comes from how releases were usually published (independent of any version control system) as tgz/zip archives on some project website or ftp server. Websites and ftp servers were often mirrored to e.g. ISP or university mirrors because bandwith was scarce and CDNs were expensive/absent. To make sure that your release download from the university of somestrangeplace ftp matches the official release, you would compare the archive hash from the official project website with the hash of the archive you downloaded (bonus points for a GPG signature on the archive).
This then got automated by build/install/package tools to check the package downloaded from some mirror against the hash from the package description. Then GitHub happened, where GitHub replaced the mirror servers, serving autogenerated 'git archive' output instead of static files. And thats where things went wrong here...
It's Microsoft. Just as the Apple of today is not the Apple of ten years ago, the GitHub today is not the GitHub of ten years ago. It's literally different people.
The people who made the things you love have mostly moved on, and the brand is being run by different people with different values now.
There's a little bit of an argument that such things are a bait-and-switch, but such is the nature of a large and multigenerational corporation.
The Microsoft of today isn't the Microsoft of 10 years ago, either, but that doesn't stop anyone from assuming that today's Microsoft is the same as the Microsoft of 10 years ago.
the logic people use to blame Microsoft is intense, man. literally any logical leap is valid except one that absolves Microsoft of anything, no matter how small.
the number of times the Microsoft-haters are just straight-up factually wrong in their justifications for their complaints is way too high for me to trust them ever again in my life.
[1] Apparently googlesource did do this and just had people shift to using GitHub mirrors to avoid this problem.