By checking the hash of the extracted files. The hash of the archive is dependen...

catiopatio · on Jan 30, 2023

That’s expensive, complicated, exposes a greater attack surface, and requires new tooling to maintain considerably more complex metadata covering the full contents of source archives.

For the entire multi-decade history of open source, the norm has been — for very good reason — that source archives are immutable and will not change.

The solution here isn’t to change the entire open source ecosystem.

Denvercoder9 · on Jan 30, 2023

> For literally the entire multi-decade history of open source, the norm has been — for very good reason — that source archives are immutable and will not change.

Well, the norm has been that maintainers generated and distributed a source archive, and that archive being immutable. That workflow is still perfectly fine with GitHub and not impacted by this change.

The problem is that a bunch of maintainers stopped generating and distributing archives, and instead started relying on GitHub to automatically do that for them.

account42 · on Jan 31, 2023

> That workflow is still perfectly fine with GitHub

It would be perfectly fine if you could prevent GitHub from linking the autogenerated archives from the releases or at least distinguish them in a way that makes it clear that they are not immutable maintainer-generated archives.

ilyt · on Jan 31, 2023

The problem was people assuming github works like that - saves a archive of every commit, which is obviously silly if you think about it (why save it if you can regenerate it on a whim from any commit you want?)

jraph · on Jan 31, 2023

You are speaking about release archives. GitHub's "Download as zip" feature is not the same thing as this multi decade-history of open source thing you are talking about.

I always thought zip archives from this feature was generated on the fly, maybe cached, because I don't expect GitHub to store zip archive for every commit of every repository.

I'm actually surprised many important projects are relying on a stable output from this feature, and that this output was actually stable.

bentley · on Jan 30, 2023

Indeed. I remember when Canonical was heavily pushing bzr and others were big fans of Mercurial. Glad my package manager maintainers didn’t waste time writing infrastructure to handle those projects at the repository level. Nobody had to, because providing source tarballs was the norm.

shakow · on Jan 31, 2023

> That’s expensive, complicated,

That sounds like prejudice. Just as a test, I cloned the git repo, which took 29 seconds, then took its hash with `guix hash`, which took 0.387ms.

I think that if you can't handle a 0.4s delay in a build, you have problem problems.

bentley · on Jan 31, 2023

Package builders work on the scale of thousands of packages. The increased time and CPU usage multiplies greatly.

“Complicated” is indisputable. Cloning a repository is absolutely complicated. Fetching a single file over HTTPS is as simple as it gets, these days.

shakow · on Jan 31, 2023

And you really believe that downloading & extracting a source .tar.gz and compiling it will have a run time much shorter than 0.4s?

Just executing the ./configure will take more than that.

bentley · on Jan 31, 2023

> And you really believe …

Huh? What I fully believe is that downloading a source tarball over HTTPS, verifying its checksum, and extracting it will take less time than cloning the repository from Git, then verifying the checksum of all files—which you said would take 29 seconds plus 0.4s.

shakow · on Jan 31, 2023

My point is that either spending 0.08s computing the md5 of the zip (I just measured) or 0.3s computing the hash of the repo does not matter the slightest if you are managing software repos, as just extracting the source and preparing to build it will be an order of magnitude slower.