rabexc23's comments

rabexc23 · on Jan 31, 2023

vendoring even with a tool has always worked poorly with me. Here are a few reasons:

1. You work in a company, you are in a team, you want some reasonable code review process in place. Now you want to check in a 3rd party dependency, "let's vendor it!" so you send out a PR with ... 10,000 - 100,000 lines of code. Your reviewer has no reasonable way to know if a) the dependency was downloaded from a reputable source, if b) the code was not modified maliciously, c) there was some local patch / local change either voluntarily added or accidentally added (maybe you tried running configure/make locally, and didn't realize that one .h file was generated from your machine. A diligent reviewer would have to re-download the source tarball from a reputable source (is there the url in the commit message? A README? better hope there is!), unpack it locally, generate the set of files and all hashes, compare with your PR. And ensure that the PR / vendored dependency comes with a README or METADATA file so the download URL and LICENSE is recorded for posterity.

2. Now you need to update the dependency. Either it's a new directory (so you vendor both versions), or you have to delete all files that are gone. The PR review will be worse, as it will show a diff, except the reviewer won't review it, except to repeat the steps in 1. Without considering patches applied in the mean time, as the code was simply checked in the repository, and anyone could easily change it.

3. For anything but small/tiny projects, the vendoring will take up most of the download / checkout time of your repository.

If you use git for vendoring, the problem is not significantly better: if you care about the integrity of the vendored code, you need to verify the final tree, or the log / hash / set of commits.

Compare to using a simple file with a 1) url, 2) secure hash, 3) list of patches to apply. Reviewing and ensuring correctness is trivial, upgrading is trivial, PRs are trivial.

To avoid problems like the github problem here, a simple proxy or local cache is enough, a tool that takes the hash (or a hash of a url) and reads it from disk, is good enough. And detects corruption.

GrumpySloth · on Jan 31, 2023

1. If you follow my link, you’ll see the process involves adding a README documenting git hashes, licence and changes made (I don't see what seems to you so arduous about it). You can checkout the git repo by hash and run a diff between the files added to the review and what you checked out. It’s also common practice to delete irrelevant files to make the review smaller.

2. Same as in 1.

3. It’s not an issue in my experience. A much bigger issue are large JSON files captured for snapshot testing or just big binary files. If your repo is so small that its deps are its majority, then it really shouldn’t take all that much time (or you use too many/too big deps, but I doubt you can beat Chromium which has Skia-sized deps).

> Compare to using a simple file with a 1) url, 2) secure hash, 3) list of patches to apply. Reviewing and ensuring correctness is trivial, upgrading is trivial, PRs are trivial.

Using a url doesn’t remove the need for reviewing the code of your dependencies. If you don’t, you’re essentially running “curl | sh” at scale. Checksums without code review don’t mean much.