The checksum just tells you what the hash is, nothing more. Supply chain attacks aren't always against the main executable either. With authenticode, the "catalog" can be signed. You're even more opposite of OP than I (OP proposes lockfiles which are at runtime).
It shouldn't be for "just" any state of the software. We should be able to verify SBOM and take actions at any point. At build time, it is only useful for the developer, I don't get why SBOM is relevant at all. I think you mean at deployment time (when someone installs it - they check SBOM). What I'm saying is, when you fetch the software (download, package manager, appstore,curl|sh), when you "install" it, when you run it, and when it is dormant and unused. At all of those times, SBOM should be checkable. Hashes are useless unless you want people to collect hashes for every executable constantly, including things like software updates.
The problem is, people are looking at it only from their own perspective. People interested in audits and compliance don't care about runtime policy enforcement. People worried about software supplychain compromises, care more about immediate auditability of their environment and ability to take actions.
The recent Shai-Hulud node worm is a good example. Even the best sources were telling people to check specific files at specific locations. There was just one post I found on github issues where someone was suggesting checking the node package cache. Ideally, we would be able to allow-list even js files based on real-time SBOM driven policies. We should be able to easily say "if the software version is published by $developer between dates $start and $end it is disallowed".
They contain for each dependency name, version, (derivable) URL and integrity checksum, plus of course the intra-dependency relationships.
This can all be verified at any point in the lifecycle without running any of the code, provided a network connection and/or the module cache. What's missing?
> With authenticode, the "catalog" can be signed
You could trivially sign any lockfile, though I've never seen it. I think it could be neat and it might have a chance to catch on if there was more support in tooling for it. The NPM registry does support ECDSA package sigs but I guess signatures for this use should be distributed on other channels given how much of an antipattern uploading lockfiles to registry is considered in the npm community and that's an uphill. In the context of SBOMs I guess there's already a slot for it?
I don't think you've addressed the requirement of having to execute the software, that was my main objection.
Another matter is that most software I know of doesn't even use lock files. Furthermore, there are lots and lots of software that would need to be updated to support your scheme, but updating them just isn't practical. It would have to be relegated to the type of software that gets regularly updated and its authors care about this stuff. I mean, we can't even get proper software authors to host a security.txt on their website reliably. It needs to work for "old" software, and "new" software would need to spend time and effort implementing this scheme. How can we get people that won't even sign their executable to sign a lock file and participate in the verification process?
> I don't think you've addressed the requirement of having to execute the software, that was my main objection.
I believe I did:
> This can all be verified at any point in the lifecycle without running any of the code, provided a network connection and/or the module cache.
It does not require a JS runtime[0] - you fetch a tarball and check its integrity. You can extract it and validate the integrity of a module cache or (non-minified) distribution.
> Another matter is that most software I know of doesn't even use lock files.
I don't believe the goal should be to lower the bar until "most software I know" pass. And you don't need all the libraries you depend on to ship lockfiles/SBOMs themselves as long as you take ownership of it wrap it up in your own builds and installations, right? Besides, lockfiles are definitely the norm in js/npm land these days from what I see...
[0]: If you have a dependency with a lifecycle script which at runtime say downloads, builds and installs new components into the module tree then all bets are off. If you are doing SBOMs for anything more than theatrical checkbox compliance, such (usage of) dependencies should already have been yeeted before you got here and if not, well, I guess you have work to do. If you get to this point I'd say the process is serving its purpose in forcing you to face these.
I concede on all but the last point. For that, I think you're taking a very language or platform specific perspective. And I think I myself am highly biased by security incidents. To give examples:
1) The Notepad++ compromise is one, lots of people install it and don't even have auto-update
2) There has been lots of state-sponsored attacks in recent years that abuse software specific to a country, for example "HWP" against south korean users; sometimes this involves code-signing cert theft
3) Things like log4j have traumatized the industry badly, how do I know what software is using log4j, or some other highly depended-upon software under $randomlang
4) It's very important to detect when someone is using some weird/unusual usage of a popular software, for example things like node, nginx, docker, k8s running on windows 10/11.
I admit I too am biased, but that's my point, we need a solution that works for the messy world out there today, not an ideal world some day. Getting people to use it is like 90% of the problem, the technical part isn't a blocker. I don't care if it's a lockfile, an xml catalog, yaml, etc... can it get standardized and widely used in practice? Can it solve the problems we're all facing in this area? That's why "most software I know" is a very important requirement.
The problem at the end of the day is malicious actors abusing software, so they sort of set the requirements.
Ah, but there are actually different types of SBOMs, that describe the software in different parts of its lifecycle. It's a completely different outcome to record the software when looking at its source, at what is being distributed, or at what is being installed, for example.
At some point we realized that we were talking across each other, since everyone was using "SBOM" to describe different contents and use cases.
I haven't had a chance to read that, but do you think it would be impractical to have the different types of SBOMs declared in a standardized format? My impression is that no matter what, authenticity needs to be established, so it will always fall under "cryptographic verification of information about software", it is the standardization of that which I have an issue with.
The digital signing of SBOM artifacts, so that one can verify authorship and authenticity, is something external to the SBOM data, on top of them.
If you are asking about a standardized way to check these, across all computing environments, I think this is a tall order. There are obviously environments currently where this check is present, and there are environments where this is rigorously enforced: software will not load and execute unless it's signed by a specific key and the signature is valid. But the environments are so diverse, I doubt a single verification process is possible.
Yes, TLS for example uses X.509, as do lots of things. The container format, as well as the data-structure. I'm saying not just for SBOM, but for the code-signing cert aspect as well. I wouldn't mind if there was an "SBOM" usage in X.509, and CA's sell SBOM signing certs or whatever, but the sad fact is, I think some mobile platforms, macos and windows are the only place this is used.
We need for data-at-rest, what TLS has been for data-in-motion.
The reason this makes sense, at least for Github, is because the only valid reason to run your own action runners is compliance. And if you are doing it for compliance, price doesn't really matter. You don't really have a choice.
If you've been running your runners on your own infra for cost reasons, you're not really that interesting to the Github business.
Github runners are slow. We're using WarpBuild and they are still cheaper per-minute, even with all the changes Github has made. Then there's the fact that the machines are faster, so we are using fewer minutes.
There are multiple competitors in this space. If you are (or were) paying for Github runners for any reason, you really shouldn't be.
We also use WarpBuild and very happy with the performance gain. This changes nothing except maybe it should signal to WarpBuild to start supporting other providers than Github. We are clearly entering the enshitiffication phase of Github.
Maybe if everything you use is public-cloud-deployed.
Self-hosted runners help bridge the gap with on-prem servers, since you can pop a runner VM inside your infra and give it the connectivity/permissions to do deployments.
This announcement pisses me off, because it's not something related to abuse/recouping cost, since they could impose limits on free plans or whatever.
This will definitely influence me to ensure all builds/deployments are fully bash/powershell scripted without GH Action-specific steps. Actions are a bit of a dumpster fire anyway, so maybe I'll just go back to TeamCity like I used before Actions.
JFS-CPP bans exceptions because you would lose control over the execution of the problem. The HFT crowds didn't like it because you'd add 10ns to a function call.
At least before we had zero-cost exceptions. These days, I suspect the HFT crowd is back to counting microseconds or milliseconds as trades are being done smarter, not faster.
Interesting. We use it to mmap a big file that just-enough fits in memory and we mostly read randomly (around 1/1000 of the file) and sometimes sparringly write and it works great. I haven't tested how would it work more fast with Seek/Read/Write, but mmap is very ergonomic for this; it just acts as a slice and it's basically invisible; and you can then take the slice of bytes and unsafe.cast it as slice of something else. Very easy.
We need to support over 10M files in each folder. JSON wouldn't fare well as the lack of indices makes random access problematic. Composing a JSON file with many objects is, at least with the current JSON implementation, not feasible.
CDB is only a transport medium. The data originates in PostgreSQL and upon request, stored in CDB and transferred. Writing/freezing to CDB is faster than encoding JSON.
CDB also makes it possible to access it directly, with ranged HTTP requests. It isn't something I've implemented, but having the option to do so is nice.
> CDB is only a transport medium. The data originates in PostgreSQL and upon request, stored in CDB and transferred. Writing/freezing to CDB is faster than encoding JSON.
Might have been interesting to actually include this in the article, do you not think so? ;-)
The way the article is written, made it seen that you used cdb on edge nodes to store metadata. With no information as to what your storing / access, how, why ... This is part of the reason we have these discussions here.
The post is about mmap and my somewhat successful use of it. If I've described my whole stack it would have been a small thesis and not really interesting.
This is amazingly good feedback. I hadn't thought of that at all. It is so much harder to reason about the Go runtime as opposed to a threaded application.
source from git ~30 go packages ~150 npm packages ~A three layered docker image