Hacker Newsnew | past | comments | ask | show | jobs | submit | perbu's commentslogin

Software I built will have the following ingredients.

source from git ~30 go packages ~150 npm packages ~A three layered docker image


Want to verify the installed package, the package should provide checksums you can verify. AFAIK, the SBOM is to documents the build, not the install.


The checksum just tells you what the hash is, nothing more. Supply chain attacks aren't always against the main executable either. With authenticode, the "catalog" can be signed. You're even more opposite of OP than I (OP proposes lockfiles which are at runtime).

It shouldn't be for "just" any state of the software. We should be able to verify SBOM and take actions at any point. At build time, it is only useful for the developer, I don't get why SBOM is relevant at all. I think you mean at deployment time (when someone installs it - they check SBOM). What I'm saying is, when you fetch the software (download, package manager, appstore,curl|sh), when you "install" it, when you run it, and when it is dormant and unused. At all of those times, SBOM should be checkable. Hashes are useless unless you want people to collect hashes for every executable constantly, including things like software updates.

The problem is, people are looking at it only from their own perspective. People interested in audits and compliance don't care about runtime policy enforcement. People worried about software supplychain compromises, care more about immediate auditability of their environment and ability to take actions.

The recent Shai-Hulud node worm is a good example. Even the best sources were telling people to check specific files at specific locations. There was just one post I found on github issues where someone was suggesting checking the node package cache. Ideally, we would be able to allow-list even js files based on real-time SBOM driven policies. We should be able to easily say "if the software version is published by $developer between dates $start and $end it is disallowed".


I still don't see how lockfiles can't be SBOM.

They contain for each dependency name, version, (derivable) URL and integrity checksum, plus of course the intra-dependency relationships.

This can all be verified at any point in the lifecycle without running any of the code, provided a network connection and/or the module cache. What's missing?

> With authenticode, the "catalog" can be signed

You could trivially sign any lockfile, though I've never seen it. I think it could be neat and it might have a chance to catch on if there was more support in tooling for it. The NPM registry does support ECDSA package sigs but I guess signatures for this use should be distributed on other channels given how much of an antipattern uploading lockfiles to registry is considered in the npm community and that's an uphill. In the context of SBOMs I guess there's already a slot for it?


I don't think you've addressed the requirement of having to execute the software, that was my main objection.

Another matter is that most software I know of doesn't even use lock files. Furthermore, there are lots and lots of software that would need to be updated to support your scheme, but updating them just isn't practical. It would have to be relegated to the type of software that gets regularly updated and its authors care about this stuff. I mean, we can't even get proper software authors to host a security.txt on their website reliably. It needs to work for "old" software, and "new" software would need to spend time and effort implementing this scheme. How can we get people that won't even sign their executable to sign a lock file and participate in the verification process?


> I don't think you've addressed the requirement of having to execute the software, that was my main objection.

I believe I did:

> This can all be verified at any point in the lifecycle without running any of the code, provided a network connection and/or the module cache.

It does not require a JS runtime[0] - you fetch a tarball and check its integrity. You can extract it and validate the integrity of a module cache or (non-minified) distribution.

> Another matter is that most software I know of doesn't even use lock files.

I don't believe the goal should be to lower the bar until "most software I know" pass. And you don't need all the libraries you depend on to ship lockfiles/SBOMs themselves as long as you take ownership of it wrap it up in your own builds and installations, right? Besides, lockfiles are definitely the norm in js/npm land these days from what I see...

[0]: If you have a dependency with a lifecycle script which at runtime say downloads, builds and installs new components into the module tree then all bets are off. If you are doing SBOMs for anything more than theatrical checkbox compliance, such (usage of) dependencies should already have been yeeted before you got here and if not, well, I guess you have work to do. If you get to this point I'd say the process is serving its purpose in forcing you to face these.


I concede on all but the last point. For that, I think you're taking a very language or platform specific perspective. And I think I myself am highly biased by security incidents. To give examples:

1) The Notepad++ compromise is one, lots of people install it and don't even have auto-update

2) There has been lots of state-sponsored attacks in recent years that abuse software specific to a country, for example "HWP" against south korean users; sometimes this involves code-signing cert theft

3) Things like log4j have traumatized the industry badly, how do I know what software is using log4j, or some other highly depended-upon software under $randomlang

4) It's very important to detect when someone is using some weird/unusual usage of a popular software, for example things like node, nginx, docker, k8s running on windows 10/11.

I admit I too am biased, but that's my point, we need a solution that works for the messy world out there today, not an ideal world some day. Getting people to use it is like 90% of the problem, the technical part isn't a blocker. I don't care if it's a lockfile, an xml catalog, yaml, etc... can it get standardized and widely used in practice? Can it solve the problems we're all facing in this area? That's why "most software I know" is a very important requirement.

The problem at the end of the day is malicious actors abusing software, so they sort of set the requirements.


Ah, but there are actually different types of SBOMs, that describe the software in different parts of its lifecycle. It's a completely different outcome to record the software when looking at its source, at what is being distributed, or at what is being installed, for example.

At some point we realized that we were talking across each other, since everyone was using "SBOM" to describe different contents and use cases.

The consensus was expressed around 3 years ago, and published in https://www.cisa.gov/sites/default/files/2023-04/sbom-types-...


I haven't had a chance to read that, but do you think it would be impractical to have the different types of SBOMs declared in a standardized format? My impression is that no matter what, authenticity needs to be established, so it will always fall under "cryptographic verification of information about software", it is the standardization of that which I have an issue with.


All types of SBOMs can be described in the same standardized format. SPDX 3.0 has a specific property and a set of values this one can take: https://spdx.github.io/spdx-spec/v3.0.1/model/Software/Vocab...

The digital signing of SBOM artifacts, so that one can verify authorship and authenticity, is something external to the SBOM data, on top of them.

If you are asking about a standardized way to check these, across all computing environments, I think this is a tall order. There are obviously environments currently where this check is present, and there are environments where this is rigorously enforced: software will not load and execute unless it's signed by a specific key and the signature is valid. But the environments are so diverse, I doubt a single verification process is possible.


Yes, TLS for example uses X.509, as do lots of things. The container format, as well as the data-structure. I'm saying not just for SBOM, but for the code-signing cert aspect as well. I wouldn't mind if there was an "SBOM" usage in X.509, and CA's sell SBOM signing certs or whatever, but the sad fact is, I think some mobile platforms, macos and windows are the only place this is used.

We need for data-at-rest, what TLS has been for data-in-motion.


I have the same and I'm very happy with UX, but less happy about the key leaving the machine.


The reason this makes sense, at least for Github, is because the only valid reason to run your own action runners is compliance. And if you are doing it for compliance, price doesn't really matter. You don't really have a choice.

If you've been running your runners on your own infra for cost reasons, you're not really that interesting to the Github business.


Github runners are slow. We're using WarpBuild and they are still cheaper per-minute, even with all the changes Github has made. Then there's the fact that the machines are faster, so we are using fewer minutes.

There are multiple competitors in this space. If you are (or were) paying for Github runners for any reason, you really shouldn't be.


Thanks for the WarpBuild love!

Performance is the primary lever to pay less $0.002/min self hosting tax and we strive to provide the best performance runners.


We also use WarpBuild and very happy with the performance gain. This changes nothing except maybe it should signal to WarpBuild to start supporting other providers than Github. We are clearly entering the enshitiffication phase of Github.


thanks for the love! we are actively considering supporting other providers.


I needed arm64 workers, because x86 would take ~25 minutes to do a build.


if it's useful, they do actually have arm workers now for linux and mac: https://github.com/actions/runner-images/tree/main?tab=readm...


TIL amd64 is also called x86-64.


They have these now.


Only for public repos though - if you're in an org with private repositories you don't get access to them (yet).


You do, you just have to set them up at the organization level. Windows/Linux/macOS are all available.


Performance and data locality.

You can throw tons of cores and ram locally at problems without any licensing costs.

Your data may be local, makes sense to work with it locally.


Maybe if everything you use is public-cloud-deployed.

Self-hosted runners help bridge the gap with on-prem servers, since you can pop a runner VM inside your infra and give it the connectivity/permissions to do deployments.

This announcement pisses me off, because it's not something related to abuse/recouping cost, since they could impose limits on free plans or whatever.

This will definitely influence me to ensure all builds/deployments are fully bash/powershell scripted without GH Action-specific steps. Actions are a bit of a dumpster fire anyway, so maybe I'll just go back to TeamCity like I used before Actions.


Not just compliance, we run CI against machines that they don’t offer, like those with big GPUs.


JFS-CPP bans exceptions because you would lose control over the execution of the problem. The HFT crowds didn't like it because you'd add 10ns to a function call.

At least before we had zero-cost exceptions. These days, I suspect the HFT crowd is back to counting microseconds or milliseconds as trades are being done smarter, not faster.


Last I checked the user guide to the API was 3500 pages.

3500 pages to describe upload and download, basically. That is pretty strange in my book.


Even download and upload get tricky if you consider stuff like serving buckets like static sites, or stuff like siged upload URLs.

Now with the trivial part off the table, let's consder storage classes, security and ACLs, lifecycle management, events, etc.


You are not a sovereign wealth fund representing a whole country, though.


random reads are ok. writes through a mmap are a disaster.


Only if you are doing in-place updates. If append-only datastores are your jam, writes via mmap are Just Fine:

  $ go test -v
  === RUN   TestChunkOps
      chunk_test.go:26: Checking basic persistence and Store expansion.
      chunk_test.go:74: Checking close and reopen read-only
      chunk_test.go:106: Checking that readonly blocks write ops
      chunk_test.go:116: Checking Clear
      chunk_test.go:175: Checking interrupted write
  --- PASS: TestChunkOps (0.06s)
  === RUN   TestEncWriteSpeed
      chunk_test.go:246: Wrote 1443 MB/s
      chunk_test.go:264: Read 5525.418751 MB/s
  --- PASS: TestEncWriteSpeed (1.42s)
  === RUN   TestPlaintextWriteSpeed
      chunk_test.go:301: Wrote 1693 MB/s
      chunk_test.go:319: Read 10528.744206 MB/s
  --- PASS: TestPlaintextWriteSpeed (1.36s)
  PASS


Interesting. We use it to mmap a big file that just-enough fits in memory and we mostly read randomly (around 1/1000 of the file) and sometimes sparringly write and it works great. I haven't tested how would it work more fast with Seek/Read/Write, but mmap is very ergonomic for this; it just acts as a slice and it's basically invisible; and you can then take the slice of bytes and unsafe.cast it as slice of something else. Very easy.


We need to support over 10M files in each folder. JSON wouldn't fare well as the lack of indices makes random access problematic. Composing a JSON file with many objects is, at least with the current JSON implementation, not feasible.

CDB is only a transport medium. The data originates in PostgreSQL and upon request, stored in CDB and transferred. Writing/freezing to CDB is faster than encoding JSON.

CDB also makes it possible to access it directly, with ranged HTTP requests. It isn't something I've implemented, but having the option to do so is nice.


> CDB is only a transport medium. The data originates in PostgreSQL and upon request, stored in CDB and transferred. Writing/freezing to CDB is faster than encoding JSON.

Might have been interesting to actually include this in the article, do you not think so? ;-)

The way the article is written, made it seen that you used cdb on edge nodes to store metadata. With no information as to what your storing / access, how, why ... This is part of the reason we have these discussions here.


The post is about mmap and my somewhat successful use of it. If I've described my whole stack it would have been a small thesis and not really interesting.


This is amazingly good feedback. I hadn't thought of that at all. It is so much harder to reason about the Go runtime as opposed to a threaded application.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: