K8s absolutely reduced labor. I used to have a sysadmin who ensured all our AMI images were up to date and maintained, and who maintained a mountain of bespoke bash scripts to handle startup, teardown, and upgrade of our backeneds.
Enter K8s in 2017 and life became MUCH easier. I literally have clusters that have been running since then, with the underlying nodes patched and replaced automatically by the cloud vendor. Deployments also "JustWork", are no downtime, and nearly instant. How many sysadmins are needed (on my side) to achieve all of this, zero. Maybe you're thinking of more complex stateful cases like running DBs on K8s, but for the typical app server workload, it's a major win.
Fair point, but I think you’ve actually illustrated my argument perfectly: you didn’t eliminate the need for specialists, you outsourced them to your cloud vendor.
Those underlying nodes being “patched and replaced automatically” by AWS/GCP/Azure? That’s their SRE teams doing exactly the work your sysadmin used to do, just at massive scale. The control plane managing your deployments? Cloud vendor specialists built and maintain that.
And I’d wager you’ve still got people on staff doing operational work, they just don’t have “sysadmin” in their title anymore. Someone’s managing your K8s manifests, debugging why pods won’t schedule, fixing networking issues when services can’t communicate, handling secrets management, setting up monitoring and alerting. That work didn’t vanish, it just got rebranded. The “DevOps engineer” or “platform engineer” or “SRE” doing that is performing sysadmin work under a different job title.
Managed K8s can absolutely reduce operational overhead compared to hand-rolling everything. But that’s not democratisation, that’s a combination of outsourcing and rebranding. The expertise is still required, you’ve just shifted who pays for it and what you call the people doing it.
That one stumped me. Why not just encrypt with a hardcoded public key, then only the attacker can get the creds.
The simple B64 encoding didn't hide these creds from anyone, so every vendor out there's security team can collect them (e.g. thinking big clouds, GitHub, etc) and disable them.
If you did a simple encryption pass, no one but you would know what was stolen, or could abuse/sell it. My best guess is that calling node encryption libs might trigger code scanners, or EDRs, or maybe they just didn't care.
They surely seemed to be smart enough to choose encryption over encoding.
Hard to believe encryption would be the one thing that would trigger code scanners.
Also it’s not just every vendor, also every bad actor could’ve scraped the keys. I wonder if they’ve set up the infrastructure to handle all these thousands of keys…
Like what do you even do with most of it on scale?
Can you turn Cloud, AWS , AI api keys to money on a black market?
ProTip: use PNPM, not NPM.
PNPM 10.x shutdown a lot of these attack vectors.
1. Does not default to running post-install scripts (must manually approve each)
2. Let's you set a min age for new releases before `pnpm install` will pull them in - e.g. 4 days - so publishers have time to cleanup.
NPM is too insecure for production CLI usage.
And of course make a very limited scope publisher key, bind it to specific packages (e.g. workflow A can only publish pkg A), and IP bound it to your self hosted CI/CD runners. No one should have publish keys on their local, and even if they got the publish keys, they couldn't publish from local.
(Granted, GHA fans can use OIDC Trusted Publishers as well, but tokens done well are just as secure)
Npm is what happens when you let tech debt stack up for years too far. It took them five attempts to get lock files to actually behave the way lock files are supposed to behave (lockfile version 3, + at least 2 unversioned attempts before that).
It’s clear from the structure and commit history they’ve been working their asses off to make it better, but when you’re standing at the bottom of a well of suck it takes that much work just to see daylight.
The last time I chimed in on this I hypothesized that there must have been a change in management on the npm team but someone countered that several of the maintainers were the originals. So I’m not sure what sort of Come to Jesus they had to realize their giant pile of sins needed some redemption but they’re trying. There’s just too much stupid there to make it easy.
I’m pretty sure it still cannot detect premature EOF during the file transfer. It keeps the incomplete file in the cache where the sha hash fails until you wipe your entire cache. Which means people with shit internet connections and large projects basically waste hours several times a week doing updates that fail.
> I’m not sure what sort of Come to Jesus they had to realize their giant pile of sins needed some redemption but they’re trying.
If they were trying, they'd stop doubling down on sunk costs and instead publicly concede that lock files and how npm-the-tool uses them to attempt to ensure the integrity of packages fetched from npm-the-registry is just a poor substitute for content-based addressing that ye olde DVCS would otherwise be doing when told to fetch designated shared objects from the code repo—to be accompanied by a formal deprecation of npm-install for use in build pipelines, i.e. all the associated user guides and documentation and everything else pushing it as best practice.
npm-install has exactly one good use case: probing the registry to look up a package by name to be fetched by the author (not collaborators or people downstream who are repackaging e.g. for a particular distribution) at the time of development (i.e. neither run time nor build time but at the time that author is introducing the dependency into their codebase). Every aspect of version control should otherwise be left up to the underlying SCM/VCS.
> cannot detect premature EOF during the file transfer. It keeps the incomplete file in the cache where the sha hash fails until you wipe your entire cache.
I wonder what circumstances led to saying “this is okay we’ll ship it like that”
I think we can blame the IO streaming API in NodeJS on this. It’s a callback and you just know you got another block. My guess is chunked mode and not checking whether the bytes expected and the bytes received matched.
Not to diminish the facepalm but the blame can be at least partially shared.
Our UI lead was getting the worst of this during Covid. I set up an nginx forward proxy mostly for him to tone this down a notch (fixed a separate issue but helped here a bit as well) so he could get work done on his shitty ISP.
True, and things that manifest only on old/slow hardware or on bad internet are the worst kind for this, since 100% of developers who have any say in the matter would never accept such circumstances at all, so they’re always approaching every issue with multi-gigabit speeds, zero latency, and this year’s $3,000 Mac. “What do you mean the page loads slowly?”
A customer of mine has the implementation of several API endpoints starting with a simulate_slow_connection method that basically sleeps for a random amount of ms. I think it sleeps 0 ms when running tests and definitely sleeps 0 ms in production. So it's never super fast even on $3,000 Macs.
but this stuff is basically solved. We have enough history with languages and distribution of packages, repositories, linux, public trust, signing, maintainers, etc.
One key shift is there is no packager anymore. Its just - trust the publisher.
Any language as big as Node should hire a handful of old unix wizards to teach them the way the truth and the life.
Likely they wouldn’t listen. Modern languages and environments seem intent on reinventing bad solutions to solved problems. I get it if it’s a bunch of kids that have never seen anything better but there is no excuse these days not to have at least a passing knowledge of older systems if you’ve been around a while.
there's certainly a piece of it. Also most seasoned people are not very interested in new languages and environments, and most languages are not 'spec built' by experts like Rob Pike building Go who explicitly set out to solve a lot of his problems, but are more naturally grown and born.
> And of course make a very limited scope publisher key, bind it to specific packages (e.g. workflow A can only publish pkg A), and IP bound it to your self hosted CI/CD runners. No one should have publish keys on their local, and even if they got the publish keys, they couldn't publish from local.
I've by now grown to like Hashicorp Vaults/OpenBao's dynamic secret management for this. It's a bit complicated to understand and get to work at first, but it's powerful:
You mirror/model the lifetime of a secret user as a lease. For example, a nomad allocation/kubernetes pod gets a lease when it is started and the lease gets revoked immediately after it is stopped. We're kinda discussing if we could have this in CI as well - create a lease for a build, destroy the lease once the build is over. This also supports ttl, ttl-refreshes and enforced max-ttls for leases.
With that in place, you can tie dynamically issued secrets to this lease and the secrets are revoked as soon as the lease is terminated or expires. This has confused developers with questionable practices a lot. You can print database credentials in your production job, run that into a local database client, but as soon as you deploy a new version, those secrets are deleted. It also gives you automated, forced database credential rotation for free through the max_ttl, including a full audit log of all credential accesses and refreshes.
I know that would be a lot of infrastructure for a FOSS project by Bob from Novi Zagreb. But with some plugin-work, for a company, it should be possible to hide long-term access credentials in Vault and supply CI builds with dropped, enforced, short-lived tokens only.
As much as I hate running after these attacks, they are spurring interesting security discussions at work, which can create actual security -- not just checkbox-theatre.
I would love to use this (for homelab stuff currently) but I would love a way to have vault/openbao be fully configuration-as-code and version controlled, and only have the actual secret values (those that would not be dynamic) in persistent storage.
Or just 'npm ci' so you install exactly what's in your package-lock.json instead of the latest version bumps of those packages. This "automatic updating" is a big factor in why these attacks are working in the first place. Make package updating deliberate instead of instant or on an arbitrary lag.
> Does not default to running post-install scripts (must manually approve each)
To get equivalent protection, use `--only-binary=:all:` when running `pip install` (or `uv pip install`). This prevents installing source distributions entirely, using exclusively pre-built wheels. (Note that this may limit version ability or even make your installation impossible.) Python source packages are built by following instructions provided with the package (specifying a build system which may then in turn be configured in an idiosyncratic way; the default Setuptools is configured using a Python script). As such, they effectively run a post-install script.
(For PAPER, long-term I intend to design a radically different UI, where you can choose a named "source" for each package or use the default; and sources are described in config files that explain the entire strategy for whether to use source packages, which indexes to check etc.)
> Let's you set a min age for new releases before `pnpm install` will pull them in - e.g. 4 days - so publishers have time to cleanup.
Pip does not support this; with uv, use `--exclude-newer`. This appears to require a timestamp; so if you always want things up to X days old you'll have to recalculate.
> Pip does not support this; with uv, use `--exclude-newer`. This appears to require a timestamp; so if you always want things up to X days old you'll have to recalculate.
I do this by having my shell init do this:
export UV_EXCLUDE_NEWER=$(date -Iu -d "14 days ago")
That’s easy to override if you need to but otherwise seamless.
FWIW, I'd like if these tools had an option to prefer the oldest version satisfying the given constraints (rather than the newest, as it is now — probably still a better default).
There were some recent posts I saw about "dependency cooldowns", which seem to be what you're referring to in item 2. The idea really resonated with me.
That said, I hard pin all our dependencies and get dependabot alerts and then look into updates manually. Not sure if I'm a rube or if that's good practice.
That's good practice. God knows how many times I've been bitten by npm packages breaking on minor or even patch version changes, even when proudly proclaiming to use semver
I'm struggling to understand why Trusted Publishers is any better.
Let's say you have a limited life, package specific scoped, IP CIDR bound publishing key, running on a private GH workflow runner. That key only exists in a trusted clouds secret store (e.g. no one will have access it from their laptop).
Now let's say you're a "trusted" publisher, running on a specific GitHub workflow, and GitHub Org, that has been configured with OIDC on the NPM side. By virtue of simply existing in that workflow, you're now a NPM publisher (run any publish commands you like). No need to have a secret passed into your workflow scope.
If someone is taking over GitHub CI/CD workflows by running `npm i` at the start of their workflow, how does the "Trusted Publisher" find themselves any more secure than the secure, very limited scope token?
Good point, but until many popular packages stop requiring install.sh to operate, you'll still need to allowlist some of them. That is built into the PNPM tooling, luckily :)
Reading through the post it looks like this infects via preinstall?
> The new versions of these packages published to the NPM registry falsely purported to introduce the Bun runtime, adding the script preinstall: node setup_bun.js along with an obfuscated bun_environment.js file.
Pnpm cannot be built without an existing pnpm binary meaning there is no way to bootstrap it from audited source code. Perfect trusting trust attack situation.
Full source bootstrapped NPM with manually reviewed dependencies is the only reasonably secure way to use NodeJS right now.
Bun disables post-install scripts by default and one can explicitly opt-in to trusting dependencies in the package.json file. One can also delay installing updated dependencies through keys like `minimumReleaseAge`. Bun is a drop-in replacement for the npm CLI and, unlike pnpm, has goals beyond performance and storage efficiency.
yes bun does both of the things mentioned in the parent comment:
> Unlike other npm clients, Bun does not execute arbitrary lifecycle scripts like postinstall for installed dependencies. Executing arbitrary scripts represents a potential security risk.
> To protect against supply chain attacks where malicious packages are quickly published, you can configure a minimum age requirement for npm packages. Package versions published more recently than the specified threshold (in seconds) will be filtered out during installation.
As far as I can understand from the documentation, that doesn't actually specify in that config that one of them is required, does it? That is, if they _all_ fail to install as far as the system is concerned there's nothing wrong? There will be runtime errors of course, but that's sort of disappointing…
Do keep in mind that the binaries are still binaries. Even if your installation process doesn't run any untrusted code from the package, you can't audit the binaries like you might the .js files prior to first run.
NPM was never "too insecure" and remains not "too insecure" today.
This is not an issue with npm, JavaScript, NodeJS, the NodeJS foundation or anything else but the consumer of these libraries pulling in code from 3rd parties and pushing it to production environments without a single review. How this still fly today, and have been since the inception of public "easy to publish" repositories remains a mystery to me even today.
If you're maintaining a platform like Zapier, which gets hacked because none of your software engineers actually review the code that ends up in your production environment (yes, that includes 3rd party dependencies, no matter where they come from), I'm not sure you even have any business writing software.
The internet been a hostile place for so long, that most of us "web masters" are used to it today. Yet it seems developers of all ages fall into the "what's the worst that can happen?" trap when pulling in either one dependency with 10K LoC without any review, or 1000s of dependencies with 10 lines each.
Until you fix your processes and workflows, this will continue to happen, even if you use pnpm. You NEED to be responsible for the code you ship, regardless of who wrote it.
They didn't deploy the code. That's not how this exploit works. They _downloaded_ the code to their machine. And npm's behavior is to implicitly run arbitrary code as part of the download - including, in this case, a script to harvest credentials and propagate the worm. That part has everything to do with npm behavior and nothing to do with how much anybody reviewed 3P deps. For all we know they downloaded the new version of the affected package to review it!
If people stop running install scripts, isn't Shai-Hulud 3: Electric Boogaloo just going to be designed to run its obfuscated malware at runtime rather than install time? Who manually reviews new versions of their project dependencies after installing them but before running them?
GP is correct. This is a workflow issue. Without a review process for dependencies, literally every package manager I know of is vulnerable to this. (Yes, even Maven.)
> If people stop running install scripts, isn't Shai-Hulud 3: Electric Boogaloo just going to be designed to run its obfuscated malware at runtime rather than install time?
Many such forms of malware have already been published and detected.
> Who manually reviews new versions of their project dependencies after installing them but before running them?
One person putting in this effort can protect everyone thereafter.
The PyPI website has a "Report project as malware" button on each project page for this purpose.
But yes, this is the world we live in. Without this particular form of insecurity, there is no "ecosystem" at all.
wait, I short-circuited here. wasn't the very concept of "libraries" created to *not* have to think about what exactly the code does?
imagine reviewing every React update. yes, some do that (Obsidian claims to review every dependency, whether new or an update), but that's due to flaws of the ecosystem.
take a look at Maven Central. it's harder to get into, but that's the price of security. you have to verify the namespace so that no one will publish under e.g. `io.gitlab.bpavuk.` namespace unless they have access to the `bpavuk` GitLab group or user, or `org.jetbrains.` unless they prove the ownership of the jetbrains.com domain.
Go is also nice in that regard - you are depending on Git repositories directly, so you have to hijack into the Git repo permissions and spoil the source code there.
> Go is also nice in that regard - you are depending on Git repositories directly, so you have to hijack into the Git repo permissions and spoil the source code there.
That in itself is scary because Git refs are mutable. Even with compromised credentials, no one can replace artifacts already deployed to Maven Central, because they simply don't allow it. There is nothing stopping someone from replacing a Git tag with one that points to compromised code.
The surface area is smaller because Go does locking via go.sum, but I could certainly see a tired developer regenerating it over the most strenuous of on-screen objections from the go CLI.
I do `cargo vendor` sometimes, but that's mostly to enable offline work and use the debugger inside some vague crates (Rust's libraries-but-not-really), and usually I gitignore the `vendor`'ed crates away.
> wasn't the very concept of "libraries" created to not have to think about what exactly the code does?
Let's say you need a FFT implementation. You can write that from scratch, or you can use a library. In both cases you should use tests to verify that the code calculates the FFT correctly, and in the library case you should read the code to make sure that it works correctly and does not omit and edge cases (e.g.).
> wasn't the very concept of "libraries" created to not have to think about what exactly the code does?
If you care about security, you only have to care once, during the audit. And you can find a pretty high percentage of malware in practice without actually having a detailed understanding of the non-malicious workings of the code.
Libraries allow you to not think about what the code does at development time, which in general is much more significant than audit time. Also, importantly, they allow you not to have to design and write that part of the code.
well, if you talk about private SSH keys for Git operations and SSH/GPG keys for signing, then you'd better set up a passphrase on them, which GitHub strongly recommends. the passphrase will make it significantly harder to use the keys. so, as usual, It Depends (c)
“Personally, I never wear a seatbelt because all drivers on the road should just follow the road rules instead and drive carefully.”
I don’t control all the drivers on the road, and a company can’t magically turn all employees into perfect developers. Get off your high horse and accept practical solutions.
> and a company can’t magically turn all employees into perfect developers
Sure, agree, that's why professionals have processes and workflows, everyone working together to build the greatest stuff you can.
But when not a single person in the entire company reviews the code that gets deployed and run by users, you have to start asking what kind of culture the company has, it's borderline irresponsible I'd say.
Or they have different priority structures where this isn’t the developers job. Also things get missed sometimes no matter how good your processes are. But improving those processes could involve something like avoiding npm over pnpm.
We're on Azure and they are worse in every aspect, bad deployment of services, and status pages that are more about PR than engineering.
At this point, is there any cloud provider that doesn't have these problems? (GCP is a non-starter because a false-positive YouTube TOS violation get you locked out of GCP[1]).
Are you warned about the risks in an active war one? Yes.
Does Google warn you about this when you sign up? No.
And PayPal having the same problem in no way identifies Google. It just means that PayPal has the same problem and they are also incompetent (and they also demonstrate their incompetence in many other ways).
> It just means that PayPal has the same problem and they are also incompetent
Do you consider regular brick-and-mortar savings banks to be incompetent when they freeze someone's personal account for receiving business amounts of money into it? Because they all do, every last one. Because, again, they expect you to open a business account if you're going to do business; and they look at anything resembling "business transactions" happening in a personal account through the lens of fraud rather than the lens of "I just didn't realize I should open a business account."
And nobody thinks this is odd, or out-of-the-ordinary.
Do you consider municipal governments to be incompetent when they tell people that they have to get their single-family dwelling rezoned as mixed-use, before they can conduct business out of it? Or for assuming that anyone who is conducting business (having a constant stream of visitors at all hours) out of a residentially-zoned property, is likely engaging in some kind of illegal business (drug sales, prostitution, etc) rather than just being a cafe who didn't realize you can't run a cafe on residential zoning?
If so, I don't think many people would agree with you. (Most would argue that municipal governments suppress real, good businesses by not issuing the required rezoning permits, but that's a separate issue.)
There being an automatic level of hair-trigger suspicion against you on the part of powerful bureaucracies — unless and until you proactively provide those bureaucracies enough information about yourself and your activities for the bureaucracies to form a mental model of your motivations that makes your actions predictable to them — is just part of living in a society.
Heck, it's just a part of dealing with people who don't know you. Anthropologists suggest that the whole reason we developed greeting gestures like shaking hands (esp. the full version where you pull each-other in and use your other arms to pat one-another on the back) is to force both parties to prove to the other that they're not holding a readied weapon behind their backs.
---
> Are you warned about the risks in an active war one? Yes. Does Google warn you about this when you sign up? No.
As a neutral third party to a conflict, do you expect the parties in the conflict to warn you about the risks upon attempting to step into the war zone? Do you expect them to put up the equivalent of police tape saying "war zone past this point, do not cross"?
This is not what happens. There is no such tape. The first warning you get from the belligerents themselves of getting near either side's trenches in an active war zone, is running face-first into the guarded outpost/checkpoint put there to prevent flanking/supply-chain attacks. And at that point, you're already in the "having to talk yourself out of being shot" point in the flowchart.
It has always been the expectation that civilian settlements outside of the conflict zone will act of their own volition to inform you of the danger, and stop you from going anywhere near the front lines of the conflict. By word-of-mouth; by media reporting in newspapers and on the radio; by municipal governments putting up barriers preventing civilians from even heading down roads that would lead to the war zone. Heck, if a conflict just started "up the road", and you're going that way while everyone's headed back the other way, you'll almost always eventually be flagged to pull over by some kind stranger who realizes you might not know, and so wants to warn you that the only thing you'll get by going that way is shot.
---
Of course, this is all just a metaphor; the "war" between infrastructure companies and malicious actors is not the same kind of hot war with two legible "sides." (To be pedantic, it's more like the "war" between an incumbent state and a constant stream of unaffiliated domestic terrorists, such as happens during the ongoing only-partially-successful suppression of a populist revolution.)
But the metaphor holds: just like it's not a military's job to teach you that military forces will suspect that you're a spy if you approach a war zone in plainclothes; and just like it's not a bank's job to teach you that banks will suspect that you're a money launderer if you start regularly receiving $100k deposits into your personal account; and just like it's not a city government's job to teach you that they'll suspect you're running a bordello out of your home if you have people visiting your residentially-zoned property 24hrs a day... it's not Google's job to teach you that the world is full of people that try to abuse Internet infrastructure to illegal ends for profit; and that they'll suspect you're one of those people, if you just show up with your personal Google account and start doing some of the things those people do.
Rather, in all of these cases, it is the job of the people who teach you about life — parents, teachers, business mentors, etc — to explain to you the dangers of living in society. Knowing to not use your personal account for business, is as much a component of "web safety" as knowing to not give out details of your personal identity is. It's "Internet literacy", just like understanding that all news has some kind of bias due to its source is "media literacy."
You may not be aware of this, but Paypal is unregulated. They can, and have, overreached. This is very different from a bank who has regulations to follow, some of which protect the consumer from the whims of the bank.
If you can't figure out how to use a different Google account for YouTube from the GCP billing account, I don't know what to say. Google's in the wrong here, but spanner's good shit! (If you can afford it. and you actually need it. you probably don't.)
The problem isn't specifically getting locked out of GCP (though it is likely to happen for those out of the loop on what happened). It is that Google themselves can't figure out that a social media ban shouldn't affect your business continuity (and access to email or what-have-you).
It is an extremely fundamental level of incompetence at Google. One should "figure out" the viability of placing all of one's eggs in the basket of such an incompetent partner. They screwed the authentication issue up and, this is no slippery slope argument, that means they could be screwing other things up (such as being able to contact a human for support, which is what the Terraria developer also had issues with).
Wow, about 9 hours later and 21 of 24 Atlassian services are still showing up as impacted on their status page.
Even @ 9:30am ET this morning, after this supposedly was clearing up, my doctor's office's practice management software was still hosed. Quite the long tail here.
It feels like the root of the issue is the scoping design of JS itself, which makes tracking TDZ more costly for the interpreter, and the fact that JS is JIT rather than AOT compiled.
I laud the recent efforts to remove the JS from JS tools (Go in TS compiler, esbuild, etc), as you don't need 100% of your lang utils written in the same interpreted lang, especially slow/expensive tasks like compilation.
Identical, highly obfuscated (and thus suspicious looking) payload was inserted into 22+ packages from the same author (many dormant for a while) simultaneously and published.
What kind of crazy AI could possible have noticed that on the NPM side?
This is frustrating as someone that has built/published apps and extensions to other software providers for years and must wait days or weeks for a release to be approved while it's scanned and analyzed.
For all the security wares that MS and GitHub sell, NPM has seen practically no investment over the years (e.g. just go review the NPM security page... oh, wait, where?).
We need your help to scale up our cloud based, AI testing software startup and put our 40M Series C raise to work.
We’re a 100% serverless operation built on Google Cloud Platform that rapidly develops and deploys features on a CI/CD model. Seven years in, we’re a Gartner recognized industry player and growing our engineering ranks to keep building out our platform.
Our open positions:
- Software Engineer - Mobile Infrastructure (Onsite / Hybrid US)
- Software Engineer - Integrations Team (Remote India)
Our stack is built with modern Java 17, TypeScript, and Infrastructure as Code
Drop me (an engineer), any questions joe at-symbol mabl.com, and checkout our careers site [1]. We can’t wait to work with you.
Enter K8s in 2017 and life became MUCH easier. I literally have clusters that have been running since then, with the underlying nodes patched and replaced automatically by the cloud vendor. Deployments also "JustWork", are no downtime, and nearly instant. How many sysadmins are needed (on my side) to achieve all of this, zero. Maybe you're thinking of more complex stateful cases like running DBs on K8s, but for the typical app server workload, it's a major win.
reply