Well, maybe. As far as I can see, the overt ones are using pretty reasonable rate limits, even though they're scraping in useless ways (every combination of git hash and file path on gitea). Rather, it seems like he anonymous ones are the problem - and since they're anonymous, we have zero reason to believe they're AI companies. Some of them are running on Huawei Cloud. I doubt OpenAI is using Huawei Cloud.
It could, the idea is just to tip the economics such that it's not worth it for the bot operator. That kind of abuse typically happens at a vast scale where the cost of solving the challenges adds up fast.
Why would someone renting dirt cheap botnet time care if the requests take a few seconds longer to your site?
Plus, the requests are still getting through after waiting a few seconds, so it does nothing for the website operator and just burns battery for legit users.
If you're a botnet operator of a botnet that normally scraped a few dozen pages per second and then noticed a site suddenly taking multiple seconds per page, that's at least an order of magnitude (or two) decrease in performance. If you care at all about your efficiency, you step in and put that site on your blacklist.
Even if the bot owner doesn't watch (or care) about about their crawling metrics, at least the botnot is not DDoSing the site in the meantime.
This is essentially a client-side tarpit, which are actually pretty effective against all forms of bot traffic while not impacting legitimate users very much if at all.
A tarpit is selective. You throw bad clients in the tarpit.
This is something you throw everyone through. both your abusive clients (running on stolen or datacenter hardware) and your real clients (running on battery-powered laptops and phones). More like a tar-checkpoint.
Websites aren't really fungible like that, and where they are (like general search indexing for example), that's usually the least hostile sort of automated traffic. But if that's all you care about, I'll cede the point.
Usually if you're going to go through the trouble of integrating a captcha, you want to protect against targeted attacks like a forum spammer where you don't want to let the abusive requests through at all, not just let it through after 5000ms.
No, because the bot can just also sleep and scrape other sources in that time. With pow, you waste their CPU cycles and block them from doing other work.
Botnets just shift the bottleneck from "how much compute can they afford to buy legit" to "how many machines can they compromise or afford to buy on the black market". Either way it's a finite resource, so making each abusive request >10,000x more expensive still severely limits how much damage they can do, especially when a lot of botnet nodes are IoT junk with barely any CPU power to speak of.
So the crazy decentralized mystery botnet(s) that are affecting many of us -- don't seem to be that worried about cost. They are making millions of duplicate requests for duplicate useless content, it's pretty wild.
On the other hand, they ALSO dont' seem to be running user-agents that execute javascript.
This is in the findings of a group of some of my colleagues at peer non-profits that have been sharing notes to try to understand what's going on.
So the fact that they don't run JS at present means that PoW would stop them -- but so would something much simpler and cheaper relying on JS.
If this becomes popular, could they afford to run JS and to calcualte the PoW?
It's really unclear. The behavior of these things does not make sense to me enough to have much of a theory about what their cost/benefits or budgets are, it's all a mystery to me.
Definitely hoping someone manages to figure out who's really behind this and why at some point. (i am definitely not assuming it's a single entity either).
It's not exactly true. You don't need to solve the challenge per each request as PoW systems provide you with a session token which is valid for a while.
Basically you need session-token generators which usually are automated headless browsers.
Another not-exactly-valid point is you don't need a botnet. You can scrape at scale with 1 machine using proxies. Proxies are dirt cheap.
So basically you generate a session for a proxy IP and scrape as long as the token is valid. No botnets, no magic, nada. Just business.
I think the general idea isn't that they can't but that they either won't, because they're not executing JS, or that it would slow them down enough to effectively cripple them.
I think this being called a "recaptcha alternative" to be slightly misleading.
There are two problems some website hosters encounter:
A) How do I ensure no one DDOS (real or inadvertently) me?
B) How can I ensure this client is actually a human, not a robot?
Things like ReCaptcha aimed to solve B, not A. But the submitted solution seems to be more for A, as calculating a PoW can be (probably must be actually) calculated by a machine, not a human. While ReCaptcha is supposed to be the opposite, could only be solved by a human.
As a web developer i don't think i have ever supported this feature, but only because i never remembered to. It's a pretty easy feature to add, but unless browsers can force it, you're better off with uBLock.
No one (corporate) supports it unless it comes enabled by default with whatever compliance service/ plugin used on their sites.
The best combo I've found so far is Waterfox + uBO. I'm sure there are others, but this works well if you don't want to use a chromium based browser.
Libs like qrframe are great for automating QR code creation in a web app, the ControlNet stuff is great for if you just need to make one QR code for an event or want an animated QR code (stuff like this is usually several frames per minute if you’ve got a good GPU).
With the right harness around it you could trust it. Set it up so that it takes your input, generates the cron expression, then uses a deterministic cron library to spit out a summary of what that expression does plus a list of next upcoming instances.
That should give you all the information you need to determine if it got the right expression.
No, that still leaves a small probability of something weird happening - and since it's a small probability you might not ever spot it in your own testing.
In this case I think it's better to run known, deterministic code that can turn a crontab into a clear explanation.
This is still held back by garbage ISPs, which are plentiful in the US. Terrible wifi/modem combos and low data caps put a huge damper on game streaming.
There are 2^256 wallets. There are 2^72 grains of sand on earth.
The chance of your bank screwing up is a lot higher, by trillions.