in general the consensus on HN is that the web should be free, scraping public content should be allowed, and net neutrality is desired.
do we want to change that? do we want to require scrapers to pay for network usage, like the ISPs were demanding from Netflix? is net neutrality a bad thing after all?
I think, for many, the web should be free for humans.
When scraping was mainly used to build things like search indexes which are ultimately mutually beneficial to both the website owner and the search engine, and the scrapers were not abusive, nobody really had a problem.
But for generative AI training and access, with scrapers that DDoS everything in sight, and which ultimately cause visits to the websites to fall significantly and merely return a mangled copy of its content back to the user, scraping is a bad thing. It also doesn't help that the generative AI companies haven't paid most people for their training data.
The general consensus here is also that a DDOS attack is bad. I haven't seen objections against respectful scraping. You can say many things about AI scrapers but I wouldn't call them respectful at all.
b) They have a complete lack of respect for robots.txt
I'm starting to think that aggressive scrapers are part of an ongoing business tactic against the decentralized web. Gmail makes self hosted mail servers jump through arduous and poorly documented hoops, and now self hosted services are being DDOSed by hordes of scrapers…
If my website got hugged to death, I would be very happy. If my website got scraped to hell and back by people putting it into the plagiarism machine so that it can regurgitate my content without giving me any attribution, I would be very displeased
Yet HN does it when linking to poorly optimized sites. I doubt people running forges would complain about AI scrapers if their sites were optimized for serving the static content that is being requested.
Net neutrality has nothing to do with how content publishers treat visitors, it's about ISPs who try to interfere based on the content of the traffic instead of just providing "dumb pipes" (infrastructure) like they're supposed to.
I can't speak for everyone, but the web should be free and scraping should be allowed insofar that it promotes dissemination of knowledge and data in a sustainable way that benefits our society and generations to come. You're doing the thing where you're trying to pervert the original intent behind those beliefs.
I see this as a clear example of the paradox of tolerance.
Just as private businesses are allowed "no shirt, no shoes, no service" policies, my website should be allowed a "no heartbeat, no qualia, no HTTP 200".
do we want to change that? do we want to require scrapers to pay for network usage, like the ISPs were demanding from Netflix? is net neutrality a bad thing after all?