The friction point capture is interesting. In my experience, the hardest part isn't collecting feedback - it's getting users to actually leave it in the moment.
How are you handling the timing of the widget popup? Too early = users haven't formed an opinion yet. Too late = they've already left.
Also curious about the LLM categorization accuracy. Do you let users correct misclassifications to improve the model over time?
Interesting approach. The scraper-vs-site-owner arms race is real.
On the flip side of this discussion - if you're building a scraper yourself, there are ways to be less annoying:
1. Run locally instead of from cloud servers. Most aggressive blocking targets VPS IPs. A desktop app using the user's home IP looks like normal browsing.
2. Respect rate limits and add delays. Obvious but often ignored.
3. Use RSS feeds when available - many sites leave them open even when blocking scrapers.
I built a Reddit data tool (search "reddit wappkit" if curious) and the "local IP" approach basically eliminated all blocking issues. Reddit is pretty aggressive against server IPs but doesn't bother home connections.
The porn-link solution is creative though. Fight absurdity with absurdity I guess.
I think "scraper vs siteowners" is a false dichotomy. Scrapers will always need to exist as long as we want search engines and archival services. We will need small versions of these services to keep popping up every now and then to keep the big guys on their toes, and the smaller guys need advice for scraping politely.
That's fair - though are we in an isolated bout of "every now and then" or has AI created a new normal of abuse (e.g. of robots.txt)? Hopefully we're at a local maximum and some of the scrapers perpetrating harmful behaviours will soon pull their heads in.
If you are a marketer or indie maker, you know the struggle: You can't post anywhere without Karma.
You try to launch your product on r/SideProject, but your post gets auto-removed because your account is too new. You try to comment on a discussion, but you get hit with "You need 50 Karma to participate."
How are you handling the timing of the widget popup? Too early = users haven't formed an opinion yet. Too late = they've already left.
Also curious about the LLM categorization accuracy. Do you let users correct misclassifications to improve the model over time?