Hacker Newsnew | past | comments | ask | show | jobs | submit | asphero's commentslogin

The friction point capture is interesting. In my experience, the hardest part isn't collecting feedback - it's getting users to actually leave it in the moment.

How are you handling the timing of the widget popup? Too early = users haven't formed an opinion yet. Too late = they've already left.

Also curious about the LLM categorization accuracy. Do you let users correct misclassifications to improve the model over time?


Interesting approach. The scraper-vs-site-owner arms race is real.

On the flip side of this discussion - if you're building a scraper yourself, there are ways to be less annoying:

1. Run locally instead of from cloud servers. Most aggressive blocking targets VPS IPs. A desktop app using the user's home IP looks like normal browsing.

2. Respect rate limits and add delays. Obvious but often ignored.

3. Use RSS feeds when available - many sites leave them open even when blocking scrapers.

I built a Reddit data tool (search "reddit wappkit" if curious) and the "local IP" approach basically eliminated all blocking issues. Reddit is pretty aggressive against server IPs but doesn't bother home connections.

The porn-link solution is creative though. Fight absurdity with absurdity I guess.


Plus simple caching to not redownload the same file/page multiple times.

It should also be easy to detect a forejo, gitea, or similar hosting site, locate the git URL and clone the repo.


Without wanting to upset anyone - what makes you interested in sharing tips for team scraper?

(Overgeneralising a bit) site owners are mostly cting for public benefit whereas scrapers act for their own benefit/for private interests.

I imagine most people would land on team site-owner, if they were asked. I certainly would.

P.S. is the best way to scrape fairly just to respect robots.txt?


I think "scraper vs siteowners" is a false dichotomy. Scrapers will always need to exist as long as we want search engines and archival services. We will need small versions of these services to keep popping up every now and then to keep the big guys on their toes, and the smaller guys need advice for scraping politely.


That's fair - though are we in an isolated bout of "every now and then" or has AI created a new normal of abuse (e.g. of robots.txt)? Hopefully we're at a local maximum and some of the scrapers perpetrating harmful behaviours will soon pull their heads in.


Hopefully. It would also be nice to see more activity in the actual search engine and archiving market; there really isn’t much right now.


If you are a marketer or indie maker, you know the struggle: You can't post anywhere without Karma.

You try to launch your product on r/SideProject, but your post gets auto-removed because your account is too new. You try to comment on a discussion, but you get hit with "You need 50 Karma to participate."


Completely legal. This is publicly available Reddit data, not something we scraped illegally.


Hey HN! Built this after Reddit's API changes made most tools unusable.

The problem: Getting API approval now takes weeks or never happens.

The solution: Desktop app that works without API approval. Data stays local, completely private.

Main features: - Search and analyze subreddits - Bulk extract posts/comments/user data - AI-powered insights - CSV export

Built primarily for marketers and researchers. Free tier gives 90 searches/day.

Would love feedback on the approach - especially from anyone who's dealt with Reddit's API changes!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: