Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Generally our use cases are completely different -- if a user is doing scraping, it's been structured scraping on a small (<50) set of sites where they need to be able to pull data from a website as if it was an API call, not as a way to web-crawl and get masses of training data.

We gate full access to the platform partially for this reason. We debated giving fewer than 50 free browser sessions, for example, and have already banned a few accounts from our self-serve today that were unidentifiable/"company accounts" without a legitimate web presence.



That is nice, so many companies don't stop to think how their product might be abused. Love to see that you've given it some thought.

One think I might add: limit how many requests per second your clients can make. Even if they can just scrape a small set of sites, they can still hammer a site into the ground. One of the things I think Google has been doing really well since their start is knowing when to back of a site. So either rate-limit your clients, or back of a site if you see responses slowing down.

We just had a company hit one of our sites pretty heavily, and when asking them of back off a bit, they just asked if we could perhaps rate-limit them. Apparently they don't know how to reduce their traffic themselves.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: