Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Your content is stolen for training the moment you put it up




If I give my content away for free, it can’t be stolen.

The point of putting up a public web site is so the public can view it (including OpenAI/google/etc).

If I don’t want people viewing it, then I don’t make it public.

Saying that things are stolen when they aren’t clouds the issue.


It is an _incredible_ stretch to frame certificate transparency logs as "content" in the creative sense.

The whole purpose of this data is to be consumed by 3rd-parties.


I don't see issue with OAI scraping public logs.

But what GP probably meant is that OAI definitely uses this log to get a list of new websites in order to scrap then later. This is a pretty standard way to use CT logs - you get a list of domains to scrap instead of relying solely on hyperlinks.


I think their point is that the people registering certs may not intend their sites to be immediately scraped, but now OpenAI is bypassing e.g. google indexing or web spidering, and using your cert provider's CT entries to find you immediately for scraping.

matt3210 clearly means that the content of the website (revealed by the CT log) is what is being stolen, not the data in the CT log

It would be funny if your content disappeared when it was stolen.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: