Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You have a point but

1) I don't think being able to query them and reconstruct input 1:1 are requirements. If i build a shitty db with a buggy query language that retrieves incomplete data and occasionally mixes in data i didn't ask for, then it's still a db, just a shitty one.

If i populate it with copyrighted material and put it online, whether I am gonna get sued is likely based on how shitty it is, if it's good enough that people can get enough value from it that they don't buy the original works, then the original authors are not gonna be pleased.

2) Yes, comparisons to humans are not always useful though I'd say they don't reason or understand at all.

Either way the discussion should be about justice and fairness. The fact is LLMs are trained on data which took human work and effort to create. LLMs would not be possible without this data (or ML companies would train on just the public domain and avoid the risk of a massive lawsuit). The people who created the original training data deserve a fair share of the value produced by using their work.

So the real question to me is how much so they deserve?



It sounds like you’re sort of starting from the position that AI is inherently unjust and then reasoning backward to justify it. But shouldn’t the argument start with actual harm rather than assumed unfairness?


I wouldn't say that.

My point is that any situation where person A puts in a certain amount of work (normalized by skill, competence, etc.), person B uses person A's work, puts in some work of his own but less than A, then gets more reward than A, is fundamentally unfair.

LLMs are this, just at a massive scale.

But to be honest, this is where the discussion went only after thinking about it for a while. My real starting point was that when I publish my code under AGPL, I do it because I want anyone who builds on top of it to also have to release their code so users have the freedom to modify it.

LLMs are used to launder my code, deprive me of credit and deprive users of their rights.

Can we agree this is harm?

I also believe than unfairness is fundamentally the same as harm, just framed a bit differently.


> LLMs are used to launder my code,

> deprive me of credit

> and deprive users of their rights.

> Can we agree this is harm?

I might consider it if any of those claims were true.

I think the opposite is true -- especially with Open-Weight models which expand user freedoms rather than restricting them. I wonder if we can get the FSF to come up with GPL compatible Open-Weight licenses.

At this point in time I'm not entirely convinced they even need to. But if future lawsuits turn out that way, it might solve issues with some models.


> I might consider it if any of those claims were true.

Please step back and consider what you are replying to.

> LLMs are used to launder my code

If an LLM was trained only on AGPL code, would it have to be licensed under AGPL? Would its output?

> deprive me of credit

They _obviously_ deprive me of credit. Even if an LLM was trained entirely on my code, nobody using its output would know. Compare to using a library where my name is right there in the license.

> and deprive users of their rights.

I appeal to you again, re-read my comment. I am not talking about users of the model but users of the software that is in part based on my AGPL code. If my code got there traditionally by being included in a library, the whole software would have to be AGPL and users would have the right to modify. If my code is laundered through an LLM, users of my code lose that right.

> Can we agree this is harm?

So all of those things are true. And this is clearly harm.

Stealing a little from everyone is morally no different than stealing a lot from one person. Whenever you think about ML, consider extreme cases such as training on data under one license and all the arguments to pretend it's not copyright infringement fall apart. (And if you don't think extreme cases set precedent to real cases, then please point out where exactly you draw the line. Give me a number.)

Spreading the harm around means everyone is harmed similarly but that is not the kind of fairness I had in mind.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: