What is "free code"? Most code is either proprietary (though sometimes public), under a permissive license or under a copyleft license. The only free code is code which is in the public domain.
You could make separate versions of an LLM depending on the license of its output. If it has t produce public domain code, it can only be trained on the public domain. If if has to produce permissive code (without attribution), then it can be trained on the public domain and permissive code. If copyleft, then those two and copyleft (but it does not solve attribution).
> Companies can also buy code for feeding the model after all.
We've come a long way since the time slavery was common (at least in the west) but we still have a class system where rich people can pay a one time fee (buying a company) and extract value in perpetuity from people putting in continuous work while doing nothing of value themselves. This kind of passive income is fundamentally unjust but pervasive enough that people accept it, just like they accepted slavery as a fact of life back then.
Companies have a stronger bargaining position than individuals (one of the reasons beside defense why people form states - to represent a common interest against companies). This is gonna lead to companies (the rich) paying a one time fee, then extracting value forever while the individual has to invest effort into looking for another job. Compensation for buying code can only be fair if it's a percentage of the value generated by that code.
> So beside the injustice you directly experience right now over your own code probably being fed into AI models, do you fear/despise anything more than that from LLMs?
Umm, I think theft on a civilization-level scale is sufficient.
> Umm, I think theft on a civilization-level scale is sufficient.
As long as everybody can also benefit from it, I see it as some kind of collective knowledge sharing.
As you stated in the paragraphs before unless the wealth distribution changes, LLMs may lead to an escalating imbalance, unless the models are shared for free as soon as a critical mass of authors is involved, regardless of who owns the assets.
1) The current system is that the rich get richer faster than the poor.
2) You're proposing a system where everybody gets richer at the same rate at the best of times (but probably devolves into case 1 IMO)
3) I am proposing a system where people get richer at the rate of how much work they put in. If a rich person does not produce value, he doesn't get richer at all. If a poor person produces value he gets richer according to how many people benefit from it. If a poor person puts in 1000 units of work and a rich person puts in 10 units of work to distribute the work to more people (for example through marketing), they get richer at comparative rates 1000:10.
My system is obviously harder to implement (despite all the revolutions in history, societies have always devolved to case 1, or sometimes case 2 that devolved into 1 later). It might be impossible to implement perfectly but it does not mean we should stop trying to make things at least less unfair.
---
We're in agreement that forcing companies who take (steal) work from everyone to release their models for free is better than letting them profit without bounds.
However, I am taking it way further. If AI research leads to fundamental changes in society, we should take it as an opportunity to reevaluate the societal systems we have now and make them more fair.
For example, I don't care who owns assets. I care about who puts in work. It's a more fundamental unit of value. Work produces assets after all.
---
And BTW making models free does not in any way help restore users' rights provided by AGPL. And I have yet to come across anybody making a workable proposal how to protect these rights in an age where AGPL code is remixed through statistics into all software without making it also AGPL. In fact, I have yet to find anybody who acknowledges it's a problem.
You could make separate versions of an LLM depending on the license of its output. If it has t produce public domain code, it can only be trained on the public domain. If if has to produce permissive code (without attribution), then it can be trained on the public domain and permissive code. If copyleft, then those two and copyleft (but it does not solve attribution).
> Companies can also buy code for feeding the model after all.
We've come a long way since the time slavery was common (at least in the west) but we still have a class system where rich people can pay a one time fee (buying a company) and extract value in perpetuity from people putting in continuous work while doing nothing of value themselves. This kind of passive income is fundamentally unjust but pervasive enough that people accept it, just like they accepted slavery as a fact of life back then.
Companies have a stronger bargaining position than individuals (one of the reasons beside defense why people form states - to represent a common interest against companies). This is gonna lead to companies (the rich) paying a one time fee, then extracting value forever while the individual has to invest effort into looking for another job. Compensation for buying code can only be fair if it's a percentage of the value generated by that code.
> So beside the injustice you directly experience right now over your own code probably being fed into AI models, do you fear/despise anything more than that from LLMs?
Umm, I think theft on a civilization-level scale is sufficient.