On 53 minutes from the original video, he shows how exact is the quotation of an...

fenomas · 2025-02-10T14:01:40 1739196100

Note that he's inferring from a base model there, which are fairly capable of regurgitating their (highly-weighted) inputs since they do nothing but predict pre-training tokens. For instruct services like ChatGPT, if they regurgitate something I'd think it would more likely be their fine-tuning data, which is usually owned by the provider (and also kept secret).

p0w3n3d · 2025-02-10T17:38:07 1739209087

what I mean is if we can describe an LLM as a lossy compression (which are words spoken by Andrej), we could define what was done during inferring as uncompressing the compressed data, and at this moment shit would hit the fan.

sambull · 2025-02-10T13:20:57 1739193657

It's a interesting question, one I wonder now that our federal data is being exfiltrated to AI companies. If they train their models on the data; how does the law tell them to 'unlearn'?

Destroying the copies they took will be what the courts ordered, but the data will still be there.

avbanks · 2025-02-10T13:20:02 1739193602

This is still being litigated I believe.