Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yeah, prompt injection is good point. For now, I try separate instruction and data by using JSON format, and run it in sandbox. Not perfect maybe, but I will try add small explanation in README so people can check it better.


In this case the result/output is plain text. Since it's not code it may be harder to imagine an attack vector. As an attacker, here would be some of my capabilities/possibilities:

- I could change the meaning of the output and the output entirely. - If I can control one part of a larger set of data that is analyzed , I could influence the whole output. - I could try to make the process take forever in order to waste resources.

I'd say the first scenario is most interesting, especially if I could then potentially also influence how an LLM trained on the output behaves and do even more damage using this down the line.

Let's say I'm a disgruntled website author. I want my users to see correct information on my website but don't want any LLM to be trained on it. In this case I could probably successfully use prompt injection to "poison" the model.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: