That said, anecdotally - they already excel at being logic engines. Capable of filling in the gaps between instructions. Using their worldly knowledge or “common sense” to do so.
But ever so often, they’ll miss an important bit. And I have to be quite involved to catch that. Kinda defeats the purpose. Here, I think we can benefit from supervisor LLMs. A second layer, whose sole job is to ensure the output quality. A QA bot - essentially.
Yeah with their own QA from a variety of different personas/perspectives/concerns/contexts I reckon you'll get a very decent accuracy - or at least self-assessment of inaccuracy. All these can ever do is propagate the data they know so far to the context/prompt you desire, but I don't see obvious limits there. And GPT4 is already a superb conversation partner as smart as nearly any person - so it's really like piecing experts together. If we run into any fundamental limitations from piecing these all together, it's gonna be the same limit that any group of humans trying to make a coherent/consistent organization of knowledge encounters, I think.
Coincidentally, that appears to be how GPT4 was made - apparently it's actually about 8 personas with designated roles running GPT3.5 trained together ("Panel of experts"? there's an AI name for the technique). Makes you wonder how far that one trick scales.
(P.S. great link. Gah - another long read on the todo list)
Like the current top HN post suggests (https://eugeneyan.com/writing/llm-patterns/), we’re still discovering patterns that work well with LLMs.
That said, anecdotally - they already excel at being logic engines. Capable of filling in the gaps between instructions. Using their worldly knowledge or “common sense” to do so.
But ever so often, they’ll miss an important bit. And I have to be quite involved to catch that. Kinda defeats the purpose. Here, I think we can benefit from supervisor LLMs. A second layer, whose sole job is to ensure the output quality. A QA bot - essentially.