I have a question: Many people have spoken about their experience of using LLMs to summarise long, complex PDFs. I am so ignorant on this matter. What is so different about reading a long PDF vs reading a large source base? Or can a modern LLM handle, say, 100 pages, but 10,000 pages is way too much? What happens to an LLM that tries to read 10,000 pages and summarise it? Is the summary rubbish?
Get the LLM to read and summarise N pages at a time, and store the outputs. Then, you concatenate those outputs into one "super summary" and use _that_ as context.
Theres some fidelity loss but it works for text, because there's quite often so much redundancy.
However, I'm not sure this technique could work on code.