That's what documentation is for. If you don't have that, AI won't figure it out...

jrochkind1 · 2025-05-27T03:01:43 1748314903

I'm not sure that's true?

guappa · 2025-05-27T06:19:52 1748326792

Such a project is way too large for AI to process as a whole. So yes it's true.

throwaway2037 · 2025-05-27T08:37:20 1748335040

I have a question: Many people have spoken about their experience of using LLMs to summarise long, complex PDFs. I am so ignorant on this matter. What is so different about reading a long PDF vs reading a large source base? Or can a modern LLM handle, say, 100 pages, but 10,000 pages is way too much? What happens to an LLM that tries to read 10,000 pages and summarise it? Is the summary rubbish?

srazzaque · 2025-05-28T12:09:27 1748434167

Get the LLM to read and summarise N pages at a time, and store the outputs. Then, you concatenate those outputs into one "super summary" and use _that_ as context.

Theres some fidelity loss but it works for text, because there's quite often so much redundancy.

However, I'm not sure this technique could work on code.

guappa · 2025-05-27T09:35:24 1748338524

It can't handle large contexts, so the way they often do it is file by file, which loses the overall context.

randomjoe2 · 2025-05-28T21:17:54 1748467074

Lots of models CAN handle large contexts, gemini 2.5 pro their latest model can take 1 million tokens of context