it does? why? the term RAG as I understand it leaves the methodology for retrieval vague so that different techniques can be used depending on the, er, context.. which makes a lot more sense to me
(blogpost author here)
You're right! I did make the distinction in an earlier draft, but decided to use "RAG" interchangeably with vector search, as it is popularly known today in code-gen systems. I'd probably go back to the previous version too.
But I do think there is a qualitative different between getting candidates and adding them to context before generating (retrieval augmented generation) vs the LLM searching for context till it is satisfied.
If you want to be really stringent, RAG originally referred to going from user query to retrieving information directly based on the query then passing it to an LLM: With CC the LLM is taking the raw user query then crafting its own searches
But realistically lots of RAG systems have LLM calls interleaved for various reasons, so what they probably mean it not doing the usual chunking + embeddings thing.