The few minute time delay is primarily because of the sequential LLM processing ...

The few minute time delay is primarily because of the sequential LLM processing steps by high quality LLMs, not database access times. The system reads and generates paragraphs about papers, then compares them, and we have to use the highest quality LLMs, so token generation times are perceptible. We repeat many times for accuracy. We find it's impossible to be accurate without GPT-4 level models and the delay.