Can anyone explain how this worked? as per the paper (which I TLDRed): "A single...

blackbear_ · on June 7, 2023

This paragraph:

> To better estimate latency, we implemented a dual value function setup, whereby AlphaDev has two value function heads: one predicting algorithm correctness and the second predicting algorithm latency. The latency head is used to directly predict the latency of a given program by using the program’s actual computed latency as a Monte Carlo target for AlphaDev during training. This dual-head approach achieved substantially better results than the vanilla, single head value function setup when optimizing for real latency.

Briefly, they use a neural network to predict whether a given sequence of instructions is correct, and how fast it is. Then they used this neural network to guide the program generation via Monte Carlo tree search [1]. It is this procedure that keeps track of the partial score functions at each node.

[1] https://en.wikipedia.org/wiki/Monte_Carlo_tree_search