Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Can anyone explain how this worked? as per the paper (which I TLDRed): "A single incorrect instruction in the AssemblyGame can potentially invalidate the entire algorithm, making exploration in this space of games incredibly challenging."

What did it do if it didn't have a useful partial score function? How did it avoid brute force?



This paragraph:

> To better estimate latency, we implemented a dual value function setup, whereby AlphaDev has two value function heads: one predicting algorithm correctness and the second predicting algorithm latency. The latency head is used to directly predict the latency of a given program by using the program’s actual computed latency as a Monte Carlo target for AlphaDev during training. This dual-head approach achieved substantially better results than the vanilla, single head value function setup when optimizing for real latency.

Briefly, they use a neural network to predict whether a given sequence of instructions is correct, and how fast it is. Then they used this neural network to guide the program generation via Monte Carlo tree search [1]. It is this procedure that keeps track of the partial score functions at each node.

[1] https://en.wikipedia.org/wiki/Monte_Carlo_tree_search




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: