seems like this is a solvable problem with just a bit more engineering effort, b...

acchow · on Sept 9, 2024

You should just need a better prompt. I think everyone would benefit from using a standardized prompt which asks the model to think through its work between `<thought>` tags before writing its response, and also reflecting on the response between `<reflection>` tags, and then outputting the final response afterwards

tantalor · on Sept 9, 2024

Easy, just instruct the LLM to not hallucinate the score, problem solved.