← back to paper
arxiv: 2605.21851 · 2 revisions
OPPO: Bayesian Value Recursion for Token-Level Credit Assignment in LLM Reasoning