ARBITER models reasoning trajectory basins in test-time sampling and uses model-internal signals to correct majority-vote failures, recovering part of the oracle gap on math benchmarks.
Internalizing LLM reasoning via discovery and replay of latent actions
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Reasoning Primitive Induction mines ReAct traces to build a library of typed pseudo-tools that, when composed in a standard ReAct loop, outperform the original agent by 22-44 percentage points on five subtasks.
citing papers explorer
-
ARBITER: Reasoning Trajectory Basins and Majority Vote Failures in Test-Time Sampling
ARBITER models reasoning trajectory basins in test-time sampling and uses model-internal signals to correct majority-vote failures, recovering part of the oracle gap on math benchmarks.