Trains a gating policy to select state-dependent planning budgets in variable-delay real-time RL, outperforming fixed-budget and heuristic baselines across Pac-Man, Tetris, Snake, Speed Hex, and Speed Go.
Reasoning, Metareasoning, and Mathematical Truth: Studies of Theorem Proving under Limited Resources
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
In earlier work, we introduced flexible inference and decision-theoretic metareasoning to address the intractability of normative inference. Here, rather than pursuing the task of computing beliefs and actions with decision models composed of distinctions about uncertain events, we examine methods for inferring beliefs about mathematical truth before an automated theorem prover completes a proof. We employ a Bayesian analysis to update belief in truth, given theorem-proving progress, and show how decision-theoretic methods can be used to determine the value of continuing to deliberate versus taking immediate action in time-critical situations.
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Finding the Time to Think: Learning Planning Budgets in Real-Time RL
Trains a gating policy to select state-dependent planning budgets in variable-delay real-time RL, outperforming fixed-budget and heuristic baselines across Pac-Man, Tetris, Snake, Speed Hex, and Speed Go.