pith. sign in

arxiv: 1711.06892 · v3 · pith:Q7NUXQ3Unew · submitted 2017-11-18 · 💻 cs.AI

Learning to select computations

classification 💻 cs.AI
keywords bmpsmetareasoningcomputationsinformationvaluealgorithmcomputationlearning
0
0 comments X
read the original abstract

The efficient use of limited computational resources is an essential ingredient of intelligence. Selecting computations optimally according to rational metareasoning would achieve this, but this is computationally intractable. Inspired by psychology and neuroscience, we propose the first concrete and domain-general learning algorithm for approximating the optimal selection of computations: Bayesian metalevel policy search (BMPS). We derive this general, sample-efficient search algorithm for a computation-selecting metalevel policy based on the insight that the value of information lies between the myopic value of information and the value of perfect information. We evaluate BMPS on three increasingly difficult metareasoning problems: when to terminate computation, how to allocate computation between competing options, and planning. Across all three domains, BMPS achieved near-optimal performance and compared favorably to previously proposed metareasoning heuristics. Finally, we demonstrate the practical utility of BMPS in an emergency management scenario, even accounting for the overhead of metareasoning.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Finding the Time to Think: Learning Planning Budgets in Real-Time RL

    cs.LG 2026-06 unverdicted novelty 6.0

    A learned gating policy selects state-dependent planning budgets in variable-delay real-time RL and outperforms fixed-budget and heuristic baselines across Pac-Man, Tetris, Snake, Speed Hex, and Speed Go.

  2. Finding the Time to Think: Learning Planning Budgets in Real-Time RL

    cs.LG 2026-06 unverdicted novelty 6.0

    Trains a gating policy to select state-dependent planning budgets in variable-delay real-time RL, outperforming fixed-budget and heuristic baselines across Pac-Man, Tetris, Snake, Speed Hex, and Speed Go.