pith. sign in

Flexible Empowerment at Reasoning with Extended Best-of-N Sampling

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it
abstract

This paper proposes a novel method that incorporates empowerment when reasoning actions in reinforcement learning (RL), thereby achieving the flexibility of exploration-exploitation dilemma (EED). In previous methods, empowerment for promoting exploration has been provided as a bonus term to the task-specific reward function as an intrinsically-motivated RL. However, this approach introduces a delay until the policy that accounts for empowerment is learned, making it difficult to adjust the emphasis on exploration as needed. On the other hand, a trick devised for fine-tuning recent foundation models at reasoning, so-called best-of-N (BoN) sampling, allows for the implicit acquisition of modified policies without explicitly learning them. It is expected that applying this trick to exploration-promoting terms, such as empowerment, will enable more flexible adjustment of EED. Therefore, this paper investigates BoN sampling for empowerment. Furthermore, to adjust the degree of policy modification in a generalizable manner while maintaining computational cost, this paper proposes a novel BoN sampling method extended by Tsalis statistics. Through toy problems, the proposed method's cability to balance EED is verified. In addition, it is demonstrated that the proposed method improves RL performance to solve complex locomotion tasks.

fields

cs.RO 1

years

2026 1

verdicts

UNVERDICTED 1

representative citing papers

Redesigning Regularization for Effective Policy Smoothing

cs.RO · 2026-06-11 · unverdicted · novelty 5.0

Redesigned regularization addresses implementation gaps in policy smoothing for RL, yielding smoother motions with improved performance and robustness on a quadruped robot in sim-to-real settings.

citing papers explorer

Showing 1 of 1 citing paper.

  • Redesigning Regularization for Effective Policy Smoothing cs.RO · 2026-06-11 · unverdicted · none · ref 35 · internal anchor

    Redesigned regularization addresses implementation gaps in policy smoothing for RL, yielding smoother motions with improved performance and robustness on a quadruped robot in sim-to-real settings.