pith. sign in

arxiv: 2606.00838 · v1 · pith:BOEINB4Onew · submitted 2026-05-30 · 💻 cs.AI

Decoupled Behavioral Cloning for Scalable Inductive Generalization in RL from Specifications

classification 💻 cs.AI
keywords generalizationtrainingbehavioralcloningfunctionlearningpoliciesdecoupled
0
0 comments X
read the original abstract

Inductive generalization is a framework for reinforcement learning (RL) generalization in which inductively related task instances admit inductively related policies. Prior work captures this structure via a higher-order policy-evolution function learned directly with RL, but suffers from poor training scalability: as training tasks grow, aggregated reward feedback becomes noisy and conflicting, destabilizing training and weakening generalization. We propose DIBS, a decoupled behavioral cloning approach that separates learning task-specific policies from learning the evolution function. We first learn individual teacher policies per task via standard RL, then fit the evolution function via behavioral cloning on teacher-labeled state-action pairs. This replaces noisy reward aggregation with dense, stable supervision. DIBS achieves significant improvements in both training stability and zero-shot generalization against existing RL and meta-RL algorithms.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.