Agentic Episodic Control

Chuyun Shen; Haosheng Chen; Junjie Sheng; Wenhao Li; Xiangfeng Wang; Xidong Yang; Yun Hua

arxiv: 2506.01442 · v2 · pith:7JDQC4ZGnew · submitted 2025-06-02 · 💻 cs.AI

Agentic Episodic Control

Xidong Yang , Wenhao Li , Junjie Sheng , Yun Hua , Haosheng Chen , Chuyun Shen , Xiangfeng Wang This is my paper

classification 💻 cs.AI

keywords episodicmemoryagenticcontroldataefficiencygeneralizationlearning

0 comments

read the original abstract

Reinforcement learning (RL) remains fundamentally limited by poor data efficiency and weak generalization. Prior episodic RL methods attempt to alleviate this via external memory modules, yet they suffer from two key limitations: a representation bottleneck caused by shallow encoders, and a retrieval dilemma where episodic memory is accessed indiscriminately. To address these challenges, we propose Agentic Episodic Control (AEC), a novel architecture that integrates large language models (LLMs) into episodic RL. AEC uses an LLM-based semantic augmenter to generate semantic representations from raw observations, and a critical state recognizer to selectively retrieve valuable experiences. This transforms memory usage from passive similarity matching into strategic, context-aware recall. Across five BabyAI-Text environments, AEC achieves 2-6x higher data efficiency than baselines and is the only method to solve complex tasks like UnlockLocal with over 90% success. It further demonstrates strong cross-task and cross-environment generalization, maintaining performance even under distribution shifts. AEC shows that combining LLM-derived priors with reinforcement learning yields more sample-efficient and adaptable agents.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Trust Region On-Policy Distillation
cs.LG 2026-05 unverdicted novelty 5.0

TrOPD stabilizes on-policy distillation for LLMs with trust-region learning, outlier estimation, and off-policy guidance, outperforming prior OPD methods on reasoning and code benchmarks.
PYTHALAB-MERA: Validation-Grounded Memory, Retrieval, and Acceptance Control for Frozen-LLM Coding Agents
cs.CL 2026-05 unverdicted novelty 5.0

An external controller for frozen LLMs raises strict validation success on three RL coding tasks from 0/9 to 8/9 by selecting memory records and skills, running fail-fast checks, and propagating credit via eligibility traces.