Exploratory memory- augmented llm agent via hybrid on- and off-policy optimization.CoRR, abs/2602.23008

Zeyuan Liu, Jeonghye Kim, Xufang Luo, Dongsheng Li, Yuqing Yang · 2026 · arXiv 2602.23008

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

MAP: A Map-then-Act Paradigm for Long-Horizon Interactive Agent Reasoning

cs.AI · 2026-05-13 · unverdicted · novelty 6.0

MAP improves LLM agent reasoning by constructing a structured cognitive map of the environment before task execution, yielding performance gains on benchmarks like ARC-AGI-3 and superior training data via the new MAP-2K dataset.

AIPO: Learning to Reason from Active Interaction

cs.CL · 2026-05-08 · unverdicted · novelty 6.0 · 2 refs

AIPO adds active multi-agent consultation (Verify, Knowledge, Reasoning agents) plus custom importance sampling to RLVR training so LLMs expand their reasoning boundary and then operate without the agents.

From History to State: Constant-Context Skill Learning for LLM Agents

cs.AI · 2026-05-06 · unverdicted · novelty 6.0

Constant-context skill learning trains reusable task-family modules for LLM agents using a deterministic state block for progress tracking and subgoal rewards, achieving 89.6% unseen success on ALFWorld, 76.8% on WebShop, and 66.4% on SciWorld with Qwen3-8B while reducing prompt tokens 2-7x.

Closing the Loop: Unified 3D Scene Generation and Immersive Interaction via LLM-RL Coupling

cs.CV · 2026-05-07 · unverdicted · novelty 5.0

A closed-loop system couples LLM-based 3D scene generation with RL optimization and VR user interactions to produce adaptive, immersive environments, claiming SOTA results on the ALFRED benchmark.

citing papers explorer

Showing 4 of 4 citing papers.

MAP: A Map-then-Act Paradigm for Long-Horizon Interactive Agent Reasoning cs.AI · 2026-05-13 · unverdicted · none · ref 15
MAP improves LLM agent reasoning by constructing a structured cognitive map of the environment before task execution, yielding performance gains on benchmarks like ARC-AGI-3 and superior training data via the new MAP-2K dataset.
AIPO: Learning to Reason from Active Interaction cs.CL · 2026-05-08 · unverdicted · none · ref 43 · 2 links
AIPO adds active multi-agent consultation (Verify, Knowledge, Reasoning agents) plus custom importance sampling to RLVR training so LLMs expand their reasoning boundary and then operate without the agents.
From History to State: Constant-Context Skill Learning for LLM Agents cs.AI · 2026-05-06 · unverdicted · none · ref 19
Constant-context skill learning trains reusable task-family modules for LLM agents using a deterministic state block for progress tracking and subgoal rewards, achieving 89.6% unseen success on ALFWorld, 76.8% on WebShop, and 66.4% on SciWorld with Qwen3-8B while reducing prompt tokens 2-7x.
Closing the Loop: Unified 3D Scene Generation and Immersive Interaction via LLM-RL Coupling cs.CV · 2026-05-07 · unverdicted · none · ref 30
A closed-loop system couples LLM-based 3D scene generation with RL optimization and VR user interactions to produce adaptive, immersive environments, claiming SOTA results on the ALFRED benchmark.

Exploratory memory- augmented llm agent via hybrid on- and off-policy optimization.CoRR, abs/2602.23008

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer