Reflexion: Language agents with verbal reinforcement learning.Advances in Neural Information Processing Systems, 36

Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, Shunyu Yao · 2024

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

citation-role summary

baseline 1

citation-polarity summary

baseline 1

representative citing papers

Group-in-Group Policy Optimization for LLM Agent Training

cs.LG · 2025-05-16 · unverdicted · novelty 7.0

GiGPO adds a hierarchical grouping mechanism to group-based RL so that LLM agents receive both global trajectory and local step-level credit signals, yielding >12% gains on ALFWorld and >9% on WebShop over GRPO while keeping the same rollout and memory footprint.

WebGen-R1: Incentivizing Large Language Models to Generate Functional and Aesthetic Websites with Reinforcement Learning

cs.CL · 2026-04-22 · unverdicted · novelty 6.0

WebGen-R1 uses end-to-end RL with scaffold-driven generation and cascaded rewards for structure, function, and aesthetics to transform a 7B model into a generator of deployable multi-page websites that rivals much larger models.

Supervising the search process produces reliable and generalizable information-seeking agents

cs.CL · 2025-02-19 · unverdicted · novelty 6.0

Process supervision via RAG-Gym produces more reliable and generalizable search agents, with gains driven by higher-quality queries on out-of-domain multi-hop tasks.

citing papers explorer

Showing 3 of 3 citing papers.

Group-in-Group Policy Optimization for LLM Agent Training cs.LG · 2025-05-16 · unverdicted · none · ref 30
GiGPO adds a hierarchical grouping mechanism to group-based RL so that LLM agents receive both global trajectory and local step-level credit signals, yielding >12% gains on ALFWorld and >9% on WebShop over GRPO while keeping the same rollout and memory footprint.
WebGen-R1: Incentivizing Large Language Models to Generate Functional and Aesthetic Websites with Reinforcement Learning cs.CL · 2026-04-22 · unverdicted · none · ref 40
WebGen-R1 uses end-to-end RL with scaffold-driven generation and cascaded rewards for structure, function, and aesthetics to transform a 7B model into a generator of deployable multi-page websites that rivals much larger models.
Supervising the search process produces reliable and generalizable information-seeking agents cs.CL · 2025-02-19 · unverdicted · none · ref 65
Process supervision via RAG-Gym produces more reliable and generalizable search agents, with gains driven by higher-quality queries on out-of-domain multi-hop tasks.

Reflexion: Language agents with verbal reinforcement learning.Advances in Neural Information Processing Systems, 36

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer