Thirty-seventh Conference on Neural Information Processing Systems , year=

Reflexion: language agents with verbal reinforcement learning , author=

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

representative citing papers

Memory-Guided Tree Search with Cross-Branch Knowledge Transfer for LLM Solver Synthesis

cs.AI · 2026-05-17 · unverdicted · novelty 7.0

MEMOIR adds branch-local and global memory with a reflection step to tree search for LLM solver synthesis, reaching 96.7% solution validity and 7.3-point score gains over baselines on seven CO problems with lower run-to-run variance.

MemQ: Integrating Q-Learning into Self-Evolving Memory Agents over Provenance DAGs

cs.AI · 2026-05-08 · unverdicted · novelty 7.0 · 3 refs

MemQ improves LLM agent performance by using eligibility traces over provenance DAGs to assign credit to dependent memories, achieving top success rates on six benchmarks with largest gains on complex multi-step tasks.

Beyond Meta-Reasoning: Metacognitive Consolidation for Self-Improving LLM Reasoning

cs.AI · 2026-04-19 · unverdicted · novelty 7.0

Metacognitive Consolidation lets LLMs accumulate reusable meta-reasoning skills from past episodes to improve future performance across benchmarks.

HINT-SD: Targeted Hindsight Self-Distillation for Long-Horizon Agents

cs.LG · 2026-05-18 · unverdicted · novelty 6.0

HINT-SD improves long-horizon LLM agent training by using hindsight to target self-distillation on failure-relevant action spans, delivering up to 18.8% higher performance and 2.26x lower time per step than dense per-turn feedback.

citing papers explorer

Showing 4 of 4 citing papers.

Memory-Guided Tree Search with Cross-Branch Knowledge Transfer for LLM Solver Synthesis cs.AI · 2026-05-17 · unverdicted · none · ref 2
MEMOIR adds branch-local and global memory with a reflection step to tree search for LLM solver synthesis, reaching 96.7% solution validity and 7.3-point score gains over baselines on seven CO problems with lower run-to-run variance.
MemQ: Integrating Q-Learning into Self-Evolving Memory Agents over Provenance DAGs cs.AI · 2026-05-08 · unverdicted · none · ref 7 · 3 links
MemQ improves LLM agent performance by using eligibility traces over provenance DAGs to assign credit to dependent memories, achieving top success rates on six benchmarks with largest gains on complex multi-step tasks.
Beyond Meta-Reasoning: Metacognitive Consolidation for Self-Improving LLM Reasoning cs.AI · 2026-04-19 · unverdicted · none · ref 22
Metacognitive Consolidation lets LLMs accumulate reusable meta-reasoning skills from past episodes to improve future performance across benchmarks.
HINT-SD: Targeted Hindsight Self-Distillation for Long-Horizon Agents cs.LG · 2026-05-18 · unverdicted · none · ref 4
HINT-SD improves long-horizon LLM agent training by using hindsight to target self-distillation on failure-relevant action spans, delivering up to 18.8% higher performance and 2.26x lower time per step than dense per-turn feedback.

Thirty-seventh Conference on Neural Information Processing Systems , year=

fields

years

verdicts

representative citing papers

citing papers explorer