hub

Self-reflection in LLM agents: Effects on problem-solving performance.CoRR, abs/2405.06682

Matthew Renze, Erhan Guven · 2024 · arXiv 2405.06682

16 Pith papers cite this work. Polarity classification is still indexing.

16 Pith papers citing it

read on arXiv browse 16 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 3 method 1

citation-polarity summary

background 3 use method 1

representative citing papers

VINS-120K: Ultra High-Resolution Image Editing with A Large-Scale Dataset

cs.CV · 2026-05-22 · unverdicted · novelty 7.0

VINS-120K supplies the first large-scale set of instruction-image-edited-image triplets at ultra-high resolution together with an adaptation strategy that improves detail synthesis.

GSAR: Typed Grounding for Hallucination Detection and Recovery in Multi-Agent LLMs

cs.AI · 2026-04-25 · unverdicted · novelty 7.0

GSAR is a grounding-evaluation framework for multi-agent LLMs that uses a four-way claim typology, evidence-weighted asymmetric scoring, and tiered recovery decisions to detect and mitigate hallucinations.

Unified Data Selection for LLM Reasoning

cs.CL · 2026-05-21 · unverdicted · novelty 6.0

High-Entropy Sum (HES) selects high-quality reasoning data for LLMs by summing entropy of the top highest-entropy tokens, matching full-dataset performance with top 20% in SFT and outperforming baselines in RFT and RL.

STAR: Failure-Aware Markovian Routing for Multi-Agent Spatiotemporal Reasoning

cs.AI · 2026-05-11 · unverdicted · novelty 6.0 · 3 refs

STAR presents a failure-aware routing framework using a state-conditioned transition policy and an agent routing matrix combining expert routes with learned recoveries from execution traces to improve multi-agent spatiotemporal reasoning.

HAGE: Harnessing Agentic Memory via RL-Driven Weighted Graph Evolution

cs.AI · 2026-05-11 · unverdicted · novelty 6.0

HAGE proposes a trainable weighted graph memory framework with LLM intent classification, dynamic edge modulation, and RL optimization that improves long-horizon reasoning accuracy in agentic LLMs over static baselines.

Visual Enhanced Depth Scaling for Multimodal Latent Reasoning

cs.CV · 2026-04-12 · unverdicted · novelty 6.0 · 3 refs

Visual replay module and adaptive depth scaling improve multimodal latent reasoning, reaching SOTA benchmarks with faster inference than explicit chain-of-thought methods.

A Minimal Agent for Automated Theorem Proving

cs.AI · 2026-02-27 · unverdicted · novelty 6.0

A minimal agentic system achieves competitive performance in automated theorem proving with a simpler design and lower cost than state-of-the-art methods.

Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory

cs.CL · 2025-11-25 · unverdicted · novelty 6.0

Evo-Memory is a new streaming benchmark and evaluation framework for self-evolving memory in LLM agents, unifying over ten memory modules and introducing the ReMem pipeline for continual improvement on multi-turn and reasoning datasets.

MeTHanol: Modularized Thinking Language Models with Intermediate Layer Thinking, Decoding and Bootstrapping Reasoning

cs.CL · 2024-09-18 · unverdicted · novelty 6.0

MeTHanol fine-tunes an intermediate LLM layer to generate thoughts in a first pass, then uses those thoughts for a second-pass answer, showing gains on Theory of Mind and vignette tasks plus adaptation to character prompts.

Context, Reasoning, and Hierarchy: A Cost-Performance Study of Compound LLM Agent Design in an Adversarial POMDP

cs.AI · 2026-05-15 · conditional · novelty 5.0

In CybORG CAGE-2, programmatic state abstraction improves mean return up to 76% over raw observations while adding deliberation tools to hierarchies degrades performance up to 3.4x and increases token use.

UniMesh: Unifying 3D Mesh Understanding and Generation

cs.CV · 2026-04-19 · unverdicted · novelty 5.0

UniMesh unifies 3D mesh generation and understanding in one model via a Mesh Head interface, Chain of Mesh iterative editing, and an Actor-Evaluator self-reflection loop.

Enhancing LLM Problem Solving via Tutor-Student Multi-Agent Interaction

cs.AI · 2026-04-10 · unverdicted · novelty 5.0

A same-LLM tutor-student agent pair solves coding tasks at similar or higher accuracy than self-consistency or debate baselines while using significantly fewer tokens.

Failure Makes the Agent Stronger: Enhancing Accuracy through Structured Reflection for Reliable Tool Interactions

cs.CV · 2025-09-23 · unverdicted · novelty 5.0

Structured reflection makes error diagnosis and repair an explicit trainable step that improves reliability and reduces redundant calls in tool-using LLM agents.

Self-Aligned Reward: Towards Effective and Efficient Reasoners

cs.LG · 2025-09-05 · unverdicted · novelty 5.0

Self-aligned reward uses relative perplexity differences to encourage concise, query-specific reasoning in LLMs, yielding 4% accuracy gains and 30% lower inference cost when added to PPO or GRPO.

Trustworthy Agent Network: Trust in Agent Networks Must Be Baked In, Not Bolted On

cs.AI · 2026-05-18 · unverdicted · novelty 4.0

Argues that trustworthiness in Agent-to-Agent networks requires a new conceptual framework with four design pillars baked in from the beginning, as retrofitting existing single-agent methods is insufficient.

Large Language Model-Brained GUI Agents: A Survey

cs.AI · 2024-11-27 · unverdicted · novelty 4.0

A survey consolidating frameworks, data practices, large action models, benchmarks, applications, and research gaps in LLM-brained GUI agents.

citing papers explorer

Showing 16 of 16 citing papers.

VINS-120K: Ultra High-Resolution Image Editing with A Large-Scale Dataset cs.CV · 2026-05-22 · unverdicted · none · ref 37
VINS-120K supplies the first large-scale set of instruction-image-edited-image triplets at ultra-high resolution together with an adaptation strategy that improves detail synthesis.
GSAR: Typed Grounding for Hallucination Detection and Recovery in Multi-Agent LLMs cs.AI · 2026-04-25 · unverdicted · none · ref 4
GSAR is a grounding-evaluation framework for multi-agent LLMs that uses a four-way claim typology, evidence-weighted asymmetric scoring, and tiered recovery decisions to detect and mitigate hallucinations.
Unified Data Selection for LLM Reasoning cs.CL · 2026-05-21 · unverdicted · none · ref 20
High-Entropy Sum (HES) selects high-quality reasoning data for LLMs by summing entropy of the top highest-entropy tokens, matching full-dataset performance with top 20% in SFT and outperforming baselines in RFT and RL.
STAR: Failure-Aware Markovian Routing for Multi-Agent Spatiotemporal Reasoning cs.AI · 2026-05-11 · unverdicted · none · ref 18 · 3 links
STAR presents a failure-aware routing framework using a state-conditioned transition policy and an agent routing matrix combining expert routes with learned recoveries from execution traces to improve multi-agent spatiotemporal reasoning.
HAGE: Harnessing Agentic Memory via RL-Driven Weighted Graph Evolution cs.AI · 2026-05-11 · unverdicted · none · ref 17
HAGE proposes a trainable weighted graph memory framework with LLM intent classification, dynamic edge modulation, and RL optimization that improves long-horizon reasoning accuracy in agentic LLMs over static baselines.
Visual Enhanced Depth Scaling for Multimodal Latent Reasoning cs.CV · 2026-04-12 · unverdicted · none · ref 45 · 3 links
Visual replay module and adaptive depth scaling improve multimodal latent reasoning, reaching SOTA benchmarks with faster inference than explicit chain-of-thought methods.
A Minimal Agent for Automated Theorem Proving cs.AI · 2026-02-27 · unverdicted · none · ref 51
A minimal agentic system achieves competitive performance in automated theorem proving with a simpler design and lower cost than state-of-the-art methods.
Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory cs.CL · 2025-11-25 · unverdicted · none · ref 151
Evo-Memory is a new streaming benchmark and evaluation framework for self-evolving memory in LLM agents, unifying over ten memory modules and introducing the ReMem pipeline for continual improvement on multi-turn and reasoning datasets.
MeTHanol: Modularized Thinking Language Models with Intermediate Layer Thinking, Decoding and Bootstrapping Reasoning cs.CL · 2024-09-18 · unverdicted · none · ref 7
MeTHanol fine-tunes an intermediate LLM layer to generate thoughts in a first pass, then uses those thoughts for a second-pass answer, showing gains on Theory of Mind and vignette tasks plus adaptation to character prompts.
Context, Reasoning, and Hierarchy: A Cost-Performance Study of Compound LLM Agent Design in an Adversarial POMDP cs.AI · 2026-05-15 · conditional · none · ref 18
In CybORG CAGE-2, programmatic state abstraction improves mean return up to 76% over raw observations while adding deliberation tools to hierarchies degrades performance up to 3.4x and increases token use.
UniMesh: Unifying 3D Mesh Understanding and Generation cs.CV · 2026-04-19 · unverdicted · none · ref 40
UniMesh unifies 3D mesh generation and understanding in one model via a Mesh Head interface, Chain of Mesh iterative editing, and an Actor-Evaluator self-reflection loop.
Enhancing LLM Problem Solving via Tutor-Student Multi-Agent Interaction cs.AI · 2026-04-10 · unverdicted · none · ref 16
A same-LLM tutor-student agent pair solves coding tasks at similar or higher accuracy than self-consistency or debate baselines while using significantly fewer tokens.
Failure Makes the Agent Stronger: Enhancing Accuracy through Structured Reflection for Reliable Tool Interactions cs.CV · 2025-09-23 · unverdicted · none · ref 24
Structured reflection makes error diagnosis and repair an explicit trainable step that improves reliability and reduces redundant calls in tool-using LLM agents.
Self-Aligned Reward: Towards Effective and Efficient Reasoners cs.LG · 2025-09-05 · unverdicted · none · ref 35
Self-aligned reward uses relative perplexity differences to encourage concise, query-specific reasoning in LLMs, yielding 4% accuracy gains and 30% lower inference cost when added to PPO or GRPO.
Trustworthy Agent Network: Trust in Agent Networks Must Be Baked In, Not Bolted On cs.AI · 2026-05-18 · unverdicted · none · ref 47
Argues that trustworthiness in Agent-to-Agent networks requires a new conceptual framework with four design pillars baked in from the beginning, as retrofitting existing single-agent methods is insufficient.
Large Language Model-Brained GUI Agents: A Survey cs.AI · 2024-11-27 · unverdicted · none · ref 258
A survey consolidating frameworks, data practices, large action models, benchmarks, applications, and research gaps in LLM-brained GUI agents.

Self-reflection in LLM agents: Effects on problem-solving performance.CoRR, abs/2405.06682

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer