arXiv preprint arXiv:2509.22391 , year=

Do LLM Agents Know How to Ground, Recover, Assess? A Benchmark for Epistemic Competence in Information-Seeking Agents , author= · 2025 · arXiv 2509.22391

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

representative citing papers

Co-ReAct: Rubrics as Step-Level Collaborators for ReAct Agents

cs.AI · 2026-05-22 · unverdicted · novelty 7.0

Co-ReAct adds step-level rubric guidance to ReAct agents via a GRPO-trained generator using list-wise ranking rewards, yielding consistent gains on DeepResearchBench and SQA-CS-V2.

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

cs.CL · 2026-05-18 · unverdicted · novelty 7.0

REFLECT benchmark shows current LLM judges achieve below 55% accuracy detecting failures in evidence-based research agents, especially on evidence verification.

When Planning Fails Despite Correct Execution: On Epistemic Calibration for LLM-Based Multi-Agent Systems

cs.AI · 2026-05-22 · unverdicted · novelty 6.0

Introduces EPC-AW to mitigate epistemic miscalibration in LLM multi-agent planning via consistency-based selection and refinement, reporting 9.75% average success improvement.

citing papers explorer

Showing 3 of 3 citing papers.

Co-ReAct: Rubrics as Step-Level Collaborators for ReAct Agents cs.AI · 2026-05-22 · unverdicted · none · ref 30
Co-ReAct adds step-level rubric guidance to ReAct agents via a GRPO-trained generator using list-wise ranking rewards, yielding consistent gains on DeepResearchBench and SQA-CS-V2.
Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents? cs.CL · 2026-05-18 · unverdicted · none · ref 46
REFLECT benchmark shows current LLM judges achieve below 55% accuracy detecting failures in evidence-based research agents, especially on evidence verification.
When Planning Fails Despite Correct Execution: On Epistemic Calibration for LLM-Based Multi-Agent Systems cs.AI · 2026-05-22 · unverdicted · none · ref 22
Introduces EPC-AW to mitigate epistemic miscalibration in LLM multi-agent planning via consistency-based selection and refinement, reporting 9.75% average success improvement.

arXiv preprint arXiv:2509.22391 , year=

fields

years

verdicts

representative citing papers

citing papers explorer