arXiv preprint arXiv:2506.13977 , year=

CRITICTOOL: Evaluating Self-Critique Capabilities of Large Language Models in Tool-Calling Error Scenarios , author= · 2024 · arXiv 2506.13977

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

representative citing papers

Recursive Self-Evolving Agents via Held-Out Selection

cs.AI · 2026-06-17 · unverdicted · novelty 6.0

RSEA adds a strict held-out keep-better gate to recursive self-evolution of agent artifacts, yielding monotone-safe gains or parity with the base ReAct agent on ALFWorld, GAIA, τ-bench, and WebShop.

Beyond Accuracy: Unveiling Inefficiency Patterns in Tool-Integrated Reasoning

cs.PF · 2026-04-07 · unverdicted · novelty 6.0

PTE is a hardware-aware metric that better predicts actual inference latency in tool-integrated reasoning than token counts and reveals that high-PTE trajectories often have lower correctness.

SimpleSearch-VL: A Simple Recipe for Multimodal Agentic Deep Search

cs.CV · 2026-06-30 · unverdicted · novelty 4.0

SimpleSearch-VL improves Qwen3-VL multimodal agent baselines by 15.8-16 points on average using 7K total training examples and reaches parity with Gemini-3-Pro on the 30B variant.

citing papers explorer

Showing 3 of 3 citing papers after filters.

Recursive Self-Evolving Agents via Held-Out Selection cs.AI · 2026-06-17 · unverdicted · none · ref 8
RSEA adds a strict held-out keep-better gate to recursive self-evolution of agent artifacts, yielding monotone-safe gains or parity with the base ReAct agent on ALFWorld, GAIA, τ-bench, and WebShop.
Beyond Accuracy: Unveiling Inefficiency Patterns in Tool-Integrated Reasoning cs.PF · 2026-04-07 · unverdicted · none · ref 4
PTE is a hardware-aware metric that better predicts actual inference latency in tool-integrated reasoning than token counts and reveals that high-PTE trajectories often have lower correctness.
SimpleSearch-VL: A Simple Recipe for Multimodal Agentic Deep Search cs.CV · 2026-06-30 · unverdicted · none · ref 20
SimpleSearch-VL improves Qwen3-VL multimodal agent baselines by 15.8-16 points on average using 7K total training examples and reaches parity with Gemini-3-Pro on the 30B variant.

arXiv preprint arXiv:2506.13977 , year=

fields

years

verdicts

representative citing papers

citing papers explorer