RSEA adds a strict held-out keep-better gate to recursive self-evolution of agent artifacts, yielding monotone-safe gains or parity with the base ReAct agent on ALFWorld, GAIA, τ-bench, and WebShop.
arXiv preprint arXiv:2506.13977 , year=
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
PTE is a hardware-aware metric that better predicts actual inference latency in tool-integrated reasoning than token counts and reveals that high-PTE trajectories often have lower correctness.
SimpleSearch-VL improves Qwen3-VL multimodal agent baselines by 15.8-16 points on average using 7K total training examples and reaches parity with Gemini-3-Pro on the 30B variant.
citing papers explorer
-
Recursive Self-Evolving Agents via Held-Out Selection
RSEA adds a strict held-out keep-better gate to recursive self-evolution of agent artifacts, yielding monotone-safe gains or parity with the base ReAct agent on ALFWorld, GAIA, τ-bench, and WebShop.
-
Beyond Accuracy: Unveiling Inefficiency Patterns in Tool-Integrated Reasoning
PTE is a hardware-aware metric that better predicts actual inference latency in tool-integrated reasoning than token counts and reveals that high-PTE trajectories often have lower correctness.
-
SimpleSearch-VL: A Simple Recipe for Multimodal Agentic Deep Search
SimpleSearch-VL improves Qwen3-VL multimodal agent baselines by 15.8-16 points on average using 7K total training examples and reaches parity with Gemini-3-Pro on the 30B variant.