AutoSearch applies RL with a self-answering reward to adaptively determine minimal sufficient search depth in agentic RAG, reducing over-searching while maintaining answer quality on complex questions.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
SRCP improves zero-shot generalization of successor representation methods in visual unsupervised reinforcement learning via saliency-guided representations and consistency policies.
OpenVLThinkerV2 applies a new Gaussian GRPO training objective with response and entropy shaping to outperform prior open-source and proprietary models on 18 visual reasoning benchmarks.
citing papers explorer
-
AutoSearch: Adaptive Search Depth for Efficient Agentic RAG via Reinforcement Learning
AutoSearch applies RL with a self-answering reward to adaptively determine minimal sufficient search depth in agentic RAG, reducing over-searching while maintaining answer quality on complex questions.
-
Saliency-Guided Representation with Consistency Policy Learning for Visual Unsupervised Reinforcement Learning
SRCP improves zero-shot generalization of successor representation methods in visual unsupervised reinforcement learning via saliency-guided representations and consistency policies.
-
OpenVLThinkerV2: A Generalist Multimodal Reasoning Model for Multi-domain Visual Tasks
OpenVLThinkerV2 applies a new Gaussian GRPO training objective with response and entropy shaping to outperform prior open-source and proprietary models on 18 visual reasoning benchmarks.