Can we further elicit reasoning in llms? critic-guided planning with retrieval-augmentation for solving challenging tasks

Xingxuan Li, Weiwen Xu, Ruochen Zhao, Fangkai Jiao, Shafiq Joty, Lidong Bing · 2024 · arXiv 2410.01428

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

baseline 1

citation-polarity summary

baseline 1

representative citing papers

ZeroSearch: Incentivize the Search Capability of LLMs without Searching

cs.CL · 2025-05-07 · unverdicted · novelty 6.0 · 2 refs

ZeroSearch uses supervised fine-tuning to create a simulated retrieval module and curriculum-based RL rollouts that degrade document quality to train LLMs on search capabilities without real search API calls.

R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning

cs.AI · 2025-03-07 · unverdicted · novelty 6.0

R1-Searcher uses two-stage outcome-based RL to train LLMs to invoke external search systems for better reasoning without process rewards or distillation.

Supervising the search process produces reliable and generalizable information-seeking agents

cs.CL · 2025-02-19 · unverdicted · novelty 6.0

Process supervision via RAG-Gym produces more reliable and generalizable search agents, with gains driven by higher-quality queries on out-of-domain multi-hop tasks.

citing papers explorer

Showing 3 of 3 citing papers.

ZeroSearch: Incentivize the Search Capability of LLMs without Searching cs.CL · 2025-05-07 · unverdicted · none · ref 25 · 2 links
ZeroSearch uses supervised fine-tuning to create a simulated retrieval module and curriculum-based RL rollouts that degrade document quality to train LLMs on search capabilities without real search API calls.
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning cs.AI · 2025-03-07 · unverdicted · none · ref 19
R1-Searcher uses two-stage outcome-based RL to train LLMs to invoke external search systems for better reasoning without process rewards or distillation.
Supervising the search process produces reliable and generalizable information-seeking agents cs.CL · 2025-02-19 · unverdicted · none · ref 45
Process supervision via RAG-Gym produces more reliable and generalizable search agents, with gains driven by higher-quality queries on out-of-domain multi-hop tasks.

Can we further elicit reasoning in llms? critic-guided planning with retrieval-augmentation for solving challenging tasks

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer