DeepRetrieval: Hacking Real Search Engines and Retrievers with Large Language Models via Reinforcement Learning

Jiacheng Lin; Jiawei Han; Jimeng Sun; Lang Cao; Pengcheng Jiang; Runchu Tian; SeongKu Kang; Zifeng Wang

arxiv: 2503.00223 · v3 · pith:P37DUGNAnew · submitted 2025-02-28 · 💻 cs.IR

DeepRetrieval: Hacking Real Search Engines and Retrievers with Large Language Models via Reinforcement Learning

Pengcheng Jiang , Jiacheng Lin , Lang Cao , Runchu Tian , SeongKu Kang , Zifeng Wang , Jimeng Sun , Jiawei Han This is my paper

classification 💻 cs.IR

keywords retrievalsearchdeepretrievaldatainformationlargelearningmodels

0 comments

read the original abstract

Information retrieval systems are crucial for enabling effective access to large document collections. Recent approaches have leveraged Large Language Models (LLMs) to enhance retrieval performance through query augmentation, but often rely on expensive supervised learning or distillation techniques that require significant computational resources and hand-labeled data. We introduce DeepRetrieval, a reinforcement learning (RL) approach that trains LLMs for query generation through trial and error without supervised data (reference query). Using retrieval metrics as rewards, our system generates queries that maximize retrieval performance. DeepRetrieval outperforms leading methods on literature search with 65.07% (vs. previous SOTA 24.68%) recall for publication search and 63.18% (vs. previous SOTA 32.11%) recall for trial search using real-world search engines. DeepRetrieval also dominates in evidence-seeking retrieval, classic information retrieval and SQL database search. With only 3B parameters, it outperforms industry-leading models like GPT-4o and Claude-3.5-Sonnet on 11/13 datasets. These results demonstrate that our RL approach offers a more efficient and effective paradigm for information retrieval. Our data and code are available at: https://github.com/pat-jj/DeepRetrieval.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 17 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

When More Reformulations Hurt: Avoiding Drift using Ranker Feedback
cs.IR 2026-05 unverdicted novelty 7.0

ReformIR adaptively prioritizes reformulations and documents with a surrogate model guided by ranker feedback to boost recall while suppressing drift under fixed reranking budgets.
Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses
cs.AI 2026-06 unverdicted novelty 6.0

Harness-1 uses a state-externalizing harness for RL-trained search agents and reports 0.730 average curated recall, outperforming the next open subagent by 11.4 points.
RICE-PO: Turning Retrieval Interactions into Credit Signals for Reasoning Agents
cs.CL 2026-05 unverdicted novelty 6.0

RICE-PO is a policy optimization framework that converts retrieval interactions into credit signals for latent reasoning steps in agents by selecting high-uncertainty actions as anchors and propagating credit based on...
WikiSeeker: Rethinking the Role of Vision-Language Models in Knowledge-Based Visual Question Answering
cs.CV 2026-04 unverdicted novelty 6.0

WikiSeeker boosts KB-VQA performance by using VLMs to rewrite image-informed queries for better retrieval and to decide when to route to external LLM or rely on internal VLM knowledge.
MoCo: A One-Stop Shop for Model Collaboration Research
cs.CL 2026-01 accept novelty 6.0

MoCo supplies a unified library of 26 collaboration strategies and benchmarks demonstrating average outperformance over single models in 61 percent of (model, data) pairs.
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
cs.AI 2025-09 accept novelty 6.0

Survey that defines agentic RL for LLMs via POMDPs, introduces a taxonomy of planning/tool-use/memory/reasoning capabilities and domains, and compiles open environments from over 500 papers.
Supervising the search process produces reliable and generalizable information-seeking agents
cs.CL 2025-02 unverdicted novelty 6.0

Process supervision via RAG-Gym produces more reliable and generalizable search agents, with gains driven by higher-quality queries on out-of-domain multi-hop tasks.
BashCoder-R1: Towards Robust and Explainable Bash Code Generation with Robustness-Aware Group Relative Policy Optimization
cs.SE 2026-06 unverdicted novelty 5.0

BashCoder-R1 applies CPT, L-CoT SFT, and R-GRPO to reach higher syntax, robustness, and functionality rates than baselines on the new BashBench benchmark of 952 tasks.
Agentic Environment Engineering for Large Language Models: A Survey of Environment Modeling, Synthesis, Evaluation, and Application
cs.CL 2026-06 unverdicted novelty 5.0

This survey categorizes agentic environments for LLMs by eight attributes and domains, introduces symbolic and neural synthesis paradigms with evaluation, and outlines four agent evolution pathways plus three environm...
DuMate-DeepResearch: An Auditable Multi-Agent System with Recursive Search and Rubric-Grounded Reasoning
cs.AI 2026-06 unverdicted novelty 5.0

DuMate-DeepResearch introduces a multi-agent deep research system with graph-based planning, recursive execution, and rubric optimization that reports new state-of-the-art scores of 58.03% and 61.95% on two benchmarks.
LLM-Oriented Information Retrieval: A Denoising-First Perspective
cs.IR 2026-05 unverdicted novelty 5.0

Denoising to maximize usable evidence density and verifiability is becoming the primary bottleneck in LLM-oriented information retrieval, conceptualized via a four-stage framework and addressed through a pipeline taxo...
BRIDGE: Multimodal-to-Text Retrieval via Reinforcement-Learned Query Alignment
cs.IR 2026-04 unverdicted novelty 5.0

BRIDGE reaches 29.7 nDCG@10 on MM-BRIGHT by RL-aligning multimodal queries to text and using a reasoning retriever, beating multimodal encoders and, when combined with Nomic-Vision, exceeding the best text-only retrie...
Not All Tokens Matter: Towards Efficient LLM Reasoning via Token Significance in Reinforcement Learning
cs.LG 2025-06 unverdicted novelty 5.0

Proposes token-significance and dynamic length rewards in RL to reduce LLM response length while preserving or improving reasoning correctness across benchmarks.
R$^2$-Searcher: Calibrating Retrieval and Reasoning Boundaries for Agentic Search
cs.IR 2026-06 unverdicted novelty 4.0

R²-Searcher introduces fine-grained evidence modeling, retrieval reflection, and R²PO RL to calibrate retrieval-reasoning boundaries and improve multi-hop QA performance.
LLM-Oriented Information Retrieval: A Denoising-First Perspective
cs.IR 2026-05 unverdicted novelty 4.0

Argues for a denoising-first paradigm in LLM-oriented information retrieval, framing challenges via a four-stage progression and providing a taxonomy of signal-to-noise optimization techniques across the pipeline.
Agentic Reasoning for Large Language Models
cs.AI 2026-01 unverdicted novelty 4.0

The survey structures agentic reasoning for LLMs into foundational, self-evolving, and collective multi-agent layers while distinguishing in-context orchestration from post-training optimization and reviewing applicat...
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems
cs.AI 2025-03 unverdicted novelty 2.0

This survey frames foundation agents using brain-inspired modular architectures and reviews challenges in evolution, collaboration, and safety.