Query Expansion in the Age of Pre-trained and Large Language Models: A Comprehensive Survey

· 2025 · cs.IR · arXiv 2509.07794

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Modern information retrieval must reconcile short, ambiguous queries with increasingly diverse and dynamic corpora. Query expansion (QE) remains a core technique for mitigating vocabulary mismatch, but its design space has been reshaped by pre-trained and large language models (PLMs/LLMs). This survey reviews QE methods in the PLM/LLM era and provides a unified view of the emerging landscape. We first summarize how different model families enable new expansion behaviors, including stronger contextualization, more controllable generation, and instruction-following. We then organize recent techniques along four complementary design dimensions: where expansion is injected in the pipeline, how it is grounded and interacts with corpus evidence, how it is learned or aligned, and how structured knowledge such as knowledge graphs is incorporated. Beyond taxonomy, we synthesize application patterns and deployment considerations across representative retrieval settings, highlighting practical trade-offs among effectiveness, controllability, grounding quality, and operating cost. Finally, we outline open challenges and future directions toward more reliable, safe, efficient, and continually adaptive QE under real-world constraints.

representative citing papers

Jobs' AI Exposure Should Be Measured from Evidence, Not Model Priors

cs.IR · 2026-05-14 · conditional · novelty 6.0

The authors propose a retrieval-augmented framework that grounds AI exposure labels for 18,796 O*NET occupation-task pairs in retrieved news and academic abstracts, outperforming zero-shot prompting in 72% of disagreements and aligning better with observed real-world usage.

citing papers explorer

Showing 1 of 1 citing paper.

Jobs' AI Exposure Should Be Measured from Evidence, Not Model Priors cs.IR · 2026-05-14 · conditional · none · ref 30 · internal anchor
The authors propose a retrieval-augmented framework that grounds AI exposure labels for 18,796 O*NET occupation-task pairs in retrieved news and academic abstracts, outperforming zero-shot prompting in 72% of disagreements and aligning better with observed real-world usage.

Query Expansion in the Age of Pre-trained and Large Language Models: A Comprehensive Survey

fields

years

verdicts

representative citing papers

citing papers explorer