hub

Large language models for information retrieval: A survey

Yutao Zhu, Huaying Yuan, Shuting Wang, Jiongnan Liu, Wenhan Liu, Chenlong Deng, Zhicheng Dou, Ji-Rong Wen · 2023 · arXiv 2308.07107

17 Pith papers cite this work. Polarity classification is still indexing.

17 Pith papers citing it

read on arXiv browse 17 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2

citation-polarity summary

background 1 unclear 1

representative citing papers

ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via Error Detection

cs.CL · 2024-10-06 · unverdicted · novelty 8.0

ErrorRadar is a new benchmark of 2,500 multimodal K-12 math problems for MLLM error step identification and categorization, where GPT-4o trails human experts by ~10%.

ResRank: Unifying Retrieval and Listwise Reranking via End-to-End Joint Training with Residual Passage Compression

cs.IR · 2026-04-24 · conditional · novelty 7.0

ResRank unifies retrieval and listwise reranking by compressing passages to one token each, using residual connections and cosine-similarity scoring, achieving competitive effectiveness on TREC DL and BEIR benchmarks with zero generated tokens.

HeadRank: Decoding-Free Passage Reranking via Preference-Aligned Attention Heads

cs.IR · 2026-04-19 · unverdicted · novelty 7.0

HeadRank lifts preference optimization into attention space via entropy-regularized head selection and distribution regularizers to sharpen discriminability for efficient listwise reranking.

Evaluating Tool-Using Language Agents: Judge Reliability, Propagation Cascades, and Runtime Mitigation in AgentProp-Bench

cs.AI · 2026-04-17 · conditional · novelty 7.0

AgentProp-Bench shows substring judging agrees with humans at kappa=0.049, LLM ensemble at 0.432, bad-parameter injection propagates with ~0.62 probability, rejection and recovery are independent, and a runtime fix cuts hallucinations 23pp on GPT-4o-mini but not Gemini-2.0-Flash.

Modular Representation Compression: Adapting LLMs for Efficient and Effective Recommendations

cs.IR · 2026-04-20 · unverdicted · novelty 6.0

LLMs exhibit mid-layer representation advantage for recommendations; MARC compresses representations modularly to reduce costs while improving performance, as shown in a large-scale online advertising deployment.

Agentic Multi-Source Grounding for Enhanced Query Intent Understanding: A DoorDash Case Study

cs.AI · 2026-03-02 · unverdicted · novelty 6.0

An agentic multi-source grounding system for marketplace query intent achieves 90.7% accuracy on long-tail queries at DoorDash by combining catalog grounding, web search, and deterministic disambiguation, outperforming baselines by up to 13pp.

Access Paths for Efficient Ordering with Large Language Models

cs.DB · 2025-08-30 · unverdicted · novelty 6.0

Introduces the LLM ORDER BY semantic operator with algorithmic improvements, a semantic-aware external merge sort, and a budget-aware optimizer that selects near-optimal access paths for LLM-based ordering.

ProRank: Prompt Warmup via Reinforcement Learning for Small Language Models Reranking

cs.IR · 2025-06-04 · unverdicted · novelty 6.0

ProRank uses RL-based prompt warmup and fine-grained scoring to train small language models that surpass LLM rerankers on BEIR.

RankFlow: A Multi-Role Collaborative Reranking Workflow Utilizing Large Language Models

cs.IR · 2025-02-02 · unverdicted · novelty 6.0

RankFlow deploys four LLM roles in sequence to rewrite queries, generate pseudo-answers, summarize passages, and rerank candidates, outperforming prior methods on TREC-DL, BEIR, and NovelEval.

Active Learners as Efficient PRP Rerankers

cs.LG · 2026-05-14 · unverdicted · novelty 5.0 · 2 refs

Active learning applied to noisy LLM pairwise judgments improves NDCG@10 per call in budget-constrained reranking and enables unbiased aggregation via a randomized-direction single-call oracle.

From Unstructured to Structured: LLM-Guided Attribute Graphs for Entity Search and Ranking

cs.IR · 2026-04-30 · unverdicted · novelty 5.0

LLM-built attribute graphs enable zero-shot entity ranking in e-commerce with over 5% average precision gains and 57% less token usage per product compared to raw-text baselines.

Efficient Listwise Reranking with Compressed Document Representations

cs.IR · 2026-04-29 · unverdicted · novelty 5.0

RRK compresses documents to multi-token embeddings for efficient listwise reranking, enabling an 8B model to achieve 3x-18x speedups over smaller models with comparable or better effectiveness.

POPI: Personalizing LLMs via Optimized Natural Language Preference Inference

cs.CL · 2025-10-17 · unverdicted · novelty 5.0

POPI distills user preferences into reusable natural-language summaries via a shared inference model and conditions a generator on them, trained jointly with RL to improve personalization quality while cutting context length by up to 10x on benchmarks.

Improving Korean-English Cross-Lingual Retrieval: A Data-Centric Study of Language Composition and Model Merging

cs.IR · 2025-07-11 · unverdicted · novelty 5.0

Language composition in training data creates opposing effects on CLIR and mono-IR performance for Korean-English retrieval, which model merging can partially resolve.

A Case-Driven Multi-Agent Framework for E-Commerce Search Relevance

cs.IR · 2026-05-07 · unverdicted · novelty 3.0

A case-driven multi-agent system automates the full pipeline of bad-case detection, annotation, and resolution for e-commerce search relevance using Annotator, Optimizer, and User agents plus supporting components.

A Survey on the Memory Mechanism of Large Language Model based Agents

cs.AI · 2024-04-21 · accept · novelty 3.0

A systematic review of memory designs, evaluation methods, applications, limitations, and future directions for LLM-based agents.

A Survey on Retrieval-Augmented Text Generation for Large Language Models

cs.IR · 2024-04-17 · unverdicted · novelty 2.0

A survey that categorizes RAG methods for LLMs into four retrieval-centric stages, reviews their evolution and evaluation, and outlines challenges and future directions.

citing papers explorer

Showing 17 of 17 citing papers.

ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via Error Detection cs.CL · 2024-10-06 · unverdicted · none · ref 85
ErrorRadar is a new benchmark of 2,500 multimodal K-12 math problems for MLLM error step identification and categorization, where GPT-4o trails human experts by ~10%.
ResRank: Unifying Retrieval and Listwise Reranking via End-to-End Joint Training with Residual Passage Compression cs.IR · 2026-04-24 · conditional · none · ref 1
ResRank unifies retrieval and listwise reranking by compressing passages to one token each, using residual connections and cosine-similarity scoring, achieving competitive effectiveness on TREC DL and BEIR benchmarks with zero generated tokens.
HeadRank: Decoding-Free Passage Reranking via Preference-Aligned Attention Heads cs.IR · 2026-04-19 · unverdicted · none · ref 48
HeadRank lifts preference optimization into attention space via entropy-regularized head selection and distribution regularizers to sharpen discriminability for efficient listwise reranking.
Evaluating Tool-Using Language Agents: Judge Reliability, Propagation Cascades, and Runtime Mitigation in AgentProp-Bench cs.AI · 2026-04-17 · conditional · none · ref 30
AgentProp-Bench shows substring judging agrees with humans at kappa=0.049, LLM ensemble at 0.432, bad-parameter injection propagates with ~0.62 probability, rejection and recovery are independent, and a runtime fix cuts hallucinations 23pp on GPT-4o-mini but not Gemini-2.0-Flash.
Modular Representation Compression: Adapting LLMs for Efficient and Effective Recommendations cs.IR · 2026-04-20 · unverdicted · none · ref 84
LLMs exhibit mid-layer representation advantage for recommendations; MARC compresses representations modularly to reduce costs while improving performance, as shown in a large-scale online advertising deployment.
Agentic Multi-Source Grounding for Enhanced Query Intent Understanding: A DoorDash Case Study cs.AI · 2026-03-02 · unverdicted · none · ref 30
An agentic multi-source grounding system for marketplace query intent achieves 90.7% accuracy on long-tail queries at DoorDash by combining catalog grounding, web search, and deterministic disambiguation, outperforming baselines by up to 13pp.
Access Paths for Efficient Ordering with Large Language Models cs.DB · 2025-08-30 · unverdicted · none · ref 80
Introduces the LLM ORDER BY semantic operator with algorithmic improvements, a semantic-aware external merge sort, and a budget-aware optimizer that selects near-optimal access paths for LLM-based ordering.
ProRank: Prompt Warmup via Reinforcement Learning for Small Language Models Reranking cs.IR · 2025-06-04 · unverdicted · none · ref 10
ProRank uses RL-based prompt warmup and fine-grained scoring to train small language models that surpass LLM rerankers on BEIR.
RankFlow: A Multi-Role Collaborative Reranking Workflow Utilizing Large Language Models cs.IR · 2025-02-02 · unverdicted · none · ref 79
RankFlow deploys four LLM roles in sequence to rewrite queries, generate pseudo-answers, summarize passages, and rerank candidates, outperforming prior methods on TREC-DL, BEIR, and NovelEval.
Active Learners as Efficient PRP Rerankers cs.LG · 2026-05-14 · unverdicted · none · ref 41 · 2 links
Active learning applied to noisy LLM pairwise judgments improves NDCG@10 per call in budget-constrained reranking and enables unbiased aggregation via a randomized-direction single-call oracle.
From Unstructured to Structured: LLM-Guided Attribute Graphs for Entity Search and Ranking cs.IR · 2026-04-30 · unverdicted · none · ref 35
LLM-built attribute graphs enable zero-shot entity ranking in e-commerce with over 5% average precision gains and 57% less token usage per product compared to raw-text baselines.
Efficient Listwise Reranking with Compressed Document Representations cs.IR · 2026-04-29 · unverdicted · none · ref 37
RRK compresses documents to multi-token embeddings for efficient listwise reranking, enabling an 8B model to achieve 3x-18x speedups over smaller models with comparable or better effectiveness.
POPI: Personalizing LLMs via Optimized Natural Language Preference Inference cs.CL · 2025-10-17 · unverdicted · none · ref 57
POPI distills user preferences into reusable natural-language summaries via a shared inference model and conditions a generator on them, trained jointly with RL to improve personalization quality while cutting context length by up to 10x on benchmarks.
Improving Korean-English Cross-Lingual Retrieval: A Data-Centric Study of Language Composition and Model Merging cs.IR · 2025-07-11 · unverdicted · none · ref 46
Language composition in training data creates opposing effects on CLIR and mono-IR performance for Korean-English retrieval, which model merging can partially resolve.
A Case-Driven Multi-Agent Framework for E-Commerce Search Relevance cs.IR · 2026-05-07 · unverdicted · none · ref 34
A case-driven multi-agent system automates the full pipeline of bad-case detection, annotation, and resolution for e-commerce search relevance using Annotator, Optimizer, and User agents plus supporting components.
A Survey on the Memory Mechanism of Large Language Model based Agents cs.AI · 2024-04-21 · accept · none · ref 37
A systematic review of memory designs, evaluation methods, applications, limitations, and future directions for LLM-based agents.
A Survey on Retrieval-Augmented Text Generation for Large Language Models cs.IR · 2024-04-17 · unverdicted · none · ref 173
A survey that categorizes RAG methods for LLMs into four retrieval-centric stages, reviews their evolution and evaluation, and outlines challenges and future directions.

Large language models for information retrieval: A survey

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer