archive

Every paper Pith has read. Search by title, abstract, or pith.

1286 papers in cs.IR · page 14

cs.IR 2026-04-09 reviewed

GUI agents match outcomes but navigate search differently
Same Outcomes, Different Journeys: A Trace-Level Framework for Comparing Human and GUI-Agent Behavior in Production Search Systems

Maria Movin +3
cs.IR 2026-04-09 reviewed

Automated loop lets support agent skills surpass expert versions
SkillForge: Forging Domain-Specific, Self-Evolving Agent Skills in Cloud Technical Support

Xingyan Liu +5
cs.IR 2026-04-09 reviewed

Ensembles lift recommender accuracy 0.3-5.7% at 19-2549% energy cost
Ensembles at Any Cost? Accuracy-Energy Trade-offs in Recommender Systems

Jannik Nitschke +2
cs.IR 2026-04-09 reviewed

Learned graph memory lifts agent retrieval to 82.7 nDCG@10
Task-Adaptive Retrieval over Agentic Multi-Modal Web Histories via Learned Graph Memory

Saman Forouzandeh +2
cs.IR 2026-04-09 reviewed

Reinforcement fine-tuning adds step-by-step reasoning to LLM recommenders
ReRec: Reasoning-Augmented LLM-based Recommendation Assistant via Reinforcement Fine-tuning

Jiani Huang +4
cs.IR 2026-04-09 reviewed

Probing flags LLM item gaps for selective knowledge fixes
Filling the Gaps: Selective Knowledge Augmentation for LLM Recommenders

Jaehyun Lee +3
cs.IR 2026-04-09 reviewed

Graph contexts raise fidelity in personalized review generation
PeReGrINE: Evaluating Personalized Review Fidelity with User Item Graph Context

Steven Au +1
cs.IR 2026-04-09 reviewed

Gradient selection trims data for adapting recommenders
Efficient Dataset Selection for Continual Adaptation of Generative Recommenders

Cathy Jiao +9
cs.IR 2026-04-08 reviewed

Models outperform pipelines on alloy experiment extraction by tracking processing steps
LitXBench: A Benchmark for Extracting Experiments from Scientific Literature

Curtis Chong +1
cs.IR 2026-04-08 reviewed

LLMs pull full experiments from papers 0.37 F1 better than pipelines
LitXBench: A Benchmark for Extracting Experiments from Scientific Literature

Curtis Chong +1
cs.IR 2026-04-08 reviewed

Frontier models top extraction pipelines by 0.37 F1 on alloy experiments
LitXBench: A Benchmark for Extracting Experiments from Scientific Literature

Curtis Chong +1
cs.IR 2026-04-08 reviewed

DCD hierarchy narrows RAG scopes to lift accuracy
DCD: Domain-Oriented Design for Controlled Retrieval-Augmented Generation

Valerii Kovalskii +4
cs.IR 2026-04-08 reviewed

AI search visibility requires repeated measurements
Don't Measure Once: Measuring Visibility in AI Search (GEO)

Julius Schulte +2
cs.IR 2026-04-08 reviewed

Hybrids improve both accuracy and diversity in recommendations
HiMARS: Hybrid multi-objective algorithms for recommender systems

Elaheh Lotfian +1
cs.IR 2026-04-08 reviewed

Benchmark tests AI on comparing music across track pairs
Jamendo-MT-QA: A Benchmark for Multi-Track Comparative Music Question Answering

Junyoung Koh +7
cs.IR 2026-04-08 reviewed

LLM framework lifts multimodal retrieval to 41.7 nDCG@10
HIVE: Query, Hypothesize, Verify An LLM Framework for Multimodal Reasoning-Intensive Retrieval

Mahmoud Abdalla +5
cs.IR 2026-04-08 reviewed

RL query alignment outperforms multimodal encoders on text retrieval
BRIDGE: Multimodal-to-Text Retrieval via Reinforcement-Learned Query Alignment

Mohamed Darwish Mounis +6
cs.IR 2026-04-08 reviewed

Rerank system lifts watch time and slashes latency
Dual-Rerank: Fusing Causality and Utility for Industrial Generative Reranking

Chao Zhang +7
cs.IR 2026-04-08 reviewed

VLM region descriptions align query rankings to lift document retrieval
ReAlign: Optimizing the Visual Document Retriever with Reasoning-Guided Fine-Grained Alignment

Hao Yang +8
cs.IR 2026-04-08 reviewed

Artist catalogs double recall for new track recommendations
Leveraging Artist Catalogs for Cold-Start Music Recommendation

Yan-Martin Tamm +6
cs.IR 2026-04-08 reviewed

Reasoning pipeline lifts multimodal retrieval to 37.9 nDCG@10
MARVEL: Multimodal Adaptive Reasoning-intensiVe Expand-rerank and retrievaL

Mahmoud SalahEldin Kasem +5
cs.IR 2026-04-08 reviewed

Intrinsic rewards strengthen LLM reasoning traces on complex queries
SubSearch: Intermediate Rewards for Unsupervised Guided Reasoning in Complex Retrieval

Roxana Petcu +2
cs.DB 2026-04-08 reviewed

Agent views help AI write complex SQL queries
AV-SQL: Decomposing Complex Text-to-SQL Queries with Agentic Views

Minh Tam Pham +5
cs.IR 2026-04-08 reviewed

Persona signals from knowledge graphs boost session recommendations
Leveraging LLMs and Heterogeneous Knowledge Graphs for Persona-Driven Session-Based Recommendation

Muskan Gupta +2
cs.IR 2026-04-08 reviewed

Calendar-time signals lift repurchase recommendation precision 8.6%
CASE: Cadence-Aware Set Encoding for Large-Scale Next Basket Repurchase Recommendation

Yanan Cao +5
cs.LG 2026-04-08 reviewed

Event memory retrieval produces physics-consistent actions
Event-Centric World Modeling with Memory-Augmented Retrieval for Embodied Decision-Making

Zhaowen Fan +1
cs.AI 2026-04-08 reviewed

Test checks if AI keeps facts straight across 250 stories
ATANT: An Evaluation Framework for AI Continuity

Samuel Sameer Tanguturi
cs.DB 2026-04-08 reviewed

CubeGraph stitches per-cell vector graphs for fast hybrid spatial search
CubeGraph: Efficient Retrieval-Augmented Generation for Spatial and Temporal Data

Mingyu Yang +2
cs.CL 2026-04-08 reviewed

LLM parser lifts missing-person data extraction F1 to 0.87 from 0.26
LLM-based Schema-Guided Extraction and Validation of Missing-Person Intelligence from Heterogeneous Data Sources

Joshua Castillo +1
cs.IR 2026-04-07 reviewed

More data keeps improving recommender performance without saturation
The Unreasonable Effectiveness of Data for Recommender Systems

Youssef Abdou
cs.IR 2026-04-07 reviewed

Retriever bias for LLM texts traces to training data
Data, Not Model: Explaining Bias toward LLM Texts in Neural Retrievers

Wei Huang +5
cs.IR 2026-04-07 reviewed

Benchmark unifies evaluation across Brazilian legal collections
JU\'A -- A Benchmark for Information Retrieval in Brazilian Legal Text Collections

Jayr Pereira +4
cs.IR 2026-04-07 reviewed

LLM rewriting cuts RAG retriever bias by 54 percent
Masking or Mitigating? Deconstructing the Impact of Query Rewriting on Retriever Biases in RAG

Agam Goyal +5
cs.CL 2026-04-07 reviewed

Multi-stage checks make LLM clinical extraction reliable at scale
A Multi-Stage Validation Framework for Trustworthy Large-scale Clinical Information Extraction using Large Language Models

Maria Mahbub +11
cs.IR 2026-04-07 reviewed

LLM profiles beat paper history for reviewer matching
Beyond Paper-to-Paper: Structured Profiling and Rubric Scoring for Paper-Reviewer Matching

Yicheng Pan +3
cs.CL 2026-04-07 reviewed

English bridges via reverse training boost cross-lingual retrieval up to 15%
CLEAR: Cross-Lingual Enhancement in Alignment via Reverse-training

Seungyoon Lee +5
cs.CV 2026-04-07 reviewed

WikiSeeker reassigns VLMs to refine queries and inspect retrieval
WikiSeeker: Rethinking the Role of Vision-Language Models in Knowledge-Based Visual Question Answering

Yingjian Zhu +5
cs.IR 2026-04-07 reviewed

LLM retrieval systems post 20% gains on old benchmarks
The LLM Effect on IR Benchmarks: A Meta-Analysis of Effectiveness, Baselines, and Contamination

Moritz Staudinger +2
cs.IR 2026-04-07 reviewed

Generative retrieval beats dense methods on LIMIT but drops with ambiguous IDs
Generative Retrieval Overcomes Limitations of Dense Retrieval but Struggles with Identifier Ambiguity

Adrian Bracher +1
cs.LG 2026-04-07 reviewed

Topology extraction refines graphs for better heterogeneous learning
Graph Topology Information Enhanced Heterogeneous Graph Representation Learning

He Zhao +3
cs.SE 2026-04-07 reviewed

Siamese model detects semantic drift in links at 96% recall
SemLink: A Semantic-Aware Automated Test Oracle for Hyperlink Verification using Siamese Sentence-BERT

Guan-Yan Yang +4
cs.IR 2026-04-07 reviewed

2.8k samples correct English bias in multilingual retrieval
Improving Semantic Proximity in Information Retrieval through Cross-Lingual Alignment

Seongtae Hong +4
cs.AI 2026-04-07 reviewed

Graph mined from agent trajectories improves tool sequencing
SkillGraph: Graph Foundation Priors for LLM Agent Tool Sequence Recommendation

Hao Liu +1
cs.IR 2026-04-07 reviewed

Retrieval disagreement adapts person search models without labels
Pretrain-then-Adapt: Uncertainty-Aware Test-Time Adaptation for Text-based Person Search

Jiahao Zhang +3
cs.IR 2026-04-07 reviewed

Perturbing single evidence items exposes hidden RAG utility patterns
CUE-R: Beyond the Final Answer in Retrieval-Augmented Generation

Siddharth Jain +1
cs.IR 2026-04-07 reviewed

Periodic updates plus augmentation improve LLM function calling
Data-Driven Function Calling Improvements in Large Language Model for Online Financial QA

Xing Tang +9
cs.IR 2026-04-07 reviewed

ReAd refines sequential predictions using retrieved collaborative items
Retrieve-then-Adapt: Retrieval-Augmented Test-Time Adaptation for Sequential Recommendation

Xing Tang +8
cs.IR 2026-04-07 reviewed

LLMs build pseudo overlaps so diffusion can transfer preferences across domains
From Clues to Generation: Language-Guided Conditional Diffusion for Cross-Domain Recommendation

Ziang Lu +3
cs.IR 2026-04-07 reviewed

Curriculum RL aligns recommendation explanations with ratings
Curr-RLCER:Curriculum Reinforcement Learning For Coherence Explainable Recommendation

Xiangchen Pan +1
cs.GT 2026-04-07 reviewed

VCG payments plus multi-fidelity optimization maximize welfare in LLM ads
Incentive-Aware Multi-Fidelity Optimization for Generative Advertising in Large Language Models

Jiayuan Liu +6