archive
Every paper Pith has read. Search by title, abstract, or pith.
1286 papers in cs.IR · page 14
-
GUI agents match outcomes but navigate search differently
Same Outcomes, Different Journeys: A Trace-Level Framework for Comparing Human and GUI-Agent Behavior in Production Search Systems
-
Automated loop lets support agent skills surpass expert versions
SkillForge: Forging Domain-Specific, Self-Evolving Agent Skills in Cloud Technical Support
-
Ensembles lift recommender accuracy 0.3-5.7% at 19-2549% energy cost
Ensembles at Any Cost? Accuracy-Energy Trade-offs in Recommender Systems
-
Learned graph memory lifts agent retrieval to 82.7 nDCG@10
Task-Adaptive Retrieval over Agentic Multi-Modal Web Histories via Learned Graph Memory
-
Reinforcement fine-tuning adds step-by-step reasoning to LLM recommenders
ReRec: Reasoning-Augmented LLM-based Recommendation Assistant via Reinforcement Fine-tuning
-
Probing flags LLM item gaps for selective knowledge fixes
Filling the Gaps: Selective Knowledge Augmentation for LLM Recommenders
-
Graph contexts raise fidelity in personalized review generation
PeReGrINE: Evaluating Personalized Review Fidelity with User Item Graph Context
-
Gradient selection trims data for adapting recommenders
Efficient Dataset Selection for Continual Adaptation of Generative Recommenders
-
Models outperform pipelines on alloy experiment extraction by tracking processing steps
LitXBench: A Benchmark for Extracting Experiments from Scientific Literature
-
LLMs pull full experiments from papers 0.37 F1 better than pipelines
LitXBench: A Benchmark for Extracting Experiments from Scientific Literature
-
Frontier models top extraction pipelines by 0.37 F1 on alloy experiments
LitXBench: A Benchmark for Extracting Experiments from Scientific Literature
-
DCD hierarchy narrows RAG scopes to lift accuracy
DCD: Domain-Oriented Design for Controlled Retrieval-Augmented Generation
-
AI search visibility requires repeated measurements
Don't Measure Once: Measuring Visibility in AI Search (GEO)
-
Hybrids improve both accuracy and diversity in recommendations
HiMARS: Hybrid multi-objective algorithms for recommender systems
-
Benchmark tests AI on comparing music across track pairs
Jamendo-MT-QA: A Benchmark for Multi-Track Comparative Music Question Answering
-
LLM framework lifts multimodal retrieval to 41.7 nDCG@10
HIVE: Query, Hypothesize, Verify An LLM Framework for Multimodal Reasoning-Intensive Retrieval
-
RL query alignment outperforms multimodal encoders on text retrieval
BRIDGE: Multimodal-to-Text Retrieval via Reinforcement-Learned Query Alignment
-
Rerank system lifts watch time and slashes latency
Dual-Rerank: Fusing Causality and Utility for Industrial Generative Reranking
-
VLM region descriptions align query rankings to lift document retrieval
ReAlign: Optimizing the Visual Document Retriever with Reasoning-Guided Fine-Grained Alignment
-
Artist catalogs double recall for new track recommendations
Leveraging Artist Catalogs for Cold-Start Music Recommendation
-
Reasoning pipeline lifts multimodal retrieval to 37.9 nDCG@10
MARVEL: Multimodal Adaptive Reasoning-intensiVe Expand-rerank and retrievaL
-
Intrinsic rewards strengthen LLM reasoning traces on complex queries
SubSearch: Intermediate Rewards for Unsupervised Guided Reasoning in Complex Retrieval
-
Agent views help AI write complex SQL queries
AV-SQL: Decomposing Complex Text-to-SQL Queries with Agentic Views
-
Persona signals from knowledge graphs boost session recommendations
Leveraging LLMs and Heterogeneous Knowledge Graphs for Persona-Driven Session-Based Recommendation
-
Calendar-time signals lift repurchase recommendation precision 8.6%
CASE: Cadence-Aware Set Encoding for Large-Scale Next Basket Repurchase Recommendation
-
Event memory retrieval produces physics-consistent actions
Event-Centric World Modeling with Memory-Augmented Retrieval for Embodied Decision-Making
-
Test checks if AI keeps facts straight across 250 stories
ATANT: An Evaluation Framework for AI Continuity
-
CubeGraph stitches per-cell vector graphs for fast hybrid spatial search
CubeGraph: Efficient Retrieval-Augmented Generation for Spatial and Temporal Data
-
LLM parser lifts missing-person data extraction F1 to 0.87 from 0.26
LLM-based Schema-Guided Extraction and Validation of Missing-Person Intelligence from Heterogeneous Data Sources
-
More data keeps improving recommender performance without saturation
The Unreasonable Effectiveness of Data for Recommender Systems
-
Retriever bias for LLM texts traces to training data
Data, Not Model: Explaining Bias toward LLM Texts in Neural Retrievers
-
Benchmark unifies evaluation across Brazilian legal collections
JU\'A -- A Benchmark for Information Retrieval in Brazilian Legal Text Collections
-
LLM rewriting cuts RAG retriever bias by 54 percent
Masking or Mitigating? Deconstructing the Impact of Query Rewriting on Retriever Biases in RAG
-
Multi-stage checks make LLM clinical extraction reliable at scale
A Multi-Stage Validation Framework for Trustworthy Large-scale Clinical Information Extraction using Large Language Models
-
LLM profiles beat paper history for reviewer matching
Beyond Paper-to-Paper: Structured Profiling and Rubric Scoring for Paper-Reviewer Matching
-
English bridges via reverse training boost cross-lingual retrieval up to 15%
CLEAR: Cross-Lingual Enhancement in Alignment via Reverse-training
-
WikiSeeker reassigns VLMs to refine queries and inspect retrieval
WikiSeeker: Rethinking the Role of Vision-Language Models in Knowledge-Based Visual Question Answering
-
LLM retrieval systems post 20% gains on old benchmarks
The LLM Effect on IR Benchmarks: A Meta-Analysis of Effectiveness, Baselines, and Contamination
-
Generative retrieval beats dense methods on LIMIT but drops with ambiguous IDs
Generative Retrieval Overcomes Limitations of Dense Retrieval but Struggles with Identifier Ambiguity
-
Topology extraction refines graphs for better heterogeneous learning
Graph Topology Information Enhanced Heterogeneous Graph Representation Learning
-
Siamese model detects semantic drift in links at 96% recall
SemLink: A Semantic-Aware Automated Test Oracle for Hyperlink Verification using Siamese Sentence-BERT
-
2.8k samples correct English bias in multilingual retrieval
Improving Semantic Proximity in Information Retrieval through Cross-Lingual Alignment
-
Graph mined from agent trajectories improves tool sequencing
SkillGraph: Graph Foundation Priors for LLM Agent Tool Sequence Recommendation
-
Retrieval disagreement adapts person search models without labels
Pretrain-then-Adapt: Uncertainty-Aware Test-Time Adaptation for Text-based Person Search
-
Perturbing single evidence items exposes hidden RAG utility patterns
CUE-R: Beyond the Final Answer in Retrieval-Augmented Generation
-
Periodic updates plus augmentation improve LLM function calling
Data-Driven Function Calling Improvements in Large Language Model for Online Financial QA
-
ReAd refines sequential predictions using retrieved collaborative items
Retrieve-then-Adapt: Retrieval-Augmented Test-Time Adaptation for Sequential Recommendation
-
LLMs build pseudo overlaps so diffusion can transfer preferences across domains
From Clues to Generation: Language-Guided Conditional Diffusion for Cross-Domain Recommendation
-
Curriculum RL aligns recommendation explanations with ratings
Curr-RLCER:Curriculum Reinforcement Learning For Coherence Explainable Recommendation
-
VCG payments plus multi-fidelity optimization maximize welfare in LLM ads
Incentive-Aware Multi-Fidelity Optimization for Generative Advertising in Large Language Models