archive

Every paper Pith has read. Search by title, abstract, or pith.

1286 papers in cs.IR · page 1

cs.IR 2026-05-22 reviewed

One model ranks items, carousels and search via user stories
TubiFM: Unified Item, Carousel, and Search Ranking for Streaming Discovery

Alexandre Salle +9
cs.IR 2026-05-22 reviewed

Generative search engines cite AI sources in 16% of cases
Synthetic Sources?: Auditing Generative Search Engine Citations for Evidence of AI-Generated Sources

Mowafak Allaham +1
cs.DL 2026-05-22 reviewed

University of Nigeria produced 6,353 papers in 2014-2023
Tracking a Decade of Research at the University of Nigeria, Nsukka: A Scientometric Analysis (2014-2023)

Muneer Ahmad +1
cs.IR 2026-05-22 reviewed

Three-phase recipe keeps 98% precision in 190M retrieval models
HARNESS-LM: A Three-Phase Training Recipe for Harnessing SLMs in Sponsored Search Retrieval

Vipul Gupta +6
cs.LG 2026-05-22 reviewed

Low dimension suffices for near-max retrieval margins
Is Dimensionality a Barrier for Retrieval Models?

Kiril Bangachev +3

4 Piths
cs.IR 2026-05-22 reviewed

Trajectory merging cuts error buildup in iterative DPO
TPMM-DPO: Trajectory-aware Preference-guided Model Merging for Iterative Direct Preference Optimization

Lingling Fu +1
cs.IR 2026-05-22 reviewed

1B generative recommender backbone beats 2M baseline on MRR
Towards Generalizable and Efficient Large-Scale Generative Recommenders

Qiuling Xu +2
cs.IR 2026-05-22 reviewed

Asymmetric head-to-tail transfer lifts CTR in long-tail rec
From Head to Tail: Asymmetric Knowledge Transfer in Long-tail Recommendation with Generative Semantic IDs

Chenyi Yan +5
cs.LG 2026-05-22 reviewed

RankElastor stabilizes rank trajectories for scaled recommenders
Expand More, Shrink Less: Shaping Effective-Rank Dynamics for Dense Scaling in Recommendation

Guoming Li +9
cs.LG 2026-05-21 reviewed

Two-stage pipeline keeps sensitive mobile data on device for recommendations
Building a privacy-preserving Federated Recommender system for mobile devices

Aasheesh Singh
cs.IR 2026-05-21 reviewed

LaTeX source yields better RAG chunks than PDF text
AI-Friendly LaTeX: Using LaTeX Code as a Knowledge Source for Retrieval-Augmented Generation

Tom Verhoeff
cs.IR 2026-05-21 reviewed

Tables in model cards raise search coverage
Diversed Model Discovery via Structured Table Discovery

Zhengyuan Dong +1
cs.CL 2026-05-21 reviewed

Any embedding model can rank first with the right prompt
One prompt is not enough: Instruction Sensitivity Undermines Embedding Model Evaluation

Yevhen Kostiuk +1
cs.AI 2026-05-21 reviewed

Self-distillation drives search reasoners to 0.440 EM
Search-E1: Self-Distillation Drives Self-Evolution in Search-Augmented Reasoning

Zihan Liang +6
cs.CL 2026-05-21 reviewed

Generative re-ranker lifts biomedical linking accuracy 3-24%
BeLink: Biomedical Entity Linking Meets Generative Re-Ranking

Darya Shlyk +2
cs.IR 2026-05-21 reviewed

Chain-of-thought steps lift generative retrieval by 6.86% on multi-hop tasks
Integrating Chain-of-Thought into Generative Retrieval: A Preliminary Study

Wenhao Zhang +4
cs.CV 2026-05-21 reviewed

OMR tops matched music score search
Direct content-based retrieval from music scores images

Noelia Luna-Barahona +4
cs.IR 2026-05-21 reviewed

Calibration step lifts multimodal recs using only training overlaps
Behavior-Guided Candidate Calibration for Multimodal Recommendation

Zesheng Li +2
cs.CL 2026-05-21 reviewed

RoBERTa reaches 93 percent accuracy on IMDb sentiment task
From TF-IDF to Transformers: A Comparative and Ensemble Approach to Sentiment Classification

Dip Biswas Shanto +3
cs.IR 2026-05-21 reviewed

One autoregressive model handles both recommendations and chat
Generative Conversational Recommender System

Sixiao Zhang +2
cs.IR 2026-05-21 reviewed

LLM semantic retrieval raises ad recommendation stability
LLM Retrieval for Stable and Predictable Ad Recommendations

Vinodh Kumar Sunkara +15
cs.IR 2026-05-21 reviewed

Rec head supplies RL rewards to align LLM reasoning with item predictions
Reinforced Preference Optimization for Reasoning-Augmented Recommendations

Jingtong Gao +9
cs.IR 2026-05-20 reviewed

Seed-guided LLMs match real query lengths 7.5x better than baselines
Bridging the Cold-Start Gap: LLM-Powered Synthetic Data Generation for Natural Language Search at Airbnb

Wendy Ran Wei +11
cs.AI 2026-05-20 reviewed

43M-paper graph gives AI agents deterministic cross-field links
SciAtlas: A Large-Scale Knowledge Graph for Automated Scientific Research

Shuofei Qiao +10
cs.IR 2026-05-20 reviewed

Explicit principles improve legal citation retrieval
SG-LegalCite: A Principle-Augmented Benchmark for Legal Citation Retrieval in Singapore Law

Shannon Lee Yueh Ern +4
cs.IR 2026-05-20 reviewed

Tests show memory systems mismatch retrieval and answers under conflicts
MemConflict: Evaluating Long-Term Memory Systems Under Memory Conflicts

Zhen Tao +6
cs.CL 2026-05-20 reviewed

7B open LLMs run GraphRAG locally for EHR schema queries
GraphRAG on Consumer Hardware: Benchmarking Local LLMs for Healthcare EHR Schema Retrieval

Peter Fernandes +1
cs.IR 2026-05-20 reviewed

Dual memory layers give LLMs unbounded conversation context
CALMem : Application-Layer Dual Memory for Conversational AI

Rajendra Narayan Jena +2
cs.CL 2026-05-20 reviewed

Self-limiting losses compress embeddings without overfitting
DIVE: Embedding Compression via Self-Limiting Gradient Updates

Dongfang Zhao
cs.IR 2026-05-20 reviewed

Middle-layer compression raises reranker speed up to 116%
Layer-wise Token Compression for Efficient Document Reranking

Shengyao Zhuang +2
cs.IR 2026-05-20 reviewed

Middle-layer compression speeds document reranking up to 116%
Layer-wise Token Compression for Efficient Document Reranking

Shengyao Zhuang +2
cs.LG 2026-05-19 reviewed

Gating ensemble harvests reliable negatives for fraud models
SAGE: Scalable Automatic Gating Ensemble for Confident Negative Harvesting in Fraud Detection

Sudheer Tubati +1
cs.CR 2026-05-19 reviewed

Bidirectional ranking cuts RAG poisoning attacks by 54%
BiRD: A Bidirectional Ranking Defense Mechanism for Retrieval Augmented Generation

Chengcai Gao +4
cs.CR 2026-05-19 reviewed

Colluding accounts scale RAG privacy loss by sqrt(k)
Auditing Privacy in Multi-Tenant RAG under Account Collusion

Florian A. D. Burnat +1
cs.IR 2026-05-19 reviewed

Multi-source sampling breaks negative feedback loops in recommendations
Divergence Meets Consensus: A Multi-Source Negative Sampling Framework for Sequential Recommendation

Yuanzi Li +6
cs.IR 2026-05-19 reviewed

Wacky weights help SPLADE mainly inside the training domain
Understanding Wacky Weights: A Dissection of SPLADE's Learned Term Importance

Gregory Polyakov +2
cs.IR 2026-05-18 reviewed

TF-IDF beats GPT-4o at finding astronomy expert reviewers
Traditional statistical representations outperform generative AI in identifying expert peer reviewers

Vicente Amado Olivo +7
cs.IR 2026-05-18 reviewed

q-log odds lift BM25 NDCG@10 by 89% on code search
Improving BM25 Code Retrieval Under Fixed Generic Tokenization: Adaptive q-Log Odds as a Drop-In BM25 Fix

Santosh Kumar Radha +1
cs.CL 2026-05-18 reviewed

Wiki beats RAG on cross-paper links but costs more tokens
Vector RAG vs LLM-Compiled Wiki: A Preregistered Comparison on a Small Multi-Domain Research

Theodore O. Cochran
cs.IR 2026-05-18 reviewed

Text guidance focuses full images for cropped-query e-commerce search
TIGER-FG: Text-Guided Implicit Fine-Grained Grounding for E-commerce Retrieval

Xinyu Sun +7
cs.AI 2026-05-18 reviewed

Self-distillation supplies step-level search signals from own rollouts
SD-Search: On-Policy Hindsight Self-Distillation for Search-Augmented Reasoning

Yufei Ma +8
cs.CL 2026-05-18 reviewed

Preference focus cuts device RAG memory 2400 times
From Volume to Value: Preference-Aligned Memory Construction for On-Device RAG

Changmin Lee +2
cs.IR 2026-05-18 reviewed

Prompting methods raise table QA accuracy without training
Efficient Table QA via TableGrid Navigation and Progressive Inference Prompting

Amritansh Maurya +3
cs.IR 2026-05-18 reviewed

RCTEA aligns temporal entities via richness-guided fusion
RCTEA: Richness-guided Co-training for Temporal Entity Alignment

Jiayun Li +5
cs.CL 2026-05-18 reviewed

SomaliWeb v1 delivers 303M tokens of cleaned Somali text
SomaliWeb v1: A Quality-Filtered Somali Web Corpus with a Matched Tokenizer and a Public Language-Identification Benchmark

Khalid Yusuf Dahir
cs.IR 2026-05-18 reviewed

LLM pseudoqueries from table profiles improve dataset search
PIPER: Content-Based Table Search via profiling and LLM-Generated Pseudoqueries

Riccardo Terrenzi +3
cs.CR 2026-05-18 reviewed

Indirect injections hijack chatbots to leak user data
An Empirical Study of Privacy Leakage Chains via Prompt Injection in Black-Box Chatbot Environments

Hongjang Yang +2
cs.IR 2026-05-18 reviewed

SynGR boosts generative recs by limiting dominant modalities
SynGR: Unleashing the Potential of Cross-Modal Synergy for Generative Recommendation

Wei Chen +8
cs.IR 2026-05-18 reviewed

Dynamic modulation replaces static IDs in multimodal recommendations
Modality-Aware Identity Construction and Counterfactual Structure Learning for ID-Free Multimodal Recommendation

Hongjian Ma +4
cs.IR 2026-05-18 reviewed

E-commerce search lifts new-item GMV 5.3 percent via long-term value estimates
Towards Sustainable Growth: A Multi-Value-Aware Retrieval Framework for E-Commerce Search

Yifan Wang +4