archive

Every paper Pith has read. Search by title, abstract, or pith.

1286 papers in cs.IR · page 7

cs.IR 2026-05-01 reviewed

ReformIR prevents drift as reformulation count rises
When More Reformulations Hurt: Avoiding Drift using Ranker Feedback

V Venktesh +2
cs.LG 2026-05-01 reviewed

Iterative tree merging lifts cross-document RAG F1 by 25.9% over RAPTOR
Hierarchical Abstract Tree for Cross-Document Retrieval-Augmented Generation

Ziwen Zhao +1
cs.IR 2026-05-01 reviewed

Denoising emerges as main bottleneck for LLM retrieval
LLM-Oriented Information Retrieval: A Denoising-First Perspective

Lu Dai +6
cs.IR 2026-05-01 reviewed

Denoising becomes the main bottleneck for LLM retrieval
LLM-Oriented Information Retrieval: A Denoising-First Perspective

Lu Dai +6
cs.IR 2026-05-01 reviewed

Dual experts separate habits from discovery in basket predictions
Time-Interval-Aware Disentangled Expert Modeling for Next-Basket Recommendation

Zhiying Deng +5
cs.IR 2026-05-01 reviewed

SCARV stabilizes rankings in redundant NLP datasets
SCARV: Structure-Constrained Aggregation for Stable Sample Ranking in Redundant NLP Datasets

Xu Zheng +4
cs.IR 2026-05-01 reviewed

Table retrievers ignore explicit instructions on content and columns
FollowTable: A Benchmark for Instruction-Following Table Retrieval

Rihui Jin +9
cs.IR 2026-05-01 reviewed

Taxonomy negatives raise offline accuracy 2.6% but no online gain
Negative Data Mining for Contrastive Learning in Dense Retrieval at IKEA.com

Eva Agapaki +1
cs.IR 2026-05-01 reviewed

Dynamic negative selection prevents DPO collapse
DynamicPO: Dynamic Preference Optimization for Recommendation

Xingyu Hu +9
cs.IR 2026-05-01 reviewed

Gradual feature fading speeds efficiency rollouts by 5x
Intelligent Elastic Feature Fading: Enabling Model Retrain-Free Feature Efficiency Rollouts at Scale

Jieming Di +23
cs.CL 2026-05-01 reviewed

Row-aware chunking slashes table fragments up to 56%
Structure-Aware Chunking for Tabular Data in Retrieval-Augmented Generation

Pooja Guttal +5
cs.CL 2026-04-30 reviewed

14B RAG model reaches 68.75% of GPT-4o on CA tasks
Retrieval-Augmented Reasoning for Chartered Accountancy

Jatin Gupta +3
cs.CL 2026-04-30 reviewed

RSAT trains small language models to output step-by-step table reasoning in structured…
RSAT: Structured Attribution Makes Small Language Models Faithful Table Reasoners

Jugal Gajjar +1
cs.CL 2026-04-30 reviewed

Integrated cell citations raise small-model faithfulness 3.7x on tables
RSAT: Structured Attribution Makes Small Language Models Faithful Table Reasoners

Jugal Gajjar +1
cs.NI 2026-04-30 reviewed

LLM-dominant sites are prevalent and growing on the web
DeGenTWeb: A First Look at LLM-dominant Websites

Sichang Steven He +3
cs.IR 2026-04-30 reviewed

Token-aware clustering makes multivector search nearly 10x faster
Efficient Multivector Retrieval with Token-Aware Clustering and Hierarchical Indexing

Silvio Martinico +3
cs.CL 2026-04-30 reviewed

Templates from past queries boost Text-to-SQL accuracy 36%
Reliable Answers for Recurring Questions: Boosting Text-to-SQL Accuracy with Template Constrained Decoding

Smit Jivani +2
cs.IR 2026-04-30 reviewed

Human-likeness test fails to predict simulator ranking validity
SimEval-IR: A Unified Toolkit and Benchmark Suite for Evaluating User Simulators and Search Sessions

Saber Zerhoudi
cs.IR 2026-04-30 reviewed

Evidence chains raise RAG reasoning accuracy at under 20% token cost
NeocorRAG: Less Irrelevant Information, More Explicit Evidence, and More Effective Recall via Evidence Chains

Shiyao Peng +9
cs.AI 2026-04-30 reviewed

ObjectGraph cuts agent document tokens by 95% without accuracy loss
ObjectGraph: From Document Injection to Knowledge Traversal -- A Native File Format for the Agentic Era

Mohit Dubey +1
cs.IR 2026-04-30 reviewed

AI Overviews appear for 51.5% of queries with different sources
How Generative AI Disrupts Search: An Empirical Study of Google Search, Gemini, and AI Overviews

Riley Grossman +5
cs.IR 2026-04-30 reviewed

Position embeddings speed LLM list recommendation 3x
Position-Aware Drafting for Inference Acceleration in LLM-Based Generative List-Wise Recommendation

Jiaju Chen +6
cs.CL 2026-04-30 reviewed

One hub text outscores real captions for many images in CLIP
One Single Hub Text Breaks CLIP: Identifying Vulnerabilities in Cross-Modal Encoders via Hubness

Hiroyuki Deguchi +2
cs.IR 2026-04-30 reviewed

Multimodal RAG gains 27% CIDEr by selecting fragments
Purifying Multimodal Retrieval: Fragment-Level Evidence Selection for RAG

Xihang Wang +6
cs.IR 2026-04-30 reviewed

LLM rerankers produce stable rankings regardless of candidate order
One Pass, Any Order: Position-Invariant Listwise Reranking for LLM-Based Recommendation

Ethan Bito +2
cs.IR 2026-04-30 reviewed

Survey maps benchmarks and taxonomy for reasoning-intensive retrieval
A Survey of Reasoning-Intensive Retrieval: Progress and Challenges

Yiyang Wei +3
cs.IR 2026-04-30 reviewed

Adaptive reranking via corpus graph lifts reasoning retrieval
Reproducing Adaptive Reranking for Reasoning-Intensive IR

Mandeep Rathee +3
cs.IR 2026-04-30 reviewed

LLM reformulation gains vanish on neural retrievers
A Reproducibility Study of LLM-Based Query Reformulation

Amin Bigdeli +6
cs.IR 2026-04-30 reviewed

LLM attribute graphs boost zero-shot ranking precision over 5%
From Unstructured to Structured: LLM-Guided Attribute Graphs for Entity Search and Ranking

Yilun Zhu +2
cs.CR 2026-04-30 reviewed

LLM framework cuts SOC triage time to under 10 minutes
Toward Autonomous SOC Operations: End-to-End LLM Framework for Threat Detection, Query Generation, and Resolution in Security Operations

Md Hasan Saju +1
cs.IR 2026-04-30 reviewed

Managed atomic nuggets raise RAG recall 42% and cut conflicts 55%
NuggetIndex: Governed Atomic Retrieval for Maintainable RAG

Saber Zerhoudi +2
cs.IR 2026-04-29 reviewed

Log-retrieved queries plus LLM variants raise QPP accuracy up to 30%
RAQG-QPP: Query Performance Prediction with Retrieved Query Variants and Retrieval Augmented Query Generation

Fangzheng Tian +2
physics.ao-ph 2026-04-29 reviewed

Multi-sensor system maps Pakistan floods daily in near real time
Continuous Flood Nowcasting in South Asia: A Multi-Sensor Ensemble Remote Sensing Framework for Flood Extent

Usman Nazir +3
cs.IR 2026-04-29 reviewed

LLM pipeline spots Snapchat trends at production scale
LLM-Enhanced Topical Trend Detection at Snapchat

Hangqi Zhao +8
cs.IR 2026-04-29 reviewed

Gated contrastive model boosts ranking metrics for review recommenders
A Gated Hybrid Contrastive Collaborative Filtering Recommendation

Eduardo Ferreira da Silva +8
cs.IR 2026-04-29 reviewed

Reproduction confirms Hypencoder beats bi-encoders with faster search
Hypencoder Revisited: Reproducibility and Analysis of Non-Linear Scoring for First-Stage Retrieval

Arne Eichholtz +4
cs.IR 2026-04-29 reviewed

Multiple latent factors boost LLM recommendations
Factorized Latent Reasoning for LLM-based Recommendation

Tianqi Gao +5
cs.IR 2026-04-29 reviewed

AgentSim builds 100k+ verifiable reasoning traces for RAG agents
AgentSim: A Platform for Verifiable Agent-Trace Simulation

Saber Zerhoudi +2
cs.IR 2026-04-29 reviewed

User state representation beats algorithm choice in recommenders
The Bandit's Blind Spot: The Critical Role of User State Representation in Recommender Systems

Pedro R. Pires +4
cs.IR 2026-04-29 reviewed

Uncertainty-triggered retrieval raises F1 10% with 47% fewer calls
When to Retrieve During Reasoning: Adaptive Retrieval for Large Reasoning Models

Dongxin Guo +2
cs.LG 2026-04-29 reviewed

DNNs prevent embedding collapse in feature interaction models
Understanding DNNs in Feature Interaction Models: A Dimensional Collapse Perspective

Jiancheng Wang +3
cs.IR 2026-04-29 reviewed

Compressed embeddings let 8B reranker run 3-18x faster than smaller models
Efficient Listwise Reranking with Compressed Document Representations

Herv\'e D\'ejean +1
cs.IR 2026-04-29 reviewed

CARD introduces a generative recommendation framework that unifies textual
CARD: Non-Uniform Quantization of Visual Semantic Unit for Generative Recommendation

Yibiao Wei +6
cs.IR 2026-04-29 reviewed

Targeted privacy noise plus meta-learning raises rec accuracy
Meta-Learning and Targeted Differential Privacy to Improve the Accuracy-Privacy Trade-off in Recommendations

Peter M\"ullner +3
cs.IR 2026-04-29 reviewed

Targeted DP plus meta-learning lifts recsys accuracy
Meta-Learning and Targeted Differential Privacy to Improve the Accuracy-Privacy Trade-off in Recommendations

Peter M\"ullner +3
cs.CL 2026-04-29 reviewed

Document AI stages barely correlate
Benchmarking Complex Multimodal Document Processing Pipelines: A Unified Evaluation Framework for Enterprise AI

Saurabh K. Singh +1
cs.CL 2026-04-29 reviewed

Query-adaptive chunking boosts RAG F1 to 0.85
Query-Adaptive Semantic Chunking for Retrieval-Augmented Generation: A Dynamic Strategy with Contextual Window Expansion

Mudit Rastogi
cs.IR 2026-04-29 reviewed

Reflexive prompting fixes LLM recommender drift on complex domains
A Reproducibility Analysis of PO4ISR: Diagnosing and Mitigating Semantic Drift in LLM-Based Session Recommendation

Aditya Tiwari +2
cs.IR 2026-04-29 reviewed

Measure classification yields exact attribution formulas for some cases
Explaining the "Why": A Unified Framework for the Additive Attribution of Changes in Arbitrary Measures

Changsheng Zhou +5
cs.IR 2026-04-29 reviewed

Recency as spectral operator adapts multimodal recommendations
TimeMM: Time-as-Operator Spectral Filtering for Dynamic Multimodal Recommendation

Wei Yang +6