archive
Every paper Pith has read. Search by title, abstract, or pith.
1286 papers in cs.IR · page 7
-
ReformIR prevents drift as reformulation count rises
When More Reformulations Hurt: Avoiding Drift using Ranker Feedback
-
Iterative tree merging lifts cross-document RAG F1 by 25.9% over RAPTOR
Hierarchical Abstract Tree for Cross-Document Retrieval-Augmented Generation
-
Denoising emerges as main bottleneck for LLM retrieval
LLM-Oriented Information Retrieval: A Denoising-First Perspective
-
Denoising becomes the main bottleneck for LLM retrieval
LLM-Oriented Information Retrieval: A Denoising-First Perspective
-
Dual experts separate habits from discovery in basket predictions
Time-Interval-Aware Disentangled Expert Modeling for Next-Basket Recommendation
-
SCARV stabilizes rankings in redundant NLP datasets
SCARV: Structure-Constrained Aggregation for Stable Sample Ranking in Redundant NLP Datasets
-
Table retrievers ignore explicit instructions on content and columns
FollowTable: A Benchmark for Instruction-Following Table Retrieval
-
Taxonomy negatives raise offline accuracy 2.6% but no online gain
Negative Data Mining for Contrastive Learning in Dense Retrieval at IKEA.com
-
Dynamic negative selection prevents DPO collapse
DynamicPO: Dynamic Preference Optimization for Recommendation
-
Gradual feature fading speeds efficiency rollouts by 5x
Intelligent Elastic Feature Fading: Enabling Model Retrain-Free Feature Efficiency Rollouts at Scale
-
Row-aware chunking slashes table fragments up to 56%
Structure-Aware Chunking for Tabular Data in Retrieval-Augmented Generation
-
14B RAG model reaches 68.75% of GPT-4o on CA tasks
Retrieval-Augmented Reasoning for Chartered Accountancy
-
RSAT trains small language models to output step-by-step table reasoning in structured…
RSAT: Structured Attribution Makes Small Language Models Faithful Table Reasoners
-
Integrated cell citations raise small-model faithfulness 3.7x on tables
RSAT: Structured Attribution Makes Small Language Models Faithful Table Reasoners
-
LLM-dominant sites are prevalent and growing on the web
DeGenTWeb: A First Look at LLM-dominant Websites
-
Token-aware clustering makes multivector search nearly 10x faster
Efficient Multivector Retrieval with Token-Aware Clustering and Hierarchical Indexing
-
Templates from past queries boost Text-to-SQL accuracy 36%
Reliable Answers for Recurring Questions: Boosting Text-to-SQL Accuracy with Template Constrained Decoding
-
Human-likeness test fails to predict simulator ranking validity
SimEval-IR: A Unified Toolkit and Benchmark Suite for Evaluating User Simulators and Search Sessions
-
Evidence chains raise RAG reasoning accuracy at under 20% token cost
NeocorRAG: Less Irrelevant Information, More Explicit Evidence, and More Effective Recall via Evidence Chains
-
ObjectGraph cuts agent document tokens by 95% without accuracy loss
ObjectGraph: From Document Injection to Knowledge Traversal -- A Native File Format for the Agentic Era
-
AI Overviews appear for 51.5% of queries with different sources
How Generative AI Disrupts Search: An Empirical Study of Google Search, Gemini, and AI Overviews
-
Position embeddings speed LLM list recommendation 3x
Position-Aware Drafting for Inference Acceleration in LLM-Based Generative List-Wise Recommendation
-
One hub text outscores real captions for many images in CLIP
One Single Hub Text Breaks CLIP: Identifying Vulnerabilities in Cross-Modal Encoders via Hubness
-
Multimodal RAG gains 27% CIDEr by selecting fragments
Purifying Multimodal Retrieval: Fragment-Level Evidence Selection for RAG
-
LLM rerankers produce stable rankings regardless of candidate order
One Pass, Any Order: Position-Invariant Listwise Reranking for LLM-Based Recommendation
-
Survey maps benchmarks and taxonomy for reasoning-intensive retrieval
A Survey of Reasoning-Intensive Retrieval: Progress and Challenges
-
Adaptive reranking via corpus graph lifts reasoning retrieval
Reproducing Adaptive Reranking for Reasoning-Intensive IR
-
LLM reformulation gains vanish on neural retrievers
A Reproducibility Study of LLM-Based Query Reformulation
-
LLM attribute graphs boost zero-shot ranking precision over 5%
From Unstructured to Structured: LLM-Guided Attribute Graphs for Entity Search and Ranking
-
LLM framework cuts SOC triage time to under 10 minutes
Toward Autonomous SOC Operations: End-to-End LLM Framework for Threat Detection, Query Generation, and Resolution in Security Operations
-
Managed atomic nuggets raise RAG recall 42% and cut conflicts 55%
NuggetIndex: Governed Atomic Retrieval for Maintainable RAG
-
Log-retrieved queries plus LLM variants raise QPP accuracy up to 30%
RAQG-QPP: Query Performance Prediction with Retrieved Query Variants and Retrieval Augmented Query Generation
-
Multi-sensor system maps Pakistan floods daily in near real time
Continuous Flood Nowcasting in South Asia: A Multi-Sensor Ensemble Remote Sensing Framework for Flood Extent
-
LLM pipeline spots Snapchat trends at production scale
LLM-Enhanced Topical Trend Detection at Snapchat
-
Gated contrastive model boosts ranking metrics for review recommenders
A Gated Hybrid Contrastive Collaborative Filtering Recommendation
-
Reproduction confirms Hypencoder beats bi-encoders with faster search
Hypencoder Revisited: Reproducibility and Analysis of Non-Linear Scoring for First-Stage Retrieval
-
Multiple latent factors boost LLM recommendations
Factorized Latent Reasoning for LLM-based Recommendation
-
AgentSim builds 100k+ verifiable reasoning traces for RAG agents
AgentSim: A Platform for Verifiable Agent-Trace Simulation
-
User state representation beats algorithm choice in recommenders
The Bandit's Blind Spot: The Critical Role of User State Representation in Recommender Systems
-
Uncertainty-triggered retrieval raises F1 10% with 47% fewer calls
When to Retrieve During Reasoning: Adaptive Retrieval for Large Reasoning Models
-
DNNs prevent embedding collapse in feature interaction models
Understanding DNNs in Feature Interaction Models: A Dimensional Collapse Perspective
-
Compressed embeddings let 8B reranker run 3-18x faster than smaller models
Efficient Listwise Reranking with Compressed Document Representations
-
CARD introduces a generative recommendation framework that unifies textual
CARD: Non-Uniform Quantization of Visual Semantic Unit for Generative Recommendation
-
Targeted privacy noise plus meta-learning raises rec accuracy
Meta-Learning and Targeted Differential Privacy to Improve the Accuracy-Privacy Trade-off in Recommendations
-
Targeted DP plus meta-learning lifts recsys accuracy
Meta-Learning and Targeted Differential Privacy to Improve the Accuracy-Privacy Trade-off in Recommendations
-
Document AI stages barely correlate
Benchmarking Complex Multimodal Document Processing Pipelines: A Unified Evaluation Framework for Enterprise AI
-
Query-adaptive chunking boosts RAG F1 to 0.85
Query-Adaptive Semantic Chunking for Retrieval-Augmented Generation: A Dynamic Strategy with Contextual Window Expansion
-
Reflexive prompting fixes LLM recommender drift on complex domains
A Reproducibility Analysis of PO4ISR: Diagnosing and Mitigating Semantic Drift in LLM-Based Session Recommendation
-
Measure classification yields exact attribution formulas for some cases
Explaining the "Why": A Unified Framework for the Additive Attribution of Changes in Arbitrary Measures
-
Recency as spectral operator adapts multimodal recommendations
TimeMM: Time-as-Operator Spectral Filtering for Dynamic Multimodal Recommendation