archive
Every paper Pith has read. Search by title, abstract, or pith.
1286 papers in cs.IR · page 1
-
One model ranks items, carousels and search via user stories
TubiFM: Unified Item, Carousel, and Search Ranking for Streaming Discovery
-
Generative search engines cite AI sources in 16% of cases
Synthetic Sources?: Auditing Generative Search Engine Citations for Evidence of AI-Generated Sources
-
University of Nigeria produced 6,353 papers in 2014-2023
Tracking a Decade of Research at the University of Nigeria, Nsukka: A Scientometric Analysis (2014-2023)
-
Three-phase recipe keeps 98% precision in 190M retrieval models
HARNESS-LM: A Three-Phase Training Recipe for Harnessing SLMs in Sponsored Search Retrieval
-
Low dimension suffices for near-max retrieval margins
Is Dimensionality a Barrier for Retrieval Models?
4 Piths -
Trajectory merging cuts error buildup in iterative DPO
TPMM-DPO: Trajectory-aware Preference-guided Model Merging for Iterative Direct Preference Optimization
-
1B generative recommender backbone beats 2M baseline on MRR
Towards Generalizable and Efficient Large-Scale Generative Recommenders
-
Asymmetric head-to-tail transfer lifts CTR in long-tail rec
From Head to Tail: Asymmetric Knowledge Transfer in Long-tail Recommendation with Generative Semantic IDs
-
RankElastor stabilizes rank trajectories for scaled recommenders
Expand More, Shrink Less: Shaping Effective-Rank Dynamics for Dense Scaling in Recommendation
-
Two-stage pipeline keeps sensitive mobile data on device for recommendations
Building a privacy-preserving Federated Recommender system for mobile devices
-
LaTeX source yields better RAG chunks than PDF text
AI-Friendly LaTeX: Using LaTeX Code as a Knowledge Source for Retrieval-Augmented Generation
-
Tables in model cards raise search coverage
Diversed Model Discovery via Structured Table Discovery
-
Any embedding model can rank first with the right prompt
One prompt is not enough: Instruction Sensitivity Undermines Embedding Model Evaluation
-
Self-distillation drives search reasoners to 0.440 EM
Search-E1: Self-Distillation Drives Self-Evolution in Search-Augmented Reasoning
-
Generative re-ranker lifts biomedical linking accuracy 3-24%
BeLink: Biomedical Entity Linking Meets Generative Re-Ranking
-
Chain-of-thought steps lift generative retrieval by 6.86% on multi-hop tasks
Integrating Chain-of-Thought into Generative Retrieval: A Preliminary Study
-
OMR tops matched music score search
Direct content-based retrieval from music scores images
-
Calibration step lifts multimodal recs using only training overlaps
Behavior-Guided Candidate Calibration for Multimodal Recommendation
-
RoBERTa reaches 93 percent accuracy on IMDb sentiment task
From TF-IDF to Transformers: A Comparative and Ensemble Approach to Sentiment Classification
-
One autoregressive model handles both recommendations and chat
Generative Conversational Recommender System
-
LLM semantic retrieval raises ad recommendation stability
LLM Retrieval for Stable and Predictable Ad Recommendations
-
Rec head supplies RL rewards to align LLM reasoning with item predictions
Reinforced Preference Optimization for Reasoning-Augmented Recommendations
-
Seed-guided LLMs match real query lengths 7.5x better than baselines
Bridging the Cold-Start Gap: LLM-Powered Synthetic Data Generation for Natural Language Search at Airbnb
-
43M-paper graph gives AI agents deterministic cross-field links
SciAtlas: A Large-Scale Knowledge Graph for Automated Scientific Research
-
Explicit principles improve legal citation retrieval
SG-LegalCite: A Principle-Augmented Benchmark for Legal Citation Retrieval in Singapore Law
-
Tests show memory systems mismatch retrieval and answers under conflicts
MemConflict: Evaluating Long-Term Memory Systems Under Memory Conflicts
-
7B open LLMs run GraphRAG locally for EHR schema queries
GraphRAG on Consumer Hardware: Benchmarking Local LLMs for Healthcare EHR Schema Retrieval
-
Dual memory layers give LLMs unbounded conversation context
CALMem : Application-Layer Dual Memory for Conversational AI
-
Self-limiting losses compress embeddings without overfitting
DIVE: Embedding Compression via Self-Limiting Gradient Updates
-
Middle-layer compression raises reranker speed up to 116%
Layer-wise Token Compression for Efficient Document Reranking
-
Middle-layer compression speeds document reranking up to 116%
Layer-wise Token Compression for Efficient Document Reranking
-
Gating ensemble harvests reliable negatives for fraud models
SAGE: Scalable Automatic Gating Ensemble for Confident Negative Harvesting in Fraud Detection
-
Bidirectional ranking cuts RAG poisoning attacks by 54%
BiRD: A Bidirectional Ranking Defense Mechanism for Retrieval Augmented Generation
-
Colluding accounts scale RAG privacy loss by sqrt(k)
Auditing Privacy in Multi-Tenant RAG under Account Collusion
-
Multi-source sampling breaks negative feedback loops in recommendations
Divergence Meets Consensus: A Multi-Source Negative Sampling Framework for Sequential Recommendation
-
Wacky weights help SPLADE mainly inside the training domain
Understanding Wacky Weights: A Dissection of SPLADE's Learned Term Importance
-
TF-IDF beats GPT-4o at finding astronomy expert reviewers
Traditional statistical representations outperform generative AI in identifying expert peer reviewers
-
q-log odds lift BM25 NDCG@10 by 89% on code search
Improving BM25 Code Retrieval Under Fixed Generic Tokenization: Adaptive q-Log Odds as a Drop-In BM25 Fix
-
Wiki beats RAG on cross-paper links but costs more tokens
Vector RAG vs LLM-Compiled Wiki: A Preregistered Comparison on a Small Multi-Domain Research
-
Text guidance focuses full images for cropped-query e-commerce search
TIGER-FG: Text-Guided Implicit Fine-Grained Grounding for E-commerce Retrieval
-
Self-distillation supplies step-level search signals from own rollouts
SD-Search: On-Policy Hindsight Self-Distillation for Search-Augmented Reasoning
-
Preference focus cuts device RAG memory 2400 times
From Volume to Value: Preference-Aligned Memory Construction for On-Device RAG
-
Prompting methods raise table QA accuracy without training
Efficient Table QA via TableGrid Navigation and Progressive Inference Prompting
-
RCTEA aligns temporal entities via richness-guided fusion
RCTEA: Richness-guided Co-training for Temporal Entity Alignment
-
SomaliWeb v1 delivers 303M tokens of cleaned Somali text
SomaliWeb v1: A Quality-Filtered Somali Web Corpus with a Matched Tokenizer and a Public Language-Identification Benchmark
-
LLM pseudoqueries from table profiles improve dataset search
PIPER: Content-Based Table Search via profiling and LLM-Generated Pseudoqueries
-
Indirect injections hijack chatbots to leak user data
An Empirical Study of Privacy Leakage Chains via Prompt Injection in Black-Box Chatbot Environments
-
SynGR boosts generative recs by limiting dominant modalities
SynGR: Unleashing the Potential of Cross-Modal Synergy for Generative Recommendation
-
Dynamic modulation replaces static IDs in multimodal recommendations
Modality-Aware Identity Construction and Counterfactual Structure Learning for ID-Free Multimodal Recommendation
-
E-commerce search lifts new-item GMV 5.3 percent via long-term value estimates
Towards Sustainable Growth: A Multi-Value-Aware Retrieval Framework for E-Commerce Search