archive
Every paper Pith has read. Search by title, abstract, or pith.
1286 papers in cs.IR · page 4
-
18% of web searches concern places
Much of Geospatial Web Search Is Beyond Traditional GIS
-
One verification trace yields calibrated LLM-judge confidence
VERDI: Single-Call Confidence Estimation for Verification-Based LLM Judges via Decomposed Inference
-
Structured belief store beats vector search for LLM memory
Structured Belief State and the First Precision-Aware Benchmark for LLM Memory Retrieval
-
Representative Stochastic ranker reaches near-parity exposure in RAG
Towards FairRAG: Preventing Representational Harm in Retrieval-Augmented Generation by Enforcing Fair Exposure at Retrieval Time
-
Locale boosting fixes US bias in global ranking models
Localization Boosting for Growth Markets: Mitigating Cross-Locale Behavioral Bias in Learning-to-Rank
-
LLM-assisted benchmark tests retrieval across four scholarly categories
MIRA: An LLM-Assisted Benchmark for Multi-Category Integrated Retrieval
-
Benchmark shows semantic plausibility misses real utility in LLM recs
RecoAtlas: From Semantic Plausibility to Set-Level Utility in LLM Recommendation Agents
-
Adaptive weights inside GNN message passing cut popularity bias
Debiasing Message Passing to Mitigate Popularity Bias in GNN-based Collaborative Filtering
-
Assertion-aware retrieval lifts clinical QA accuracy by 22 points
ClinicalBench: Stress-Testing Assertion-Aware Retrieval for Cross-Admission Clinical QA on MIMIC-IV
-
Cascaded generative model lifts e-commerce cart adds 2.7%
A Cascaded Generative Approach for e-Commerce Recommendations
-
Generative cascade boosts e-commerce cart adds by 2.7%
A Cascaded Generative Approach for e-Commerce Recommendations
-
Prompt optimization ranks second for EHR clinical QA
Neural at ArchEHR-QA 2026: One Method Fits All: Unified Prompt Optimization for Clinical QA over EHRs
-
BM25 lexical search hits 83% accuracy in LLM research agents
Rethinking Agentic Search with Pi-Serini: Is Lexical Retrieval Sufficient?
-
User context inside the loop lifts LLM research report relevance
Personalized Deep Research: A User-Centric Framework, Dataset, and Hybrid Evaluation for Knowledge Discovery
-
Iterative denoising unifies list reranking
UniRank: Unified List-wise Reranking via Confidence-Ordered Denoising
-
Synthetic probes can isolate how data traits affect LLM behavior
Position: Let's Develop Data Probes to Fundamentally Understand How Data Affects LLM Performance
-
LLM agents model group leadership to lift recommendation accuracy
AgentGR: Semantic-aware Agentic Group Decision-Making Simulator for Group Recommendation
-
LLM recommenders gain from anchoring ratings as numeric tokens
Every Preference Has Its Strength: Injecting Ordinal Semantics into LLM-Based Recommenders
-
Answer-aware reranking reaches 96% accuracy on Ukrainian document QA
Qwen Goes Brrr: Off-the-Shelf RAG for Ukrainian Multi-Domain Document Understanding
-
Local 9B model nears commercial LLM on FOIA privilege classification
To Redact, or not to Redact? A Local LLM Approach to Deliberative Process Privilege Classification
-
Latent reasoning halves steps while lifting generative recommendation accuracy
LASAR: Latent Adaptive Semantic Aligned Reasoning for Generative Recommendation
-
Benchmark scores abstract answers by topic coverage without comparisons
ASTRA-QA: A Benchmark for Abstract Question Answering over Documents
-
NumColBERT injects numeracy into ColBERT without pipeline overhaul
NumColBERT: Non-Intrusive Numeracy Injection for Late-Interaction Retrieval Models
-
Three-layer memory turns reading scrolls into tailored paper questions
H-MAPS: Hierarchical Memory-Augmented Proactive Search Assistant for Scientific Literature
-
CCD-aware scheduling lifts vector search throughput 3.7x
CCD-Level and Load-Aware Thread Orchestration for In-Memory Vector ANNS on Multi-Core CPUs
-
Query clustering and novel loss improve health intent accuracy
Enhancing Healthcare Search Intent Recognition with Query Representation Learning and Session Context
-
SABER improves RAG accuracy by choosing to trust or abstain
Trust or Abstain? A Self-Aware RAG Approach
-
LLM-RAG system raises average HEI scores by 6.45 points
An LLM-RAG Approach for Healthy Eating Index-Informed Personalized Food Recommendations
-
2 million Weibo photos benchmark AI on city space understanding
Urban-ImageNet: A Large-Scale Multi-Modal Dataset and Evaluation Framework for Urban Space Perception
-
Graph of codecs compresses data smaller and faster
OpenZL: Using Graphs to Compress Smaller and Faster
-
Black-box method flags LLM agent drift at 0.83 AUC
Nautilus Compass: Black-box Persona Drift Detection for Production LLM Agents
-
ReCoVR reaches 74% recall after one interactive video round
ReCoVR: Closing the Loop in Interactive Composed Video Retrieval
-
Hybrid system recommends coherent outfits from fashion catalogs
Loom: Hybrid Retrieval-Scoring Outfit Recommendation with Semantic Material Compatibility and Occasion-Aware Embedding Priors
-
LLM agents let users beat platform personalization with their own data
LLM Agents Enable User-Governed Personalization Beyond Platform Boundaries
-
Aggregate peaks occur at 3-5 times the individual exposure level
Simpson's Paradox in Behavioral Curves: How Aggregation Distorts Parametric Models of User Dynamics
-
MM-LLM captions lift recsys AUC by 0.35% at industrial scale
A General Framework for Multimodal LLM-Based Multimedia Understanding in Large-Scale Recommendation Systems
-
OpenIIR runs LLM persona simulations for IR research
OpenIIR: An Open Simulation Platform for Information Retrieval Research
-
Open platform runs LLM personas in four IR scenario types
OpenIIR: An Open Simulation Platform for Information Retrieval Research
-
Semantic search finds more hidden Locke receptions than word matching
Matching Meaning at Scale: Evaluating Semantic Search for 18th-Century Intellectual History through the Case of Locke
-
Semantic search finds more implicit Locke references than keywords
Matching Meaning at Scale: Evaluating Semantic Search for 18th-Century Intellectual History through the Case of Locke
-
Reddit music chats become 190k Deezer-grounded dialogues
Reddit2Deezer: A Scalable Dataset for Real-World Grounded Conversational Music Recommendation
-
Personalized privacy cuts infinite stream estimation error by 53.6%
Personalized w-Event Privacy for Infinite Stream Estimation
-
Semantic IDs enable efficient ultra-long user sequence modeling
UxSID: Semantic-Aware User Interests Modeling for Ultra-Long Sequence
-
Semantic IDs enable efficient modeling of ultra-long user sequences
UxSID: Semantic-Aware User Interests Modeling for Ultra-Long Sequence
-
Semantic IDs enable efficient ultra-long user sequence modeling
UxSID: Semantic-Aware User Interests Modeling for Ultra-Long Sequence
-
LLM framework summarizes user histories into personas
UserGPT Technical Report
-
Bio-inspired memory cuts LLM agent storage by 58% at 97% precision
Human-Inspired Memory Architecture for LLM Agents
-
Multi-level contrastive learning improves knowledge graph recommendations
Multi-Level Graph Attention Network Contrastive Learning for Knowledge-Aware Recommendation
-
Exclusion distances raise filtered ANNS speed 1.3-5x
FAVOR: Efficient Filter-Agnostic Vector ANNS Based on Selectivity-Aware Exclusion Distances
-
Benchmark reveals three competency gaps in tourism recommenders
TRACE: Tourism Recommendation with Accountable Citation Evidence