archive
Every paper Pith has read. Search by title, abstract, or pith.
1286 papers in cs.IR · page 16
-
Two-phase retrieval and LLM-guided evolution raise job match quality
Synapse: Evolving Job-Person Fit with Explainable Two-phase Retrieval and LLM-guided Genetic Resume Optimization
-
Scholarly recommenders must track volatile contexts and research strands
What Do Humanities Scholars Need? A User Model for Recommendation in Digital Archives
-
Recommenders should stop pushing novelty at user-specific points
Modeling User Exploration Saturation: When Recommender Systems Should Stop Pushing Novelty
-
Type routing lifts small models past large retrievers on chat memory
SelRoute: Query-Type-Aware Routing for Long-Term Conversational Memory Retrieval
-
Retrieval partially offsets smaller models on science tasks
Do We Need Bigger Models for Science? Task-Aware Retrieval with Small Language Models
-
Literature graphs project from tensor manifolds
Tensor Manifold-Based Graph-Vector Fusion for AI-Native Academic Literature Retrieval
-
LLM tool adds database functions 34 percent more accurately
Automating Database-Native Function Code Synthesis with LLMs
-
Binary encoding matches alphanumeric codes without training
Improving Search Suggestions for Alphanumeric Queries
-
DeepSlide matches slide visuals but lifts narrative flow and pacing
DeepSlide: From Artifacts to Presentation Delivery
-
ViT design choices aid active learning for cluttered object retrieval
Revisiting Human-in-the-Loop Object Retrieval with Pre-Trained Vision Transformers
-
Portuguese math benchmark shows LLM drops on figures and open answers
MATH-PT: A Math Reasoning Benchmark for European and Brazilian Portuguese
-
RAG assistant gives reliable answers on bachelor project rules
Generative AI-Based Virtual Assistant using Retrieval-Augmented Generation: An evaluation study for bachelor projects
-
Synthetic dictionary retrieves matches for 54% of unseen oracle bone characters
Decoding Ancient Oracle Bone Script via Generative Dictionary Retrieval
-
TF-IDF arises from test statistic for word burstiness
Common TF-IDF variants arise as key components in the test statistic of a penalized likelihood-ratio test for word burstiness
-
Agentic search narrows dense RAG's gap to GraphRAG
Do We Still Need GraphRAG? Benchmarking RAG and GraphRAG for Agentic Search Systems
-
GPU bucketing delivers 240x faster hybrid searches
GRAB-ANNS: High-Throughput Indexing and Hybrid Search via GPU-Native Bucketing
-
Router cuts large LLM use by nearly 30% in GraphRAG QA
GraphRAG-Router: Learning Cost-Efficient Routing over GraphRAGs and LLMs with Reinforcement Learning
-
Memory system reuses agent plans across unrelated tasks
APEX-EM: Non-Parametric Online Learning for Autonomous Agents via Structured Procedural-Episodic Experience Replay
-
Percentile calibration lifts last-hop retrieval on multi-hop QA
Calibrated Fusion for Heterogeneous Graph-Vector Retrieval in Multi-Hop QA
-
Agent trajectories train retrievers that raise recall and task success
Learning to Retrieve from Agent Trajectories
-
One LoRA toggles a model between retrieval and generation
Hydra: Unifying Document Retrieval and Generation in a Single Vision-Language Model
-
Data prep beats PDF tool choice in RAG accuracy
From PDF to RAG-Ready: Evaluating Document Conversion Frameworks for Domain-Specific Question Answering
-
SUMMIR ranks sports insights from LLMs while catching hallucinations
SUMMIR: A Hallucination-Aware Framework for Ranking Sports Insights from LLMs
-
Vision-language models boost Italian parliament speech transcripts
Transcription and Recognition of Italian Parliamentary Speeches Using Vision-Language Models
-
Concept-mediated graph lifts agent memory retrieval
GAAMA: Graph Augmented Associative Memory for Agents
-
PLT cache beats frequency cache on expected inference cost
Probabilistic Language Tries: A Unified Framework for Compression, Decision Policies, and Execution Reuse
-
LLM agent fuses lexical and embedding search to match queries to dataset metadata
A Reference Architecture for Agentic Hybrid Retrieval in Dataset Search
-
LLM app gives instant ASD conversation feedback
SocialWise: LLM-Agentic Conversation Therapy for Individuals with Autism Spectrum Disorder to Enhance Communication Skills
-
Metadata at file start routes LLM queries at 100% accuracy
Self-Describing Structured Data with Dual-Layer Guidance: A Lightweight Alternative to RAG for Precision Retrieval in Large-Scale LLM Knowledge Navigation
-
LLMs identify top articles with over 80% accuracy
Large language models for post-publication research evaluation: Evidence from expert recommendations and citation indicators
-
Length bias holds for causal late interaction models
Working Notes on Late Interaction Dynamics: Analyzing Targeted Behaviors of Late Interaction Models
-
Memory pipeline gives AI agents cross-session recall
Cognis: Context-Aware Memory for Conversational AI Agents
-
Static pipelines become self-evolving agent systems
Rethinking Recommendation Paradigms: From Pipelines to Agentic Recommender Systems
-
Agents automate recommender model reproduction from papers
AgenticRS-Architecture: System Design for Agentic Recommender Systems
-
Power-law forgetting emerges from interference in embeddings
The Geometry of Forgetting
-
AI oncology planner earns high clinician ratings on accuracy and safety
Clinical Reasoning AI for Oncology Treatment Planning: A Multi-Specialty Case-Based Evaluation
-
Metric spots incoherent multimodal inputs better than accuracy
Good Scores, Bad Data: A Metric for Multimodal Coherence
-
Hybrid retrieval resolves RAG trade-off in financial queries
Resolving the Robustness-Precision Trade-off in Financial RAG through Hybrid Document-Routed Retrieval
-
Positive-first criterion boosts rare visual category retrieval
Positive-First Most Ambiguous: A Simple Active Learning Criterion for Interactive Retrieval of Rare Categories
-
Generative search lifts CTR 4 percent by internalizing latent user reasoning
OneSearch-V2: The Latent Reasoning Enhanced Self-distillation Generative Search Framework
-
Lightweight filter cuts vision tokens for document parsing
Boosting Document Parsing Efficiency and Performance with Coarse-to-Fine Visual Processing
-
Joint data-model scaling lifts e-commerce purchases 1.7%
Joint Model Parameter Scaling and Universal-Domain Data Integration for E-commerce Search Ranking
-
LLM motives enable accurate recs in sparse industrial data
LLMAR: A Tuning-Free Recommendation Framework for Sparse and Text-Rich Industrial Domains
-
LLM moral advice responses reinforce human-like assumptions
Implicit Humanization in Everyday LLM Moral Judgments
-
Semantic shift, not length, drives embedding collapse
Pooling and Semantic Shift: The Fundamental Challenges in Long Text Embedding and Retrieval
-
Item-aware attention lets LLMs capture item-level collaborations
Beyong Tokens: Item-aware Attention for LLM-based Recommendation
-
Adaptive gamma from spectrum achieves near-optimal embedding compression
Spectral Tempering for Embedding Compression in Dense Passage Retrieval
-
Pairwise comparisons boost LLM paper ranking by 21.8% over baselines
From Isolated Scoring to Collaborative Ranking: A Comparison-Native Framework for LLM-Based Paper Evaluation
-
Lightweight profiler sets new record in citation recommendations
Public Profile Matters: A Scalable Integrated Approach to Recommend Citations in the Wild
-
Modular stages let small models answer farm questions accurately
AgriIR: A Scalable Framework for Domain-Specific Knowledge Retrieval