{"total":54,"items":[{"citing_arxiv_id":"2605.20724","ref_index":7,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"CALMem : Application-Layer Dual Memory for Conversational AI","primary_cat":"cs.IR","submitted_at":"2026-05-20T05:23:57+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"CALMem delivers virtually unbounded effective context for LLM conversations via an application-layer dual memory architecture with intra-session retrieval and token-adaptive injection.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.20683","ref_index":32,"ref_count":2,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Layer-wise Token Compression for Efficient Document Reranking","primary_cat":"cs.IR","submitted_at":"2026-05-20T03:52:31+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Layer-wise Token Compression applies adaptive token pooling at middle transformer layers for cross-encoder rerankers, preserving MS MARCO ranking quality while raising QPS up to 25% on passages and 116% on documents, with added gains on listwise LLM rerankers and a regularizer effect for long inputs","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18029","ref_index":23,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"What Matters for Grocery Product Retrieval with Open Source Vision Language Models","primary_cat":"cs.CV","submitted_at":"2026-05-18T08:20:13+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Systematic zero-shot benchmarking of open-source VLMs on multimodal grocery product retrieval shows data quality outperforms scale, introduces semantic power density as an efficiency metric, and identifies a persistent top-1 precision gap.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.14503","ref_index":33,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Not All RAGs Are Created Equal: A Component-Wise Empirical Study for Software Engineering Tasks","primary_cat":"cs.SE","submitted_at":"2026-05-14T07:47:44+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Retriever-side choices, particularly the retrieval algorithm, exert more influence on RAG performance than generator selection across code generation, summarization, and repair tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.14053","ref_index":9,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Derivation Prompting: A Logic-Based Method for Improving Retrieval-Augmented Generation","primary_cat":"cs.CL","submitted_at":"2026-05-13T19:20:16+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Derivation Prompting constructs logic-based derivation trees in RAG generation to improve interpretability and reduce unacceptable answers compared to standard RAG or long-context methods in a case study.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.12260","ref_index":15,"ref_count":2,"confidence":0.98,"is_internal_anchor":true,"paper_title":"PRISM: Pareto-Efficient Retrieval over Intent-Aware Structured Memory for Long-Horizon Agents","primary_cat":"cs.CL","submitted_at":"2026-05-12T15:28:30+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"PRISM is a new inference-time retrieval system that achieves higher accuracy than baselines on long-horizon agent tasks while using an order of magnitude less context by combining hierarchical graph search, intent-based costing, compression, and adaptive routing over structured memory.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"compression step that brings per-query context to∼2K tokens without sacrificing accuracy. Retrieval-side optimisation.A third line treats memory as a retrieval problem and optimises the selectionof evidence sent to the answer model. Dense retrievers [ 10, 11] and lexical baselines [18] score passages by surface or embedding similarity; cross-encoder rerankers [15] re-score a shortlist with a query-conditioned model; and prompt-compression methods such as LLMLingua [8] prune tokens after retrieval. However, these methods are benchmarked on flat document collections and cannot exploit indirect evidence linked by causal, temporal, or evolution relations.In contrast, PRISM brings retrieval-side discipline to graph-structured memory by combining typed-path candidate"},{"citing_arxiv_id":"2605.11864","ref_index":46,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Very Efficient Listwise Multimodal Reranking for Long Documents","primary_cat":"cs.IR","submitted_at":"2026-05-12T09:45:59+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"ZipRerank delivers state-of-the-art multimodal listwise reranking accuracy for long documents at up to 10x lower latency via early interaction and single-pass scoring.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.10224","ref_index":28,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Hypothesis-Driven Deep Research with Large Language Models: A Structured Methodology for Automated Knowledge Discovery","primary_cat":"cs.AI","submitted_at":"2026-05-11T09:04:09+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"HDRI is a six-principle eight-stage framework for hypothesis-organized LLM research featuring gap-driven iteration, traceable fact reasoning, and subject locking, realized in INFOMINER with reported gains in fact density and completeness.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"5 Information Retrieval and Query Understanding Query understanding is a foundational compo- nent of modern search systems [27]. Intent clas- sification, entity recognition, and query expan- sion are well-studied problems in information re- trieval. Recent neural approaches have improved query understanding through contextual embed- dings and pre-trained language models [28]. Our query understanding module extends tra- ditional approaches by incorporatingtemporal context extraction-identifying time-related in- tent and computing appropriate temporal con- straints for search queries-andcomplexity as- sessment-estimating the research depth re- quired based on query characteristics. These ca- pabilities are particularly important for research"},{"citing_arxiv_id":"2605.07381","ref_index":85,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Escaping the Diversity Trap in Robotic Manipulation via Anchor-Centric Adaptation","primary_cat":"cs.RO","submitted_at":"2026-05-08T07:35:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Anchor-Centric Adaptation escapes the diversity trap by prioritizing repeated demonstrations at core anchors over broad coverage, yielding higher success rates under fixed data budgets in robotic manipulation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.05991","ref_index":24,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"A Case-Driven Multi-Agent Framework for E-Commerce Search Relevance","primary_cat":"cs.IR","submitted_at":"2026-05-07T10:41:50+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"A case-driven multi-agent system automates the full pipeline of bad-case detection, annotation, and resolution for e-commerce search relevance using Annotator, Optimizer, and User agents plus supporting components.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.05538","ref_index":6,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"AgenticRAG: Agentic Retrieval for Enterprise Knowledge Bases","primary_cat":"cs.AI","submitted_at":"2026-05-07T00:39:22+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"AgenticRAG equips an LLM with iterative retrieval and navigation tools, delivering 49.6% recall@1 on BRIGHT, 0.96 factuality on WixQA, and 92% correctness on FinanceBench.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.04897","ref_index":9,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Storage Is Not Memory: A Retrieval-Centered Architecture for Agent Recall","primary_cat":"cs.CL","submitted_at":"2026-05-06T13:27:41+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"True Memory is a verbatim-event retrieval pipeline running on a single SQLite file that reaches 93% accuracy on LoCoMo multi-session questions, outperforming Mem0, Supermemory, Zep, and matching or exceeding EverMemOS and Hindsight on other long-context benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.01582","ref_index":12,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"KG-First, LLM-Fallback: A Hybrid Microservice for Grounded Skill Search and Explanation","primary_cat":"cs.IR","submitted_at":"2026-05-02T19:07:56+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"SkillGraph-Service builds a provenance-preserving knowledge graph from multiple competency frameworks and achieves nDCG@5 above 0.94 with sub-200 ms latency via KG-first hybrid retrieval and constrained LLM explanations.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.01409","ref_index":25,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Interactive Multi-Turn Retrieval for Health Videos","primary_cat":"cs.IR","submitted_at":"2026-05-02T12:12:07+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"DATR combines coarse CLIP-based retrieval with multi-turn query fusion and cross-encoder re-ranking to improve health video retrieval, supported by the new MHVRC corpus.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.01399","ref_index":38,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Verbal-R3: Verbal Reranker as the Missing Bridge between Retrieval and Reasoning","primary_cat":"cs.CL","submitted_at":"2026-05-02T11:43:23+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Verbal-R3 uses a verbal reranker to generate analytic narratives that guide retrieval and reasoning in LLMs, achieving SOTA results on complex QA benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.00646","ref_index":21,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"A Replicability Study of XTR","primary_cat":"cs.IR","submitted_at":"2026-05-01T13:28:09+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"XTR training does not improve retrieval effectiveness over ColBERT but enhances IVF engine efficiency by flattening token scores to produce more discriminative centroids.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.00505","ref_index":135,"ref_count":4,"confidence":0.98,"is_internal_anchor":true,"paper_title":"LLM-Oriented Information Retrieval: A Denoising-First Perspective","primary_cat":"cs.IR","submitted_at":"2026-05-01T08:30:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Argues for a denoising-first paradigm in LLM-oriented information retrieval, framing challenges via a four-stage progression and providing a taxonomy of signal-to-noise optimization techniques across the pipeline.","context_count":2,"top_context_role":"background","top_context_polarity":"background","context_text":"curated corpora with web data only.Advances in Neural Information Processing Systems36 (2023), 79155-79172. [137] Wenjun Peng, Guiyang Li, Yue Jiang, Zilong Wang, Dan Ou, Xiaoyi Zeng, Derong Xu, Tong Xu, and Enhong Chen. 2024. Large Language Model based Long-tail Query Rewriting in Taobao Search. InCompanion Proceedings of the ACM Web Conference 2024. ACM, 20-28. [138] Gustavo Penha and Claudia Hauff. 2021. On the Calibration and Uncertainty of Neural Learning to Rank Models for Conversational Search.. InConference of the European Chapter of the Association for Computational Linguistics (EACL). 160-170. [139] A. Peysakhovich and Adam Lerer. 2023. Attention Sorting Combats Recency Bias In Long Context Language Models."},{"citing_arxiv_id":"2604.27906","ref_index":18,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"From Unstructured Recall to Schema-Grounded Memory: Reliable AI Memory via Iterative, Schema-Aware Extraction","primary_cat":"cs.AI","submitted_at":"2026-04-30T14:14:02+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Schema-aware iterative extraction turns AI memory into a verified system of record, reaching 90-97% accuracy on extraction and end-to-end memory benchmarks where retrieval baselines score 80-87%.","context_count":1,"top_context_role":"background","top_context_polarity":"support","context_text":"gap between hybrid retrieval and schema-grounded memory is therefore narrower in coverage, but still different in kind. By shifting complexity from the read path to the write path, schema-grounded memory trades ingestion cost for correctness, stability, and long-term reliability. 4.2 Graph RAG: introducing partial structure into retrieval Graph RAG[18] introduces explicit relationships into retrieval, and in doing so supports the broader thesis of this paper: pure similarity is not enough, and structure helps. Benefits include: •improved multi-hop retrieval, •relationships made explicit, •reduced ambiguity in navigation, •constrained traversal over embedding stores. The limits are equally important."},{"citing_arxiv_id":"2604.27037","ref_index":42,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Hypencoder Revisited: Reproducibility and Analysis of Non-Linear Scoring for First-Stage Retrieval","primary_cat":"cs.IR","submitted_at":"2026-04-29T17:05:53+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":3.0,"formal_verification":"none","one_line_summary":"Reproducibility study confirms Hypencoder's non-linear query-specific scoring improves retrieval over bi-encoders on standard benchmarks but standard methods remain faster and hard-task results are mixed due to implementation issues.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"works, the architectural basis of Hypencoder. 2.1 Neural Information Retrieval Modern information retrieval systems typically adopt a multi-stage retrieve-then-rerank paradigm to balance efficiency and effective- ness [28, 37, 54]. First-stage retrieval.First-stage retrievers are generally catego- rized into sparse and dense approaches. Sparse methods, exempli- fied by BM25 [42], rely on lexical matching. Recent advances like SPLADE [12] bridge the semantic gap by performing learned term expansion while retaining the efficiency of inverted indices. In the realm of dense retrieval, the bi-encoder architecture has become the de facto standard [21]. By encoding queries and docu- ments independently into separate embeddings, bi-encoders allow"},{"citing_arxiv_id":"2604.26483","ref_index":23,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Efficient Listwise Reranking with Compressed Document Representations","primary_cat":"cs.IR","submitted_at":"2026-04-29T09:48:44+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"RRK compresses documents to multi-token embeddings for efficient listwise reranking, enabling an 8B model to achieve 3x-18x speedups over smaller models with comparable or better effectiveness.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"state-of-the-art performance as a zero-shot listwise reranker. Qin et al.[24] show that listwise ranking with moderately sized open models often yields un- informative outputs, which motivates their pairwise reranking strategy combined with PRP-Sorting to im- prove both stability and efficiency. A common way to narrow this performance gap is through distillation: Pradeep et al.[23] fine-tune a Zephyr-7B model via knowledge transfer and obtain results comparable to GPT-4. More recently, Zhuang et al.[35] systematically compare pointwise, pairwise, and listwise reranking, and propose a setwise prompting method that improves the effectiveness of zero-shot listwise approaches. Another idea explored in Liu et al.[17] is to apply"},{"citing_arxiv_id":"2604.23734","ref_index":19,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Prism-Reranker: Beyond Relevance Scoring -- Jointly Producing Contributions and Evidence for Agentic Retrieval","primary_cat":"cs.IR","submitted_at":"2026-04-26T14:28:48+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Prism-Reranker models output relevance, contribution statements, and evidence passages to support agentic retrieval beyond scalar scoring.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.22095","ref_index":1,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"An End-to-End Ukrainian RAG for Local Deployment. Optimized Hybrid Search and Lightweight Generation","primary_cat":"cs.CL","submitted_at":"2026-04-23T21:59:59+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"A two-stage hybrid search pipeline paired with a synthetic-data fine-tuned and compressed Ukrainian language model delivers competitive local question answering under strict compute limits.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.20401","ref_index":86,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Onyx: Cost-Efficient Disk-Oblivious ANN Search","primary_cat":"cs.CR","submitted_at":"2026-04-22T10:12:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Onyx inverts ANN-ORAM optimization priorities with a compact pruning representation and locality-aware shallow tree to deliver 1.7-9.9x lower cost and 2.3-12.3x lower latency for disk-oblivious ANN search.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.17906","ref_index":27,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Bayesian Active Learning with Gaussian Processes Guided by LLM Relevance Scoring for Dense Passage Retrieval","primary_cat":"cs.IR","submitted_at":"2026-04-20T07:32:56+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"BAGEL is a Bayesian active learning framework that uses Gaussian Processes to propagate LLM relevance signals across embedding space and guide global exploration, outperforming standard LLM reranking under identical budgets on four retrieval benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.17738","ref_index":17,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Mira-Embeddings-V1: Domain-Adapted Semantic Reranking for Recruitment via LLM-Synthesized Data","primary_cat":"cs.CL","submitted_at":"2026-04-20T02:51:12+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Mira-Embeddings-V1 adapts embeddings for recruitment reranking by synthesizing positive and hard-negative samples with LLMs, then applies JD-JD contrastive and JD-CV triplet training plus a BoundaryHead MLP, lifting Recall@50 from 68.89% to 77.55% and Recall@200 from 0.5969 to 0.7047.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"as the failure mode to be diagnosed and suppressed. Heavy reranking is effective, but our question is whether targeted local correction is enough.A separate line of work improves retrieval with powerful second-stage rerankers. Cross-encoders jointly score query-document pairs and are highly effective, but their computa- tional cost scales linearly with the number of candidates [17, 18]. Late-interaction models such as ColBERT offer a more efficient compromise through precomputed token-level representations [14], and sequence-to-sequence or LLM-based rerankers further improve generic ranking quality [18, 19, 26]. These methods are strong when the objective is broad relevance optimization with a heavy rerank- ing stage. Our setting is narrower: after first-stage retrieval, can"},{"citing_arxiv_id":"2604.17667","ref_index":16,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Peerispect: Claim Verification in Scientific Peer Reviews","primary_cat":"cs.CL","submitted_at":"2026-04-19T23:40:26+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Peerispect extracts claims from peer reviews, retrieves evidence from the manuscript, and verifies them via NLI in a modular pipeline with a visual interface.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.16915","ref_index":20,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"KIRA: Knowledge-Intensive Image Retrieval and Reasoning Architecture for Specialized Visual Domains","primary_cat":"cs.CV","submitted_at":"2026-04-18T08:47:08+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"KIRA is a unified architecture for visual RAG that reports 0.97 retrieval precision, 1.0 grounding, and 0.707 domain correctness across medical, circuit, satellite, and histopathology domains via hierarchical chunking, dual-path retrieval, and evidence-conditioned generation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.13721","ref_index":4,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"FRAGATA: Semantic Retrieval of HPC Support Tickets via Hybrid RAG over 20 Years of Request Tracker History","primary_cat":"cs.IR","submitted_at":"2026-04-15T10:53:49+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Fragata applies hybrid RAG to enable semantic retrieval of HPC support tickets across 20 years of history, handling language differences, typos, and varied wording better than traditional keyword search.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.14222","ref_index":8,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Adaptive Query Routing: A Tier-Based Framework for Hybrid Retrieval Across Financial, Legal, and Medical Documents","primary_cat":"cs.IR","submitted_at":"2026-04-14T10:48:13+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Tree reasoning outperforms vector search on complex document queries but a hybrid approach balances results across tiers, with validation showing an 11.7-point gap on real finance documents.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.12099","ref_index":3,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"The Effect of Document Selection on Query-focused Text Analysis","primary_cat":"cs.IR","submitted_at":"2026-04-13T22:19:20+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Semantic and hybrid document retrieval methods provide reliable, efficient selection for query-focused text analyses like LDA and BERTopic, outperforming random or keyword-only approaches.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.09492","ref_index":19,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Dynamic Ranked List Truncation for Reranking Pipelines via LLM-generated Reference-Documents","primary_cat":"cs.IR","submitted_at":"2026-04-10T16:59:54+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"for multi-stage retrieval pipelines poses a significant challenge due to the use of effective but computationally expensive reranking models [12, 15, 40]. Reranking strategies vary significantly across models and can be broadly categorized into pointwise, pairwise, and listwise setups [22, 26, 31-33]. Typically, in a pointwise or pairwise setup, the first-stage retrieve candidate list is pruned before reranking [19]. This method of prun- ing before reranking in a typical \"retrieve then rerank\" setup is known as ranked list truncation (RLT) and has been widely adopted in reranking pipelines [13, 16]. However, recent literature shows that a fixed cut-off or score-based heuristic method is often subopti- mal because they ignore query-specific relevance distributions and"},{"citing_arxiv_id":"2604.07274","ref_index":29,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"A Systematic Study of Retrieval Pipeline Design for Retrieval-Augmented Medical Question Answering","primary_cat":"cs.CL","submitted_at":"2026-04-08T16:37:22+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Dense retrieval plus query reformulation and reranking reaches 60.49% accuracy on MedQA USMLE, outperforming other setups while domain-specialized models make better use of the retrieved evidence.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.05719","ref_index":82,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Hackers or Hallucinators? A Comprehensive Analysis of LLM-Based Automated Penetration Testing","primary_cat":"cs.CR","submitted_at":"2026-04-07T11:19:16+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"The first SoK on LLM-based AutoPT frameworks provides a six-dimension taxonomy of agent designs and a unified empirical benchmark evaluating 15 frameworks via over 10 billion tokens and 1,500 manually reviewed logs.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"The retrieved candidate results often contain content with varying relevance to the current task, and using them directly affects the subsequent generation quality. To this end, the reranking mechanism, as a common supplement to dense retrieval, performs a secondary scoring on initial retrieval results through a two-stage or cross-encoder architecture [ 82]. For example, after VulnBot [ 64] uses an embedding model to retrieve the Top-k similar texts above a threshold, it cascades the bce-reranker model [ 130] to perform a re-ranking algorithm on these candidate retrieved texts, thereby ensuring that only the task nodes with the absolute highest relevance are retained. Similarly, when processing complex histor-"},{"citing_arxiv_id":"2604.05204","ref_index":13,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Entities as Retrieval Signals: A Systematic Study of Coverage, Supervision, and Evaluation in Entity-Oriented Ranking","primary_cat":"cs.IR","submitted_at":"2026-04-06T22:02:35+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Entity signals cover only 19.7% of relevant documents on Robust04 and no configuration among 443 systems improves MAP by more than 0.05 in open-world evaluation, despite gains when entities are pre-restricted.","context_count":1,"top_context_role":"baseline","top_context_polarity":"baseline","context_text":"Table 2. Open-world MAP on Robust04 for a range of neural reranking models, compared against the best QDER open-world configuration.▲/▼denote statistically significant improvement / degradation over BM25. Model Type MAP nDCG@20 P@20 BM25 Lexical 0.292 0.435 0.384 Cross-Encoder Models (Fine-Tuned) RankT5 [24] Cross-encoder 0.303 0.494▲0.429▲ MonoBERT [13] Cross-encoder 0.297 0.479 0.409 RoBERTa [10] Cross-encoder 0.290 0.474 0.410 DeBERTa [7] Cross-encoder 0.293 0.486 0.422 ELECTRA [3] Cross-encoder 0.268 0.446 0.387 ERNIE [23] Cross-encoder 0.289 0.475 0.412 Bi-Encoder Models ColBERT v2 [9] Bi-encoder (FT) 0.292 0.473▲0.410▲ DPR [8] Bi-encoder (ZS) 0.170▼0.300▼0.259▼ LLM-Based Models (Zero-Shot) RankVicuna [15] LLM listwise 0."},{"citing_arxiv_id":"2604.05034","ref_index":40,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Learning to Unscramble Feynman Loop Integrals with SAILIR","primary_cat":"hep-ph","submitted_at":"2026-04-06T18:00:04+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"A self-supervised transformer learns to unscramble Feynman integrals for online IBP reduction, delivering bounded memory use on complex two-loop topologies while matching Kira's speed on the hardest cases tested.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.04734","ref_index":21,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Beyond Hard Negatives: The Importance of Score Distribution in Knowledge Distillation for Dense Retrieval","primary_cat":"cs.IR","submitted_at":"2026-04-06T15:02:35+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Stratified sampling preserving teacher score distribution outperforms hard-negative mining as a robust baseline for knowledge distillation in dense retrieval.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2603.26815","ref_index":11,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Resolving the Robustness-Precision Trade-off in Financial RAG through Hybrid Document-Routed Retrieval","primary_cat":"cs.CL","submitted_at":"2026-03-26T18:05:38+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"HDRR combines document-level semantic routing with scoped chunk retrieval to outperform both pure chunk-based retrieval and semantic file routing on the FinDER benchmark, delivering higher average scores, lower failure rates, and more perfect answers.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.16329","ref_index":2,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Beyond Single-Score Ranking: Facet-Aware Reranking for Controllable Diversity in Paper Recommendation","primary_cat":"cs.IR","submitted_at":"2026-03-11T07:55:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SciFACE improves facet-specific paper ranking NDCG scores by training separate cross-encoders for Background and Method similarity on 5,891 GPT-4o-mini labeled pairs, outperforming SPECTER by up to 31 points.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2603.04816","ref_index":32,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Scaling Laws for Cross-Encoder Reranking","primary_cat":"cs.IR","submitted_at":"2026-03-05T05:03:07+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Cross-encoder reranker performance scales predictably via power laws with model size and training exposure, allowing accurate forecasts for 400M and 1B models and data-heavy compute allocation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.20849","ref_index":17,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"SPIRE: Structure-Preserving Interpretable Retrieval of Evidence","primary_cat":"cs.IR","submitted_at":"2026-02-12T03:46:10+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"SPIRE presents a tree-structured retrieval method using subdocuments, paths, and dual contextualization that produces higher-quality and more diverse citations than passage-based baselines on HTML QA benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.04936","ref_index":10,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Web Retrieval-Aware Chunking (W-RAC) for Efficient and Cost-Effective Retrieval-Augmented Generation Systems","primary_cat":"cs.IR","submitted_at":"2026-01-08T06:41:59+00:00","verdict":"UNVERDICTED","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"W-RAC decouples extraction from semantic planning via structured units and LLM grouping to match traditional retrieval performance at roughly 10x lower LLM token cost.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2509.16621","ref_index":13,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"The Role of Vocabularies in Learning Sparse Representations for Ranking","primary_cat":"cs.IR","submitted_at":"2025-09-20T10:44:26+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Larger 100K vocabularies in SPLADE models, especially those initialized with ESPLADE pretraining, improve retrieval effectiveness after pruning compared to 32K baselines while keeping similar efficiency.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2509.07794","ref_index":81,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Query Expansion in the Age of Pre-trained and Large Language Models: A Comprehensive Survey","primary_cat":"cs.IR","submitted_at":"2025-09-09T14:31:11+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A comprehensive survey that organizes query expansion methods in the PLM/LLM era along four design dimensions, synthesizes application patterns, and outlines future directions.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Yet traditional techniques typically rely on static associations or shallow co-occurrence statistics and struggle with short, ambiguous, or long-tail queries [16, 104]. Topic drift, limited domain coverage, and the precision-recall trade-off remain recurring pain points. Recent advances in PLMs and LLMs-e.g., BERT/RoBERTa-style encoders for context-sensitive representations [81], encoder-decoders such as T5/BART for controlled generation, and decoder-only LLMs (e.g., GPT-3/4, PaLM, LLaMA-family) for zero-/few-shot reasoning-have opened a broader design space for QE. These models support implicit expansion in the embedding space (e.g., PRF with dense vectors), selection-based term filtering with contextual encoders, and generative expansion via pseudo-documents or structured rationales."},{"citing_arxiv_id":"2506.03487","ref_index":3,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"ProRank: Prompt Warmup via Reinforcement Learning for Small Language Models Reranking","primary_cat":"cs.IR","submitted_at":"2025-06-04T02:00:44+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"ProRank uses RL-based prompt warmup and fine-grained scoring to train small language models that surpass LLM rerankers on BEIR.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2504.21015","ref_index":11,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Don't Retrieve, Generate: Prompting LLMs for Synthetic Training Data in Dense Retrieval","primary_cat":"cs.IR","submitted_at":"2025-04-20T08:34:10+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"LLM-generated synthetic hard negatives for training dense retrievers consistently underperform corpus-mined negatives from BM25 and cross-encoders across 10 BEIR datasets, with non-monotonic gains from scaling the generator from 4B to 30B parameters.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2406.11290","ref_index":4,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"An Iterative Utility Judgment Framework Inspired by Philosophical Relevance via LLMs","primary_cat":"cs.IR","submitted_at":"2024-06-17T07:52:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"ITEM is a new iterative utility judgment loop for RAG that maps Schutz's three levels of relevance to retrieval, utility scoring, and generation, yielding measured gains on TREC DL, WebAP, GTI-NQ, and NQ.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2402.19473","ref_index":151,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Retrieval-Augmented Generation for AI-Generated Content: A Survey","primary_cat":"cs.CV","submitted_at":"2024-02-29T18:59:01+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A survey classifying RAG foundations for AIGC, summarizing enhancements, cross-modal applications, benchmarks, limitations, and future directions.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"\"acting token\", which determines the source from which to retrieve information. Koley et al. [149] enhance image retrieval by integrating sketch and text for fine-grained retrieval, yield- ing improved results. Re-ranking: The Rerank technique refers to reordering the retrieved content in order to achieve greater diversity and better results. Re2G [150] applies a re-ranker [151] model after the tradi- tional retriever to reduce the impact of information loss caused by compressing text into vectors. AceCoder [152] reranks the retrieved programs with a selector to reduce redundant programs and obtain diverse retrieved programs. XRICL [153] uses a distillation-based exemplar reranker after retrieval. Rangan [154] employs the Quantized Influence Measure, as-"},{"citing_arxiv_id":"2312.02724","ref_index":23,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"RankZephyr: Effective and Robust Zero-Shot Listwise Reranking is a Breeze!","primary_cat":"cs.IR","submitted_at":"2023-12-05T12:39:00+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"RankZephyr is a new open-source LLM that closes the effectiveness gap with GPT-4 for zero-shot listwise reranking while showing robustness to input ordering and document count.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2208.03299","ref_index":94,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Atlas: Few-shot Learning with Retrieval Augmented Language Models","primary_cat":"cs.CL","submitted_at":"2022-08-05T17:39:22+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Atlas reaches over 42% accuracy on Natural Questions with only 64 examples, outperforming a 540B-parameter model by 3% with 50x fewer parameters.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2112.09118","ref_index":163,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Unsupervised Dense Information Retrieval with Contrastive Learning","primary_cat":"cs.IR","submitted_at":"2021-12-16T18:57:37+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Contrastive learning trains unsupervised dense retrievers that beat BM25 on most BEIR datasets and support cross-lingual retrieval across scripts.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}