{"total":24,"items":[{"citing_arxiv_id":"2606.23642","ref_index":18,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Improving Long-Context Retrieval with Multi-Prefix Embedding","primary_cat":"cs.IR","submitted_at":"2026-06-22T17:31:02+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Multi-Prefix Embedding extracts per-chunk embeddings from a single forward pass over EOS-separated document chunks and matches via MaxSim while training only on document-level labels.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.19719","ref_index":29,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Closing the Calibration Gap in Semantic Caching","primary_cat":"cs.IR","submitted_at":"2026-06-18T02:34:33+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Introduces P-CHR AUC and CRR metrics to demonstrate that semantic caching model selection is limited by calibration quality rather than ranking performance.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.18781","ref_index":21,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Lost in a Single Vector: Improving Long-Document Retrieval with Chunk Evidence Aggregation","primary_cat":"cs.CL","submitted_at":"2026-06-17T07:44:04+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"DICE aggregates independently encoded document chunks into a single vector to reduce evidence dilution in long-document dense retrieval, reporting gains on LongEmbed especially beyond 4k tokens.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.04240","ref_index":21,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Overview of the EReL@MIR 2025 Multimodal Document Retrieval Challenge (Track 1)","primary_cat":"cs.CV","submitted_at":"2026-06-02T21:39:32+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":1.0,"formal_verification":"none","one_line_summary":"The EReL@MIR 2025 Track 1 challenge evaluates single systems on two multimodal retrieval tasks and finds that Qwen2-VL decoder-based embedders dominate, with a training-free entry within 0.1 points of the fine-tuned winner.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.22247","ref_index":47,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"IdioLink: Retrieving Meaning Beyond Words Across Idiomatic and Literal Expressions","primary_cat":"cs.CL","submitted_at":"2026-05-21T09:53:10+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"IdioLink introduces a benchmark dataset and evaluation showing that strong embedding models struggle to retrieve equivalent meanings across idiomatic and literal forms, relying on shallow cues instead.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.11374","ref_index":25,"ref_count":2,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Test-Time Compute for Frozen Embedding Models through Agentic Program Search","primary_cat":"cs.LG","submitted_at":"2026-05-12T00:56:34+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Agentic program search over a frozen encoder API yields retrieval programs that improve nDCG@10 on held-out tasks and unseen encoder families with no per-domain training.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.10109","ref_index":36,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"NumColBERT: Non-Intrusive Numeracy Injection for Late-Interaction Retrieval Models","primary_cat":"cs.IR","submitted_at":"2026-05-11T07:24:30+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"NumColBERT improves ColBERT performance on numerical query conditions non-intrusively via gating and contrastive learning, outperforming fine-tuning while matching or exceeding separate text-number scoring methods.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"[35] Dhanasekar Sundararaman, Shijing Si, Vivek Subramanian, Guoyin Wang, Deva- manyu Hazarika, and Lawrence Carin. 2020. Methods for Numeracy-Preserving Word Embeddings. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, Online, 4742-4753. doi:10.18653/v1/2020.emnlp-main.384 [36] Avijit Thawani, Jay Pujara, Filip Ilievski, and Pedro Szekely. 2021. Representing Numbers in NLP: a Survey and a Vision. InProceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguis- tics: Human Language Technologies, Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tur, Iz Beltagy, Steven Bethard, Ryan Cotterell,"},{"citing_arxiv_id":"2606.28327","ref_index":20,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"The Interference Gap: Comparing Retrieval Bounds in Human Memory and RAG Systems","primary_cat":"cs.IR","submitted_at":"2026-05-09T05:51:21+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Unified SDT model finds humans less sensitive to interference (α/σ=0.41) than dense passage retrieval (0.67), with HippoRAG intermediate (0.44), backed by N=112 experiments and simulations favoring logarithmic over power-law decline.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.07210","ref_index":5,"ref_count":2,"confidence":0.88,"is_internal_anchor":false,"paper_title":"DiffRetriever: Parallel Representative Tokens for Retrieval with Diffusion Language Models","primary_cat":"cs.IR","submitted_at":"2026-05-08T03:57:51+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"DiffRetriever uses parallel masked tokens in diffusion LMs for retrieval representations, outperforming DiffEmbed and other baselines on aggregate effectiveness while supporting efficient multi-representation matching.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"autoregressive and diffusion backbones; only the decoding strategy differs at training time. For each query q, let p+ be a positive passage and let P be a pool of negatives (sampled hard negatives plus in-batch passages from other queries). The dense loss is InfoNCE with temperatureτ: Ldense =−log exp sdense(q, p+)/τ \u0001 P p∈P exp sdense(q, p)/τ \u0001 .(5) The sparse loss is the analogous InfoNCE on ssparse, applied without temperature. The training objective is their sum,L=L dense +L sparse. At training time, we set (Kq, Kp) to the same values used at zero-shot, so a backbone is trained and evaluated under the same budget. Diffusion backbones use parallel [MASK] prediction as in §3.2; autoregressive backbones use sequential decoding."},{"citing_arxiv_id":"2605.05806","ref_index":30,"ref_count":2,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Retrieval from Within: An Intrinsic Capability of Attention-Based Models","primary_cat":"cs.LG","submitted_at":"2026-05-07T07:42:28+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Attention-based models can retrieve evidence intrinsically by using decoder attention to score and reuse their own pre-encoded chunks, outperforming separate retrieval pipelines on QA benchmarks.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Multi-pass agentic RAG systems instead interleave reasoning and repeated retrieval [ 21, 1, 31]. INTRA targets the single-pass retrieval block that could be used within such pipelines, rather than the pipeline-level agentic loop itself. Late Interaction and Representation-Space Retrieval.Our retrieval formulation is close to late- interaction systems such as ColBERT [18], ColBERTv2 [30] and ColPali [8], which compare query and document tokens via MaxSim-style matching over multi-vector representations. Whereas late- interaction systems rely on a dedicated retriever to score query-document matches, INTRA lets the decoder's own cross-attention perform this matching and then consume the matched representations during generation. Memory, Latent Retrieval, and Unified Retrieval-Generation."},{"citing_arxiv_id":"2605.02950","ref_index":41,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Kernel Affine Hull Machines as Compute-Efficient Encoders for Frozen Semantic Spaces","primary_cat":"cs.LG","submitted_at":"2026-05-01T17:46:26+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"the ﬁnal encoding error (posterior mismatch, ﬁnite-sample approximation, and teacher-noise effects) under the mod- eling and steady-state assumptions used in this paper . This interpretation is consistent with standard adaptive-ﬁlte r analyses, where sufﬁciently small step sizes and sufﬁcient ly long runs are used to characterize steady-state behavior rather than exact transient identities [41, 13]. 4 Experiments This section evaluates the proposed encoding method on an Au strian-law retrieval benchmark. The central empirical question is whether the proposed KAHM-based encoder can ser ve, in this domain, as a compute-efﬁcient substitute for online transformer query encoding by mapping inexpensi ve lexical query features into a high-quality semantic"},{"citing_arxiv_id":"2605.00646","ref_index":25,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"A Replicability Study of XTR","primary_cat":"cs.IR","submitted_at":"2026-05-01T13:28:09+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"XTR training does not improve retrieval effectiveness over ColBERT but enhances IVF engine efficiency by flattening token scores to produce more discriminative centroids.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.28142","ref_index":30,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Efficient Multivector Retrieval with Token-Aware Clustering and Hierarchical Indexing","primary_cat":"cs.IR","submitted_at":"2026-04-30T17:30:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"TACHIOM speeds up multivector retrieval by up to 247x in clustering and 9.8x in retrieval on MS-MARCOv1 and LoTTE benchmarks using token-distribution-aware centroid allocation and a graph-plus-PQ index, with comparable effectiveness to prior systems.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.27852","ref_index":38,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"NeocorRAG: Less Irrelevant Information, More Explicit Evidence, and More Effective Recall via Evidence Chains","primary_cat":"cs.IR","submitted_at":"2026-04-30T13:37:01+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"NeocorRAG uses Evidence Chains to achieve SOTA retrieval quality in RAG on HotpotQA, 2WikiMultiHopQA, MuSiQue, and NQ for 3B and 70B models while using under 20% of the tokens of comparable methods.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"near-linear decay as Recall@5 increases, exposing a \"high recall, low conversion\" phenomenon. As illustrated in Figure 1, even with optimal recall scores, the retrieved content can still contain noisy text that severely interferes with the model's reasoning. This observation exposes a fundamental limitation of commonly used retrieval metrics, such as Recall@n [3] and NDCG [38], which prioritize surface-level matching of relevant snippets while over- looking whether the retrieved content truly provides faithful, non- misleading evidence that supports downstream reasoning. Con- sequently, improvements in retrieval metrics often fail to yield corresponding gains in reasoning performance. Reasoning-enhanced methods have recognized the importance"},{"citing_arxiv_id":"2604.27037","ref_index":48,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Hypencoder Revisited: Reproducibility and Analysis of Non-Linear Scoring for First-Stage Retrieval","primary_cat":"cs.IR","submitted_at":"2026-04-29T17:05:53+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":3.0,"formal_verification":"none","one_line_summary":"Reproducibility study confirms Hypencoder's non-linear query-specific scoring improves retrieval over bi-encoders on standard benchmarks but standard methods remain faster and hard-task results are mixed due to implementation issues.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, 14918-14937. doi:10.18653/V1/2023.EMNLP-MAIN.923 [48] Panuthep Tasawong, Wuttikorn Ponwitayarat, Peerat Limkonchotiwat, Can Udomcharoenchaikit, Ekapol Chuangsuwanich, and Sarana Nutanong. 2023. Typo-Robust Representation Learning for Dense Retrieval. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki (Eds."},{"citing_arxiv_id":"2604.26649","ref_index":26,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"When to Retrieve During Reasoning: Adaptive Retrieval for Large Reasoning Models","primary_cat":"cs.IR","submitted_at":"2026-04-29T13:15:44+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"ReaLM-Retrieve uses step-level uncertainty to trigger retrievals during reasoning, achieving 10.1% better F1 scores and 47% fewer calls on multi-hop QA benchmarks.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"FLARE [11] triggers retrieval when token probability falls below a threshold, but requires token-level probabilities unavail- able from completion-only models. DRAGIN [ 27] uses attention entropy but requires internal states. Self-RAG [1] learns special to- kens through fine-tuning, achieving strong performance but requir- ing full model fine-tuning impossible for proprietary models. RE- PLUG [26] treats models as black boxes but operates at query-level without mid-reasoning intervention. A separate line of work makes the retrieval decisionat the query levelbased on question com- plexity: Adaptive-RAG [10] routes queries to no-retrieval, single- retrieval, or multi-retrieval strategies, and Open-RAG [ 9] learns reflection tokens that determine retrieval necessity per query."},{"citing_arxiv_id":"2604.17237","ref_index":15,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"HeadRank: Decoding-Free Passage Reranking via Preference-Aligned Attention Heads","primary_cat":"cs.IR","submitted_at":"2026-04-19T03:43:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"HeadRank lifts preference optimization into attention space via entropy-regularized head selection and distribution regularizers to sharpen discriminability for efficient listwise reranking.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.16576","ref_index":58,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"On the Robustness of LLM-Based Dense Retrievers: A Systematic Analysis of Generalizability and Stability","primary_cat":"cs.IR","submitted_at":"2026-04-17T13:02:29+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"LLM-based dense retrievers generalize better when instruction-tuned but pay a specialization tax when optimized for reasoning; they resist typos and corpus poisoning better than encoder-only baselines yet remain vulnerable to semantic perturbations, with larger models and certain embedding geometry,","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"4 Yongkang Li, Panagiotis Eustratiadis, Yixing Fan, and Evangelos Kanoulas GTE [39], NV-Retriever [50], and Qwen3-Embedding [74], further improved transfer across retrieval tasks and achieved strong performance on broad evaluation benchmarks such as BEIR [66]. More recently, a further specialization has emerged in retrieval models designed for reasoning-intensive queries. ReasonIR [58] incorporates chain-of-thought traces into training, DIVER [48] adopts a multi-stage training scheme for complex inference, and ReasonEmbed [8] augments the Qwen3 backbone with logical dependency modeling. These models perform strongly on reasoning-oriented benchmarks such as BRIGHT [62], but their growing specialization also raises a broader question: whether improvements on reasoning-heavy benchmarks translate into robust performance"},{"citing_arxiv_id":"2605.18769","ref_index":90,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"ClusterRAG: Cluster-Based Collaborative Filtering for Personalized Retrieval-Augmented Generation","primary_cat":"cs.IR","submitted_at":"2026-04-14T01:52:09+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"ClusterRAG applies density-based clustering to user profiles for collaborative retrieval in personalized RAG and reports best performance on LaMP tasks by combining target and similar-user profiles.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.05253","ref_index":4,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Spike Hijacking in Late-Interaction Retrieval","primary_cat":"cs.IR","submitted_at":"2026-04-06T23:31:03+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Hard maximum similarity pooling in late-interaction models induces higher patch-level gradient concentration and greater length sensitivity than top-k or softmax alternatives.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2603.09933","ref_index":28,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"A Voronoi Cell Formulation for Principled Token Pruning in Late-Interaction Retrieval Models","primary_cat":"cs.IR","submitted_at":"2026-03-10T17:28:25+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A Voronoi cell estimation framework in embedding space enables principled token pruning for late-interaction models, reducing index size while retaining retrieval quality.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2412.13663","ref_index":180,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference","primary_cat":"cs.CL","submitted_at":"2024-12-18T09:39:44+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"ModernBERT is a new bidirectional encoder model achieving SOTA performance on diverse classification and retrieval benchmarks while offering superior speed and memory efficiency for long-context inference.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2401.18059","ref_index":160,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval","primary_cat":"cs.CL","submitted_at":"2024-01-31T18:30:21+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"RAPTOR introduces a tree-organized retrieval method using recursive abstractive summaries, achieving a 20% absolute accuracy improvement on the QuALITY benchmark when paired with GPT-4.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2306.15595","ref_index":15,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Extending Context Window of Large Language Models via Positional Interpolation","primary_cat":"cs.CL","submitted_at":"2023-06-27T16:26:26+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Position Interpolation linearly down-scales position indices to extend RoPE context windows to 32768 tokens with 1000-step fine-tuning, delivering strong long-context results on LLaMA 7B-65B while preserving short-context quality.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}