{"total":14,"items":[{"citing_arxiv_id":"2606.30473","ref_index":2,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Field Order Should Not Matter: Permutation-Invariant Embedding Model Fine-Tuning for Structured Metadata Retrieval","primary_cat":"cs.CL","submitted_at":"2026-06-29T15:33:04+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Permutation-invariant fine-tuning (PI-FT) randomizes field order and applies dropout during embedding model training to eliminate sensitivity to serialization order, reducing order-change penalty from 7.4 to 0.2 nDCG@10 on a generated multilingual DevDataBench while outperforming zero-shot baselines","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.22807","ref_index":3,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"KaLM-Reranker-V1: Fast but Not Late Interaction for Compressed Document Reranking","primary_cat":"cs.CL","submitted_at":"2026-06-22T03:36:31+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"KaLM-Reranker-V1 introduces a fast but not late-interaction reranker that decouples passage pre-encoding from query processing via encoder-decoder architecture and cross-attention to achieve efficiency and competitive performance.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.22722","ref_index":5,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"moBERTo: A Modern Encoder for Portuguese via Continued Pretraining of ModernBERT","primary_cat":"cs.CL","submitted_at":"2026-06-21T23:44:47+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Continued pretraining of ModernBERT on curated Portuguese data produces moBERTo, which reports top results on Portuguese retrieval reranking and PLUE-PT benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.13537","ref_index":27,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"When Does Mixing Help? Analyzing Query Embedding Interpolation in Multilingual Dense Retrieval","primary_cat":"cs.CL","submitted_at":"2026-06-11T16:23:16+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Optimal interpolation of query embeddings from parallel translations outperforms the best monolingual query in 88/105 cases on mMARCO, showing English-driven asymmetry and negative correlation with typological distance.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.01995","ref_index":2,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"CARTE: A Benchmark for Mapping Language Model Knowledge Across France","primary_cat":"cs.CL","submitted_at":"2026-06-01T09:50:50+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"CARTE is a new benchmark for fine-grained regional knowledge in France that shows LLMs exhibit performance gaps across regions and scales, pointing to uneven pretraining coverage.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.01304","ref_index":3,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"When Hard Negatives Hurt: Bridging the Generative-Discriminative Gap in Hard Negative Synthesis for Retrieval","primary_cat":"cs.LG","submitted_at":"2026-05-31T15:48:30+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Identifies the generative-discriminative gap in LLM hard negative synthesis for retrieval and proposes CausalNeg using CoT counterfactual perturbation plus query-view entropy maximization to generate more effective negatives.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.29502","ref_index":1,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Source-Grounded Semantic Reinforcement Learning for Low-Resource Target-Language Generation","primary_cat":"cs.CL","submitted_at":"2026-05-28T07:27:16+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"SG-SRL applies cross-lingual semantic RL on source monolingual data plus a recovery stage to improve semantic grounding over standard SFT in low-resource target-language generation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.23396","ref_index":2,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Lost in Decoding? Reproducing and Stress-Testing the Look-Ahead Prior in Generative Retrieval","primary_cat":"cs.IR","submitted_at":"2026-04-25T17:58:15+00:00","verdict":"ACCEPT","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Reproduction confirms PAG boosts generative retrieval effectiveness, but its look-ahead planning signal collapses under intent-preserving typos and query mismatches, reverting performance to unguided decoding.","context_count":1,"top_context_role":"dataset","top_context_polarity":"use_dataset","context_text":"characterize when planning provides reliable guidance and how reliability degrades under shift. Stress tests and scope.We conduct an inference-time reproduc- tion and two stress tests of PAG. First, we evaluate robustness under intent-preserving query variations. Second, we evaluate a stricterquery-document language mismatchsetting by issuing non- EnglishmMARCOqueries [ 2] against the fixed EnglishMS MARCO collection and released identifier trie, without re-indexing. This mismatch setting is a direct test of whether PAG's planning mecha- nism remains useful when query-side surface form diverges from the evidence space on which the planner and identifier trie were built. For this setting, we evaluate two query-side mitigations with"},{"citing_arxiv_id":"2604.21511","ref_index":3,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"From Tokens to Concepts: Leveraging SAE for SPLADE","primary_cat":"cs.IR","submitted_at":"2026-04-23T10:13:21+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.20199","ref_index":5,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"All Languages Matter: Understanding and Mitigating Language Bias in Multilingual RAG","primary_cat":"cs.CL","submitted_at":"2026-04-22T05:33:06+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Multilingual RAG rerankers exhibit language bias that limits cross-lingual evidence use, and the proposed LAURA method aligns ranking with downstream generation utility to reduce the bias and improve performance.","context_count":1,"top_context_role":"background","top_context_polarity":"unclear","context_text":"Он разработал все 900 костюмов, использованных в сценах на корабле «Флостон Парадайз». Костюм Лилу из белых полос ткани Готье создал, вдохновившись картиной Фриды Кало «Сломанная колонна». В течение года команда создала более 8000 рисунков. В это время Бессон предложил на главную роль Брюса Уиллиса и Мела Гибсона, а также рассматривал Джулию Робертс на роль Лилу... [5] (zh) 第五元素 (電影). 米拉·乔沃维奇饰） 的人形女性。 莉露对周围的一切深感恐惧， 逃出实验 室后，她从楼层的外沿跳了下去，正好掉进前特种部队少校科本·达拉斯（布鲁斯·威利斯饰）所 开的出租车裡... Model Answer Milla Jovovich plays the blue lady, Leeloo, in The Fifth Element. Wrong Oracle Top-5 [1] (de) Das fünfte Element. Das fünfte Element (Originaltitel: Le Cinquième Élément) ist ein Science- Fiction-Film von Luc Besson mit Bruce Willis und Milla Jovovich aus dem Jahr 1997."},{"citing_arxiv_id":"2604.14448","ref_index":5,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"MARCA: A Checklist-Based Benchmark for Multilingual Web Search","primary_cat":"cs.CL","submitted_at":"2026-04-15T21:54:27+00:00","verdict":"ACCEPT","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MARCA is a bilingual benchmark using 52 questions and validated checklists to evaluate LLM web-search completeness and correctness in English and Portuguese.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.13721","ref_index":14,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"FRAGATA: Semantic Retrieval of HPC Support Tickets via Hybrid RAG over 20 Years of Request Tracker History","primary_cat":"cs.IR","submitted_at":"2026-04-15T10:53:49+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Fragata applies hybrid RAG to enable semantic retrieval of HPC support tickets across 20 years of history, handling language differences, typos, and varied wording better than traditional keyword search.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2402.03216","ref_index":64,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation","primary_cat":"cs.CL","submitted_at":"2024-02-05T17:26:49+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"M3-Embedding is a single model for multi-lingual, multi-functional, and multi-granular text embeddings trained via self-knowledge distillation that achieves new state-of-the-art results on multilingual, cross-lingual, and long-document retrieval benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2309.07597","ref_index":10,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"C-Pack: Packed Resources For General Chinese Embeddings","primary_cat":"cs.CL","submitted_at":"2023-09-14T10:57:50+00:00","verdict":"ACCEPT","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"C-Pack releases a new Chinese embedding benchmark, large training dataset, and optimized models that outperform priors by up to 10% on C-MTEB while also delivering English SOTA results.","context_count":1,"top_context_role":"dataset","top_context_polarity":"use_dataset","context_text":"CL] 24 Sep 2024 SIGIR '24, July 14-18, 2024, Washington, DC, USA Shitao Xiao et al. scale, diversity, and quality. To achieve high discriminative power for the embeddings, it may take more than hundreds of millions of training instances [22, 40, 53], which is orders of magnitude greater than typical task-specific datasets, like MS MARCO [38] and NLI [10, 55]. Besides scale, the training data needs to be collected from a wide range of sources so as to improve the generality across different tasks [22, 53]. Finally, the augmentation of scale and diversity will probably introduce noise. Thus, the collected data must be properly cleaned before being utilized for the training of embeddings [53]. • Training."}],"limit":50,"offset":0}