arXiv preprint arXiv:2108.13897 , year=

Luiz Bonifacio, Vitor Jeronymo, Hugo Abonizio, Israel Campiotti, Marzieh Fadaee, Roberto Lotufo, Rodrigo Nogueira · 2021 · arXiv 2108.13897

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

read on arXiv browse 10 citing papers

citation-role summary

dataset 2 background 1

citation-polarity summary

use dataset 2 unclear 1

representative citing papers

When Hard Negatives Hurt: Bridging the Generative-Discriminative Gap in Hard Negative Synthesis for Retrieval

cs.LG · 2026-05-31 · unverdicted · novelty 7.0

Identifies the generative-discriminative gap in LLM hard negative synthesis for retrieval and proposes CausalNeg using CoT counterfactual perturbation plus query-view entropy maximization to generate more effective negatives.

M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation

cs.CL · 2024-02-05 · unverdicted · novelty 7.0

M3-Embedding is a single model for multi-lingual, multi-functional, and multi-granular text embeddings trained via self-knowledge distillation that achieves new state-of-the-art results on multilingual, cross-lingual, and long-document retrieval benchmarks.

C-Pack: Packed Resources For General Chinese Embeddings

cs.CL · 2023-09-14 · accept · novelty 7.0

C-Pack releases a new Chinese embedding benchmark, large training dataset, and optimized models that outperform priors by up to 10% on C-MTEB while also delivering English SOTA results.

Field Order Should Not Matter: Permutation-Invariant Embedding Model Fine-Tuning for Structured Metadata Retrieval

cs.CL · 2026-06-29 · unverdicted · novelty 6.0

Permutation-invariant fine-tuning (PI-FT) randomizes field order and applies dropout during embedding model training to eliminate sensitivity to serialization order, reducing order-change penalty from 7.4 to 0.2 nDCG@10 on a generated multilingual DevDataBench while outperforming zero-shot baselines

Lost in Decoding? Reproducing and Stress-Testing the Look-Ahead Prior in Generative Retrieval

cs.IR · 2026-04-25 · accept · novelty 6.0

Reproduction confirms PAG boosts generative retrieval effectiveness, but its look-ahead planning signal collapses under intent-preserving typos and query mismatches, reverting performance to unguided decoding.

MARCA: A Checklist-Based Benchmark for Multilingual Web Search

cs.CL · 2026-04-15 · accept · novelty 6.0

MARCA is a bilingual benchmark using 52 questions and validated checklists to evaluate LLM web-search completeness and correctness in English and Portuguese.

Source-Grounded Semantic Reinforcement Learning for Low-Resource Target-Language Generation

cs.CL · 2026-05-28 · unverdicted · novelty 5.0

SG-SRL applies cross-lingual semantic RL on source monolingual data plus a recovery stage to improve semantic grounding over standard SFT in low-resource target-language generation.

All Languages Matter: Understanding and Mitigating Language Bias in Multilingual RAG

cs.CL · 2026-04-22 · unverdicted · novelty 5.0

Multilingual RAG rerankers exhibit language bias that limits cross-lingual evidence use, and the proposed LAURA method aligns ranking with downstream generation utility to reduce the bias and improve performance.

FRAGATA: Semantic Retrieval of HPC Support Tickets via Hybrid RAG over 20 Years of Request Tracker History

cs.IR · 2026-04-15 · unverdicted · novelty 4.0

Fragata applies hybrid RAG to enable semantic retrieval of HPC support tickets across 20 years of history, handling language differences, typos, and varied wording better than traditional keyword search.

From Tokens to Concepts: Leveraging SAE for SPLADE

cs.IR · 2026-04-23

citing papers explorer

Showing 10 of 10 citing papers.

When Hard Negatives Hurt: Bridging the Generative-Discriminative Gap in Hard Negative Synthesis for Retrieval cs.LG · 2026-05-31 · unverdicted · none · ref 3
Identifies the generative-discriminative gap in LLM hard negative synthesis for retrieval and proposes CausalNeg using CoT counterfactual perturbation plus query-view entropy maximization to generate more effective negatives.
M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation cs.CL · 2024-02-05 · unverdicted · none · ref 64
M3-Embedding is a single model for multi-lingual, multi-functional, and multi-granular text embeddings trained via self-knowledge distillation that achieves new state-of-the-art results on multilingual, cross-lingual, and long-document retrieval benchmarks.
C-Pack: Packed Resources For General Chinese Embeddings cs.CL · 2023-09-14 · accept · none · ref 10
C-Pack releases a new Chinese embedding benchmark, large training dataset, and optimized models that outperform priors by up to 10% on C-MTEB while also delivering English SOTA results.
Field Order Should Not Matter: Permutation-Invariant Embedding Model Fine-Tuning for Structured Metadata Retrieval cs.CL · 2026-06-29 · unverdicted · none · ref 2
Permutation-invariant fine-tuning (PI-FT) randomizes field order and applies dropout during embedding model training to eliminate sensitivity to serialization order, reducing order-change penalty from 7.4 to 0.2 nDCG@10 on a generated multilingual DevDataBench while outperforming zero-shot baselines
Lost in Decoding? Reproducing and Stress-Testing the Look-Ahead Prior in Generative Retrieval cs.IR · 2026-04-25 · accept · none · ref 2
Reproduction confirms PAG boosts generative retrieval effectiveness, but its look-ahead planning signal collapses under intent-preserving typos and query mismatches, reverting performance to unguided decoding.
MARCA: A Checklist-Based Benchmark for Multilingual Web Search cs.CL · 2026-04-15 · accept · none · ref 5
MARCA is a bilingual benchmark using 52 questions and validated checklists to evaluate LLM web-search completeness and correctness in English and Portuguese.
Source-Grounded Semantic Reinforcement Learning for Low-Resource Target-Language Generation cs.CL · 2026-05-28 · unverdicted · none · ref 1
SG-SRL applies cross-lingual semantic RL on source monolingual data plus a recovery stage to improve semantic grounding over standard SFT in low-resource target-language generation.
All Languages Matter: Understanding and Mitigating Language Bias in Multilingual RAG cs.CL · 2026-04-22 · unverdicted · none · ref 5
Multilingual RAG rerankers exhibit language bias that limits cross-lingual evidence use, and the proposed LAURA method aligns ranking with downstream generation utility to reduce the bias and improve performance.
FRAGATA: Semantic Retrieval of HPC Support Tickets via Hybrid RAG over 20 Years of Request Tracker History cs.IR · 2026-04-15 · unverdicted · none · ref 14
Fragata applies hybrid RAG to enable semantic retrieval of HPC support tickets across 20 years of history, handling language differences, typos, and varied wording better than traditional keyword search.
From Tokens to Concepts: Leveraging SAE for SPLADE cs.IR · 2026-04-23 · unreviewed · ref 3

arXiv preprint arXiv:2108.13897 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer