hub Mixed citations

M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation

Jianlv Chen, Shitao Xiao, Peitian Zhang, Kun Luo, Defu Lian, Zheng Liu · 2024 · cs.CL · arXiv 2402.03216

Mixed citation behavior. Most common role is background (39%).

98 Pith papers citing it

Background 39% of classified citations

open full Pith review browse 98 citing papers arXiv PDF

abstract

In this paper, we introduce a new embedding model called M3-Embedding, which is distinguished for its versatility in \textit{Multi-Linguality}, \textit{Multi-Functionality}, and \textit{Multi-Granularity}. It provides a uniform support for the semantic retrieval of more than 100 working languages. It can simultaneously accomplish the three common retrieval functionalities: dense retrieval, multi-vector retrieval, and sparse retrieval. Besides, it is also capable of processing inputs of different granularities, spanning from short sentences to long documents of up to 8,192 tokens. The effective training of M3-Embedding presents a series of technical contributions. Notably, we propose a novel self-knowledge distillation approach, where the relevance scores from different retrieval functionalities can be integrated as the teacher signal to enhance the training quality. We also optimize the batching strategy, which enables a large batch size and high training throughput to improve the discriminativeness of embeddings. M3-Embedding exhibits a superior performance in our experiment, leading to new state-of-the-art results on multilingual, cross-lingual, and long-document retrieval benchmarks.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 8 method 5 baseline 3 dataset 2

citation-polarity summary

background 7 use method 5 baseline 3 use dataset 2 unclear 1

claims ledger

abstract In this paper, we introduce a new embedding model called M3-Embedding, which is distinguished for its versatility in \textit{Multi-Linguality}, \textit{Multi-Functionality}, and \textit{Multi-Granularity}. It provides a uniform support for the semantic retrieval of more than 100 working languages. It can simultaneously accomplish the three common retrieval functionalities: dense retrieval, multi-vector retrieval, and sparse retrieval. Besides, it is also capable of processing inputs of different granularities, spanning from short sentences to long documents of up to 8,192 tokens. The effective

co-cited works

representative citing papers

CORTEX: High-Quality Cross-Domain Organization of Web-Scale Corpora through Ontological Corpus Graph

cs.CL · 2026-06-29 · unverdicted · novelty 7.0

Cortex uses an Ontological Corpus Graph to structure web-scale corpora, creating a refined 24.14B-token corpus and a new benchmark validated on eight LLMs.

Diagnosing and Mitigating Retrieval Bottlenecks in LLM-Based Cold-Start Recommendation

cs.IR · 2026-06-29 · conditional · novelty 7.0

Retrieval coverage limits LLM rerankers in cold-start recommendation; a learned hybrid fusion improves pool quality but LLM reranking often degrades end-to-end performance while simpler rankers exploit the pool.

Beyond the Reranker: Do RAG Retrieval Enhancements Help Once a Strong Reranker Is Present?

cs.IR · 2026-06-14 · conditional · novelty 7.0

On heterogeneous document collections, only query expansion and a newly introduced per-source calibrated corrector (SSCC) deliver reliable gains beyond a strong cross-encoder reranker; other common retrieval enhancements do not.

Towards Cost-effective LLMs Routing with Batch Prompting

cs.DB · 2026-05-27 · unverdicted · novelty 7.0

RoBatch is a two-stage framework that formulates and solves the joint Route with Batching Problem via a batch-aware proxy utility model and greedy scheduling, outperforming separate routing or batching baselines on six benchmarks.

Very Efficient Listwise Multimodal Reranking for Long Documents

cs.IR · 2026-05-12 · unverdicted · novelty 7.0

ZipRerank delivers state-of-the-art multimodal listwise reranking accuracy for long documents at up to 10x lower latency via early interaction and single-pass scoring.

Nautilus Compass: Black-box Persona Drift Detection for Production LLM Agents

cs.CR · 2026-05-11 · unverdicted · novelty 7.0

Nautilus Compass is a black-box drift detector for production LLM agents that uses weighted cosine similarity on BGE-m3 embeddings of raw text against anchors, achieving 0.83 ROC AUC on real session traces while shipping as plugins and servers with an audit log.

QuIVer: Rethinking ANN Graph Topology via Training-Free Binary Quantization

cs.DB · 2026-05-04 · unverdicted · novelty 7.0 · 2 refs

QuIVer performs Vamana-style graph construction entirely inside a 2-bit Sign-Magnitude BQ space, achieving >=88% Recall@10 on contrastive-learning embeddings and 2.5-5.5x higher throughput than DiskANN/HNSW at matched recall with 4.7x less hot memory.

Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory

cs.CL · 2026-05-01 · unverdicted · novelty 7.0

MemCoE learns memory organization guidelines via contrastive feedback and then trains a guideline-aligned RL policy for memory updates, yielding consistent gains on personalization benchmarks.

Purifying Multimodal Retrieval: Fragment-Level Evidence Selection for RAG

cs.IR · 2026-04-30 · unverdicted · novelty 7.0

FES-RAG reframes multimodal RAG as fragment-level selection using Fragment Information Gain to outperform document-level methods with up to 27% relative CIDEr gains on M2RAG while shortening context.

Prism-Reranker: Beyond Relevance Scoring -- Jointly Producing Contributions and Evidence for Agentic Retrieval

cs.IR · 2026-04-26 · accept · novelty 7.0

Prism-Reranker models output relevance, contribution statements, and evidence passages to support agentic retrieval beyond scalar scoring.

Latent Abstraction for Retrieval-Augmented Generation

cs.CL · 2026-04-20 · unverdicted · novelty 7.0

LAnR unifies retrieval-augmented generation inside a single LLM by deriving dense retrieval vectors from a [PRED] token's hidden states and using entropy to adaptively stop retrieval, outperforming prior RAG on six QA benchmarks with better efficiency.

vstash: Local-First Hybrid Retrieval with Adaptive Fusion for LLM Agents

cs.IR · 2026-04-16 · conditional · novelty 7.0

vstash shows that hybrid retrieval disagreements provide a free training signal to fine-tune 33M-parameter embeddings, yielding NDCG@10 gains up to 19.5% on NFCorpus and matching some larger models on three of five BEIR datasets.

Sell More, Play Less: Benchmarking LLM Realistic Selling Skill

cs.CL · 2026-04-08 · conditional · novelty 7.0

SalesLLM provides an automatic evaluation framework for LLM sales dialogues that correlates 0.98 with human experts and shows top models approaching human performance while weaker ones lag.

LMEB: Long-horizon Memory Embedding Benchmark

cs.CL · 2026-03-13 · unverdicted · novelty 7.0

LMEB benchmark shows that embedding models' performance on traditional retrieval does not transfer to long-horizon memory tasks, larger models do not always perform better, and LMEB measures capabilities orthogonal to MTEB.

SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise

cs.IR · 2026-02-13 · unverdicted · novelty 7.0

SQuTR aggregates 37k queries from six text retrieval datasets, synthesizes speech from 200 speakers, adds 17 noise categories at varying SNR, and shows that even large retrieval models degrade sharply under extreme acoustic noise.

Identifying and Resolving Pitfalls of Knowledge-Based VQA Benchmarks: Auditing, Repairing, and Augmenting

cs.CL · 2026-06-30 · unverdicted · novelty 6.0

Audit of KB-VQA benchmarks reveals systematic violations of answer derivability, question clarity, and visual disambiguation assumptions, with new repair and multi-entity augmentation protocols producing different model performance trends.

SHARD: cell-keyed residual splitting for alignment-resistant private dense retrieval

cs.CR · 2026-06-26 · unverdicted · novelty 6.0 · 2 refs

SHARD introduces cell-keyed residual splitting that turns dense retrieval embeddings into revocable, renewable, unlinkable templates resistant to alignment attacks while preserving exact utility under CKKS reranking.

SkillPager: Query-Adaptive Intra-Skill Navigation via Semantic Node Retrieval

cs.IR · 2026-05-30 · unverdicted · novelty 6.0

SkillPager retrieves typed semantic nodes from skill documents via MMR to reach 78.89% LLM-judged sufficiency with 47% fewer tokens than full documents on a 395-skill benchmark.

SPADER: Step-wise Peer Advantage with Diversity-Aware Exploration Rewards for Multi-Answer Question Answering

cs.CL · 2026-05-30 · unverdicted · novelty 6.0

SPADER proposes step-wise peer advantage and diversity-aware exploration rewards in RL for multi-answer QA, reporting improved recall and F1 on QAMPARI, Mintaka, WebQSP, and QUEST.

On the Robustness of Multilingual Text Embedding Rankings Across Learning Tasks, Languages, and Benchmark Datasets

cs.CL · 2026-05-29 · unverdicted · novelty 6.0

Meta-study of MTEB rankings introduces dataset-composition and ranking-scheme robustness indicators and finds only a small subset of models stay consistently strong across tasks, languages, and evaluation variations.

Beyond Chunk-Local Extraction: Cross-Chunk Graph Augmentation for GraphRAG

cs.CL · 2026-05-27 · unverdicted · novelty 6.0

CrossAug augments GraphRAG indices with cross-chunk relations via GNN-guided subgraph scoring and selective LLM completion, yielding consistent gains on four QA benchmarks across three frameworks.

LATTE: Forecasting Peer Anchored Preference Trajectories for Personalized LLM Generation

cs.CL · 2026-05-26 · unverdicted · novelty 6.0

LATTE improves personalized LLM generation by forecasting peer-anchored relative preference trajectories and injecting the forecast via a State to Token Bridge, raising ROUGE-L from 0.219-0.245 to 0.259 on Amazon Reviews 2023 over static and compression baselines.

An Efficient and Privacy-Preserving Architecture for Cross-Institutional Collaborative RAG

cs.CR · 2026-05-25 · unverdicted · novelty 6.0

FedRAG uses a Scrambled Distributed Attention protocol with feature scrambling and token permutation to enable high-throughput, privacy-preserving federated RAG without special hardware or retraining.

Iterate Until Retrieved: Factual Nugget Optimization for Discoverable Continual Corrections in Agentic RAG

cs.CL · 2026-05-25 · unverdicted · novelty 6.0

INO is an index-time method that uses the production RAG agent to iteratively create, test with queries and paraphrases, reflect on failures, and revise factual nuggets until they are discoverable and used correctly.

citing papers explorer

Showing 35 of 35 citing papers after filters.

CORTEX: High-Quality Cross-Domain Organization of Web-Scale Corpora through Ontological Corpus Graph cs.CL · 2026-06-29 · unverdicted · none · ref 72 · internal anchor
Cortex uses an Ontological Corpus Graph to structure web-scale corpora, creating a refined 24.14B-token corpus and a new benchmark validated on eight LLMs.
Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory cs.CL · 2026-05-01 · unverdicted · none · ref 2 · internal anchor
MemCoE learns memory organization guidelines via contrastive feedback and then trains a guideline-aligned RL policy for memory updates, yielding consistent gains on personalization benchmarks.
Latent Abstraction for Retrieval-Augmented Generation cs.CL · 2026-04-20 · unverdicted · none · ref 5 · internal anchor
LAnR unifies retrieval-augmented generation inside a single LLM by deriving dense retrieval vectors from a [PRED] token's hidden states and using entropy to adaptively stop retrieval, outperforming prior RAG on six QA benchmarks with better efficiency.
Sell More, Play Less: Benchmarking LLM Realistic Selling Skill cs.CL · 2026-04-08 · conditional · none · ref 6 · internal anchor
SalesLLM provides an automatic evaluation framework for LLM sales dialogues that correlates 0.98 with human experts and shows top models approaching human performance while weaker ones lag.
LMEB: Long-horizon Memory Embedding Benchmark cs.CL · 2026-03-13 · unverdicted · none · ref 8 · internal anchor
LMEB benchmark shows that embedding models' performance on traditional retrieval does not transfer to long-horizon memory tasks, larger models do not always perform better, and LMEB measures capabilities orthogonal to MTEB.
Identifying and Resolving Pitfalls of Knowledge-Based VQA Benchmarks: Auditing, Repairing, and Augmenting cs.CL · 2026-06-30 · unverdicted · none · ref 5 · internal anchor
Audit of KB-VQA benchmarks reveals systematic violations of answer derivability, question clarity, and visual disambiguation assumptions, with new repair and multi-entity augmentation protocols producing different model performance trends.
SPADER: Step-wise Peer Advantage with Diversity-Aware Exploration Rewards for Multi-Answer Question Answering cs.CL · 2026-05-30 · unverdicted · none · ref 10 · internal anchor
SPADER proposes step-wise peer advantage and diversity-aware exploration rewards in RL for multi-answer QA, reporting improved recall and F1 on QAMPARI, Mintaka, WebQSP, and QUEST.
On the Robustness of Multilingual Text Embedding Rankings Across Learning Tasks, Languages, and Benchmark Datasets cs.CL · 2026-05-29 · unverdicted · none · ref 20 · internal anchor
Meta-study of MTEB rankings introduces dataset-composition and ranking-scheme robustness indicators and finds only a small subset of models stay consistently strong across tasks, languages, and evaluation variations.
Beyond Chunk-Local Extraction: Cross-Chunk Graph Augmentation for GraphRAG cs.CL · 2026-05-27 · unverdicted · none · ref 1 · internal anchor
CrossAug augments GraphRAG indices with cross-chunk relations via GNN-guided subgraph scoring and selective LLM completion, yielding consistent gains on four QA benchmarks across three frameworks.
LATTE: Forecasting Peer Anchored Preference Trajectories for Personalized LLM Generation cs.CL · 2026-05-26 · unverdicted · none · ref 2 · internal anchor
LATTE improves personalized LLM generation by forecasting peer-anchored relative preference trajectories and injecting the forecast via a State to Token Bridge, raising ROUGE-L from 0.219-0.245 to 0.259 on Amazon Reviews 2023 over static and compression baselines.
Iterate Until Retrieved: Factual Nugget Optimization for Discoverable Continual Corrections in Agentic RAG cs.CL · 2026-05-25 · unverdicted · none · ref 5 · internal anchor
INO is an index-time method that uses the production RAG agent to iteratively create, test with queries and paraphrases, reflect on failures, and revise factual nuggets until they are discoverable and used correctly.
Structure Retention in Embedding Spaces as a Predictor of Benchmark Performance cs.CL · 2026-05-21 · unverdicted · none · ref 58 · internal anchor
Embedding model performance on MTEB tasks correlates strongly with nearest-neighbor overlap and ICA magnitude differences in their embedding spaces.
Is Textual Similarity Invariant under Machine Translation? Evidence Based on the Political Manifesto Corpus cs.CL · 2026-05-01 · unverdicted · none · ref 17 · internal anchor
Machine translation preserves embedding similarity structure for ten languages but distorts it for four in the Manifesto Corpus, via a new non-inferiority testing framework.
To Know is to Construct: Schema-Constrained Generation for Agent Memory cs.CL · 2026-04-22 · unverdicted · none · ref 1 · internal anchor
SCG-MEM reformulates agent memory access as schema-constrained generation within dynamic cognitive schemas, using assimilation and accommodation for updates plus an associative graph for reasoning, and outperforms retrieval baselines on the LoCoMo benchmark.
BiCon-Gate: Consistency-Gated De-colloquialisation for Dialogue Fact-Checking cs.CL · 2026-04-15 · unverdicted · none · ref 2 · internal anchor
BiCon-Gate improves dialogue fact-checking by applying staged de-colloquialisation and gating rewrites based on semantic consistency with context, yielding gains on the DialFact benchmark over baselines including LLM rewrites.
Differences in Text Generated by Diffusion and Autoregressive Language Models cs.CL · 2026-04-04 · unverdicted · none · ref 5 · internal anchor
DLMs exhibit lower n-gram entropy, higher semantic coherence, and higher semantic diversity than ARMs, primarily due to bidirectional context and remasking decoding strategies.
EvolveR: Self-Evolving LLM Agents through an Experience-Driven Lifecycle cs.CL · 2025-10-17 · unverdicted · none · ref 44 · 2 links · internal anchor
EvolveR enables LLM agents to self-evolve via a closed loop of distilling interaction trajectories into strategic principles offline and retrieving them to guide online decisions with policy reinforcement, yielding better results on multi-hop QA benchmarks.
Culinary Crossroads: A RAG Framework for Enhancing Diversity in Cross-Cultural Recipe Adaptation cs.CL · 2025-07-29 · unverdicted · none · ref 1 · internal anchor
CARRIAGE is a RAG framework that improves output diversity in cross-cultural recipe adaptation by enhancing retrieval and context handling, reaching Pareto efficiency on diversity and quality versus closed-book LLMs.
Mixture-of-Retrieval Experts for Reasoning-Guided Multimodal Knowledge Exploitation cs.CL · 2025-05-28 · unverdicted · none · ref 7 · internal anchor
MoRE enables MLLMs to dynamically coordinate heterogeneous retrieval experts via Step-GRPO training, yielding over 7% average gains on open-domain QA benchmarks.
Latent Bridges for Multi-Table Question Answering cs.CL · 2026-06-27 · unverdicted · none · ref 39 · internal anchor
GRAB improves multi-table QA performance by encoding relational data as graphs and bridging structural signals to frozen LLMs through latent tokens.
Adapting Multilingual Embedding Models to Turkish via Cross-Lingual Tokenizer Surgery and Offline Distillation cs.CL · 2026-05-28 · unverdicted · none · ref 4 · internal anchor
A 200M-parameter Turkish sentence embedding model is adapted from a multilingual teacher via tokenizer pruning, mean-composition initialization, and offline cosine distillation, achieving 77.55% Pearson correlation on STSbTR and 7th place on TR-MTEB.
Large Language Model-Powered Query-Driven Event Timeline Summarization in Industrial Search cs.CL · 2026-05-26 · unverdicted · none · ref 33 · internal anchor
QDET deploys a 7B-parameter model fine-tuned with three auxiliary tasks and RL that matches a 671B model's F1 on query-driven timeline summarization while delivering measurable gains in production search metrics.
Personalizing LLMs with Binary Feedback: A Preference-Corrected Optimization Framework cs.CL · 2026-05-11 · unverdicted · none · ref 35 · internal anchor
C-BPO personalizes LLMs via preference-calibrated binary signals and PU learning theory to isolate inter-user differences from shared task knowledge.
Cross-Lingual Jailbreak Detection via Semantic Codebooks cs.CL · 2026-04-28 · unverdicted · none · ref 4 · internal anchor
Semantic similarity to an English jailbreak codebook detects cross-lingual attacks with high accuracy on curated benchmarks but shows poor separability on diverse unsafe prompts.
Search-R3: Unifying Reasoning and Embedding in Large Language Models cs.CL · 2025-10-08 · unverdicted · none · ref 9 · internal anchor
Search-R3 trains LLMs to output search embeddings as a direct product of step-by-step reasoning via supervised pre-training and a specialized RL environment that avoids full corpus re-encoding.
Advancing Multi-Agent RAG Systems with Minimalist Reinforcement Learning cs.CL · 2025-05-20 · unverdicted · none · ref 10 · internal anchor
Mujica-MyGo decomposes multi-turn RAG interactions via multi-agent workflows and applies minimalist policy gradient optimization to improve performance on QA benchmarks while avoiding long-context problems.
LegalGraphRAG: Multi-Agent Graph Retrieval-Augmented Generation for Reliable Legal Reasoning cs.CL · 2026-05-27 · unverdicted · none · ref 1 · internal anchor
LegalGraphRAG adds hierarchical organization to legal knowledge graphs and a multi-agent verification loop to reach claimed state-of-the-art accuracy and trustworthiness on legal reasoning benchmarks.
Reducing Redundancy in Retrieval-Augmented Generation through Chunk Filtering cs.CL · 2026-04-27 · unverdicted · none · ref 7 · internal anchor
Entity-based chunk filtering reduces RAG vector index size by 25-36% with retrieval quality near baseline levels.
Mira-Embeddings-V1: Domain-Adapted Semantic Reranking for Recruitment via LLM-Synthesized Data cs.CL · 2026-04-20 · conditional · none · ref 4 · internal anchor
Mira-Embeddings-V1 adapts embeddings for recruitment reranking by synthesizing positive and hard-negative samples with LLMs, then applies JD-JD contrastive and JD-CV triplet training plus a BoundaryHead MLP, lifting Recall@50 from 68.89% to 77.55% and Recall@200 from 0.5969 to 0.7047.
Comparison of Modern Multilingual Text Embedding Techniques for Hate Speech Detection Task cs.CL · 2026-04-16 · unverdicted · none · ref 31 · internal anchor
Supervised models using embeddings like jina and e5 reach up to 92% accuracy on multilingual hate speech detection, substantially outperforming anomaly detection, while PCA to 64 dimensions preserves most performance in the supervised case.
Overview of the TalentCLEF 2026: Skill and Job Title Intelligence for Human Capital Management cs.CL · 2026-06-30 · unverdicted · none · ref 7 · internal anchor
The paper describes the organization, tasks, datasets, and participation results for the TalentCLEF 2026 challenge, which received 113 team registrations and over 400 submissions.
Evaluation of Chunking Strategies for Effective Text Embedding in Low-Resource Language on Agricultural Documents cs.CL · 2026-05-21 · unverdicted · none · ref 3 · internal anchor
Recursive character-based chunking at 300 characters outperforms Sentence-Based, Khmer-Aware, and LLM-Based methods on L2 distance, answer relevance, and Khmer IoU in a 5-fold evaluation on 18 Khmer agricultural QA pairs.
KIT-TIP-NLP at MultiPride: Continual Learning with Multilingual Foundation Model cs.CL · 2026-05-13 · unverdicted · none · ref 11 · 2 links · internal anchor
A system using XLM-RoBERTa, GPT-4 back-translation augmentation, undersampling, and language-specific threshold tuning reports 2-5% F1 gains on multilingual slur reclamation detection.
5ting at SemEval-2026 Task 8: Strong End-to-End Multi-Turn RAG via LLM-Based Reranking and Faithfulness Control cs.CL · 2026-06-27 · unverdicted · none · ref 21 · internal anchor
5ting achieves nDCG@5 of 0.4719 on Task A and harmonic score 0.5597 with RL_F 0.7692 on Task C for multi-turn RAG via standard dense retrieval plus LLM reranking and faithfulness constraints.
A Benchmark Construction and Evaluation Framework for Specialist Domains: Case Study on Defense-related Documents cs.CL · 2026-04-20 · unreviewed · ref 41 · internal anchor

M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation

hub tools

citation-role summary

citation-polarity summary

claims ledger

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer