{"work":{"id":"a9435752-4e49-42bd-95b4-0fec975633c8","openalex_id":null,"doi":null,"arxiv_id":"2402.03216","raw_key":null,"title":"M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation","authors":null,"authors_text":"Jianlv Chen, Shitao Xiao, Peitian Zhang, Kun Luo, Defu Lian, Zheng Liu","year":2024,"venue":"cs.CL","abstract":"In this paper, we introduce a new embedding model called M3-Embedding, which is distinguished for its versatility in \\textit{Multi-Linguality}, \\textit{Multi-Functionality}, and \\textit{Multi-Granularity}. It provides a uniform support for the semantic retrieval of more than 100 working languages. It can simultaneously accomplish the three common retrieval functionalities: dense retrieval, multi-vector retrieval, and sparse retrieval. Besides, it is also capable of processing inputs of different granularities, spanning from short sentences to long documents of up to 8,192 tokens. The effective training of M3-Embedding presents a series of technical contributions. Notably, we propose a novel self-knowledge distillation approach, where the relevance scores from different retrieval functionalities can be integrated as the teacher signal to enhance the training quality. We also optimize the batching strategy, which enables a large batch size and high training throughput to improve the discriminativeness of embeddings. M3-Embedding exhibits a superior performance in our experiment, leading to new state-of-the-art results on multilingual, cross-lingual, and long-document retrieval benchmarks.","external_url":"https://arxiv.org/abs/2402.03216","cited_by_count":null,"metadata_source":"pith","metadata_fetched_at":"2026-06-29T13:23:28.255476+00:00","pith_arxiv_id":"2402.03216","created_at":"2026-05-09T00:14:28.046662+00:00","updated_at":"2026-06-29T13:23:28.255476+00:00","title_quality_ok":true,"display_title":"M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation","render_title":"M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation"},"hub":{"state":{"work_id":"a9435752-4e49-42bd-95b4-0fec975633c8","tier":"hub","tier_reason":"10+ Pith inbound or 1,000+ external citations","pith_inbound_count":74,"external_cited_by_count":null,"distinct_field_count":13,"first_pith_cited_at":"2024-10-08T12:17:42+00:00","last_pith_cited_at":"2026-06-26T06:03:33+00:00","author_build_status":"not_needed","summary_status":"needed","contexts_status":"needed","graph_status":"needed","ask_index_status":"not_needed","reader_status":"not_needed","recognition_status":"not_needed","updated_at":"2026-06-29T14:08:56.972139+00:00","tier_text":"hub"},"tier":"hub","role_counts":[{"context_role":"background","n":8},{"context_role":"method","n":5},{"context_role":"baseline","n":3},{"context_role":"dataset","n":2}],"polarity_counts":[{"context_polarity":"background","n":7},{"context_polarity":"use_method","n":5},{"context_polarity":"baseline","n":3},{"context_polarity":"use_dataset","n":2},{"context_polarity":"unclear","n":1}],"runs":{"context_extract":{"job_type":"context_extract","status":"succeeded","result":{"enqueued_papers":25},"error":null,"updated_at":"2026-05-14T17:59:45.686814+00:00"},"graph_features":{"job_type":"graph_features","status":"succeeded","result":{"co_cited":[{"title":"Qwen3 Technical Report","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","shared_citers":7},{"title":"Text Embeddings by Weakly-Supervised Contrastive Pre-training","work_id":"789cc674-467e-4f23-bb50-05c79fe8c4c2","shared_citers":7},{"title":"Qwen2.5 Technical Report","work_id":"d8432992-4980-4a81-85c7-9fa2c2b87f85","shared_citers":6},{"title":"A-MEM: Agentic Memory for LLM Agents","work_id":"3b98feb2-fdb1-479a-bbe4-2c298a4592e2","shared_citers":5},{"title":"The Llama 3 Herd of Models","work_id":"1549a635-88af-4ac1-acfe-51ae7bb53345","shared_citers":5},{"title":"arXiv preprint arXiv:2312.02724 , year=","work_id":"0d9b3ad1-b405-412f-81ee-fd6f941d2367","shared_citers":4},{"title":"LoRA: Low-Rank Adaptation of Large Language Models","work_id":"0426219a-789e-4964-adc8-a04538510818","shared_citers":4},{"title":"Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory","work_id":"a5aed26c-a248-48b6-a59e-f7693fcb180a","shared_citers":4},{"title":"Passage Re-ranking with BERT","work_id":"562fbfab-d6fe-48e1-a06d-e5d078c70945","shared_citers":4},{"title":"Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models","work_id":"bab684a8-d933-426c-a19e-2c855a0d1f59","shared_citers":4},{"title":"Qwen3-VL Technical Report","work_id":"1fe243aa-e3c0-4da6-b391-4cbcfc88d5c0","shared_citers":4},{"title":"Zep: A Temporal Knowledge Graph Architecture for Agent Memory","work_id":"515c933e-12ae-439d-a7ff-c07fee482dfb","shared_citers":4},{"title":"arXiv preprint arXiv:2506.17188 , year=","work_id":"d669ef64-2a85-4a5f-a899-b6b9ea17b665","shared_citers":3},{"title":"DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning","work_id":"e6b75ad5-2877-4168-97c8-710407094d20","shared_citers":3},{"title":"From Local to Global: A Graph RAG Approach to Query-Focused Summarization","work_id":"588618d7-fd41-4053-b34d-a981f8793039","shared_citers":3},{"title":"GPT-4o System Card","work_id":"f37bf1c7-4964-4e56-9762-d20da8d9009f","shared_citers":3},{"title":"jina-embeddings-v3: Multilingual embeddings with task lora","work_id":"4f781d04-a865-40a4-b6ab-989f993d1ce0","shared_citers":3},{"title":"MemGPT: Towards LLMs as Operating Systems","work_id":"2698f5ad-c84c-40ca-b839-0912dae10ba2","shared_citers":3},{"title":"Qwen2.5-VL Technical Report","work_id":"69dffacb-bfe8-442d-be86-48624c60426f","shared_citers":3},{"title":"Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning","work_id":"0e0b7549-2bc4-4574-aa7f-588ffa16eaae","shared_citers":3},{"title":"Towards General Text Embeddings with Multi-stage Contrastive Learning","work_id":"861a61de-66fe-49d1-b1ab-11f8b082a4cc","shared_citers":3},{"title":"Training Large Language Models to Reason in a Continuous Latent Space","work_id":"3ddd0fd2-c176-408f-9b58-0666c2707f2d","shared_citers":3},{"title":"UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction","work_id":"54c15172-6304-4008-a3b6-c4cc0803c054","shared_citers":3},{"title":"2024 , issn =","work_id":"ea6c2332-aa10-40e3-87f5-b2cb91d7909a","shared_citers":2}],"time_series":[{"n":1,"year":2025},{"n":35,"year":2026}],"dependency_candidates":[]},"error":null,"updated_at":"2026-05-14T17:59:41.683102+00:00"},"identity_refresh":{"job_type":"identity_refresh","status":"succeeded","result":{"items":[{"title":"Qwen3 Technical Report","outcome":"unchanged","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","resolver":"local_arxiv","confidence":0.98,"old_work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e"}],"counts":{"fixed":0,"merged":0,"unchanged":1,"quarantined":0,"needs_external_resolution":0},"errors":[],"attempted":1},"error":null,"updated_at":"2026-05-14T18:00:14.446779+00:00"},"summary_claims":{"job_type":"summary_claims","status":"succeeded","result":{"title":"M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation","claims":[{"claim_text":"In this paper, we introduce a new embedding model called M3-Embedding, which is distinguished for its versatility in \\textit{Multi-Linguality}, \\textit{Multi-Functionality}, and \\textit{Multi-Granularity}. It provides a uniform support for the semantic retrieval of more than 100 working languages. It can simultaneously accomplish the three common retrieval functionalities: dense retrieval, multi-vector retrieval, and sparse retrieval. Besides, it is also capable of processing inputs of different granularities, spanning from short sentences to long documents of up to 8,192 tokens. The effective","claim_type":"abstract","evidence_strength":"source_metadata"}],"why_cited":"Pith tracks M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation because it crossed a citation-hub threshold.","role_counts":[]},"error":null,"updated_at":"2026-05-14T17:59:37.578572+00:00"}},"summary":{"title":"M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation","claims":[{"claim_text":"In this paper, we introduce a new embedding model called M3-Embedding, which is distinguished for its versatility in \\textit{Multi-Linguality}, \\textit{Multi-Functionality}, and \\textit{Multi-Granularity}. It provides a uniform support for the semantic retrieval of more than 100 working languages. It can simultaneously accomplish the three common retrieval functionalities: dense retrieval, multi-vector retrieval, and sparse retrieval. Besides, it is also capable of processing inputs of different granularities, spanning from short sentences to long documents of up to 8,192 tokens. The effective","claim_type":"abstract","evidence_strength":"source_metadata"}],"why_cited":"Pith tracks M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation because it crossed a citation-hub threshold.","role_counts":[]},"graph":{"co_cited":[{"title":"Qwen3 Technical Report","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","shared_citers":7},{"title":"Text Embeddings by Weakly-Supervised Contrastive Pre-training","work_id":"789cc674-467e-4f23-bb50-05c79fe8c4c2","shared_citers":7},{"title":"Qwen2.5 Technical Report","work_id":"d8432992-4980-4a81-85c7-9fa2c2b87f85","shared_citers":6},{"title":"A-MEM: Agentic Memory for LLM Agents","work_id":"3b98feb2-fdb1-479a-bbe4-2c298a4592e2","shared_citers":5},{"title":"The Llama 3 Herd of Models","work_id":"1549a635-88af-4ac1-acfe-51ae7bb53345","shared_citers":5},{"title":"arXiv preprint arXiv:2312.02724 , year=","work_id":"0d9b3ad1-b405-412f-81ee-fd6f941d2367","shared_citers":4},{"title":"LoRA: Low-Rank Adaptation of Large Language Models","work_id":"0426219a-789e-4964-adc8-a04538510818","shared_citers":4},{"title":"Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory","work_id":"a5aed26c-a248-48b6-a59e-f7693fcb180a","shared_citers":4},{"title":"Passage Re-ranking with BERT","work_id":"562fbfab-d6fe-48e1-a06d-e5d078c70945","shared_citers":4},{"title":"Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models","work_id":"bab684a8-d933-426c-a19e-2c855a0d1f59","shared_citers":4},{"title":"Qwen3-VL Technical Report","work_id":"1fe243aa-e3c0-4da6-b391-4cbcfc88d5c0","shared_citers":4},{"title":"Zep: A Temporal Knowledge Graph Architecture for Agent Memory","work_id":"515c933e-12ae-439d-a7ff-c07fee482dfb","shared_citers":4},{"title":"arXiv preprint arXiv:2506.17188 , year=","work_id":"d669ef64-2a85-4a5f-a899-b6b9ea17b665","shared_citers":3},{"title":"DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning","work_id":"e6b75ad5-2877-4168-97c8-710407094d20","shared_citers":3},{"title":"From Local to Global: A Graph RAG Approach to Query-Focused Summarization","work_id":"588618d7-fd41-4053-b34d-a981f8793039","shared_citers":3},{"title":"GPT-4o System Card","work_id":"f37bf1c7-4964-4e56-9762-d20da8d9009f","shared_citers":3},{"title":"jina-embeddings-v3: Multilingual embeddings with task lora","work_id":"4f781d04-a865-40a4-b6ab-989f993d1ce0","shared_citers":3},{"title":"MemGPT: Towards LLMs as Operating Systems","work_id":"2698f5ad-c84c-40ca-b839-0912dae10ba2","shared_citers":3},{"title":"Qwen2.5-VL Technical Report","work_id":"69dffacb-bfe8-442d-be86-48624c60426f","shared_citers":3},{"title":"Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning","work_id":"0e0b7549-2bc4-4574-aa7f-588ffa16eaae","shared_citers":3},{"title":"Towards General Text Embeddings with Multi-stage Contrastive Learning","work_id":"861a61de-66fe-49d1-b1ab-11f8b082a4cc","shared_citers":3},{"title":"Training Large Language Models to Reason in a Continuous Latent Space","work_id":"3ddd0fd2-c176-408f-9b58-0666c2707f2d","shared_citers":3},{"title":"UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction","work_id":"54c15172-6304-4008-a3b6-c4cc0803c054","shared_citers":3},{"title":"2024 , issn =","work_id":"ea6c2332-aa10-40e3-87f5-b2cb91d7909a","shared_citers":2}],"time_series":[{"n":1,"year":2025},{"n":35,"year":2026}],"dependency_candidates":[]},"authors":[]}}