{"work":{"id":"789cc674-467e-4f23-bb50-05c79fe8c4c2","openalex_id":null,"doi":null,"arxiv_id":"2212.03533","raw_key":null,"title":"Text Embeddings by Weakly-Supervised Contrastive Pre-training","authors":null,"authors_text":"Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang","year":2022,"venue":"cs.CL","abstract":"This paper presents E5, a family of state-of-the-art text embeddings that transfer well to a wide range of tasks. The model is trained in a contrastive manner with weak supervision signals from our curated large-scale text pair dataset (called CCPairs). E5 can be readily used as a general-purpose embedding model for any tasks requiring a single-vector representation of texts such as retrieval, clustering, and classification, achieving strong performance in both zero-shot and fine-tuned settings. We conduct extensive evaluations on 56 datasets from the BEIR and MTEB benchmarks. For zero-shot settings, E5 is the first model that outperforms the strong BM25 baseline on the BEIR retrieval benchmark without using any labeled data. When fine-tuned, E5 obtains the best results on the MTEB benchmark, beating existing embedding models with 40x more parameters.","external_url":"https://arxiv.org/abs/2212.03533","cited_by_count":null,"metadata_source":"pith","metadata_fetched_at":"2026-06-29T10:33:18.830948+00:00","pith_arxiv_id":"2212.03533","created_at":"2026-05-09T02:34:41.789044+00:00","updated_at":"2026-06-29T10:33:18.830948+00:00","title_quality_ok":true,"display_title":"Text Embeddings by Weakly-Supervised Contrastive Pre-training","render_title":"Text Embeddings by Weakly-Supervised Contrastive Pre-training"},"hub":{"state":{"work_id":"789cc674-467e-4f23-bb50-05c79fe8c4c2","tier":"super_hub","tier_reason":"100+ Pith inbound or 10,000+ external citations","pith_inbound_count":136,"external_cited_by_count":null,"distinct_field_count":14,"first_pith_cited_at":"2023-08-07T03:52:59+00:00","last_pith_cited_at":"2026-06-25T21:24:17+00:00","author_build_status":"needed","summary_status":"needed","contexts_status":"needed","graph_status":"needed","ask_index_status":"needed","reader_status":"not_needed","recognition_status":"not_needed","updated_at":"2026-06-29T10:48:36.800598+00:00","tier_text":"super_hub"},"tier":"super_hub","role_counts":[{"context_role":"background","n":10},{"context_role":"method","n":9},{"context_role":"other","n":2},{"context_role":"baseline","n":1},{"context_role":"dataset","n":1}],"polarity_counts":[{"context_polarity":"use_method","n":9},{"context_polarity":"background","n":8},{"context_polarity":"support","n":2},{"context_polarity":"unclear","n":2},{"context_polarity":"baseline","n":1},{"context_polarity":"use_dataset","n":1}],"runs":{"ask_index":{"job_type":"ask_index","status":"succeeded","result":{"title":"Text Embeddings by Weakly-Supervised Contrastive Pre-training","claims":[{"claim_text":"This paper presents E5, a family of state-of-the-art text embeddings that transfer well to a wide range of tasks. The model is trained in a contrastive manner with weak supervision signals from our curated large-scale text pair dataset (called CCPairs). E5 can be readily used as a general-purpose embedding model for any tasks requiring a single-vector representation of texts such as retrieval, clustering, and classification, achieving strong performance in both zero-shot and fine-tuned settings. We conduct extensive evaluations on 56 datasets from the BEIR and MTEB benchmarks. For zero-shot se","claim_type":"abstract","evidence_strength":"source_metadata"},{"claim_text":"0 280.3 Baseline & Evaluation.To demonstrate the efficacy of π-Play, we compare it against a variety of baseline search agents: (1)training-free: ReAct [ 40]; (2)supervised RL: Search-R1 [ 13] and ToolForge [2], and (3)self-play: Dr.Zero [ 45] and SQLM* [3, 45]. All models are evaluated using exact match scores with identical search engine (E5-base [34]) and corpus settings (English Wikipedia dump [15]), using the checkpoint from their best-performing iteration (step). 4.2 Main Results We first ","claim_type":"method","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"Development in Information Retrieval . 1513-1523. [57] Shitao Xiao, Zheng Liu, Weihao Han, Jianjin Zhang, Yingxia Shao, Defu Lian, Chaozhuo Li, Hao Sun, Denvy Deng, Liangjie Zhang, et al. 2022. Progressively optimized bi-granular document representation for scalable embedding based retrieval. In Proceedings of the ACM Web Conference 2022 . 286-296. [58] Shitao Xiao, Zheng Liu, Yingxia Shao, and Zhao Cao. 2023. RetroMAE-2: Duplex Masked Auto-Encoder For Pre-Training Retrieval-Oriented Language Mo","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"2 Context: tabular learning research 2.1 The tabular-learning benchmarking landscape A need for string tabular learning benchmarksThe \"iron rule\" guiding machine-learning research is to compare pipelines on held-out data [24]. While model rankings remain surprisingly consistent across data splits [ 50, 48, 24], no algorithm is optimal across all problem classes [ 64]. Rankings are domain-dependent, and models whose inductive biases match the data distribution perform best [21]. Introducing strin","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"approaches show that LLM-generated supervision can substantially improve retrieval. InPars and InPars-v2 synthesize queries from documents [3, 11], Promptagator bootstraps training data from a small seed set [5], HyDE uses hypothetical documents for zero-shot retrieval [8], and recent work argues that diverse synthetic tasks can strengthen embedding models more broadly [28]. What is still missing for our setting is not a stronger generic relevance recipe, but a formulation centered on recruitmen","claim_type":"background","confidence":0.85,"evidence_strength":"citation_context"},{"claim_text":"and queries with instructions (+I), plus p-MRR and IRS (both scaled to [-100, +100]). Best results in bold. Model WQT WTR TArX IndusTR Average nDCG p-MRR IRS nDCG p-MRR IRS nDCG p-MRR IRS nDCG p-MRR IRS nDCG p-MRR IRS Q +I Q +I Q +I Q +I Q +I No-Instruction Retrievers BM25 [31] 65.3 30.3 2.4 -1.7 52.5 12.2 3.8 2.5 60.7 44.4 -8.0 5.7 62.7 36.9 0.8 8.3 60.3 31.0 -0.5 3.7 BGE-Large-v1.5 [49] 73.6 46.9 2.8 0.1 84.7 44.5 6.4 -0.5 77.7 59.8 -8.2 9.0 76.7 50.9 -5.9 9.2 78.2 50.5 -1.2 4.5 E5-Large-v2 [4","claim_type":"baseline","confidence":0.85,"evidence_strength":"citation_context"},{"claim_text":"better skill-conditioned querying improves efficient fact lookup, while the multi-hop benchmarks test whether the learned skills help the model decompose bridge, comparison, and compositional search problems into more reliable multi-turn search trajectories. 4.2 Baselines Our baselines isolate three sources of improvement. Direct inference and chain-of-thought prompt- ing [32] measure language-only reasoning under matched Qwen2.5 backbones. RAG [15] measures the effect of adding retrieved eviden","claim_type":"background","confidence":0.85,"evidence_strength":"citation_context"}],"why_cited":"Pith tracks Text Embeddings by Weakly-Supervised Contrastive Pre-training because it crossed a citation-hub threshold. Current citing contexts most often use it as background evidence (8 contexts).","role_counts":[{"n":8,"context_role":"background"},{"n":5,"context_role":"method"},{"n":2,"context_role":"other"},{"n":1,"context_role":"baseline"},{"n":1,"context_role":"dataset"}]},"error":null,"updated_at":"2026-05-19T02:31:22.024605+00:00"},"author_expand":{"job_type":"author_expand","status":"succeeded","result":{"authors_linked":[{"id":"7039103b-0889-4fb8-b5e1-d6e0869e4d71","orcid":null,"display_name":"Liang Wang"},{"id":"a582f8ec-0744-463c-9340-9b79b5d25393","orcid":null,"display_name":"Nan Yang"},{"id":"f7d8a1e0-19a5-43c4-bf12-50c9dd3ebd69","orcid":null,"display_name":"Xiaolong Huang"},{"id":"438ec165-f127-49a2-80f9-bac9ca9dca20","orcid":null,"display_name":"Binxing Jiao"},{"id":"ff2cba8f-3007-4821-bd19-e817ae9db727","orcid":null,"display_name":"Linjun Yang"},{"id":"3fc74868-3964-48f4-97c7-4a095f17e5bc","orcid":null,"display_name":"Daxin Jiang"}]},"error":null,"updated_at":"2026-05-19T02:31:22.480385+00:00"},"context_extract":{"job_type":"context_extract","status":"succeeded","result":{"enqueued_papers":25},"error":null,"updated_at":"2026-05-14T08:38:05.499057+00:00"},"graph_features":{"job_type":"graph_features","status":"succeeded","result":{"co_cited":[{"title":"Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models","work_id":"bab684a8-d933-426c-a19e-2c855a0d1f59","shared_citers":17},{"title":"Qwen3 Technical Report","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","shared_citers":17},{"title":"Towards General Text Embeddings with Multi-stage Contrastive Learning","work_id":"861a61de-66fe-49d1-b1ab-11f8b082a4cc","shared_citers":14},{"title":"Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning","work_id":"0e0b7549-2bc4-4574-aa7f-588ffa16eaae","shared_citers":13},{"title":"DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models","work_id":"c5006563-f3ec-438a-9e35-b7b484f34828","shared_citers":11},{"title":"DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning","work_id":"e6b75ad5-2877-4168-97c8-710407094d20","shared_citers":10},{"title":"Qwen2.5 Technical Report","work_id":"d8432992-4980-4a81-85c7-9fa2c2b87f85","shared_citers":10},{"title":"Proximal Policy Optimization Algorithms","work_id":"240c67fe-d14d-4520-91c1-38a4e272ca19","shared_citers":9},{"title":"The Llama 3 Herd of Models","work_id":"1549a635-88af-4ac1-acfe-51ae7bb53345","shared_citers":9},{"title":"Unsupervised Dense Information Retrieval with Contrastive Learning","work_id":"5316be40-48c7-48a5-87bc-7cfa3486f835","shared_citers":9},{"title":"BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models","work_id":"c5f7f027-ac36-4b07-b824-0eca2f310641","shared_citers":8},{"title":"GPT-4 Technical Report","work_id":"b928e041-6991-4c08-8c81-0359e4097c7b","shared_citers":8},{"title":"Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks","work_id":"27adfcc9-2a67-43d6-a844-78309012411f","shared_citers":8},{"title":"M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation","work_id":"a9435752-4e49-42bd-95b4-0fec975633c8","shared_citers":7},{"title":"Nv-embed: Improved techniques for training llms as generalist embedding models.arXiv preprint arXiv:2405.17428","work_id":"f8b6faa6-cb4e-4e31-95a5-6b8986c27baa","shared_citers":7},{"title":"Representation Learning with Contrastive Predictive Coding","work_id":"7b08a1d4-d565-424e-9c86-6ef244b7b90a","shared_citers":7},{"title":"arXiv preprint arXiv:2312.02724 , year=","work_id":"0d9b3ad1-b405-412f-81ee-fd6f941d2367","shared_citers":6},{"title":"GPT-4o System Card","work_id":"f37bf1c7-4964-4e56-9762-d20da8d9009f","shared_citers":6},{"title":"LLaMA: Open and Efficient Foundation Language Models","work_id":"c018fc23-6f3f-4035-9d02-28a2173b2b9d","shared_citers":6},{"title":"Mistral 7B","work_id":"eb5e1305-ad11-4875-ad8d-ad8b8f697599","shared_citers":6},{"title":"C-Pack: Packed Resources For General Chinese Embeddings","work_id":"5d8d3efd-bb5b-4a30-8457-28f190c026e9","shared_citers":5},{"title":"DAPO: An Open-Source LLM Reinforcement Learning System at Scale","work_id":"64019d00-0b11-4bbd-b173-b46c8fad0157","shared_citers":5},{"title":"Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities","work_id":"008df105-2fdd-45d8-857a-8e35868aecb6","shared_citers":5},{"title":"Gemini embedding: Generalizable embeddings from gemini","work_id":"911b6918-a128-453f-ae99-94388c38fcb1","shared_citers":5}],"time_series":[{"n":2,"year":2023},{"n":2,"year":2024},{"n":2,"year":2025},{"n":64,"year":2026}],"dependency_candidates":[]},"error":null,"updated_at":"2026-05-14T08:47:54.077688+00:00"},"identity_refresh":{"job_type":"identity_refresh","status":"succeeded","result":{"items":[{"title":"Qwen3 Technical Report","outcome":"unchanged","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","resolver":"local_arxiv","confidence":0.98,"old_work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e"}],"counts":{"fixed":0,"merged":0,"unchanged":1,"quarantined":0,"needs_external_resolution":0},"errors":[],"attempted":1},"error":null,"updated_at":"2026-05-14T08:38:09.740651+00:00"},"role_polarity":{"job_type":"role_polarity","status":"succeeded","result":{"title":"Text Embeddings by Weakly-Supervised Contrastive Pre-training","claims":[{"claim_text":"This paper presents E5, a family of state-of-the-art text embeddings that transfer well to a wide range of tasks. The model is trained in a contrastive manner with weak supervision signals from our curated large-scale text pair dataset (called CCPairs). E5 can be readily used as a general-purpose embedding model for any tasks requiring a single-vector representation of texts such as retrieval, clustering, and classification, achieving strong performance in both zero-shot and fine-tuned settings. We conduct extensive evaluations on 56 datasets from the BEIR and MTEB benchmarks. For zero-shot se","claim_type":"abstract","evidence_strength":"source_metadata"},{"claim_text":"0 280.3 Baseline & Evaluation.To demonstrate the efficacy of π-Play, we compare it against a variety of baseline search agents: (1)training-free: ReAct [ 40]; (2)supervised RL: Search-R1 [ 13] and ToolForge [2], and (3)self-play: Dr.Zero [ 45] and SQLM* [3, 45]. All models are evaluated using exact match scores with identical search engine (E5-base [34]) and corpus settings (English Wikipedia dump [15]), using the checkpoint from their best-performing iteration (step). 4.2 Main Results We first ","claim_type":"method","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"Development in Information Retrieval . 1513-1523. [57] Shitao Xiao, Zheng Liu, Weihao Han, Jianjin Zhang, Yingxia Shao, Defu Lian, Chaozhuo Li, Hao Sun, Denvy Deng, Liangjie Zhang, et al. 2022. Progressively optimized bi-granular document representation for scalable embedding based retrieval. In Proceedings of the ACM Web Conference 2022 . 286-296. [58] Shitao Xiao, Zheng Liu, Yingxia Shao, and Zhao Cao. 2023. RetroMAE-2: Duplex Masked Auto-Encoder For Pre-Training Retrieval-Oriented Language Mo","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"2 Context: tabular learning research 2.1 The tabular-learning benchmarking landscape A need for string tabular learning benchmarksThe \"iron rule\" guiding machine-learning research is to compare pipelines on held-out data [24]. While model rankings remain surprisingly consistent across data splits [ 50, 48, 24], no algorithm is optimal across all problem classes [ 64]. Rankings are domain-dependent, and models whose inductive biases match the data distribution perform best [21]. Introducing strin","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"approaches show that LLM-generated supervision can substantially improve retrieval. InPars and InPars-v2 synthesize queries from documents [3, 11], Promptagator bootstraps training data from a small seed set [5], HyDE uses hypothetical documents for zero-shot retrieval [8], and recent work argues that diverse synthetic tasks can strengthen embedding models more broadly [28]. What is still missing for our setting is not a stronger generic relevance recipe, but a formulation centered on recruitmen","claim_type":"background","confidence":0.85,"evidence_strength":"citation_context"},{"claim_text":"and queries with instructions (+I), plus p-MRR and IRS (both scaled to [-100, +100]). Best results in bold. Model WQT WTR TArX IndusTR Average nDCG p-MRR IRS nDCG p-MRR IRS nDCG p-MRR IRS nDCG p-MRR IRS nDCG p-MRR IRS Q +I Q +I Q +I Q +I Q +I No-Instruction Retrievers BM25 [31] 65.3 30.3 2.4 -1.7 52.5 12.2 3.8 2.5 60.7 44.4 -8.0 5.7 62.7 36.9 0.8 8.3 60.3 31.0 -0.5 3.7 BGE-Large-v1.5 [49] 73.6 46.9 2.8 0.1 84.7 44.5 6.4 -0.5 77.7 59.8 -8.2 9.0 76.7 50.9 -5.9 9.2 78.2 50.5 -1.2 4.5 E5-Large-v2 [4","claim_type":"baseline","confidence":0.85,"evidence_strength":"citation_context"},{"claim_text":"better skill-conditioned querying improves efficient fact lookup, while the multi-hop benchmarks test whether the learned skills help the model decompose bridge, comparison, and compositional search problems into more reliable multi-turn search trajectories. 4.2 Baselines Our baselines isolate three sources of improvement. Direct inference and chain-of-thought prompt- ing [32] measure language-only reasoning under matched Qwen2.5 backbones. RAG [15] measures the effect of adding retrieved eviden","claim_type":"background","confidence":0.85,"evidence_strength":"citation_context"}],"why_cited":"Pith tracks Text Embeddings by Weakly-Supervised Contrastive Pre-training because it crossed a citation-hub threshold. Current citing contexts most often use it as background evidence (8 contexts).","role_counts":[{"n":8,"context_role":"background"},{"n":5,"context_role":"method"},{"n":2,"context_role":"other"},{"n":1,"context_role":"baseline"},{"n":1,"context_role":"dataset"}]},"error":null,"updated_at":"2026-05-19T02:31:22.020292+00:00"},"summary_claims":{"job_type":"summary_claims","status":"succeeded","result":{"title":"Text Embeddings by Weakly-Supervised Contrastive Pre-training","claims":[{"claim_text":"This paper presents E5, a family of state-of-the-art text embeddings that transfer well to a wide range of tasks. The model is trained in a contrastive manner with weak supervision signals from our curated large-scale text pair dataset (called CCPairs). E5 can be readily used as a general-purpose embedding model for any tasks requiring a single-vector representation of texts such as retrieval, clustering, and classification, achieving strong performance in both zero-shot and fine-tuned settings. We conduct extensive evaluations on 56 datasets from the BEIR and MTEB benchmarks. For zero-shot se","claim_type":"abstract","evidence_strength":"source_metadata"}],"why_cited":"Pith tracks Text Embeddings by Weakly-Supervised Contrastive Pre-training because it crossed a citation-hub threshold.","role_counts":[]},"error":null,"updated_at":"2026-05-14T08:47:51.748596+00:00"}},"summary":{"title":"Text Embeddings by Weakly-Supervised Contrastive Pre-training","claims":[{"claim_text":"This paper presents E5, a family of state-of-the-art text embeddings that transfer well to a wide range of tasks. The model is trained in a contrastive manner with weak supervision signals from our curated large-scale text pair dataset (called CCPairs). E5 can be readily used as a general-purpose embedding model for any tasks requiring a single-vector representation of texts such as retrieval, clustering, and classification, achieving strong performance in both zero-shot and fine-tuned settings. We conduct extensive evaluations on 56 datasets from the BEIR and MTEB benchmarks. For zero-shot se","claim_type":"abstract","evidence_strength":"source_metadata"}],"why_cited":"Pith tracks Text Embeddings by Weakly-Supervised Contrastive Pre-training because it crossed a citation-hub threshold.","role_counts":[]},"graph":{"co_cited":[{"title":"Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models","work_id":"bab684a8-d933-426c-a19e-2c855a0d1f59","shared_citers":17},{"title":"Qwen3 Technical Report","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","shared_citers":17},{"title":"Towards General Text Embeddings with Multi-stage Contrastive Learning","work_id":"861a61de-66fe-49d1-b1ab-11f8b082a4cc","shared_citers":14},{"title":"Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning","work_id":"0e0b7549-2bc4-4574-aa7f-588ffa16eaae","shared_citers":13},{"title":"DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models","work_id":"c5006563-f3ec-438a-9e35-b7b484f34828","shared_citers":11},{"title":"DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning","work_id":"e6b75ad5-2877-4168-97c8-710407094d20","shared_citers":10},{"title":"Qwen2.5 Technical Report","work_id":"d8432992-4980-4a81-85c7-9fa2c2b87f85","shared_citers":10},{"title":"Proximal Policy Optimization Algorithms","work_id":"240c67fe-d14d-4520-91c1-38a4e272ca19","shared_citers":9},{"title":"The Llama 3 Herd of Models","work_id":"1549a635-88af-4ac1-acfe-51ae7bb53345","shared_citers":9},{"title":"Unsupervised Dense Information Retrieval with Contrastive Learning","work_id":"5316be40-48c7-48a5-87bc-7cfa3486f835","shared_citers":9},{"title":"BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models","work_id":"c5f7f027-ac36-4b07-b824-0eca2f310641","shared_citers":8},{"title":"GPT-4 Technical Report","work_id":"b928e041-6991-4c08-8c81-0359e4097c7b","shared_citers":8},{"title":"Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks","work_id":"27adfcc9-2a67-43d6-a844-78309012411f","shared_citers":8},{"title":"M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation","work_id":"a9435752-4e49-42bd-95b4-0fec975633c8","shared_citers":7},{"title":"Nv-embed: Improved techniques for training llms as generalist embedding models.arXiv preprint arXiv:2405.17428","work_id":"f8b6faa6-cb4e-4e31-95a5-6b8986c27baa","shared_citers":7},{"title":"Representation Learning with Contrastive Predictive Coding","work_id":"7b08a1d4-d565-424e-9c86-6ef244b7b90a","shared_citers":7},{"title":"arXiv preprint arXiv:2312.02724 , year=","work_id":"0d9b3ad1-b405-412f-81ee-fd6f941d2367","shared_citers":6},{"title":"GPT-4o System Card","work_id":"f37bf1c7-4964-4e56-9762-d20da8d9009f","shared_citers":6},{"title":"LLaMA: Open and Efficient Foundation Language Models","work_id":"c018fc23-6f3f-4035-9d02-28a2173b2b9d","shared_citers":6},{"title":"Mistral 7B","work_id":"eb5e1305-ad11-4875-ad8d-ad8b8f697599","shared_citers":6},{"title":"C-Pack: Packed Resources For General Chinese Embeddings","work_id":"5d8d3efd-bb5b-4a30-8457-28f190c026e9","shared_citers":5},{"title":"DAPO: An Open-Source LLM Reinforcement Learning System at Scale","work_id":"64019d00-0b11-4bbd-b173-b46c8fad0157","shared_citers":5},{"title":"Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities","work_id":"008df105-2fdd-45d8-857a-8e35868aecb6","shared_citers":5},{"title":"Gemini embedding: Generalizable embeddings from gemini","work_id":"911b6918-a128-453f-ae99-94388c38fcb1","shared_citers":5}],"time_series":[{"n":2,"year":2023},{"n":2,"year":2024},{"n":2,"year":2025},{"n":64,"year":2026}],"dependency_candidates":[]},"authors":[{"id":"438ec165-f127-49a2-80f9-bac9ca9dca20","orcid":null,"display_name":"Binxing Jiao","source":"manual","import_confidence":0.72},{"id":"3fc74868-3964-48f4-97c7-4a095f17e5bc","orcid":null,"display_name":"Daxin Jiang","source":"manual","import_confidence":0.72},{"id":"7039103b-0889-4fb8-b5e1-d6e0869e4d71","orcid":null,"display_name":"Liang Wang","source":"manual","import_confidence":0.72},{"id":"ff2cba8f-3007-4821-bd19-e817ae9db727","orcid":null,"display_name":"Linjun Yang","source":"manual","import_confidence":0.72},{"id":"a582f8ec-0744-463c-9340-9b79b5d25393","orcid":null,"display_name":"Nan Yang","source":"manual","import_confidence":0.72},{"id":"f7d8a1e0-19a5-43c4-bf12-50c9dd3ebd69","orcid":null,"display_name":"Xiaolong Huang","source":"manual","import_confidence":0.72}]}}