Recognition: 3 theorem links
· Lean TheoremC-Pack: Packed Resources For General Chinese Embeddings
Pith reviewed 2026-05-13 13:22 UTC · model grok-4.3
The pith
C-Pack supplies a benchmark, training dataset, and models that let Chinese text embeddings outperform all earlier ones by up to 10 percent on 35 tasks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We introduce C-Pack consisting of C-MTEB, a comprehensive Chinese embedding benchmark with 6 tasks and 35 datasets, C-MTP, a massive curated text embedding training set drawn from Chinese corpora, and C-TEM, a family of embedding models that achieve up to 10 percent higher scores than prior Chinese models on C-MTEB when trained with the integrated suite of methods.
What carries the argument
C-TEM models trained on the C-MTP dataset and evaluated on the C-MTEB benchmark.
If this is right
- Downstream Chinese NLP systems can adopt higher-quality embeddings for retrieval, classification, and semantic similarity tasks.
- Open release of both the benchmark and the training data allows direct replication and extension by other researchers.
- The English models and twice-larger English data set provide a parallel resource that reaches top MTEB scores.
- The optimized training pipeline can be applied to produce embeddings in additional languages or sizes.
Where Pith is reading between the lines
- The same packing approach of benchmark plus data plus model could be replicated for other languages to close performance gaps.
- If C-MTEB becomes widely adopted it may standardize evaluation and reduce hidden selection effects in future Chinese embedding papers.
- Larger-scale training on the released C-MTP data could further widen the gap over prior methods.
- Integration of the English and Chinese resources may support improved bilingual or multilingual embedding models.
Load-bearing premise
The C-MTEB collection of 35 datasets supplies an unbiased and comprehensive test of general Chinese embedding quality.
What would settle it
Release of a new Chinese embedding model that scores higher than the largest C-TEM variant on every C-MTEB task without using the C-MTP training data.
read the original abstract
We introduce C-Pack, a package of resources that significantly advance the field of general Chinese embeddings. C-Pack includes three critical resources. 1) C-MTEB is a comprehensive benchmark for Chinese text embeddings covering 6 tasks and 35 datasets. 2) C-MTP is a massive text embedding dataset curated from labeled and unlabeled Chinese corpora for training embedding models. 3) C-TEM is a family of embedding models covering multiple sizes. Our models outperform all prior Chinese text embeddings on C-MTEB by up to +10% upon the time of the release. We also integrate and optimize the entire suite of training methods for C-TEM. Along with our resources on general Chinese embedding, we release our data and models for English text embeddings. The English models achieve state-of-the-art performance on MTEB benchmark; meanwhile, our released English data is 2 times larger than the Chinese data. All these resources are made publicly available at https://github.com/FlagOpen/FlagEmbedding.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces C-Pack, a package of resources for general Chinese embeddings consisting of (1) C-MTEB, a benchmark spanning 6 tasks and 35 datasets, (2) C-MTP, a large curated training corpus from labeled and unlabeled Chinese text, and (3) C-TEM, a family of embedding models of varying sizes. The central claim is that the released C-TEM models outperform all prior Chinese text embeddings on C-MTEB by up to 10% at the time of release; the authors also release English data (twice the size of the Chinese data) and models that reach SOTA on MTEB.
Significance. If the performance claims hold under rigorous scrutiny, the work supplies valuable, publicly released resources that address the relative scarcity of high-quality Chinese embedding benchmarks and training data. The integration of multiple training methods into C-TEM and the dual-language release could accelerate progress in multilingual embedding research.
major comments (3)
- [§3] §3 (C-MTEB construction): the description of how the 35 datasets were selected and filtered lacks explicit criteria for avoiding task-specific overfitting or selection effects that could favor the proposed models; a clear protocol for dataset inclusion/exclusion is needed to substantiate the claim that C-MTEB is an unbiased measure of general Chinese embedding quality.
- [Table 2] Table 2 (main results): the reported gains of up to +10% are presented without standard deviations across runs, statistical significance tests, or details on the exact baseline implementations and hyper-parameters; this information is load-bearing for the central empirical claim.
- [§4.2] §4.2 (training procedure): the statement that the authors 'integrate and optimize the entire suite of training methods' is not accompanied by sufficient ablation results or hyper-parameter schedules to allow reproduction or assessment of whether the gains derive from data scale, model architecture, or training tricks.
minor comments (2)
- [Abstract] The GitHub link in the abstract should be repeated in the conclusion or data-availability statement for reader convenience.
- [Figure 1] Figure 1 caption could explicitly state the number of parameters for each C-TEM variant shown.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and positive recommendation. We address each major comment below and will revise the manuscript accordingly to improve clarity and reproducibility.
read point-by-point responses
-
Referee: [§3] §3 (C-MTEB construction): the description of how the 35 datasets were selected and filtered lacks explicit criteria for avoiding task-specific overfitting or selection effects that could favor the proposed models; a clear protocol for dataset inclusion/exclusion is needed to substantiate the claim that C-MTEB is an unbiased measure of general Chinese embedding quality.
Authors: We agree that an explicit protocol strengthens the benchmark's credibility. In the revised manuscript we will add a dedicated subsection to §3 that details the inclusion/exclusion criteria, including steps taken to ensure task diversity, domain coverage, and avoidance of selection bias toward our training data. This protocol draws on established practices from MTEB while adapting for Chinese-specific considerations. revision: yes
-
Referee: [Table 2] Table 2 (main results): the reported gains of up to +10% are presented without standard deviations across runs, statistical significance tests, or details on the exact baseline implementations and hyper-parameters; this information is load-bearing for the central empirical claim.
Authors: We acknowledge the value of statistical rigor. The revision will expand the experimental section and Table 2 caption with full hyper-parameter settings and exact baseline re-implementation details (including sources and any adaptations). Standard deviations are not reported in the current version because experiments used fixed seeds for reproducibility; we will add a note on this limitation and include variance estimates from additional runs where compute permits. revision: partial
-
Referee: [§4.2] §4.2 (training procedure): the statement that the authors 'integrate and optimize the entire suite of training methods' is not accompanied by sufficient ablation results or hyper-parameter schedules to allow reproduction or assessment of whether the gains derive from data scale, model architecture, or training tricks.
Authors: We will revise §4.2 to include expanded ablation tables and a hyper-parameter schedule appendix. These additions will isolate the contributions of data scale, contrastive objectives, and other optimizations, enabling readers to assess the sources of improvement. revision: yes
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Contrastive learning objectives on curated text pairs produce effective general-purpose embeddings
Forward citations
Cited by 30 Pith papers
-
Knowledge Packs: Zero-Token Knowledge Delivery via KV Cache Injection
Knowledge Packs deliver knowledge via pre-computed KV caches with exact equivalence under causal masking, achieving zero divergences on tested questions and enabling value-based steering without training.
-
Retrieval from Within: An Intrinsic Capability of Attention-Based Models
Attention-based models can intrinsically retrieve and reuse pre-encoded evidence chunks via decoder attention queries, unifying retrieval with generation and outperforming external RAG pipelines on QA benchmarks.
-
Prism-Reranker: Beyond Relevance Scoring -- Jointly Producing Contributions and Evidence for Agentic Retrieval
Prism-Reranker models output relevance, contribution statements, and evidence passages to support agentic retrieval beyond scalar scoring.
-
HaS: Accelerating RAG through Homology-Aware Speculative Retrieval
HaS accelerates RAG retrieval via homology-aware speculative retrieval and homologous query re-identification validation, cutting latency 24-37% with 1-2% accuracy drop on tested datasets.
-
METRO: Towards Strategy Induction from Expert Dialogue Transcripts for Non-collaborative Dialogues
METRO induces both short-term actions and long-term planning from expert transcripts into a Strategy Forest, outperforming prior methods by 9-10% on two non-collaborative dialogue benchmarks.
-
DRBENCHER: Can Your Agent Identify the Entity, Retrieve Its Properties and Do the Math?
DRBENCHER generates multi-hop questions across biochemistry, finance, geophysics, security, and history that test interleaved browsing and computation, where the strongest models reach only 20% accuracy and human vali...
-
MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries
MultiHop-RAG is a new benchmark dataset demonstrating that existing retrieval-augmented generation systems perform poorly on multi-hop queries requiring retrieval and reasoning over multiple evidence pieces.
-
An Annotation Scheme and Classifier for Personal Facts in Dialogue
An extended annotation scheme with new categories and attributes plus a Gemma-300M-based multi-head classifier achieves 81.6% macro F1 on personal fact classification, outperforming few-shot LLM baselines by nearly 9 ...
-
SkillRAE: Agent Skill-Based Context Compilation for Retrieval-Augmented Execution
SkillRAE organizes skills into a graph and compiles compact, grounded contexts for LLM agents, yielding 11.7% gains on SkillsBench over prior RAE methods.
-
Retrieval from Within: An Intrinsic Capability of Attention-Based Models
Attention-based models can retrieve evidence intrinsically by using decoder attention to score and reuse their own pre-encoded chunks, outperforming separate retrieval pipelines on QA benchmarks.
-
Agentic Retrieval-Augmented Generation for Financial Document Question Answering
FinAgent-RAG achieves 76.81-78.46% execution accuracy on financial QA benchmarks by combining contrastive retrieval, program-of-thought code generation, and adaptive strategy routing, outperforming baselines by 5.62-9...
-
A Replicability Study of XTR
XTR training does not improve retrieval effectiveness over ColBERT but enhances IVF engine efficiency by flattening token scores to produce more discriminative centroids.
-
MemRouter: Memory-as-Embedding Routing for Long-Term Conversational Agents
A lightweight supervised router using frozen-LLM embeddings for memory admission decisions outperforms LLM-based memory managers in both F1 score and latency on the LoCoMo benchmark.
-
MiMIC: Mitigating Visual Modality Collapse in Universal Multimodal Retrieval While Avoiding Semantic Misalignment
MiMIC mitigates visual modality collapse and semantic misalignment in universal multimodal retrieval via fusion-in-decoder architecture and robust single-modality training.
-
EvoRAG: Making Knowledge Graph-based RAG Automatically Evolve through Feedback-driven Backpropagation
EvoRAG adds a feedback-driven backpropagation step that attributes response quality to individual knowledge-graph triplets and updates the graph to raise reasoning accuracy by 7.34 percent over prior KG-RAG methods.
-
EpiAgent: An Agent-Centric System for Ancient Inscription Restoration
EpiAgent is a new agent-centric system that restores degraded ancient inscriptions with better quality and generalization than prior rigid AI methods by using an LLM planner to coordinate multimodal tools and iterativ...
-
Regime-Conditional Retrieval: Theory and a Transferable Router for Two-Hop QA
Two-hop QA retrieval performance depends on whether the hop-2 entity is in the question or bridge passage, and a simple predicate-based router trained on one dataset transfers to improve R@5 on others.
-
ResearchEVO: An End-to-End Framework for Automated Scientific Discovery and Documentation
ResearchEVO automates the discover-then-explain cycle by evolving algorithms via fitness-driven LLM co-evolution and generating grounded, anti-hallucination research papers through sentence-level RAG.
-
SelRoute: Query-Type-Aware Routing for Long-Term Conversational Memory Retrieval
SelRoute routes queries to type-specific retrieval pipelines, achieving Recall@5 of 0.800 with a 109M model on LongMemEval_M and outperforming LLM-augmented baselines including a strong zero-ML lexical method.
-
ASTRA: Mapping Art-Technology Institutions via Conceptual Axes, Text Embeddings, and Unsupervised Clustering
ASTRA combines an eight-axis conceptual framework with text embeddings and unsupervised clustering to map and group 78 art-technology institutions into coherent thematic clusters.
-
OGER: A Robust Offline-Guided Exploration Reward for Hybrid Reinforcement Learning
OGER adds an auxiliary exploration reward built from offline trajectories and model entropy to hybrid RL training, yielding gains on math reasoning benchmarks and out-of-domain generalization.
-
DocSeeker: Structured Visual Reasoning with Evidence Grounding for Long Document Understanding
DocSeeker improves long-document understanding in MLLMs via a two-stage training process that combines supervised fine-tuning from distilled data with evidence-aware group relative policy optimization and memory-effic...
-
DocSeeker: Structured Visual Reasoning with Evidence Grounding for Long Document Understanding
DocSeeker uses supervised fine-tuning on distilled data followed by evidence-aware group relative policy optimization to improve long-document understanding and evidence grounding in MLLMs.
-
MEME-Fusion@CHiPSAL 2026: Multimodal Ablation Study of Hate Detection and Sentiment Analysis on Nepali Memes
A hybrid cross-modal attention model using CLIP and BGE-M3 improves hate detection F1-macro by 5.9% over text-only baselines on Nepali memes while revealing failures of English-centric vision models and ensembles on s...
-
Domain-Adapted Retrieval for In-Context Annotation of Pedagogical Dialogue Acts
Domain-adapted utterance-level retrieval raises Cohen's kappa for tutoring dialogue act annotation to 0.526-0.580 on TalkMoves and 0.659-0.743 on Eedi, beating no-retrieval baselines by large margins across three LLMs.
-
The Geometry of Forgetting
Interference among memories in embedding spaces produces human-like power-law forgetting (b≈0.46) and false memories (false alarm rate 0.583) from raw pre-trained embeddings with zero tuning.
-
Retrieval-Augmented Generation for AI-Generated Content: A Survey
A survey classifying RAG foundations for AIGC, summarizing enhancements, cross-modal applications, benchmarks, limitations, and future directions.
-
Multilingual E5 Text Embeddings: A Technical Report
Open-source multilingual E5 embedding models are trained via contrastive pre-training on 1 billion text pairs and fine-tuning, with an instruction-tuned model matching English SOTA performance.
-
Mira-Embeddings-V1: Domain-Adapted Semantic Reranking for Recruitment via LLM-Synthesized Data
Mira-Embeddings-V1 adapts embeddings for recruitment reranking by synthesizing positive and hard-negative samples with LLMs, then applies JD-JD contrastive and JD-CV triplet training plus a BoundaryHead MLP, lifting R...
-
Hybrid Retrieval for COVID-19 Literature: Comparing Rank Fusion and Projection Fusion with Diversity Reranking
Rank fusion (RRF) reaches the highest relevance (nDCG@10 = 0.828) on expert COVID-19 queries while a projection fusion variant (B5) is 33% faster and produces more diverse results.
Reference graph
Works this paper leans on
-
[1]
Eneko Agirre, Carmen Banea, Claire Cardie, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, Weiwei Guo, Inigo Lopez-Gazpio, Montse Maritxalar, Rada Mihalcea, et al . 2015. Semeval-2015 task 2: Semantic textual similarity, eng- lish, spanish and pilot on interpretability. In Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015) . 252–263
work page 2015
-
[2]
Eneko Agirre, Carmen Banea, Claire Cardie, Daniel M Cer, Mona T Diab, Aitor Gonzalez-Agirre, Weiwei Guo, Rada Mihalcea, German Rigau, and Janyce Wiebe
-
[3]
SemEval-2014 Task 10: Multilingual Semantic Textual Similarity.. In Se- mEval@ COLING. 81–91
work page 2014
-
[4]
Eneko Agirre, Carmen Banea, Daniel Cer, Mona Diab, Aitor Gonzalez Agirre, Rada Mihalcea, German Rigau Claramunt, and Janyce Wiebe. 2016. Semeval-2016 task 1: Semantic textual similarity, monolingual and cross-lingual evaluation. In SemEval-2016. 10th International Workshop on Semantic Evaluation; 2016 Jun 16-17; San Diego, CA. Stroudsburg (PA): ACL; 2016....
work page 2016
-
[5]
Eneko Agirre, Daniel Cer, Mona Diab, and Aitor Gonzalez-Agirre. 2012. Semeval- 2012 task 6: A pilot on semantic textual similarity. In * SEM 2012: The First Joint Conference on Lexical and Computational Semantics–Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Eval...
work page 2012
-
[6]
Eneko Agirre, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, and Weiwei Guo
-
[7]
* SEM 2013 shared task: Semantic textual similarity. In Second joint confer- ence on lexical and computational semantics (* SEM), volume 1: proceedings of the Main conference and the shared task: semantic textual similarity . 32–43
work page 2013
-
[8]
Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, 15huggingface.co/hfl/chinese-roberta-wwm-ext-large Manan Dey, et al. 2023. SantaCoder: don’t reach for the stars! arXiv preprint arXiv:2301.03988 (2023)
- [9]
- [10]
-
[11]
Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Ruther- ford, Katie Millican, George Bm Van Den Driessche, Jean-Baptiste Lespiau, Bog- dan Damoc, Aidan Clark, et al. 2022. Improving language models by retrieving from trillions of tokens. In International conference on machine learning . PMLR, 2206–2240
work page 2022
-
[12]
Samuel R Bowman, Gabor Angeli, Christopher Potts, and Christopher D Manning
-
[13]
arXiv preprint arXiv:1508.05326 (2015)
A large annotated corpus for learning natural language inference. arXiv preprint arXiv:1508.05326 (2015)
-
[14]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901
work page 2020
-
[15]
Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Se- bastian Gehrmann, et al. 2022. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311 (2022)
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[16]
Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, et al. 2022. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416 (2022)
work page internal anchor Pith review Pith/arXiv arXiv 2022
- [17]
-
[18]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding.arXiv preprint arXiv:1810.04805 (2018)
work page internal anchor Pith review Pith/arXiv arXiv 2018
- [19]
-
[20]
Leo Gao, Jonathan Tow, Stella Biderman, Sid Black, Anthony DiPofi, Charles Foster, Laurence Golding, Jeffrey Hsu, Kyle McDonell, Niklas Muennighoff, Jason Phang, Laria Reynolds, Eric Tang, Anish Thite, Ben Wang, Kevin Wang, and Andy Zou. 2021. A framework for few-shot language model evaluation . https: //doi.org/10.5281/zenodo.5371628
- [21]
-
[22]
Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Mingwei Chang. 2020. Retrieval augmented language model pre-training. In International conference on machine learning. PMLR, 3929–3938
work page 2020
- [23]
-
[24]
Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes C-Pack: Packed Resources For General Chinese Embeddings SIGIR ’24, July 14–18, 2024, Washington, DC, USA Welbl, Aidan Clark, et al. 2022. Training compute-optimal large language models. arXiv preprint arXiv...
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[25]
Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bo- janowski, Armand Joulin, and Edouard Grave. 2021. Unsupervised dense in- formation retrieval with contrastive learning. arXiv preprint arXiv:2112.09118 (2021)
work page internal anchor Pith review Pith/arXiv arXiv 2021
- [26]
- [27]
- [28]
-
[29]
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems 33 (2020), 9459–9474
work page 2020
-
[30]
Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, et al. 2023. StarCoder: may the source be with you! arXiv preprint arXiv:2305.06161 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[31]
Zehan Li, Xin Zhang, Yanzhao Zhang, Dingkun Long, Pengjun Xie, and Meishan Zhang. 2023. Towards General Text Embeddings with Multi-stage Contrastive Learning. arXiv preprint arXiv:2308.03281 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[32]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
work page internal anchor Pith review Pith/arXiv arXiv 2019
- [33]
-
[34]
Dingkun Long, Qiong Gao, Kuan Zou, Guangwei Xu, Pengjun Xie, Ruijie Guo, Jian Xu, Guanjun Jiang, Luxi Xing, and Ping Yang. 2022. Multi-cpr: A multi domain Chinese dataset for passage retrieval. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval . 3046–3056
work page 2022
- [35]
- [36]
- [37]
-
[38]
Niklas Muennighoff, Nouamane Tazi, Loïc Magne, and Nils Reimers. 2022. MTEB: Massive text embedding benchmark. arXiv preprint arXiv:2210.07316 (2022)
work page internal anchor Pith review arXiv 2022
- [39]
-
[40]
Arvind Neelakantan, Tao Xu, Raul Puri, Alec Radford, Jesse Michael Han, Jerry Tworek, Qiming Yuan, Nikolas Tezak, Jong Wook Kim, Chris Hallacy, et al
-
[41]
arXiv preprint arXiv:2201.10005 , year=
Text and code embeddings by contrastive pre-training. arXiv preprint arXiv:2201.10005 (2022)
-
[42]
Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng. 2016. Ms marco: A human-generated machine reading comprehension dataset. (2016)
work page 2016
- [43]
- [44]
-
[45]
Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, et al. 2023. ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs.arXiv preprint arXiv:2307.16789 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
- [46]
- [47]
-
[48]
Jack W Rae, Sebastian Borgeaud, Trevor Cai, Katie Millican, Jordan Hoffmann, Francis Song, John Aslanides, Sarah Henderson, Roman Ring, Susannah Young, et al. 2021. Scaling language models: Methods, analysis & insights from training gopher. arXiv preprint arXiv:2112.11446 (2021)
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[49]
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research 21, 1 (2020), 5485–5551
work page 2020
-
[50]
Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084 (2019)
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[51]
Victor Sanh, Albert Webson, Colin Raffel, Stephen H Bach, Lintang Sutawika, Zaid Alyafeai, Antoine Chaffin, Arnaud Stiegler, Teven Le Scao, Arun Raja, et al
-
[52]
Multitask Prompted Training Enables Zero-Shot Task Generalization
Multitask prompted training enables zero-shot task generalization. arXiv preprint arXiv:2110.08207 (2021)
work page internal anchor Pith review arXiv 2021
- [53]
-
[54]
Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, et al. 2022. Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arXiv preprint arXiv:2206.04615 (2022)
work page internal anchor Pith review Pith/arXiv arXiv 2022
- [55]
-
[56]
Nandan Thakur, Nils Reimers, Andreas Rücklé, Abhishek Srivastava, and Iryna Gurevych. 2021. Beir: A heterogenous benchmark for zero-shot evaluation of information retrieval models. arXiv preprint arXiv:2104.08663 (2021)
work page internal anchor Pith review Pith/arXiv arXiv 2021
- [57]
-
[58]
Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, and Furu Wei. 2022. Text embeddings by weakly-supervised contrastive pre-training. arXiv preprint arXiv:2212.03533 (2022)
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[59]
Jason Wei, Maarten Bosma, Vincent Y Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M Dai, and Quoc V Le. 2021. Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652 (2021)
work page internal anchor Pith review Pith/arXiv arXiv 2021
- [60]
-
[61]
Shitao Xiao, Zheng Liu, Weihao Han, Jianjin Zhang, Defu Lian, Yeyun Gong, Qi Chen, Fan Yang, Hao Sun, Yingxia Shao, et al. 2022. Distill-vq: Learning retrieval oriented vector quantization by distilling knowledge from dense embeddings. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval . 1513–1523
work page 2022
-
[62]
Shitao Xiao, Zheng Liu, Weihao Han, Jianjin Zhang, Yingxia Shao, Defu Lian, Chaozhuo Li, Hao Sun, Denvy Deng, Liangjie Zhang, et al. 2022. Progressively optimized bi-granular document representation for scalable embedding based retrieval. In Proceedings of the ACM Web Conference 2022 . 286–296
work page 2022
- [63]
- [64]
- [65]
- [66]
- [67]
-
[68]
Sha Yuan, Hanyu Zhao, Zhengxiao Du, Ming Ding, Xiao Liu, Yukuo Cen, Xu Zou, Zhilin Yang, and Jie Tang. 2021. Wudaocorpora: A super large-scale chinese corpora for pre-training language models. AI Open 2 (2021), 65–68
work page 2021
-
[69]
Jianjin Zhang, Zheng Liu, Weihao Han, Shitao Xiao, Ruicheng Zheng, Yingxia Shao, Hao Sun, Hanqing Zhu, Premkumar Srinivasan, Weiwei Deng, et al. 2022. Uni-retriever: Towards learning the unified embedding based retriever in bing sponsored search. In Proceedings of the 28th ACM SIGKDD Conference on Knowl- edge Discovery and Data Mining . 4493–4501
work page 2022
-
[70]
S. Zhang, X. Zhang, H. Wang, L. Guo, and S. Liu. 2018. Multi-Scale Attentive Interaction Networks for Chinese Medical Question Answer Selection. IEEE Access 6 (2018), 74061–74071. https://doi.org/10.1109/ACCESS.2018.2883637
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.