Data-CUBE: Data Curriculum for Instruction-based Sentence Representation Learning
Pith reviewed 2026-05-24 04:27 UTC · model grok-4.3
The pith
A data curriculum ordering tasks by interference risk and instances by difficulty reduces cross-task problems in instruction-based sentence representation learning.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Data-CUBE arranges the orders of all the multi-task data for training to minimize the interference risks from the two views. In the task level we aim to find the optimal task order to minimize the total cross-task interference risk, which is exactly the traveling salesman problem, hence we utilize a simulated annealing algorithm to find its solution. In the instance level we measure the difficulty of all instances per task then divide them into the easy-to-difficult mini-batches for training. Experiments on MTEB sentence representation evaluation tasks show that our approach can boost the performance of state-of-the-art methods.
What carries the argument
Dual-level data curriculum: task ordering cast as a traveling salesman problem solved by simulated annealing to minimize total cross-task interference, plus per-task instance ordering from easy to difficult mini-batches.
If this is right
- State-of-the-art instruction-tuned models obtain higher scores on MTEB sentence representation tasks.
- Training stability improves because total cross-task interference risk is reduced by the computed task sequence.
- Within each task, progressing from easy to difficult instances produces better convergence of the representation model.
- The same ordering logic directly applies to any collection of instruction-following tasks used for sentence embedding training.
Where Pith is reading between the lines
- The TSP formulation implies that interference between tasks behaves like a distance metric that can be estimated from data alone.
- Similar curriculum ordering might be tested on instruction-tuned models for other embedding tasks such as retrieval or classification without changing the underlying architecture.
- If the simulated-annealing solution consistently outperforms greedy or random baselines, the approach supplies a practical, parameter-light way to schedule any multi-task instruction dataset.
Load-bearing premise
Cross-task interference can be quantified in a form that lets task ordering be solved as a traveling salesman problem whose solution improves final representations.
What would settle it
Applying the Data-CUBE ordering to the same multi-task instruction data and observing no gain or a loss in average MTEB score relative to random task order would falsify the claim.
Figures
read the original abstract
Recently, multi-task instruction tuning has been applied into sentence representation learning, which endows the capability of generating specific representations with the guidance of task instruction, exhibiting strong generalization ability on new tasks. However, these methods mostly neglect the potential interference problems across different tasks and instances, which may affect the training and convergence of the model. To address it, we propose a data curriculum method, namely Data-CUBE, that arranges the orders of all the multi-task data for training, to minimize the interference risks from the two views. In the task level, we aim to find the optimal task order to minimize the total cross-task interference risk, which is exactly the traveling salesman problem, hence we utilize a simulated annealing algorithm to find its solution. In the instance level, we measure the difficulty of all instances per task, then divide them into the easy-to-difficult mini-batches for training. Experiments on MTEB sentence representation evaluation tasks show that our approach can boost the performance of state-of-the-art methods. Our code and data are publicly available at the link: \url{https://github.com/RUCAIBox/Data-CUBE}.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Data-CUBE, a data curriculum method for multi-task instruction tuning in sentence representation learning. It casts task ordering as a traveling salesman problem whose edge weights are cross-task interference risks, solved via simulated annealing, and orders instances within each task from easy to difficult based on a difficulty heuristic; experiments claim this boosts performance of state-of-the-art methods on MTEB tasks, with code and data released.
Significance. If the interference metric and difficulty heuristic demonstrably correlate with training dynamics and the reported MTEB gains prove robust to hyperparameter controls, the approach would offer a practical, model-agnostic way to mitigate interference in multi-task sentence embedding training. The public code release is a clear strength for reproducibility.
major comments (3)
- [Abstract] Abstract: the central claim that the method 'boost[s] the performance of state-of-the-art methods' on MTEB is unsupported by any reported error bars, statistical significance tests, number of runs, or ablation on the simulated annealing cooling schedule and iteration count; these omissions make it impossible to attribute gains to the curriculum rather than hyperparameter search.
- [Task-level curriculum] Task-level curriculum paragraph: the cross-task interference risk used as TSP edge weights is never defined or computed (no equation or algorithm); without an explicit metric or empirical validation that it predicts gradient conflict or forgetting on held-out pairs, the TSP+SA ordering has no demonstrated advantage over random or heuristic baselines.
- [Instance-level curriculum] Instance-level curriculum paragraph: the difficulty metric and batch-division threshold are introduced as free parameters with no sensitivity analysis or correlation study against actual training loss curves; this leaves open whether the easy-to-difficult ordering is load-bearing or interchangeable with random batching.
minor comments (1)
- [Abstract] The abstract states 'Our code and data are publicly available' but provides only a GitHub link without a commit hash or data version; this should be pinned for reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript accordingly to strengthen statistical reporting, provide explicit definitions, and include additional analyses.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that the method 'boost[s] the performance of state-of-the-art methods' on MTEB is unsupported by any reported error bars, statistical significance tests, number of runs, or ablation on the simulated annealing cooling schedule and iteration count; these omissions make it impossible to attribute gains to the curriculum rather than hyperparameter search.
Authors: We agree that the abstract claim would benefit from greater statistical rigor. In the revised manuscript we will report results averaged over multiple random seeds with standard deviations, include paired statistical significance tests, and add an ablation on the simulated annealing cooling schedule and iteration count to better attribute gains to the curriculum. revision: yes
-
Referee: [Task-level curriculum] Task-level curriculum paragraph: the cross-task interference risk used as TSP edge weights is never defined or computed (no equation or algorithm); without an explicit metric or empirical validation that it predicts gradient conflict or forgetting on held-out pairs, the TSP+SA ordering has no demonstrated advantage over random or heuristic baselines.
Authors: We acknowledge that an explicit definition and validation of the interference risk metric is necessary. In the revised manuscript we will add the mathematical formulation of the cross-task interference risk (based on gradient conflicts), the algorithm for computing TSP edge weights, empirical validation of its correlation with gradient conflict and forgetting on held-out pairs, and direct comparisons against random and heuristic baselines. revision: yes
-
Referee: [Instance-level curriculum] Instance-level curriculum paragraph: the difficulty metric and batch-division threshold are introduced as free parameters with no sensitivity analysis or correlation study against actual training loss curves; this leaves open whether the easy-to-difficult ordering is load-bearing or interchangeable with random batching.
Authors: We agree that sensitivity analysis and correlation studies are needed. The revised version will include sensitivity analysis on the difficulty metric and batch-division threshold, plus a correlation study between the difficulty heuristic and training loss curves to demonstrate that the easy-to-difficult ordering contributes beyond random batching. revision: yes
Circularity Check
No significant circularity; curriculum heuristics are independent of reported gains
full rationale
The paper defines task ordering via TSP on a cross-task interference risk metric and instance ordering via a per-task difficulty heuristic, then reports empirical gains on the external MTEB benchmark. Neither the TSP formulation nor the difficulty measure is shown to be computed from or fitted to the final MTEB scores; the optimization steps remain external to the evaluation data. No equations reduce the claimed performance boost to quantities defined by the evaluation itself, and no self-citation chain is load-bearing for the central experimental claim. This satisfies the default expectation of a non-circular derivation.
Axiom & Free-Parameter Ledger
free parameters (2)
- simulated annealing cooling schedule and iteration count
- instance difficulty metric and batch division threshold
axioms (2)
- domain assumption Cross-task interference risk is quantifiable and additive so that total risk equals the sum of pairwise risks.
- domain assumption Instance difficulty can be measured independently of the training dynamics and predicts learning interference.
Reference graph
Works this paper leans on
-
[1]
Emile HL Aarts, Jan HM Korst, and Peter JM van Laarhoven. 1988. A quantitative analysis of the simulated annealing algorithm: A case study for the traveling salesman problem. Journal of Statistical Physics, 50:187--206
work page 1988
-
[2]
Eneko Agirre, Carmen Banea, Claire Cardie, Daniel M. Cer, Mona T. Diab, Aitor Gonzalez - Agirre, Weiwei Guo, I \ n igo Lopez - Gazpio, Montse Maritxalar, Rada Mihalcea, German Rigau, Larraitz Uria, and Janyce Wiebe. 2015. https://doi.org/10.18653/V1/S15-2045 Semeval-2015 task 2: Semantic textual similarity, english, spanish and pilot on interpretability ....
-
[3]
Eneko Agirre, Carmen Banea, Claire Cardie, Daniel M. Cer, Mona T. Diab, Aitor Gonzalez - Agirre, Weiwei Guo, Rada Mihalcea, German Rigau, and Janyce Wiebe. 2014. https://doi.org/10.3115/V1/S14-2010 Semeval-2014 task 10: Multilingual semantic textual similarity . In Proceedings of the 8th International Workshop on Semantic Evaluation, SemEval@COLING 2014, ...
-
[4]
Eneko Agirre, Carmen Banea, Daniel M. Cer, Mona T. Diab, Aitor Gonzalez - Agirre, Rada Mihalcea, German Rigau, and Janyce Wiebe. 2016. https://doi.org/10.18653/V1/S16-1081 Semeval-2016 task 1: Semantic textual similarity, monolingual and cross-lingual evaluation . In Proceedings of the 10th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT ...
-
[5]
Eneko Agirre, Daniel M. Cer, Mona T. Diab, and Aitor Gonzalez - Agirre. 2012. https://aclanthology.org/S12-1051/ Semeval-2012 task 6: A pilot on semantic textual similarity . In Proceedings of the 6th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2012, Montr \' e al, Canada, June 7-8, 2012 , pages 385--393. The Association for Computer ...
work page 2012
-
[6]
Eneko Agirre, Daniel M. Cer, Mona T. Diab, Aitor Gonzalez - Agirre, and Weiwei Guo. 2013. https://aclanthology.org/S13-1004/ *sem 2013 shared task: Semantic textual similarity . In Proceedings of the Second Joint Conference on Lexical and Computational Semantics, *SEM 2013, June 13-14, 2013, Atlanta, Georgia, USA , pages 32--43. Association for Computatio...
work page 2013
-
[7]
Yoshua Bengio, J \' e r \^ o me Louradour, Ronan Collobert, and Jason Weston. 2009. https://doi.org/10.1145/1553374.1553380 Curriculum learning . In Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, Montreal, Quebec, Canada, June 14-18, 2009 , volume 382 of ACM International Conference Proceeding Series , pages 41--48. ACM
-
[8]
Dimitris Bertsimas and John Tsitsiklis. 1993. Simulated annealing. Statistical science, 8(1):10--15
work page 1993
- [9]
-
[10]
John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Brian Strope, and Ray Kurzweil
Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Brian Strope, and Ray Kurzweil. 2018. https://doi.org/10.18653/v1/D18-2029 Universal sentence encoder for E nglish . In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: Sys...
-
[11]
Daniel M. Cer, Mona T. Diab, Eneko Agirre, I \ n igo Lopez - Gazpio, and Lucia Specia. 2017. https://doi.org/10.18653/V1/S17-2001 Semeval-2017 task 1: Semantic textual similarity multilingual and crosslingual focused evaluation . In Proceedings of the 11th International Workshop on Semantic Evaluation, SemEval@ACL 2017, Vancouver, Canada, August 3-4, 2017...
-
[12]
Shouvik Chakraborty and Sandeep Bhowmik. 2015. An efficient approach to job shop scheduling problem using simulated annealing. International Journal of Hybrid Information Technology, 8(11):273--284
work page 2015
-
[13]
Omar Cheikhrouhou and Ines Khoufi. 2021. https://doi.org/10.1016/J.COSREV.2021.100369 A comprehensive survey on the multiple traveling salesman problem: Applications, approaches and taxonomy . Comput. Sci. Rev., 40:100369
-
[14]
Camargo, Fabian Fl \" o ck, Devin Gaffney, Przemyslaw A
Xi Chen, Ali Zeynali, Chico Q. Camargo, Fabian Fl \" o ck, Devin Gaffney, Przemyslaw A. Grabowicz, Scott A. Hale, David Jurgens, and Mattia Samory. 2022. https://doi.org/10.18653/V1/2022.SEMEVAL-1.155 Semeval-2022 task 8: Multilingual news article similarity . In Proceedings of the 16th International Workshop on Semantic Evaluation, SemEval@NAACL 2022, Se...
-
[15]
Arman Cohan, Sergey Feldman, Iz Beltagy, Doug Downey, and Daniel S. Weld. 2020. https://doi.org/10.18653/V1/2020.ACL-MAIN.207 SPECTER: document-level representation learning using citation-informed transformers . In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020 , pages 2270--2282...
-
[16]
Daniel Delahaye, Supatcha Chaimatanan, and Marcel Mongeau. 2019. Simulated annealing: From basics to applications. Handbook of metaheuristics, pages 1--35
work page 2019
-
[17]
Chuntao Ding, Zhichao Lu, Shangguang Wang, Ran Cheng, and Vishnu Naresh Boddeti. 2023. https://doi.org/10.1109/CVPR52729.2023.00749 Mitigating task interference in multi-task learning via explicit task routing with non-learnable primitives . In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17-24, 20...
-
[18]
Tianyu Gao, Xingcheng Yao, and Danqi Chen. 2021. https://doi.org/10.18653/V1/2021.EMNLP-MAIN.552 Simcse: Simple contrastive learning of sentence embeddings . In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021 , pages 6894--6910. Associat...
- [19]
-
[20]
Michael G \" u nther, Louis Milliken, Jonathan Geuter, Georgios Mastrapas, Bo Wang, and Han Xiao. 2023. https://doi.org/10.48550/ARXIV.2307.11224 Jina embeddings: A novel set of high-performance sentence embedding models . CoRR, abs/2307.11224
-
[21]
Keld Helsgaun. 2006. An effective implementation of K-opt moves for the Lin-Kernighan TSP heuristic. Ph.D. thesis, Roskilde University. Department of Computer Science
work page 2006
-
[22]
Karla L Hoffman, Manfred Padberg, Giovanni Rinaldi, et al. 2013. Traveling salesman problem. Encyclopedia of operations research and management science, 1:1573--1578
work page 2013
-
[23]
Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bojanowski, Armand Joulin, and Edouard Grave. 2022 a . https://openreview.net/forum?id=jKN1pXi7b0 Unsupervised dense information retrieval with contrastive learning . Trans. Mach. Learn. Res., 2022
work page 2022
-
[24]
Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bojanowski, Armand Joulin, and Edouard Grave. 2022 b . https://openreview.net/forum?id=jKN1pXi7b0 Unsupervised dense information retrieval with contrastive learning . Trans. Mach. Learn. Res., 2022
work page 2022
-
[25]
Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick S. H. Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen - tau Yih. 2020. https://doi.org/10.18653/V1/2020.EMNLP-MAIN.550 Dense passage retrieval for open-domain question answering . In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November ...
-
[26]
Wuwei Lan, Siyu Qiu, Hua He, and Wei Xu. 2017. https://doi.org/10.18653/V1/D17-1126 A continuously growing dataset of sentential paraphrases . In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9-11, 2017 , pages 1224--1234. Association for Computational Linguistics
-
[27]
Jaakkola, Kateryna Tymoshenko, Alessandro Moschitti, and Llu \' s M \` a rquez
Tao Lei, Hrishikesh Joshi, Regina Barzilay, Tommi S. Jaakkola, Kateryna Tymoshenko, Alessandro Moschitti, and Llu \' s M \` a rquez. 2016. https://doi.org/10.18653/V1/N16-1153 Semi-supervised question retrieval with gated convolutions . In NAACL HLT 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: H...
-
[28]
Zehan Li, Xin Zhang, Yanzhao Zhang, Dingkun Long, Pengjun Xie, and Meishan Zhang. 2023. https://doi.org/10.48550/ARXIV.2308.03281 Towards general text embeddings with multi-stage contrastive learning . CoRR, abs/2308.03281
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2308.03281 2023
-
[29]
Xueqing Liu, Chi Wang, Yue Leng, and ChengXiang Zhai. 2018. https://doi.org/10.1145/3283812.3283815 Linkso: a dataset for learning to retrieve similar question answer pairs on software development forums . In Proceedings of the 4th ACM SIGSOFT International Workshop on NLP for Software Engineering, NL4SE@ESEC/SIGSOFT FSE 2018, Lake Buena Vista, FL, USA, N...
-
[30]
Rajesh Matai, Surya Prakash Singh, and Murari Lal Mittal. 2010. Traveling salesman problem: an overview of applications, formulations, and solution approaches. Traveling salesman problem, theory and applications, 1(1):1--25
work page 2010
-
[31]
David Mueller, Nicholas Andrews, and Mark Dredze. 2022. https://doi.org/10.18653/V1/2022.FINDINGS-EMNLP.206 Do text-to-text multi-task learners suffer from task conflict? In Findings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022 , pages 2843--2858. Association for Computational Linguistics
- [32]
-
[33]
Niklas Muennighoff, Nouamane Tazi, Lo \" c Magne, and Nils Reimers. 2023. https://doi.org/10.18653/V1/2023.EACL-MAIN.148 MTEB: massive text embedding benchmark . In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2023, Dubrovnik, Croatia, May 2-6, 2023 , pages 2006--2029. Association for Co...
-
[34]
Arvind Neelakantan, Tao Xu, Raul Puri, Alec Radford, Jesse Michael Han, Jerry Tworek, Qiming Yuan, Nikolas Tezak, Jong Wook Kim, Chris Hallacy, Johannes Heidecke, Pranav Shyam, Boris Power, Tyna Eloundou Nekoul, Girish Sastry, Gretchen Krueger, David Schnurr, Felipe Petroski Such, Kenny Hsu, Madeleine Thompson, Tabarak Khan, Toki Sherbakov, Joanne Jang, P...
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[35]
Hall, Daniel Cer, and Yinfei Yang
Jianmo Ni, Gustavo Hern \' a ndez \' A brego, Noah Constant, Ji Ma, Keith B. Hall, Daniel Cer, and Yinfei Yang. 2022 a . https://doi.org/10.18653/V1/2022.FINDINGS-ACL.146 Sentence-t5: Scalable sentence encoders from pre-trained text-to-text models . In Findings of the Association for Computational Linguistics: ACL 2022, Dublin, Ireland, May 22-27, 2022 , ...
-
[36]
Jianmo Ni, Chen Qu, Jing Lu, Zhuyun Dai, Gustavo Hern \' a ndez \' A brego, Ji Ma, Vincent Y. Zhao, Yi Luan, Keith B. Hall, Ming - Wei Chang, and Yinfei Yang. 2022 b . https://doi.org/10.18653/V1/2022.EMNLP-MAIN.669 Large dual encoders are generalizable retrievers . In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing,...
- [37]
-
[38]
K Otubamowo, TO Egunjobi, and AP Adewole. 2012. A comparative study of simulated annealing and genetic algorithm for solving the travelling salesman problem
work page 2012
-
[39]
Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul F. Christiano, Jan Leike, and Ryan Lowe. 2022. http://papers.nips.cc/paper\_files/paper/2022/hash/b1efd...
work page 2022
-
[40]
Anindya Jyoti Pal, Biman Ray, Nordin Zakaria, and Samar Sen Sarma. 2012. Comparative performance of modified simulated annealing with simple simulated annealing for graph coloring problem. Procedia Computer Science, 9:321--327
work page 2012
-
[41]
Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. https://doi.org/10.3115/v1/D14-1162 G lo V e: Global vectors for word representation . In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing ( EMNLP ) , pages 1532--1543, Doha, Qatar. Association for Computational Linguistics
-
[42]
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. http://jmlr.org/papers/v21/20-074.html Exploring the limits of transfer learning with a unified text-to-text transformer . J. Mach. Learn. Res., 21:140:1--140:67
work page 2020
-
[43]
Nils Reimers and Iryna Gurevych. 2019. https://doi.org/10.18653/V1/D19-1410 Sentence-bert: Sentence embeddings using siamese bert-networks . In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, ...
-
[44]
Shah, Tao Lei, Alessandro Moschitti, Salvatore Romeo, and Preslav Nakov
Darsh J. Shah, Tao Lei, Alessandro Moschitti, Salvatore Romeo, and Preslav Nakov. 2018. https://doi.org/10.18653/V1/D18-1131 Adversarial domain adaptation for duplicate question detection . In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018, pages 1056--1063. Associat...
-
[45]
Gizem Sogancioglu, Hakime \" O zt \" u rk, and Arzucan \" O zg \" u r. 2017. https://doi.org/10.1093/BIOINFORMATICS/BTX238 BIOSSES: a semantic sentence similarity estimation system for the biomedical domain . Bioinform., 33(14):i49--i58
-
[46]
Smith, Luke Zettlemoyer, and Tao Yu
Hongjin Su, Weijia Shi, Jungo Kasai, Yizhong Wang, Yushi Hu, Mari Ostendorf, Wen - tau Yih, Noah A. Smith, Luke Zettlemoyer, and Tao Yu. 2023. https://doi.org/10.18653/V1/2023.FINDINGS-ACL.71 One embedder, any task: Instruction-finetuned text embeddings . In Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada, July 9-14, 2...
-
[47]
Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, and Furu Wei. 2022 a . https://doi.org/10.48550/ARXIV.2212.03533 Text embeddings by weakly-supervised contrastive pre-training . CoRR, abs/2212.03533
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2212.03533 2022
-
[48]
Yizhong Wang, Swaroop Mishra, Pegah Alipoormolabashi, Yeganeh Kordi, Amirreza Mirzaei, Atharva Naik, Arjun Ashok, Arut Selvan Dhanasekaran, Anjana Arunkumar, David Stap, Eshaan Pathak, Giannis Karamanolakis, Haizhi Gary Lai, Ishan Purohit, Ishani Mondal, Jacob Anderson, Kirby Kuznia, Krima Doshi, Kuntal Kumar Pal, Maitreya Patel, Mehrad Moradshahi, Mihir ...
-
[49]
Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M
Jason Wei, Maarten Bosma, Vincent Y. Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, and Quoc V. Le. 2022. https://openreview.net/forum?id=gEZrGCozdqR Finetuned language models are zero-shot learners . In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022 . OpenReview.net
work page 2022
-
[50]
Fangzhao Wu, Ying Qiao, Jiun - Hung Chen, Chuhan Wu, Tao Qi, Jianxun Lian, Danyang Liu, Xing Xie, Jianfeng Gao, Winnie Wu, and Ming Zhou. 2020. https://doi.org/10.18653/V1/2020.ACL-MAIN.331 MIND: A large-scale dataset for news recommendation . In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, Jul...
-
[51]
Shitao Xiao, Zheng Liu, Peitian Zhang, and Niklas Muennighoff. 2023. http://arxiv.org/abs/2309.07597 C-pack: Packaged resources to advance general chinese embedding
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[52]
Wei Xu, Chris Callison - Burch, and Bill Dolan. 2015. https://doi.org/10.18653/V1/S15-2001 Semeval-2015 task 1: Paraphrase and semantic similarity in twitter (PIT) . In Proceedings of the 9th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2015, Denver, Colorado, USA, June 4-5, 2015, pages 1--11. The Association for Computer Linguistics
-
[53]
Xin Zhang, Zehan Li, Yanzhao Zhang, Dingkun Long, Pengjun Xie, Meishan Zhang, and Min Zhang. 2023. https://doi.org/10.48550/ARXIV.2310.08232 Language models are universal embedders . CoRR, abs/2310.08232
-
[54]
Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian - Yun Nie, and Ji - Rong Wen. 2023. https://doi.org/10.48550/ARXIV.2303.18223 A survey of large langua...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2303.18223 2023
- [55]
-
[56]
Kun Zhou, Xiao Liu, Yeyun Gong, Wayne Xin Zhao, Daxin Jiang, Nan Duan, and Ji - Rong Wen. 2023 a . https://doi.org/10.1007/978-3-031-43415-0\_37 MASTER: multi-task pre-trained bottlenecked masked autoencoders are better dense retrievers . In Machine Learning and Knowledge Discovery in Databases: Research Track - European Conference, ECML PKDD 2023, Turin,...
-
[57]
Kun Zhou, Beichen Zhang, Wayne Xin Zhao, and Ji - Rong Wen. 2022 b . https://doi.org/10.18653/V1/2022.ACL-LONG.423 Debiased contrastive learning of unsupervised sentence representations . In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022 , pages 61...
-
[58]
Kun Zhou, Yuanhang Zhou, Wayne Xin Zhao, and Ji-Rong Wen. 2023 b . https://doi.org/10.1109/TASLP.2023.3304485 Learning to perturb for contrastive learning of unsupervised sentence representations . IEEE/ACM Transactions on Audio, Speech, and Language Processing, 31:3935--3944
-
[59]
ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type volume year eprint doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRINGS urlintro eprinturl eprintpr...
-
[60]
" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.