pith. sign in

arxiv: 2401.03563 · v1 · submitted 2024-01-07 · 💻 cs.CL · cs.IR

Data-CUBE: Data Curriculum for Instruction-based Sentence Representation Learning

Pith reviewed 2026-05-24 04:27 UTC · model grok-4.3

classification 💻 cs.CL cs.IR
keywords curriculum learningsentence representation learninginstruction tuningmulti-task learningdata orderinginterference minimizationtraveling salesman problemsimulated annealing
0
0 comments X

The pith

A data curriculum ordering tasks by interference risk and instances by difficulty reduces cross-task problems in instruction-based sentence representation learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Data-CUBE to arrange the sequence of all multi-task training data so that interference between tasks and between instances is minimized during instruction tuning for sentence representations. At the task level the method models optimal ordering as a traveling salesman problem and solves it with simulated annealing; at the instance level it ranks examples by difficulty and feeds them in easy-to-hard mini-batches. A sympathetic reader would care because lower interference could produce representations that generalize more reliably to unseen tasks. The resulting models are evaluated on the MTEB benchmark where the ordering yields measurable gains over prior state-of-the-art instruction-tuned approaches.

Core claim

Data-CUBE arranges the orders of all the multi-task data for training to minimize the interference risks from the two views. In the task level we aim to find the optimal task order to minimize the total cross-task interference risk, which is exactly the traveling salesman problem, hence we utilize a simulated annealing algorithm to find its solution. In the instance level we measure the difficulty of all instances per task then divide them into the easy-to-difficult mini-batches for training. Experiments on MTEB sentence representation evaluation tasks show that our approach can boost the performance of state-of-the-art methods.

What carries the argument

Dual-level data curriculum: task ordering cast as a traveling salesman problem solved by simulated annealing to minimize total cross-task interference, plus per-task instance ordering from easy to difficult mini-batches.

If this is right

  • State-of-the-art instruction-tuned models obtain higher scores on MTEB sentence representation tasks.
  • Training stability improves because total cross-task interference risk is reduced by the computed task sequence.
  • Within each task, progressing from easy to difficult instances produces better convergence of the representation model.
  • The same ordering logic directly applies to any collection of instruction-following tasks used for sentence embedding training.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The TSP formulation implies that interference between tasks behaves like a distance metric that can be estimated from data alone.
  • Similar curriculum ordering might be tested on instruction-tuned models for other embedding tasks such as retrieval or classification without changing the underlying architecture.
  • If the simulated-annealing solution consistently outperforms greedy or random baselines, the approach supplies a practical, parameter-light way to schedule any multi-task instruction dataset.

Load-bearing premise

Cross-task interference can be quantified in a form that lets task ordering be solved as a traveling salesman problem whose solution improves final representations.

What would settle it

Applying the Data-CUBE ordering to the same multi-task instruction data and observing no gain or a loss in average MTEB score relative to random task order would falsify the claim.

Figures

Figures reproduced from arXiv: 2401.03563 by Dawei Gao, He Hu, Kun Zhou, Wayne Xin Zhao, Yaliang Li, Yingqian Min.

Figure 1
Figure 1. Figure 1: (a) Example of the task and instance inter￾ference. The distance reflects task similarity, and the shades of oranges represent the difficulty level. (b) The underfitting degrees of all training tasks. We catego￾rize all tasks into three degrees: severe (>80%), moder￾ate (>50% but <80%), and mild (<50%), according to the ratio of instances whose positives and negatives are not clearly distinguished (margin<… view at source ↗
Figure 2
Figure 2. Figure 2: The proportion of underfitting instances within different tasks. We show the comparison between [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Illustration of Data-CUBE: Task-level Curriculum rearranges the task orders from similar to dissimilar [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Performance variation curve on the STS tasks [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
read the original abstract

Recently, multi-task instruction tuning has been applied into sentence representation learning, which endows the capability of generating specific representations with the guidance of task instruction, exhibiting strong generalization ability on new tasks. However, these methods mostly neglect the potential interference problems across different tasks and instances, which may affect the training and convergence of the model. To address it, we propose a data curriculum method, namely Data-CUBE, that arranges the orders of all the multi-task data for training, to minimize the interference risks from the two views. In the task level, we aim to find the optimal task order to minimize the total cross-task interference risk, which is exactly the traveling salesman problem, hence we utilize a simulated annealing algorithm to find its solution. In the instance level, we measure the difficulty of all instances per task, then divide them into the easy-to-difficult mini-batches for training. Experiments on MTEB sentence representation evaluation tasks show that our approach can boost the performance of state-of-the-art methods. Our code and data are publicly available at the link: \url{https://github.com/RUCAIBox/Data-CUBE}.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper proposes Data-CUBE, a data curriculum method for multi-task instruction tuning in sentence representation learning. It casts task ordering as a traveling salesman problem whose edge weights are cross-task interference risks, solved via simulated annealing, and orders instances within each task from easy to difficult based on a difficulty heuristic; experiments claim this boosts performance of state-of-the-art methods on MTEB tasks, with code and data released.

Significance. If the interference metric and difficulty heuristic demonstrably correlate with training dynamics and the reported MTEB gains prove robust to hyperparameter controls, the approach would offer a practical, model-agnostic way to mitigate interference in multi-task sentence embedding training. The public code release is a clear strength for reproducibility.

major comments (3)
  1. [Abstract] Abstract: the central claim that the method 'boost[s] the performance of state-of-the-art methods' on MTEB is unsupported by any reported error bars, statistical significance tests, number of runs, or ablation on the simulated annealing cooling schedule and iteration count; these omissions make it impossible to attribute gains to the curriculum rather than hyperparameter search.
  2. [Task-level curriculum] Task-level curriculum paragraph: the cross-task interference risk used as TSP edge weights is never defined or computed (no equation or algorithm); without an explicit metric or empirical validation that it predicts gradient conflict or forgetting on held-out pairs, the TSP+SA ordering has no demonstrated advantage over random or heuristic baselines.
  3. [Instance-level curriculum] Instance-level curriculum paragraph: the difficulty metric and batch-division threshold are introduced as free parameters with no sensitivity analysis or correlation study against actual training loss curves; this leaves open whether the easy-to-difficult ordering is load-bearing or interchangeable with random batching.
minor comments (1)
  1. [Abstract] The abstract states 'Our code and data are publicly available' but provides only a GitHub link without a commit hash or data version; this should be pinned for reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript accordingly to strengthen statistical reporting, provide explicit definitions, and include additional analyses.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that the method 'boost[s] the performance of state-of-the-art methods' on MTEB is unsupported by any reported error bars, statistical significance tests, number of runs, or ablation on the simulated annealing cooling schedule and iteration count; these omissions make it impossible to attribute gains to the curriculum rather than hyperparameter search.

    Authors: We agree that the abstract claim would benefit from greater statistical rigor. In the revised manuscript we will report results averaged over multiple random seeds with standard deviations, include paired statistical significance tests, and add an ablation on the simulated annealing cooling schedule and iteration count to better attribute gains to the curriculum. revision: yes

  2. Referee: [Task-level curriculum] Task-level curriculum paragraph: the cross-task interference risk used as TSP edge weights is never defined or computed (no equation or algorithm); without an explicit metric or empirical validation that it predicts gradient conflict or forgetting on held-out pairs, the TSP+SA ordering has no demonstrated advantage over random or heuristic baselines.

    Authors: We acknowledge that an explicit definition and validation of the interference risk metric is necessary. In the revised manuscript we will add the mathematical formulation of the cross-task interference risk (based on gradient conflicts), the algorithm for computing TSP edge weights, empirical validation of its correlation with gradient conflict and forgetting on held-out pairs, and direct comparisons against random and heuristic baselines. revision: yes

  3. Referee: [Instance-level curriculum] Instance-level curriculum paragraph: the difficulty metric and batch-division threshold are introduced as free parameters with no sensitivity analysis or correlation study against actual training loss curves; this leaves open whether the easy-to-difficult ordering is load-bearing or interchangeable with random batching.

    Authors: We agree that sensitivity analysis and correlation studies are needed. The revised version will include sensitivity analysis on the difficulty metric and batch-division threshold, plus a correlation study between the difficulty heuristic and training loss curves to demonstrate that the easy-to-difficult ordering contributes beyond random batching. revision: yes

Circularity Check

0 steps flagged

No significant circularity; curriculum heuristics are independent of reported gains

full rationale

The paper defines task ordering via TSP on a cross-task interference risk metric and instance ordering via a per-task difficulty heuristic, then reports empirical gains on the external MTEB benchmark. Neither the TSP formulation nor the difficulty measure is shown to be computed from or fitted to the final MTEB scores; the optimization steps remain external to the evaluation data. No equations reduce the claimed performance boost to quantities defined by the evaluation itself, and no self-citation chain is load-bearing for the central experimental claim. This satisfies the default expectation of a non-circular derivation.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim rests on the unstated premise that interference can be additively modeled across tasks and that a difficulty metric exists that correlates with learning progress; no free parameters are named in the abstract but the SA cooling schedule and difficulty threshold are implicit tunable quantities.

free parameters (2)
  • simulated annealing cooling schedule and iteration count
    Controls the search for the task order; chosen to approximate the TSP optimum but not derived from data.
  • instance difficulty metric and batch division threshold
    Determines the easy-to-difficult ordering inside each task; definition and cutoff are required to reproduce the curriculum.
axioms (2)
  • domain assumption Cross-task interference risk is quantifiable and additive so that total risk equals the sum of pairwise risks.
    Invoked when the task-ordering problem is reduced to TSP.
  • domain assumption Instance difficulty can be measured independently of the training dynamics and predicts learning interference.
    Required for the instance-level curriculum to be well-defined.

pith-pipeline@v0.9.0 · 5740 in / 1349 out tokens · 18130 ms · 2026-05-24T04:27:16.761950+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

60 extracted references · 60 canonical work pages · 5 internal anchors

  1. [1]

    Emile HL Aarts, Jan HM Korst, and Peter JM van Laarhoven. 1988. A quantitative analysis of the simulated annealing algorithm: A case study for the traveling salesman problem. Journal of Statistical Physics, 50:187--206

  2. [2]

    Cer, Mona T

    Eneko Agirre, Carmen Banea, Claire Cardie, Daniel M. Cer, Mona T. Diab, Aitor Gonzalez - Agirre, Weiwei Guo, I \ n igo Lopez - Gazpio, Montse Maritxalar, Rada Mihalcea, German Rigau, Larraitz Uria, and Janyce Wiebe. 2015. https://doi.org/10.18653/V1/S15-2045 Semeval-2015 task 2: Semantic textual similarity, english, spanish and pilot on interpretability ....

  3. [3]

    Cer, Mona T

    Eneko Agirre, Carmen Banea, Claire Cardie, Daniel M. Cer, Mona T. Diab, Aitor Gonzalez - Agirre, Weiwei Guo, Rada Mihalcea, German Rigau, and Janyce Wiebe. 2014. https://doi.org/10.3115/V1/S14-2010 Semeval-2014 task 10: Multilingual semantic textual similarity . In Proceedings of the 8th International Workshop on Semantic Evaluation, SemEval@COLING 2014, ...

  4. [4]

    Cer, Mona T

    Eneko Agirre, Carmen Banea, Daniel M. Cer, Mona T. Diab, Aitor Gonzalez - Agirre, Rada Mihalcea, German Rigau, and Janyce Wiebe. 2016. https://doi.org/10.18653/V1/S16-1081 Semeval-2016 task 1: Semantic textual similarity, monolingual and cross-lingual evaluation . In Proceedings of the 10th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT ...

  5. [5]

    Cer, Mona T

    Eneko Agirre, Daniel M. Cer, Mona T. Diab, and Aitor Gonzalez - Agirre. 2012. https://aclanthology.org/S12-1051/ Semeval-2012 task 6: A pilot on semantic textual similarity . In Proceedings of the 6th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2012, Montr \' e al, Canada, June 7-8, 2012 , pages 385--393. The Association for Computer ...

  6. [6]

    Cer, Mona T

    Eneko Agirre, Daniel M. Cer, Mona T. Diab, Aitor Gonzalez - Agirre, and Weiwei Guo. 2013. https://aclanthology.org/S13-1004/ *sem 2013 shared task: Semantic textual similarity . In Proceedings of the Second Joint Conference on Lexical and Computational Semantics, *SEM 2013, June 13-14, 2013, Atlanta, Georgia, USA , pages 32--43. Association for Computatio...

  7. [7]

    Yoshua Bengio, J \' e r \^ o me Louradour, Ronan Collobert, and Jason Weston. 2009. https://doi.org/10.1145/1553374.1553380 Curriculum learning . In Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, Montreal, Quebec, Canada, June 14-18, 2009 , volume 382 of ACM International Conference Proceeding Series , pages 41--48. ACM

  8. [8]

    Dimitris Bertsimas and John Tsitsiklis. 1993. Simulated annealing. Statistical science, 8(1):10--15

  9. [9]

    I \ n igo Casanueva, Tadas Temcinas, Daniela Gerz, Matthew Henderson, and Ivan Vulic. 2020. http://arxiv.org/abs/2003.04807 Efficient intent detection with dual sentence encoders . CoRR, abs/2003.04807

  10. [10]

    John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Brian Strope, and Ray Kurzweil

    Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Brian Strope, and Ray Kurzweil. 2018. https://doi.org/10.18653/v1/D18-2029 Universal sentence encoder for E nglish . In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: Sys...

  11. [11]

    Cer, Mona T

    Daniel M. Cer, Mona T. Diab, Eneko Agirre, I \ n igo Lopez - Gazpio, and Lucia Specia. 2017. https://doi.org/10.18653/V1/S17-2001 Semeval-2017 task 1: Semantic textual similarity multilingual and crosslingual focused evaluation . In Proceedings of the 11th International Workshop on Semantic Evaluation, SemEval@ACL 2017, Vancouver, Canada, August 3-4, 2017...

  12. [12]

    Shouvik Chakraborty and Sandeep Bhowmik. 2015. An efficient approach to job shop scheduling problem using simulated annealing. International Journal of Hybrid Information Technology, 8(11):273--284

  13. [13]

    Omar Cheikhrouhou and Ines Khoufi. 2021. https://doi.org/10.1016/J.COSREV.2021.100369 A comprehensive survey on the multiple traveling salesman problem: Applications, approaches and taxonomy . Comput. Sci. Rev., 40:100369

  14. [14]

    Camargo, Fabian Fl \" o ck, Devin Gaffney, Przemyslaw A

    Xi Chen, Ali Zeynali, Chico Q. Camargo, Fabian Fl \" o ck, Devin Gaffney, Przemyslaw A. Grabowicz, Scott A. Hale, David Jurgens, and Mattia Samory. 2022. https://doi.org/10.18653/V1/2022.SEMEVAL-1.155 Semeval-2022 task 8: Multilingual news article similarity . In Proceedings of the 16th International Workshop on Semantic Evaluation, SemEval@NAACL 2022, Se...

  15. [15]

    Arman Cohan, Sergey Feldman, Iz Beltagy, Doug Downey, and Daniel S. Weld. 2020. https://doi.org/10.18653/V1/2020.ACL-MAIN.207 SPECTER: document-level representation learning using citation-informed transformers . In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020 , pages 2270--2282...

  16. [16]

    Daniel Delahaye, Supatcha Chaimatanan, and Marcel Mongeau. 2019. Simulated annealing: From basics to applications. Handbook of metaheuristics, pages 1--35

  17. [17]

    Chuntao Ding, Zhichao Lu, Shangguang Wang, Ran Cheng, and Vishnu Naresh Boddeti. 2023. https://doi.org/10.1109/CVPR52729.2023.00749 Mitigating task interference in multi-task learning via explicit task routing with non-learnable primitives . In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17-24, 20...

  18. [18]

    Tianyu Gao, Xingcheng Yao, and Danqi Chen. 2021. https://doi.org/10.18653/V1/2021.EMNLP-MAIN.552 Simcse: Simple contrastive learning of sentence embeddings . In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021 , pages 6894--6910. Associat...

  19. [19]

    Gregor Geigle, Nils Reimers, Andreas R \" u ckl \' e , and Iryna Gurevych. 2021. http://arxiv.org/abs/2104.07081 TWEAC: transformer with extendable QA agent classifiers . CoRR, abs/2104.07081

  20. [20]

    Michael G \" u nther, Louis Milliken, Jonathan Geuter, Georgios Mastrapas, Bo Wang, and Han Xiao. 2023. https://doi.org/10.48550/ARXIV.2307.11224 Jina embeddings: A novel set of high-performance sentence embedding models . CoRR, abs/2307.11224

  21. [21]

    Keld Helsgaun. 2006. An effective implementation of K-opt moves for the Lin-Kernighan TSP heuristic. Ph.D. thesis, Roskilde University. Department of Computer Science

  22. [22]

    Karla L Hoffman, Manfred Padberg, Giovanni Rinaldi, et al. 2013. Traveling salesman problem. Encyclopedia of operations research and management science, 1:1573--1578

  23. [23]

    Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bojanowski, Armand Joulin, and Edouard Grave. 2022 a . https://openreview.net/forum?id=jKN1pXi7b0 Unsupervised dense information retrieval with contrastive learning . Trans. Mach. Learn. Res., 2022

  24. [24]

    Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bojanowski, Armand Joulin, and Edouard Grave. 2022 b . https://openreview.net/forum?id=jKN1pXi7b0 Unsupervised dense information retrieval with contrastive learning . Trans. Mach. Learn. Res., 2022

  25. [25]

    Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick S. H. Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen - tau Yih. 2020. https://doi.org/10.18653/V1/2020.EMNLP-MAIN.550 Dense passage retrieval for open-domain question answering . In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November ...

  26. [26]

    Wuwei Lan, Siyu Qiu, Hua He, and Wei Xu. 2017. https://doi.org/10.18653/V1/D17-1126 A continuously growing dataset of sentential paraphrases . In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9-11, 2017 , pages 1224--1234. Association for Computational Linguistics

  27. [27]

    Jaakkola, Kateryna Tymoshenko, Alessandro Moschitti, and Llu \' s M \` a rquez

    Tao Lei, Hrishikesh Joshi, Regina Barzilay, Tommi S. Jaakkola, Kateryna Tymoshenko, Alessandro Moschitti, and Llu \' s M \` a rquez. 2016. https://doi.org/10.18653/V1/N16-1153 Semi-supervised question retrieval with gated convolutions . In NAACL HLT 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: H...

  28. [28]

    Zehan Li, Xin Zhang, Yanzhao Zhang, Dingkun Long, Pengjun Xie, and Meishan Zhang. 2023. https://doi.org/10.48550/ARXIV.2308.03281 Towards general text embeddings with multi-stage contrastive learning . CoRR, abs/2308.03281

  29. [29]

    Xueqing Liu, Chi Wang, Yue Leng, and ChengXiang Zhai. 2018. https://doi.org/10.1145/3283812.3283815 Linkso: a dataset for learning to retrieve similar question answer pairs on software development forums . In Proceedings of the 4th ACM SIGSOFT International Workshop on NLP for Software Engineering, NL4SE@ESEC/SIGSOFT FSE 2018, Lake Buena Vista, FL, USA, N...

  30. [30]

    Rajesh Matai, Surya Prakash Singh, and Murari Lal Mittal. 2010. Traveling salesman problem: an overview of applications, formulations, and solution approaches. Traveling salesman problem, theory and applications, 1(1):1--25

  31. [31]

    David Mueller, Nicholas Andrews, and Mark Dredze. 2022. https://doi.org/10.18653/V1/2022.FINDINGS-EMNLP.206 Do text-to-text multi-task learners suffer from task conflict? In Findings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022 , pages 2843--2858. Association for Computational Linguistics

  32. [32]

    Niklas Muennighoff. 2022. http://arxiv.org/abs/2202.08904 SGPT: GPT sentence embeddings for semantic search . CoRR, abs/2202.08904

  33. [33]

    Niklas Muennighoff, Nouamane Tazi, Lo \" c Magne, and Nils Reimers. 2023. https://doi.org/10.18653/V1/2023.EACL-MAIN.148 MTEB: massive text embedding benchmark . In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2023, Dubrovnik, Croatia, May 2-6, 2023 , pages 2006--2029. Association for Co...

  34. [34]

    Arvind Neelakantan, Tao Xu, Raul Puri, Alec Radford, Jesse Michael Han, Jerry Tworek, Qiming Yuan, Nikolas Tezak, Jong Wook Kim, Chris Hallacy, Johannes Heidecke, Pranav Shyam, Boris Power, Tyna Eloundou Nekoul, Girish Sastry, Gretchen Krueger, David Schnurr, Felipe Petroski Such, Kenny Hsu, Madeleine Thompson, Tabarak Khan, Toki Sherbakov, Joanne Jang, P...

  35. [35]

    Hall, Daniel Cer, and Yinfei Yang

    Jianmo Ni, Gustavo Hern \' a ndez \' A brego, Noah Constant, Ji Ma, Keith B. Hall, Daniel Cer, and Yinfei Yang. 2022 a . https://doi.org/10.18653/V1/2022.FINDINGS-ACL.146 Sentence-t5: Scalable sentence encoders from pre-trained text-to-text models . In Findings of the Association for Computational Linguistics: ACL 2022, Dublin, Ireland, May 22-27, 2022 , ...

  36. [36]

    Zhao, Yi Luan, Keith B

    Jianmo Ni, Chen Qu, Jing Lu, Zhuyun Dai, Gustavo Hern \' a ndez \' A brego, Ji Ma, Vincent Y. Zhao, Yi Luan, Keith B. Hall, Ming - Wei Chang, and Yinfei Yang. 2022 b . https://doi.org/10.18653/V1/2022.EMNLP-MAIN.669 Large dual encoders are generalizable retrievers . In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing,...

  37. [37]

    James O'Neill, Polina Rozenshtein, Ryuichi Kiryo, Motoko Kubota, and Danushka Bollegala. 2021. http://arxiv.org/abs/2104.06893 I wish I would have loved this one, but I didn't - A multilingual dataset for counterfactual detection in product reviews . CoRR, abs/2104.06893

  38. [38]

    K Otubamowo, TO Egunjobi, and AP Adewole. 2012. A comparative study of simulated annealing and genetic algorithm for solving the travelling salesman problem

  39. [39]

    Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul F. Christiano, Jan Leike, and Ryan Lowe. 2022. http://papers.nips.cc/paper\_files/paper/2022/hash/b1efd...

  40. [40]

    Anindya Jyoti Pal, Biman Ray, Nordin Zakaria, and Samar Sen Sarma. 2012. Comparative performance of modified simulated annealing with simple simulated annealing for graph coloring problem. Procedia Computer Science, 9:321--327

  41. [41]

    Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. https://doi.org/10.3115/v1/D14-1162 G lo V e: Global vectors for word representation . In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing ( EMNLP ) , pages 1532--1543, Doha, Qatar. Association for Computational Linguistics

  42. [42]

    Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. http://jmlr.org/papers/v21/20-074.html Exploring the limits of transfer learning with a unified text-to-text transformer . J. Mach. Learn. Res., 21:140:1--140:67

  43. [43]

    Nils Reimers and Iryna Gurevych. 2019. https://doi.org/10.18653/V1/D19-1410 Sentence-bert: Sentence embeddings using siamese bert-networks . In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, ...

  44. [44]

    Shah, Tao Lei, Alessandro Moschitti, Salvatore Romeo, and Preslav Nakov

    Darsh J. Shah, Tao Lei, Alessandro Moschitti, Salvatore Romeo, and Preslav Nakov. 2018. https://doi.org/10.18653/V1/D18-1131 Adversarial domain adaptation for duplicate question detection . In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018, pages 1056--1063. Associat...

  45. [45]

    O zt \" u rk, and Arzucan \

    Gizem Sogancioglu, Hakime \" O zt \" u rk, and Arzucan \" O zg \" u r. 2017. https://doi.org/10.1093/BIOINFORMATICS/BTX238 BIOSSES: a semantic sentence similarity estimation system for the biomedical domain . Bioinform., 33(14):i49--i58

  46. [46]

    Smith, Luke Zettlemoyer, and Tao Yu

    Hongjin Su, Weijia Shi, Jungo Kasai, Yizhong Wang, Yushi Hu, Mari Ostendorf, Wen - tau Yih, Noah A. Smith, Luke Zettlemoyer, and Tao Yu. 2023. https://doi.org/10.18653/V1/2023.FINDINGS-ACL.71 One embedder, any task: Instruction-finetuned text embeddings . In Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada, July 9-14, 2...

  47. [47]

    Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, and Furu Wei. 2022 a . https://doi.org/10.48550/ARXIV.2212.03533 Text embeddings by weakly-supervised contrastive pre-training . CoRR, abs/2212.03533

  48. [48]

    Yizhong Wang, Swaroop Mishra, Pegah Alipoormolabashi, Yeganeh Kordi, Amirreza Mirzaei, Atharva Naik, Arjun Ashok, Arut Selvan Dhanasekaran, Anjana Arunkumar, David Stap, Eshaan Pathak, Giannis Karamanolakis, Haizhi Gary Lai, Ishan Purohit, Ishani Mondal, Jacob Anderson, Kirby Kuznia, Krima Doshi, Kuntal Kumar Pal, Maitreya Patel, Mehrad Moradshahi, Mihir ...

  49. [49]

    Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M

    Jason Wei, Maarten Bosma, Vincent Y. Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, and Quoc V. Le. 2022. https://openreview.net/forum?id=gEZrGCozdqR Finetuned language models are zero-shot learners . In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022 . OpenReview.net

  50. [50]

    Fangzhao Wu, Ying Qiao, Jiun - Hung Chen, Chuhan Wu, Tao Qi, Jianxun Lian, Danyang Liu, Xing Xie, Jianfeng Gao, Winnie Wu, and Ming Zhou. 2020. https://doi.org/10.18653/V1/2020.ACL-MAIN.331 MIND: A large-scale dataset for news recommendation . In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, Jul...

  51. [51]

    Shitao Xiao, Zheng Liu, Peitian Zhang, and Niklas Muennighoff. 2023. http://arxiv.org/abs/2309.07597 C-pack: Packaged resources to advance general chinese embedding

  52. [52]

    Wei Xu, Chris Callison - Burch, and Bill Dolan. 2015. https://doi.org/10.18653/V1/S15-2001 Semeval-2015 task 1: Paraphrase and semantic similarity in twitter (PIT) . In Proceedings of the 9th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2015, Denver, Colorado, USA, June 4-5, 2015, pages 1--11. The Association for Computer Linguistics

  53. [53]

    Xin Zhang, Zehan Li, Yanzhao Zhang, Dingkun Long, Pengjun Xie, Meishan Zhang, and Min Zhang. 2023. https://doi.org/10.48550/ARXIV.2310.08232 Language models are universal embedders . CoRR, abs/2310.08232

  54. [54]

    Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian - Yun Nie, and Ji - Rong Wen. 2023. https://doi.org/10.48550/ARXIV.2303.18223 A survey of large langua...

  55. [55]

    Kun Zhou, Yeyun Gong, Xiao Liu, Wayne Xin Zhao, Yelong Shen, Anlei Dong, Jingwen Lu, Rangan Majumder, Ji-Rong Wen, Nan Duan, and Weizhu Chen. 2022 a . http://arxiv.org/abs/2210.11773 Simans: Simple ambiguous negatives sampling for dense text retrieval

  56. [56]

    Kun Zhou, Xiao Liu, Yeyun Gong, Wayne Xin Zhao, Daxin Jiang, Nan Duan, and Ji - Rong Wen. 2023 a . https://doi.org/10.1007/978-3-031-43415-0\_37 MASTER: multi-task pre-trained bottlenecked masked autoencoders are better dense retrievers . In Machine Learning and Knowledge Discovery in Databases: Research Track - European Conference, ECML PKDD 2023, Turin,...

  57. [57]

    Kun Zhou, Beichen Zhang, Wayne Xin Zhao, and Ji - Rong Wen. 2022 b . https://doi.org/10.18653/V1/2022.ACL-LONG.423 Debiased contrastive learning of unsupervised sentence representations . In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022 , pages 61...

  58. [58]

    Kun Zhou, Yuanhang Zhou, Wayne Xin Zhao, and Ji-Rong Wen. 2023 b . https://doi.org/10.1109/TASLP.2023.3304485 Learning to perturb for contrastive learning of unsupervised sentence representations . IEEE/ACM Transactions on Audio, Speech, and Language Processing, 31:3935--3944

  59. [59]

    URL: " 'urlintro :=

    ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type volume year eprint doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRINGS urlintro eprinturl eprintpr...

  60. [60]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...