Data-CUBE: Data Curriculum for Instruction-based Sentence Representation Learning

Dawei Gao; He Hu; Kun Zhou; Wayne Xin Zhao; Yaliang Li; Yingqian Min

arxiv: 2401.03563 · v1 · submitted 2024-01-07 · 💻 cs.CL · cs.IR

Data-CUBE: Data Curriculum for Instruction-based Sentence Representation Learning

Yingqian Min , Kun Zhou , Dawei Gao , Wayne Xin Zhao , He Hu , Yaliang Li This is my paper

Pith reviewed 2026-05-24 04:27 UTC · model grok-4.3

classification 💻 cs.CL cs.IR

keywords curriculum learningsentence representation learninginstruction tuningmulti-task learningdata orderinginterference minimizationtraveling salesman problemsimulated annealing

0 comments

The pith

A data curriculum ordering tasks by interference risk and instances by difficulty reduces cross-task problems in instruction-based sentence representation learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Data-CUBE to arrange the sequence of all multi-task training data so that interference between tasks and between instances is minimized during instruction tuning for sentence representations. At the task level the method models optimal ordering as a traveling salesman problem and solves it with simulated annealing; at the instance level it ranks examples by difficulty and feeds them in easy-to-hard mini-batches. A sympathetic reader would care because lower interference could produce representations that generalize more reliably to unseen tasks. The resulting models are evaluated on the MTEB benchmark where the ordering yields measurable gains over prior state-of-the-art instruction-tuned approaches.

Core claim

Data-CUBE arranges the orders of all the multi-task data for training to minimize the interference risks from the two views. In the task level we aim to find the optimal task order to minimize the total cross-task interference risk, which is exactly the traveling salesman problem, hence we utilize a simulated annealing algorithm to find its solution. In the instance level we measure the difficulty of all instances per task then divide them into the easy-to-difficult mini-batches for training. Experiments on MTEB sentence representation evaluation tasks show that our approach can boost the performance of state-of-the-art methods.

What carries the argument

Dual-level data curriculum: task ordering cast as a traveling salesman problem solved by simulated annealing to minimize total cross-task interference, plus per-task instance ordering from easy to difficult mini-batches.

If this is right

State-of-the-art instruction-tuned models obtain higher scores on MTEB sentence representation tasks.
Training stability improves because total cross-task interference risk is reduced by the computed task sequence.
Within each task, progressing from easy to difficult instances produces better convergence of the representation model.
The same ordering logic directly applies to any collection of instruction-following tasks used for sentence embedding training.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The TSP formulation implies that interference between tasks behaves like a distance metric that can be estimated from data alone.
Similar curriculum ordering might be tested on instruction-tuned models for other embedding tasks such as retrieval or classification without changing the underlying architecture.
If the simulated-annealing solution consistently outperforms greedy or random baselines, the approach supplies a practical, parameter-light way to schedule any multi-task instruction dataset.

Load-bearing premise

Cross-task interference can be quantified in a form that lets task ordering be solved as a traveling salesman problem whose solution improves final representations.

What would settle it

Applying the Data-CUBE ordering to the same multi-task instruction data and observing no gain or a loss in average MTEB score relative to random task order would falsify the claim.

Figures

Figures reproduced from arXiv: 2401.03563 by Dawei Gao, He Hu, Kun Zhou, Wayne Xin Zhao, Yaliang Li, Yingqian Min.

**Figure 1.** Figure 1: (a) Example of the task and instance interference. The distance reflects task similarity, and the shades of oranges represent the difficulty level. (b) The underfitting degrees of all training tasks. We categorize all tasks into three degrees: severe (>80%), moderate (>50% but <80%), and mild (<50%), according to the ratio of instances whose positives and negatives are not clearly distinguished (margin<… view at source ↗

**Figure 2.** Figure 2: The proportion of underfitting instances within different tasks. We show the comparison between [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Illustration of Data-CUBE: Task-level Curriculum rearranges the task orders from similar to dissimilar [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Performance variation curve on the STS tasks [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

read the original abstract

Recently, multi-task instruction tuning has been applied into sentence representation learning, which endows the capability of generating specific representations with the guidance of task instruction, exhibiting strong generalization ability on new tasks. However, these methods mostly neglect the potential interference problems across different tasks and instances, which may affect the training and convergence of the model. To address it, we propose a data curriculum method, namely Data-CUBE, that arranges the orders of all the multi-task data for training, to minimize the interference risks from the two views. In the task level, we aim to find the optimal task order to minimize the total cross-task interference risk, which is exactly the traveling salesman problem, hence we utilize a simulated annealing algorithm to find its solution. In the instance level, we measure the difficulty of all instances per task, then divide them into the easy-to-difficult mini-batches for training. Experiments on MTEB sentence representation evaluation tasks show that our approach can boost the performance of state-of-the-art methods. Our code and data are publicly available at the link: \url{https://github.com/RUCAIBox/Data-CUBE}.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Data-CUBE applies TSP plus simulated annealing to order tasks and easy-to-hard batches for instruction-tuned sentence embeddings, but the interference metric's validity is not shown in the abstract.

read the letter

The main thing to know is that Data-CUBE casts task ordering as a traveling salesman problem solved by simulated annealing to cut cross-task interference, then adds per-task easy-to-difficult instance batches, and reports MTEB gains over prior instruction-tuned models. The specific combination for this sentence representation setting is new, even if the underlying curriculum and annealing tools are established. It does a reasonable job spotting the interference problem in multi-task instruction tuning and giving a structured way to handle it at two scales, plus the public code link is useful for anyone who wants to test it. The soft spot is the interference risk metric that drives the TSP edges. The abstract never defines how that risk is computed from task pairs or shows it actually tracks gradient conflicts or forgetting during training. Without that link, the simulated annealing order could be no better than random, and the reported gains might come from other sources. There are also free parameters in the annealing schedule and the difficulty heuristic with no sensitivity checks mentioned, and no error bars or significance numbers on the MTEB results. This is for readers already working on sentence embeddings and multi-task instruction tuning who might want to try explicit ordering tricks. It is not foundational enough to pull in people outside that niche. I would send it for peer review so the full paper can be checked on the metric definition and controls, even though the current write-up leaves the central claim under-supported.

Referee Report

3 major / 1 minor

Summary. The paper proposes Data-CUBE, a data curriculum method for multi-task instruction tuning in sentence representation learning. It casts task ordering as a traveling salesman problem whose edge weights are cross-task interference risks, solved via simulated annealing, and orders instances within each task from easy to difficult based on a difficulty heuristic; experiments claim this boosts performance of state-of-the-art methods on MTEB tasks, with code and data released.

Significance. If the interference metric and difficulty heuristic demonstrably correlate with training dynamics and the reported MTEB gains prove robust to hyperparameter controls, the approach would offer a practical, model-agnostic way to mitigate interference in multi-task sentence embedding training. The public code release is a clear strength for reproducibility.

major comments (3)

[Abstract] Abstract: the central claim that the method 'boost[s] the performance of state-of-the-art methods' on MTEB is unsupported by any reported error bars, statistical significance tests, number of runs, or ablation on the simulated annealing cooling schedule and iteration count; these omissions make it impossible to attribute gains to the curriculum rather than hyperparameter search.
[Task-level curriculum] Task-level curriculum paragraph: the cross-task interference risk used as TSP edge weights is never defined or computed (no equation or algorithm); without an explicit metric or empirical validation that it predicts gradient conflict or forgetting on held-out pairs, the TSP+SA ordering has no demonstrated advantage over random or heuristic baselines.
[Instance-level curriculum] Instance-level curriculum paragraph: the difficulty metric and batch-division threshold are introduced as free parameters with no sensitivity analysis or correlation study against actual training loss curves; this leaves open whether the easy-to-difficult ordering is load-bearing or interchangeable with random batching.

minor comments (1)

[Abstract] The abstract states 'Our code and data are publicly available' but provides only a GitHub link without a commit hash or data version; this should be pinned for reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript accordingly to strengthen statistical reporting, provide explicit definitions, and include additional analyses.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the method 'boost[s] the performance of state-of-the-art methods' on MTEB is unsupported by any reported error bars, statistical significance tests, number of runs, or ablation on the simulated annealing cooling schedule and iteration count; these omissions make it impossible to attribute gains to the curriculum rather than hyperparameter search.

Authors: We agree that the abstract claim would benefit from greater statistical rigor. In the revised manuscript we will report results averaged over multiple random seeds with standard deviations, include paired statistical significance tests, and add an ablation on the simulated annealing cooling schedule and iteration count to better attribute gains to the curriculum. revision: yes
Referee: [Task-level curriculum] Task-level curriculum paragraph: the cross-task interference risk used as TSP edge weights is never defined or computed (no equation or algorithm); without an explicit metric or empirical validation that it predicts gradient conflict or forgetting on held-out pairs, the TSP+SA ordering has no demonstrated advantage over random or heuristic baselines.

Authors: We acknowledge that an explicit definition and validation of the interference risk metric is necessary. In the revised manuscript we will add the mathematical formulation of the cross-task interference risk (based on gradient conflicts), the algorithm for computing TSP edge weights, empirical validation of its correlation with gradient conflict and forgetting on held-out pairs, and direct comparisons against random and heuristic baselines. revision: yes
Referee: [Instance-level curriculum] Instance-level curriculum paragraph: the difficulty metric and batch-division threshold are introduced as free parameters with no sensitivity analysis or correlation study against actual training loss curves; this leaves open whether the easy-to-difficult ordering is load-bearing or interchangeable with random batching.

Authors: We agree that sensitivity analysis and correlation studies are needed. The revised version will include sensitivity analysis on the difficulty metric and batch-division threshold, plus a correlation study between the difficulty heuristic and training loss curves to demonstrate that the easy-to-difficult ordering contributes beyond random batching. revision: yes

Circularity Check

0 steps flagged

No significant circularity; curriculum heuristics are independent of reported gains

full rationale

The paper defines task ordering via TSP on a cross-task interference risk metric and instance ordering via a per-task difficulty heuristic, then reports empirical gains on the external MTEB benchmark. Neither the TSP formulation nor the difficulty measure is shown to be computed from or fitted to the final MTEB scores; the optimization steps remain external to the evaluation data. No equations reduce the claimed performance boost to quantities defined by the evaluation itself, and no self-citation chain is load-bearing for the central experimental claim. This satisfies the default expectation of a non-circular derivation.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim rests on the unstated premise that interference can be additively modeled across tasks and that a difficulty metric exists that correlates with learning progress; no free parameters are named in the abstract but the SA cooling schedule and difficulty threshold are implicit tunable quantities.

free parameters (2)

simulated annealing cooling schedule and iteration count
Controls the search for the task order; chosen to approximate the TSP optimum but not derived from data.
instance difficulty metric and batch division threshold
Determines the easy-to-difficult ordering inside each task; definition and cutoff are required to reproduce the curriculum.

axioms (2)

domain assumption Cross-task interference risk is quantifiable and additive so that total risk equals the sum of pairwise risks.
Invoked when the task-ordering problem is reduced to TSP.
domain assumption Instance difficulty can be measured independently of the training dynamics and predicts learning interference.
Required for the instance-level curriculum to be well-defined.

pith-pipeline@v0.9.0 · 5740 in / 1349 out tokens · 18130 ms · 2026-05-24T04:27:16.761950+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

60 extracted references · 60 canonical work pages · 5 internal anchors

[1]

Emile HL Aarts, Jan HM Korst, and Peter JM van Laarhoven. 1988. A quantitative analysis of the simulated annealing algorithm: A case study for the traveling salesman problem. Journal of Statistical Physics, 50:187--206

work page 1988
[2]

Cer, Mona T

Eneko Agirre, Carmen Banea, Claire Cardie, Daniel M. Cer, Mona T. Diab, Aitor Gonzalez - Agirre, Weiwei Guo, I \ n igo Lopez - Gazpio, Montse Maritxalar, Rada Mihalcea, German Rigau, Larraitz Uria, and Janyce Wiebe. 2015. https://doi.org/10.18653/V1/S15-2045 Semeval-2015 task 2: Semantic textual similarity, english, spanish and pilot on interpretability ....

work page doi:10.18653/v1/s15-2045 2015
[3]

Cer, Mona T

Eneko Agirre, Carmen Banea, Claire Cardie, Daniel M. Cer, Mona T. Diab, Aitor Gonzalez - Agirre, Weiwei Guo, Rada Mihalcea, German Rigau, and Janyce Wiebe. 2014. https://doi.org/10.3115/V1/S14-2010 Semeval-2014 task 10: Multilingual semantic textual similarity . In Proceedings of the 8th International Workshop on Semantic Evaluation, SemEval@COLING 2014, ...

work page doi:10.3115/v1/s14-2010 2014
[4]

Cer, Mona T

Eneko Agirre, Carmen Banea, Daniel M. Cer, Mona T. Diab, Aitor Gonzalez - Agirre, Rada Mihalcea, German Rigau, and Janyce Wiebe. 2016. https://doi.org/10.18653/V1/S16-1081 Semeval-2016 task 1: Semantic textual similarity, monolingual and cross-lingual evaluation . In Proceedings of the 10th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT ...

work page doi:10.18653/v1/s16-1081 2016
[5]

Cer, Mona T

Eneko Agirre, Daniel M. Cer, Mona T. Diab, and Aitor Gonzalez - Agirre. 2012. https://aclanthology.org/S12-1051/ Semeval-2012 task 6: A pilot on semantic textual similarity . In Proceedings of the 6th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2012, Montr \' e al, Canada, June 7-8, 2012 , pages 385--393. The Association for Computer ...

work page 2012
[6]

Cer, Mona T

Eneko Agirre, Daniel M. Cer, Mona T. Diab, Aitor Gonzalez - Agirre, and Weiwei Guo. 2013. https://aclanthology.org/S13-1004/ *sem 2013 shared task: Semantic textual similarity . In Proceedings of the Second Joint Conference on Lexical and Computational Semantics, *SEM 2013, June 13-14, 2013, Atlanta, Georgia, USA , pages 32--43. Association for Computatio...

work page 2013
[7]

Yoshua Bengio, J \' e r \^ o me Louradour, Ronan Collobert, and Jason Weston. 2009. https://doi.org/10.1145/1553374.1553380 Curriculum learning . In Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, Montreal, Quebec, Canada, June 14-18, 2009 , volume 382 of ACM International Conference Proceeding Series , pages 41--48. ACM

work page doi:10.1145/1553374.1553380 2009
[8]

Dimitris Bertsimas and John Tsitsiklis. 1993. Simulated annealing. Statistical science, 8(1):10--15

work page 1993
[9]

I \ n igo Casanueva, Tadas Temcinas, Daniela Gerz, Matthew Henderson, and Ivan Vulic. 2020. http://arxiv.org/abs/2003.04807 Efficient intent detection with dual sentence encoders . CoRR, abs/2003.04807

work page arXiv 2020
[10]

John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Brian Strope, and Ray Kurzweil

Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Brian Strope, and Ray Kurzweil. 2018. https://doi.org/10.18653/v1/D18-2029 Universal sentence encoder for E nglish . In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: Sys...

work page doi:10.18653/v1/d18-2029 2018
[11]

Cer, Mona T

Daniel M. Cer, Mona T. Diab, Eneko Agirre, I \ n igo Lopez - Gazpio, and Lucia Specia. 2017. https://doi.org/10.18653/V1/S17-2001 Semeval-2017 task 1: Semantic textual similarity multilingual and crosslingual focused evaluation . In Proceedings of the 11th International Workshop on Semantic Evaluation, SemEval@ACL 2017, Vancouver, Canada, August 3-4, 2017...

work page doi:10.18653/v1/s17-2001 2017
[12]

Shouvik Chakraborty and Sandeep Bhowmik. 2015. An efficient approach to job shop scheduling problem using simulated annealing. International Journal of Hybrid Information Technology, 8(11):273--284

work page 2015
[13]

Omar Cheikhrouhou and Ines Khoufi. 2021. https://doi.org/10.1016/J.COSREV.2021.100369 A comprehensive survey on the multiple traveling salesman problem: Applications, approaches and taxonomy . Comput. Sci. Rev., 40:100369

work page doi:10.1016/j.cosrev.2021.100369 2021
[14]

Camargo, Fabian Fl \" o ck, Devin Gaffney, Przemyslaw A

Xi Chen, Ali Zeynali, Chico Q. Camargo, Fabian Fl \" o ck, Devin Gaffney, Przemyslaw A. Grabowicz, Scott A. Hale, David Jurgens, and Mattia Samory. 2022. https://doi.org/10.18653/V1/2022.SEMEVAL-1.155 Semeval-2022 task 8: Multilingual news article similarity . In Proceedings of the 16th International Workshop on Semantic Evaluation, SemEval@NAACL 2022, Se...

work page doi:10.18653/v1/2022.semeval-1.155 2022
[15]

Arman Cohan, Sergey Feldman, Iz Beltagy, Doug Downey, and Daniel S. Weld. 2020. https://doi.org/10.18653/V1/2020.ACL-MAIN.207 SPECTER: document-level representation learning using citation-informed transformers . In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020 , pages 2270--2282...

work page doi:10.18653/v1/2020.acl-main.207 2020
[16]

Daniel Delahaye, Supatcha Chaimatanan, and Marcel Mongeau. 2019. Simulated annealing: From basics to applications. Handbook of metaheuristics, pages 1--35

work page 2019
[17]

Chuntao Ding, Zhichao Lu, Shangguang Wang, Ran Cheng, and Vishnu Naresh Boddeti. 2023. https://doi.org/10.1109/CVPR52729.2023.00749 Mitigating task interference in multi-task learning via explicit task routing with non-learnable primitives . In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17-24, 20...

work page doi:10.1109/cvpr52729.2023.00749 2023
[18]

Tianyu Gao, Xingcheng Yao, and Danqi Chen. 2021. https://doi.org/10.18653/V1/2021.EMNLP-MAIN.552 Simcse: Simple contrastive learning of sentence embeddings . In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021 , pages 6894--6910. Associat...

work page doi:10.18653/v1/2021.emnlp-main.552 2021
[19]

Gregor Geigle, Nils Reimers, Andreas R \" u ckl \' e , and Iryna Gurevych. 2021. http://arxiv.org/abs/2104.07081 TWEAC: transformer with extendable QA agent classifiers . CoRR, abs/2104.07081

work page arXiv 2021
[20]

Michael G \" u nther, Louis Milliken, Jonathan Geuter, Georgios Mastrapas, Bo Wang, and Han Xiao. 2023. https://doi.org/10.48550/ARXIV.2307.11224 Jina embeddings: A novel set of high-performance sentence embedding models . CoRR, abs/2307.11224

work page doi:10.48550/arxiv.2307.11224 2023
[21]

Keld Helsgaun. 2006. An effective implementation of K-opt moves for the Lin-Kernighan TSP heuristic. Ph.D. thesis, Roskilde University. Department of Computer Science

work page 2006
[22]

Karla L Hoffman, Manfred Padberg, Giovanni Rinaldi, et al. 2013. Traveling salesman problem. Encyclopedia of operations research and management science, 1:1573--1578

work page 2013
[23]

Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bojanowski, Armand Joulin, and Edouard Grave. 2022 a . https://openreview.net/forum?id=jKN1pXi7b0 Unsupervised dense information retrieval with contrastive learning . Trans. Mach. Learn. Res., 2022

work page 2022
[24]

Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bojanowski, Armand Joulin, and Edouard Grave. 2022 b . https://openreview.net/forum?id=jKN1pXi7b0 Unsupervised dense information retrieval with contrastive learning . Trans. Mach. Learn. Res., 2022

work page 2022
[25]

Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick S. H. Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen - tau Yih. 2020. https://doi.org/10.18653/V1/2020.EMNLP-MAIN.550 Dense passage retrieval for open-domain question answering . In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November ...

work page doi:10.18653/v1/2020.emnlp-main.550 2020
[26]

Wuwei Lan, Siyu Qiu, Hua He, and Wei Xu. 2017. https://doi.org/10.18653/V1/D17-1126 A continuously growing dataset of sentential paraphrases . In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9-11, 2017 , pages 1224--1234. Association for Computational Linguistics

work page doi:10.18653/v1/d17-1126 2017
[27]

Jaakkola, Kateryna Tymoshenko, Alessandro Moschitti, and Llu \' s M \` a rquez

Tao Lei, Hrishikesh Joshi, Regina Barzilay, Tommi S. Jaakkola, Kateryna Tymoshenko, Alessandro Moschitti, and Llu \' s M \` a rquez. 2016. https://doi.org/10.18653/V1/N16-1153 Semi-supervised question retrieval with gated convolutions . In NAACL HLT 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: H...

work page doi:10.18653/v1/n16-1153 2016
[28]

Zehan Li, Xin Zhang, Yanzhao Zhang, Dingkun Long, Pengjun Xie, and Meishan Zhang. 2023. https://doi.org/10.48550/ARXIV.2308.03281 Towards general text embeddings with multi-stage contrastive learning . CoRR, abs/2308.03281

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2308.03281 2023
[29]

Xueqing Liu, Chi Wang, Yue Leng, and ChengXiang Zhai. 2018. https://doi.org/10.1145/3283812.3283815 Linkso: a dataset for learning to retrieve similar question answer pairs on software development forums . In Proceedings of the 4th ACM SIGSOFT International Workshop on NLP for Software Engineering, NL4SE@ESEC/SIGSOFT FSE 2018, Lake Buena Vista, FL, USA, N...

work page doi:10.1145/3283812.3283815 2018
[30]

Rajesh Matai, Surya Prakash Singh, and Murari Lal Mittal. 2010. Traveling salesman problem: an overview of applications, formulations, and solution approaches. Traveling salesman problem, theory and applications, 1(1):1--25

work page 2010
[31]

David Mueller, Nicholas Andrews, and Mark Dredze. 2022. https://doi.org/10.18653/V1/2022.FINDINGS-EMNLP.206 Do text-to-text multi-task learners suffer from task conflict? In Findings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022 , pages 2843--2858. Association for Computational Linguistics

work page doi:10.18653/v1/2022.findings-emnlp.206 2022
[32]

Niklas Muennighoff. 2022. http://arxiv.org/abs/2202.08904 SGPT: GPT sentence embeddings for semantic search . CoRR, abs/2202.08904

work page arXiv 2022
[33]

Niklas Muennighoff, Nouamane Tazi, Lo \" c Magne, and Nils Reimers. 2023. https://doi.org/10.18653/V1/2023.EACL-MAIN.148 MTEB: massive text embedding benchmark . In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2023, Dubrovnik, Croatia, May 2-6, 2023 , pages 2006--2029. Association for Co...

work page doi:10.18653/v1/2023.eacl-main.148 2023
[34]

Arvind Neelakantan, Tao Xu, Raul Puri, Alec Radford, Jesse Michael Han, Jerry Tworek, Qiming Yuan, Nikolas Tezak, Jong Wook Kim, Chris Hallacy, Johannes Heidecke, Pranav Shyam, Boris Power, Tyna Eloundou Nekoul, Girish Sastry, Gretchen Krueger, David Schnurr, Felipe Petroski Such, Kenny Hsu, Madeleine Thompson, Tabarak Khan, Toki Sherbakov, Joanne Jang, P...

work page internal anchor Pith review Pith/arXiv arXiv 2022
[35]

Hall, Daniel Cer, and Yinfei Yang

Jianmo Ni, Gustavo Hern \' a ndez \' A brego, Noah Constant, Ji Ma, Keith B. Hall, Daniel Cer, and Yinfei Yang. 2022 a . https://doi.org/10.18653/V1/2022.FINDINGS-ACL.146 Sentence-t5: Scalable sentence encoders from pre-trained text-to-text models . In Findings of the Association for Computational Linguistics: ACL 2022, Dublin, Ireland, May 22-27, 2022 , ...

work page doi:10.18653/v1/2022.findings-acl.146 2022
[36]

Zhao, Yi Luan, Keith B

Jianmo Ni, Chen Qu, Jing Lu, Zhuyun Dai, Gustavo Hern \' a ndez \' A brego, Ji Ma, Vincent Y. Zhao, Yi Luan, Keith B. Hall, Ming - Wei Chang, and Yinfei Yang. 2022 b . https://doi.org/10.18653/V1/2022.EMNLP-MAIN.669 Large dual encoders are generalizable retrievers . In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing,...

work page doi:10.18653/v1/2022.emnlp-main.669 2022
[37]

James O'Neill, Polina Rozenshtein, Ryuichi Kiryo, Motoko Kubota, and Danushka Bollegala. 2021. http://arxiv.org/abs/2104.06893 I wish I would have loved this one, but I didn't - A multilingual dataset for counterfactual detection in product reviews . CoRR, abs/2104.06893

work page arXiv 2021
[38]

K Otubamowo, TO Egunjobi, and AP Adewole. 2012. A comparative study of simulated annealing and genetic algorithm for solving the travelling salesman problem

work page 2012
[39]

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul F. Christiano, Jan Leike, and Ryan Lowe. 2022. http://papers.nips.cc/paper\_files/paper/2022/hash/b1efd...

work page 2022
[40]

Anindya Jyoti Pal, Biman Ray, Nordin Zakaria, and Samar Sen Sarma. 2012. Comparative performance of modified simulated annealing with simple simulated annealing for graph coloring problem. Procedia Computer Science, 9:321--327

work page 2012
[41]

Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. https://doi.org/10.3115/v1/D14-1162 G lo V e: Global vectors for word representation . In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing ( EMNLP ) , pages 1532--1543, Doha, Qatar. Association for Computational Linguistics

work page doi:10.3115/v1/d14-1162 2014
[42]

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. http://jmlr.org/papers/v21/20-074.html Exploring the limits of transfer learning with a unified text-to-text transformer . J. Mach. Learn. Res., 21:140:1--140:67

work page 2020
[43]

Nils Reimers and Iryna Gurevych. 2019. https://doi.org/10.18653/V1/D19-1410 Sentence-bert: Sentence embeddings using siamese bert-networks . In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, ...

work page doi:10.18653/v1/d19-1410 2019
[44]

Shah, Tao Lei, Alessandro Moschitti, Salvatore Romeo, and Preslav Nakov

Darsh J. Shah, Tao Lei, Alessandro Moschitti, Salvatore Romeo, and Preslav Nakov. 2018. https://doi.org/10.18653/V1/D18-1131 Adversarial domain adaptation for duplicate question detection . In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018, pages 1056--1063. Associat...

work page doi:10.18653/v1/d18-1131 2018
[45]

O zt \" u rk, and Arzucan \

Gizem Sogancioglu, Hakime \" O zt \" u rk, and Arzucan \" O zg \" u r. 2017. https://doi.org/10.1093/BIOINFORMATICS/BTX238 BIOSSES: a semantic sentence similarity estimation system for the biomedical domain . Bioinform., 33(14):i49--i58

work page doi:10.1093/bioinformatics/btx238 2017
[46]

Smith, Luke Zettlemoyer, and Tao Yu

Hongjin Su, Weijia Shi, Jungo Kasai, Yizhong Wang, Yushi Hu, Mari Ostendorf, Wen - tau Yih, Noah A. Smith, Luke Zettlemoyer, and Tao Yu. 2023. https://doi.org/10.18653/V1/2023.FINDINGS-ACL.71 One embedder, any task: Instruction-finetuned text embeddings . In Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada, July 9-14, 2...

work page doi:10.18653/v1/2023.findings-acl.71 2023
[47]

Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, and Furu Wei. 2022 a . https://doi.org/10.48550/ARXIV.2212.03533 Text embeddings by weakly-supervised contrastive pre-training . CoRR, abs/2212.03533

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2212.03533 2022
[48]

Yizhong Wang, Swaroop Mishra, Pegah Alipoormolabashi, Yeganeh Kordi, Amirreza Mirzaei, Atharva Naik, Arjun Ashok, Arut Selvan Dhanasekaran, Anjana Arunkumar, David Stap, Eshaan Pathak, Giannis Karamanolakis, Haizhi Gary Lai, Ishan Purohit, Ishani Mondal, Jacob Anderson, Kirby Kuznia, Krima Doshi, Kuntal Kumar Pal, Maitreya Patel, Mehrad Moradshahi, Mihir ...

work page doi:10.18653/v1/2022.emnlp-main.340 2022
[49]

Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M

Jason Wei, Maarten Bosma, Vincent Y. Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, and Quoc V. Le. 2022. https://openreview.net/forum?id=gEZrGCozdqR Finetuned language models are zero-shot learners . In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022 . OpenReview.net

work page 2022
[50]

Fangzhao Wu, Ying Qiao, Jiun - Hung Chen, Chuhan Wu, Tao Qi, Jianxun Lian, Danyang Liu, Xing Xie, Jianfeng Gao, Winnie Wu, and Ming Zhou. 2020. https://doi.org/10.18653/V1/2020.ACL-MAIN.331 MIND: A large-scale dataset for news recommendation . In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, Jul...

work page doi:10.18653/v1/2020.acl-main.331 2020
[51]

Shitao Xiao, Zheng Liu, Peitian Zhang, and Niklas Muennighoff. 2023. http://arxiv.org/abs/2309.07597 C-pack: Packaged resources to advance general chinese embedding

work page internal anchor Pith review Pith/arXiv arXiv 2023
[52]

Wei Xu, Chris Callison - Burch, and Bill Dolan. 2015. https://doi.org/10.18653/V1/S15-2001 Semeval-2015 task 1: Paraphrase and semantic similarity in twitter (PIT) . In Proceedings of the 9th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2015, Denver, Colorado, USA, June 4-5, 2015, pages 1--11. The Association for Computer Linguistics

work page doi:10.18653/v1/s15-2001 2015
[53]

Xin Zhang, Zehan Li, Yanzhao Zhang, Dingkun Long, Pengjun Xie, Meishan Zhang, and Min Zhang. 2023. https://doi.org/10.48550/ARXIV.2310.08232 Language models are universal embedders . CoRR, abs/2310.08232

work page doi:10.48550/arxiv.2310.08232 2023
[54]

Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian - Yun Nie, and Ji - Rong Wen. 2023. https://doi.org/10.48550/ARXIV.2303.18223 A survey of large langua...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2303.18223 2023
[55]

Kun Zhou, Yeyun Gong, Xiao Liu, Wayne Xin Zhao, Yelong Shen, Anlei Dong, Jingwen Lu, Rangan Majumder, Ji-Rong Wen, Nan Duan, and Weizhu Chen. 2022 a . http://arxiv.org/abs/2210.11773 Simans: Simple ambiguous negatives sampling for dense text retrieval

work page arXiv 2022
[56]

Kun Zhou, Xiao Liu, Yeyun Gong, Wayne Xin Zhao, Daxin Jiang, Nan Duan, and Ji - Rong Wen. 2023 a . https://doi.org/10.1007/978-3-031-43415-0\_37 MASTER: multi-task pre-trained bottlenecked masked autoencoders are better dense retrievers . In Machine Learning and Knowledge Discovery in Databases: Research Track - European Conference, ECML PKDD 2023, Turin,...

work page doi:10.1007/978-3-031-43415-0 2023
[57]

Kun Zhou, Beichen Zhang, Wayne Xin Zhao, and Ji - Rong Wen. 2022 b . https://doi.org/10.18653/V1/2022.ACL-LONG.423 Debiased contrastive learning of unsupervised sentence representations . In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022 , pages 61...

work page doi:10.18653/v1/2022.acl-long.423 2022
[58]

Kun Zhou, Yuanhang Zhou, Wayne Xin Zhao, and Ji-Rong Wen. 2023 b . https://doi.org/10.1109/TASLP.2023.3304485 Learning to perturb for contrastive learning of unsupervised sentence representations . IEEE/ACM Transactions on Audio, Speech, and Language Processing, 31:3935--3944

work page doi:10.1109/taslp.2023.3304485 2023
[59]

URL: " 'urlintro :=

ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type volume year eprint doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRINGS urlintro eprinturl eprintpr...

work page
[60]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page

[1] [1]

Emile HL Aarts, Jan HM Korst, and Peter JM van Laarhoven. 1988. A quantitative analysis of the simulated annealing algorithm: A case study for the traveling salesman problem. Journal of Statistical Physics, 50:187--206

work page 1988

[2] [2]

Cer, Mona T

Eneko Agirre, Carmen Banea, Claire Cardie, Daniel M. Cer, Mona T. Diab, Aitor Gonzalez - Agirre, Weiwei Guo, I \ n igo Lopez - Gazpio, Montse Maritxalar, Rada Mihalcea, German Rigau, Larraitz Uria, and Janyce Wiebe. 2015. https://doi.org/10.18653/V1/S15-2045 Semeval-2015 task 2: Semantic textual similarity, english, spanish and pilot on interpretability ....

work page doi:10.18653/v1/s15-2045 2015

[3] [3]

Cer, Mona T

Eneko Agirre, Carmen Banea, Claire Cardie, Daniel M. Cer, Mona T. Diab, Aitor Gonzalez - Agirre, Weiwei Guo, Rada Mihalcea, German Rigau, and Janyce Wiebe. 2014. https://doi.org/10.3115/V1/S14-2010 Semeval-2014 task 10: Multilingual semantic textual similarity . In Proceedings of the 8th International Workshop on Semantic Evaluation, SemEval@COLING 2014, ...

work page doi:10.3115/v1/s14-2010 2014

[4] [4]

Cer, Mona T

Eneko Agirre, Carmen Banea, Daniel M. Cer, Mona T. Diab, Aitor Gonzalez - Agirre, Rada Mihalcea, German Rigau, and Janyce Wiebe. 2016. https://doi.org/10.18653/V1/S16-1081 Semeval-2016 task 1: Semantic textual similarity, monolingual and cross-lingual evaluation . In Proceedings of the 10th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT ...

work page doi:10.18653/v1/s16-1081 2016

[5] [5]

Cer, Mona T

Eneko Agirre, Daniel M. Cer, Mona T. Diab, and Aitor Gonzalez - Agirre. 2012. https://aclanthology.org/S12-1051/ Semeval-2012 task 6: A pilot on semantic textual similarity . In Proceedings of the 6th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2012, Montr \' e al, Canada, June 7-8, 2012 , pages 385--393. The Association for Computer ...

work page 2012

[6] [6]

Cer, Mona T

Eneko Agirre, Daniel M. Cer, Mona T. Diab, Aitor Gonzalez - Agirre, and Weiwei Guo. 2013. https://aclanthology.org/S13-1004/ *sem 2013 shared task: Semantic textual similarity . In Proceedings of the Second Joint Conference on Lexical and Computational Semantics, *SEM 2013, June 13-14, 2013, Atlanta, Georgia, USA , pages 32--43. Association for Computatio...

work page 2013

[7] [7]

Yoshua Bengio, J \' e r \^ o me Louradour, Ronan Collobert, and Jason Weston. 2009. https://doi.org/10.1145/1553374.1553380 Curriculum learning . In Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, Montreal, Quebec, Canada, June 14-18, 2009 , volume 382 of ACM International Conference Proceeding Series , pages 41--48. ACM

work page doi:10.1145/1553374.1553380 2009

[8] [8]

Dimitris Bertsimas and John Tsitsiklis. 1993. Simulated annealing. Statistical science, 8(1):10--15

work page 1993

[9] [9]

I \ n igo Casanueva, Tadas Temcinas, Daniela Gerz, Matthew Henderson, and Ivan Vulic. 2020. http://arxiv.org/abs/2003.04807 Efficient intent detection with dual sentence encoders . CoRR, abs/2003.04807

work page arXiv 2020

[10] [10]

John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Brian Strope, and Ray Kurzweil

Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Brian Strope, and Ray Kurzweil. 2018. https://doi.org/10.18653/v1/D18-2029 Universal sentence encoder for E nglish . In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: Sys...

work page doi:10.18653/v1/d18-2029 2018

[11] [11]

Cer, Mona T

Daniel M. Cer, Mona T. Diab, Eneko Agirre, I \ n igo Lopez - Gazpio, and Lucia Specia. 2017. https://doi.org/10.18653/V1/S17-2001 Semeval-2017 task 1: Semantic textual similarity multilingual and crosslingual focused evaluation . In Proceedings of the 11th International Workshop on Semantic Evaluation, SemEval@ACL 2017, Vancouver, Canada, August 3-4, 2017...

work page doi:10.18653/v1/s17-2001 2017

[12] [12]

Shouvik Chakraborty and Sandeep Bhowmik. 2015. An efficient approach to job shop scheduling problem using simulated annealing. International Journal of Hybrid Information Technology, 8(11):273--284

work page 2015

[13] [13]

Omar Cheikhrouhou and Ines Khoufi. 2021. https://doi.org/10.1016/J.COSREV.2021.100369 A comprehensive survey on the multiple traveling salesman problem: Applications, approaches and taxonomy . Comput. Sci. Rev., 40:100369

work page doi:10.1016/j.cosrev.2021.100369 2021

[14] [14]

Camargo, Fabian Fl \" o ck, Devin Gaffney, Przemyslaw A

Xi Chen, Ali Zeynali, Chico Q. Camargo, Fabian Fl \" o ck, Devin Gaffney, Przemyslaw A. Grabowicz, Scott A. Hale, David Jurgens, and Mattia Samory. 2022. https://doi.org/10.18653/V1/2022.SEMEVAL-1.155 Semeval-2022 task 8: Multilingual news article similarity . In Proceedings of the 16th International Workshop on Semantic Evaluation, SemEval@NAACL 2022, Se...

work page doi:10.18653/v1/2022.semeval-1.155 2022

[15] [15]

Arman Cohan, Sergey Feldman, Iz Beltagy, Doug Downey, and Daniel S. Weld. 2020. https://doi.org/10.18653/V1/2020.ACL-MAIN.207 SPECTER: document-level representation learning using citation-informed transformers . In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020 , pages 2270--2282...

work page doi:10.18653/v1/2020.acl-main.207 2020

[16] [16]

Daniel Delahaye, Supatcha Chaimatanan, and Marcel Mongeau. 2019. Simulated annealing: From basics to applications. Handbook of metaheuristics, pages 1--35

work page 2019

[17] [17]

Chuntao Ding, Zhichao Lu, Shangguang Wang, Ran Cheng, and Vishnu Naresh Boddeti. 2023. https://doi.org/10.1109/CVPR52729.2023.00749 Mitigating task interference in multi-task learning via explicit task routing with non-learnable primitives . In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17-24, 20...

work page doi:10.1109/cvpr52729.2023.00749 2023

[18] [18]

Tianyu Gao, Xingcheng Yao, and Danqi Chen. 2021. https://doi.org/10.18653/V1/2021.EMNLP-MAIN.552 Simcse: Simple contrastive learning of sentence embeddings . In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021 , pages 6894--6910. Associat...

work page doi:10.18653/v1/2021.emnlp-main.552 2021

[19] [19]

Gregor Geigle, Nils Reimers, Andreas R \" u ckl \' e , and Iryna Gurevych. 2021. http://arxiv.org/abs/2104.07081 TWEAC: transformer with extendable QA agent classifiers . CoRR, abs/2104.07081

work page arXiv 2021

[20] [20]

Michael G \" u nther, Louis Milliken, Jonathan Geuter, Georgios Mastrapas, Bo Wang, and Han Xiao. 2023. https://doi.org/10.48550/ARXIV.2307.11224 Jina embeddings: A novel set of high-performance sentence embedding models . CoRR, abs/2307.11224

work page doi:10.48550/arxiv.2307.11224 2023

[21] [21]

Keld Helsgaun. 2006. An effective implementation of K-opt moves for the Lin-Kernighan TSP heuristic. Ph.D. thesis, Roskilde University. Department of Computer Science

work page 2006

[22] [22]

Karla L Hoffman, Manfred Padberg, Giovanni Rinaldi, et al. 2013. Traveling salesman problem. Encyclopedia of operations research and management science, 1:1573--1578

work page 2013

[23] [23]

Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bojanowski, Armand Joulin, and Edouard Grave. 2022 a . https://openreview.net/forum?id=jKN1pXi7b0 Unsupervised dense information retrieval with contrastive learning . Trans. Mach. Learn. Res., 2022

work page 2022

[24] [24]

Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bojanowski, Armand Joulin, and Edouard Grave. 2022 b . https://openreview.net/forum?id=jKN1pXi7b0 Unsupervised dense information retrieval with contrastive learning . Trans. Mach. Learn. Res., 2022

work page 2022

[25] [25]

Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick S. H. Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen - tau Yih. 2020. https://doi.org/10.18653/V1/2020.EMNLP-MAIN.550 Dense passage retrieval for open-domain question answering . In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November ...

work page doi:10.18653/v1/2020.emnlp-main.550 2020

[26] [26]

Wuwei Lan, Siyu Qiu, Hua He, and Wei Xu. 2017. https://doi.org/10.18653/V1/D17-1126 A continuously growing dataset of sentential paraphrases . In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9-11, 2017 , pages 1224--1234. Association for Computational Linguistics

work page doi:10.18653/v1/d17-1126 2017

[27] [27]

Jaakkola, Kateryna Tymoshenko, Alessandro Moschitti, and Llu \' s M \` a rquez

Tao Lei, Hrishikesh Joshi, Regina Barzilay, Tommi S. Jaakkola, Kateryna Tymoshenko, Alessandro Moschitti, and Llu \' s M \` a rquez. 2016. https://doi.org/10.18653/V1/N16-1153 Semi-supervised question retrieval with gated convolutions . In NAACL HLT 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: H...

work page doi:10.18653/v1/n16-1153 2016

[28] [28]

Zehan Li, Xin Zhang, Yanzhao Zhang, Dingkun Long, Pengjun Xie, and Meishan Zhang. 2023. https://doi.org/10.48550/ARXIV.2308.03281 Towards general text embeddings with multi-stage contrastive learning . CoRR, abs/2308.03281

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2308.03281 2023

[29] [29]

Xueqing Liu, Chi Wang, Yue Leng, and ChengXiang Zhai. 2018. https://doi.org/10.1145/3283812.3283815 Linkso: a dataset for learning to retrieve similar question answer pairs on software development forums . In Proceedings of the 4th ACM SIGSOFT International Workshop on NLP for Software Engineering, NL4SE@ESEC/SIGSOFT FSE 2018, Lake Buena Vista, FL, USA, N...

work page doi:10.1145/3283812.3283815 2018

[30] [30]

Rajesh Matai, Surya Prakash Singh, and Murari Lal Mittal. 2010. Traveling salesman problem: an overview of applications, formulations, and solution approaches. Traveling salesman problem, theory and applications, 1(1):1--25

work page 2010

[31] [31]

David Mueller, Nicholas Andrews, and Mark Dredze. 2022. https://doi.org/10.18653/V1/2022.FINDINGS-EMNLP.206 Do text-to-text multi-task learners suffer from task conflict? In Findings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022 , pages 2843--2858. Association for Computational Linguistics

work page doi:10.18653/v1/2022.findings-emnlp.206 2022

[32] [32]

Niklas Muennighoff. 2022. http://arxiv.org/abs/2202.08904 SGPT: GPT sentence embeddings for semantic search . CoRR, abs/2202.08904

work page arXiv 2022

[33] [33]

Niklas Muennighoff, Nouamane Tazi, Lo \" c Magne, and Nils Reimers. 2023. https://doi.org/10.18653/V1/2023.EACL-MAIN.148 MTEB: massive text embedding benchmark . In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2023, Dubrovnik, Croatia, May 2-6, 2023 , pages 2006--2029. Association for Co...

work page doi:10.18653/v1/2023.eacl-main.148 2023

[34] [34]

Arvind Neelakantan, Tao Xu, Raul Puri, Alec Radford, Jesse Michael Han, Jerry Tworek, Qiming Yuan, Nikolas Tezak, Jong Wook Kim, Chris Hallacy, Johannes Heidecke, Pranav Shyam, Boris Power, Tyna Eloundou Nekoul, Girish Sastry, Gretchen Krueger, David Schnurr, Felipe Petroski Such, Kenny Hsu, Madeleine Thompson, Tabarak Khan, Toki Sherbakov, Joanne Jang, P...

work page internal anchor Pith review Pith/arXiv arXiv 2022

[35] [35]

Hall, Daniel Cer, and Yinfei Yang

Jianmo Ni, Gustavo Hern \' a ndez \' A brego, Noah Constant, Ji Ma, Keith B. Hall, Daniel Cer, and Yinfei Yang. 2022 a . https://doi.org/10.18653/V1/2022.FINDINGS-ACL.146 Sentence-t5: Scalable sentence encoders from pre-trained text-to-text models . In Findings of the Association for Computational Linguistics: ACL 2022, Dublin, Ireland, May 22-27, 2022 , ...

work page doi:10.18653/v1/2022.findings-acl.146 2022

[36] [36]

Zhao, Yi Luan, Keith B

Jianmo Ni, Chen Qu, Jing Lu, Zhuyun Dai, Gustavo Hern \' a ndez \' A brego, Ji Ma, Vincent Y. Zhao, Yi Luan, Keith B. Hall, Ming - Wei Chang, and Yinfei Yang. 2022 b . https://doi.org/10.18653/V1/2022.EMNLP-MAIN.669 Large dual encoders are generalizable retrievers . In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing,...

work page doi:10.18653/v1/2022.emnlp-main.669 2022

[37] [37]

James O'Neill, Polina Rozenshtein, Ryuichi Kiryo, Motoko Kubota, and Danushka Bollegala. 2021. http://arxiv.org/abs/2104.06893 I wish I would have loved this one, but I didn't - A multilingual dataset for counterfactual detection in product reviews . CoRR, abs/2104.06893

work page arXiv 2021

[38] [38]

K Otubamowo, TO Egunjobi, and AP Adewole. 2012. A comparative study of simulated annealing and genetic algorithm for solving the travelling salesman problem

work page 2012

[39] [39]

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul F. Christiano, Jan Leike, and Ryan Lowe. 2022. http://papers.nips.cc/paper\_files/paper/2022/hash/b1efd...

work page 2022

[40] [40]

Anindya Jyoti Pal, Biman Ray, Nordin Zakaria, and Samar Sen Sarma. 2012. Comparative performance of modified simulated annealing with simple simulated annealing for graph coloring problem. Procedia Computer Science, 9:321--327

work page 2012

[41] [41]

Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. https://doi.org/10.3115/v1/D14-1162 G lo V e: Global vectors for word representation . In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing ( EMNLP ) , pages 1532--1543, Doha, Qatar. Association for Computational Linguistics

work page doi:10.3115/v1/d14-1162 2014

[42] [42]

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. http://jmlr.org/papers/v21/20-074.html Exploring the limits of transfer learning with a unified text-to-text transformer . J. Mach. Learn. Res., 21:140:1--140:67

work page 2020

[43] [43]

Nils Reimers and Iryna Gurevych. 2019. https://doi.org/10.18653/V1/D19-1410 Sentence-bert: Sentence embeddings using siamese bert-networks . In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, ...

work page doi:10.18653/v1/d19-1410 2019

[44] [44]

Shah, Tao Lei, Alessandro Moschitti, Salvatore Romeo, and Preslav Nakov

Darsh J. Shah, Tao Lei, Alessandro Moschitti, Salvatore Romeo, and Preslav Nakov. 2018. https://doi.org/10.18653/V1/D18-1131 Adversarial domain adaptation for duplicate question detection . In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018, pages 1056--1063. Associat...

work page doi:10.18653/v1/d18-1131 2018

[45] [45]

O zt \" u rk, and Arzucan \

Gizem Sogancioglu, Hakime \" O zt \" u rk, and Arzucan \" O zg \" u r. 2017. https://doi.org/10.1093/BIOINFORMATICS/BTX238 BIOSSES: a semantic sentence similarity estimation system for the biomedical domain . Bioinform., 33(14):i49--i58

work page doi:10.1093/bioinformatics/btx238 2017

[46] [46]

Smith, Luke Zettlemoyer, and Tao Yu

Hongjin Su, Weijia Shi, Jungo Kasai, Yizhong Wang, Yushi Hu, Mari Ostendorf, Wen - tau Yih, Noah A. Smith, Luke Zettlemoyer, and Tao Yu. 2023. https://doi.org/10.18653/V1/2023.FINDINGS-ACL.71 One embedder, any task: Instruction-finetuned text embeddings . In Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada, July 9-14, 2...

work page doi:10.18653/v1/2023.findings-acl.71 2023

[47] [47]

Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, and Furu Wei. 2022 a . https://doi.org/10.48550/ARXIV.2212.03533 Text embeddings by weakly-supervised contrastive pre-training . CoRR, abs/2212.03533

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2212.03533 2022

[48] [48]

Yizhong Wang, Swaroop Mishra, Pegah Alipoormolabashi, Yeganeh Kordi, Amirreza Mirzaei, Atharva Naik, Arjun Ashok, Arut Selvan Dhanasekaran, Anjana Arunkumar, David Stap, Eshaan Pathak, Giannis Karamanolakis, Haizhi Gary Lai, Ishan Purohit, Ishani Mondal, Jacob Anderson, Kirby Kuznia, Krima Doshi, Kuntal Kumar Pal, Maitreya Patel, Mehrad Moradshahi, Mihir ...

work page doi:10.18653/v1/2022.emnlp-main.340 2022

[49] [49]

Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M

Jason Wei, Maarten Bosma, Vincent Y. Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, and Quoc V. Le. 2022. https://openreview.net/forum?id=gEZrGCozdqR Finetuned language models are zero-shot learners . In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022 . OpenReview.net

work page 2022

[50] [50]

Fangzhao Wu, Ying Qiao, Jiun - Hung Chen, Chuhan Wu, Tao Qi, Jianxun Lian, Danyang Liu, Xing Xie, Jianfeng Gao, Winnie Wu, and Ming Zhou. 2020. https://doi.org/10.18653/V1/2020.ACL-MAIN.331 MIND: A large-scale dataset for news recommendation . In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, Jul...

work page doi:10.18653/v1/2020.acl-main.331 2020

[51] [51]

Shitao Xiao, Zheng Liu, Peitian Zhang, and Niklas Muennighoff. 2023. http://arxiv.org/abs/2309.07597 C-pack: Packaged resources to advance general chinese embedding

work page internal anchor Pith review Pith/arXiv arXiv 2023

[52] [52]

Wei Xu, Chris Callison - Burch, and Bill Dolan. 2015. https://doi.org/10.18653/V1/S15-2001 Semeval-2015 task 1: Paraphrase and semantic similarity in twitter (PIT) . In Proceedings of the 9th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2015, Denver, Colorado, USA, June 4-5, 2015, pages 1--11. The Association for Computer Linguistics

work page doi:10.18653/v1/s15-2001 2015

[53] [53]

Xin Zhang, Zehan Li, Yanzhao Zhang, Dingkun Long, Pengjun Xie, Meishan Zhang, and Min Zhang. 2023. https://doi.org/10.48550/ARXIV.2310.08232 Language models are universal embedders . CoRR, abs/2310.08232

work page doi:10.48550/arxiv.2310.08232 2023

[54] [54]

Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian - Yun Nie, and Ji - Rong Wen. 2023. https://doi.org/10.48550/ARXIV.2303.18223 A survey of large langua...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2303.18223 2023

[55] [55]

Kun Zhou, Yeyun Gong, Xiao Liu, Wayne Xin Zhao, Yelong Shen, Anlei Dong, Jingwen Lu, Rangan Majumder, Ji-Rong Wen, Nan Duan, and Weizhu Chen. 2022 a . http://arxiv.org/abs/2210.11773 Simans: Simple ambiguous negatives sampling for dense text retrieval

work page arXiv 2022

[56] [56]

Kun Zhou, Xiao Liu, Yeyun Gong, Wayne Xin Zhao, Daxin Jiang, Nan Duan, and Ji - Rong Wen. 2023 a . https://doi.org/10.1007/978-3-031-43415-0\_37 MASTER: multi-task pre-trained bottlenecked masked autoencoders are better dense retrievers . In Machine Learning and Knowledge Discovery in Databases: Research Track - European Conference, ECML PKDD 2023, Turin,...

work page doi:10.1007/978-3-031-43415-0 2023

[57] [57]

Kun Zhou, Beichen Zhang, Wayne Xin Zhao, and Ji - Rong Wen. 2022 b . https://doi.org/10.18653/V1/2022.ACL-LONG.423 Debiased contrastive learning of unsupervised sentence representations . In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022 , pages 61...

work page doi:10.18653/v1/2022.acl-long.423 2022

[58] [58]

Kun Zhou, Yuanhang Zhou, Wayne Xin Zhao, and Ji-Rong Wen. 2023 b . https://doi.org/10.1109/TASLP.2023.3304485 Learning to perturb for contrastive learning of unsupervised sentence representations . IEEE/ACM Transactions on Audio, Speech, and Language Processing, 31:3935--3944

work page doi:10.1109/taslp.2023.3304485 2023

[59] [59]

URL: " 'urlintro :=

ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type volume year eprint doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRINGS urlintro eprinturl eprintpr...

work page

[60] [60]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page