IdioLink: Retrieving Meaning Beyond Words Across Idiomatic and Literal Expressions
Pith reviewed 2026-05-22 05:57 UTC · model grok-4.3
The pith
Embedding models fail to retrieve the same core meaning when expressed as idioms versus literal phrases.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We introduce IdioLink to test whether models can link idiomatic expressions to conceptually equivalent meanings expressed in literal or paraphrased forms. The benchmark spans 107 idioms with both literal and figurative uses, each document and query annotated with spans that convey the core meaning. Evaluation of current embedding baselines reveals that models struggle to retrieve equivalent meanings across divergent surface realizations and instead rely on topical and shallow semantic cues.
What carries the argument
IdioLink, a retrieval benchmark of idiomatic and literal expressions paired with core-meaning-span annotations that forces models to abstract beyond lexical overlap.
If this is right
- Retrieval systems using current embeddings will miss relevant documents when queries or documents contain idioms.
- Benchmark results indicate that semantic abstraction mechanisms beyond surface similarity must be developed.
- IdioLink provides a concrete testbed for training or evaluating future models on figurative language.
- Performance gaps suggest existing evaluation sets may overestimate model capability on non-literal input.
Where Pith is reading between the lines
- Success on IdioLink would likely improve model robustness on related phenomena such as metaphors or sarcasm that also require abstraction.
- Adding core-meaning supervision during pretraining could transfer to other retrieval tasks that involve paraphrasing.
- Extending the benchmark to additional languages would reveal whether the observed gaps are language-specific or general.
Load-bearing premise
Human annotations correctly and without bias identify the core meaning spans that distinguish genuine semantic equivalence from mere topical similarity.
What would settle it
Run the models on IdioLink queries while also measuring performance on a matched control set of topical distractors that lack core-meaning overlap; high accuracy on core matches paired with low accuracy on topical controls would support the claim, while strong performance on both would falsify the reported struggle.
Figures
read the original abstract
Idioms pose a fundamental challenge for language models, as their meaning cannot be inferred from surface form alone. Understanding such expressions, therefore, requires semantic abstraction beyond lexical overlap. We introduce IdioLink, a retrieval benchmark designed to test whether models can link idiomatic expressions to conceptually equivalent meanings expressed in literal or paraphrased forms. IdioLink comprises 10,700 documents and 2,140 queries, spanning 107 idioms with both literal and figurative uses. Each document and query is annotated with spans that convey the core meaning. Evaluating strong embedding baselines (e.g., BGE, E5, Contriever, and Qwen), we show that current models struggle to retrieve equivalent meanings across divergent surface realizations, relying instead on topical and shallow semantic cues. IdioLink exposes key gaps in idiom-aware semantic retrieval and provides a challenging testbed for future models.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces IdioLink, a retrieval benchmark with 10,700 documents and 2,140 queries spanning 107 idioms that have both literal and figurative uses. Each document and query is annotated with spans conveying the core meaning. Strong embedding baselines (BGE, E5, Contriever, Qwen) are evaluated, with the central claim that current models struggle to retrieve equivalent meanings across divergent surface realizations and instead rely on topical and shallow semantic cues.
Significance. If the central claim holds after addressing the annotation and reporting gaps, the work would be significant for the field by providing a targeted testbed that isolates semantic abstraction failures in idiom handling. The explicit core-meaning span annotations are a clear strength, enabling more precise diagnosis of whether retrieval failures stem from surface cues rather than meaning equivalence.
major comments (3)
- [Dataset construction] Dataset construction section: the paper reports 10,700 documents and 107 idioms but supplies no details on collection, filtering, or controls for topical/lexical bias. This is load-bearing for the claim that models rely on shallow cues, because without such details it is impossible to rule out that the benchmark itself introduces regularities that make shallow matching artificially easy.
- [Annotation process] Annotation process (core-meaning spans): no inter-annotator agreement, adjudication protocol, or control experiments (e.g., span-swapping while preserving meaning) are reported. This directly undermines the central claim, as annotator bias toward surface or topical overlap could produce the observed performance gaps even if models were capable of true semantic abstraction.
- [Evaluation and results] Evaluation and results section: the abstract states the main finding that models struggle and rely on shallow cues, yet the provided text contains no quantitative retrieval metrics, error analysis, or breakdown by idiom type. Without these, the claim that performance reflects inability to abstract beyond surface form cannot be verified.
minor comments (1)
- [Abstract] The abstract would be strengthened by including one or two key quantitative results (e.g., top-1 or MRR scores for the strongest baseline) to support the qualitative claim.
Simulated Author's Rebuttal
We thank the referee for their detailed and constructive review. The comments highlight important areas for improving clarity and rigor in our presentation of the IdioLink benchmark. We address each major comment below and have revised the manuscript to incorporate additional details and analyses where feasible.
read point-by-point responses
-
Referee: [Dataset construction] Dataset construction section: the paper reports 10,700 documents and 107 idioms but supplies no details on collection, filtering, or controls for topical/lexical bias. This is load-bearing for the claim that models rely on shallow cues, because without such details it is impossible to rule out that the benchmark itself introduces regularities that make shallow matching artificially easy.
Authors: We agree that explicit details on dataset construction are necessary to support the claim regarding shallow cue reliance. In the revised manuscript we have expanded the Dataset Construction section to describe idiom selection from standard linguistic resources, document sourcing via balanced web queries targeting both literal and figurative contexts for each idiom, filtering steps for quality and relevance, and controls such as topic diversification across documents and lexical overlap minimization outside the idiom expressions themselves. We also added supporting analysis showing that topical regularities alone do not explain the performance patterns observed. revision: yes
-
Referee: [Annotation process] Annotation process (core-meaning spans): no inter-annotator agreement, adjudication protocol, or control experiments (e.g., span-swapping while preserving meaning) are reported. This directly undermines the central claim, as annotator bias toward surface or topical overlap could produce the observed performance gaps even if models were capable of true semantic abstraction.
Authors: We acknowledge that these methodological details were insufficiently reported. The revised manuscript now includes a new subsection detailing the annotation guidelines, inter-annotator agreement statistics computed over a sampled subset, the adjudication process for resolving disagreements, and results from control experiments that test annotation robustness by altering surface forms while holding core meaning constant. These additions help demonstrate that the observed model failures are not artifacts of annotation bias. revision: yes
-
Referee: [Evaluation and results] Evaluation and results section: the abstract states the main finding that models struggle and rely on shallow cues, yet the provided text contains no quantitative retrieval metrics, error analysis, or breakdown by idiom type. Without these, the claim that performance reflects inability to abstract beyond surface form cannot be verified.
Authors: We apologize if the quantitative results were not prominent enough in the reviewed version. The manuscript contains retrieval metrics (including Recall@k and nDCG) for the evaluated models; we have now expanded the Evaluation section with a dedicated error analysis subsection and breakdowns by idiom properties such as frequency and semantic decomposability. These additions provide direct evidence linking performance gaps to difficulties with meaning abstraction rather than surface cues. revision: yes
Circularity Check
No circularity: benchmark introduction and baseline evaluation are self-contained.
full rationale
The paper constructs IdioLink as an external retrieval benchmark consisting of 10,700 documents, 2,140 queries, and core-meaning span annotations over 107 idioms, then reports empirical performance of independent embedding models (BGE, E5, Contriever, Qwen) on it. No derivations, equations, parameter fitting, or self-citations are invoked to generate the central claims; the observed gaps in cross-surface retrieval are presented as direct measurements against the newly introduced dataset rather than reductions to prior fitted values or author-defined uniqueness results. The work therefore remains externally falsifiable through the released annotations and queries without any load-bearing step that collapses back to its own inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Idioms pose a fundamental challenge because their meaning cannot be inferred from surface form alone
Reference graph
Works this paper leans on
-
[1]
Do g ukan Arslan, H \"u seyin An l C akmak, G \"u l s en Eryi g it, and Joakim Nivre. 2025. https://doi.org/10.18653/v1/2025.mwe-1.4 Using LLM s to advance idiom corpus construction . In Proceedings of the 21st Workshop on Multiword Expressions (MWE 2025), pages 21--31, Albuquerque, New Mexico, U.S.A. Association for Computational Linguistics
-
[2]
Akari Asai, Timo Schick, Patrick Lewis, Xilun Chen, Gautier Izacard, Sebastian Riedel, Hannaneh Hajishirzi, and Wen-tau Yih. 2023. https://doi.org/10.18653/v1/2023.findings-acl.225 Task-aware retrieval with instructions . In Findings of the Association for Computational Linguistics: ACL 2023, pages 3650--3675, Toronto, Canada. Association for Computationa...
-
[3]
G \"o zde Aslanta s and Tunga Gungor. 2026. https://doi.org/10.18653/v1/2026.sigturk-1.4 A unified T urkic idiom understanding benchmark: Idiom detection and semantic retrieval across five T urkic languages . In Proceedings of the Second Workshop Natural Language Processing for T urkic Languages ( SIGTURK 2026) , pages 38--51, Rabat, Morocco. Association ...
-
[4]
J Briskilal and C.N. Subalalitha. 2022. https://doi.org/10.1016/j.ipm.2021.102756 An ensemble model for classifying idioms and literal texts using BERT and RoBERTa . Information Processing & Management, 59(1):102756
-
[5]
Jianlyu Chen, Shitao Xiao, Peitian Zhang, Kun Luo, Defu Lian, and Zheng Liu. 2024. https://doi.org/10.18653/v1/2024.findings-acl.137 M 3-embedding: Multi-linguality, multi-functionality, multi-granularity text embeddings through self-knowledge distillation . In Findings of the Association for Computational Linguistics: ACL 2024, pages 2318--2335, Bangkok,...
- [6]
-
[7]
Mathieu Constant, G \"u l s en Eryiǧit, Johanna Monti, Lonneke van der Plas, Carlos Ramisch, Michael Rosner, and Amalia Todirascu. 2017. https://doi.org/10.1162/COLI_a_00302 S urvey: Multiword expression processing: A S urvey . Computational Linguistics, 43(4):837--892
-
[8]
Paul Cook, Afsaneh Fazly, and Suzanne Stevenson. 2007. https://aclanthology.org/W07-1106/ Pulling their weight: Exploiting syntactic forms for the automatic identification of idiomatic expressions in context . In Proceedings of the Workshop on A Broader Perspective on Multiword Expressions, pages 41--48, Prague, Czech Republic. Association for Computation...
work page 2007
-
[9]
Paul Cook, Afsaneh Fazly, and Suzanne Stevenson. 2008. The VNC - T okens dataset. In Proceedings of the LREC Workshop Towards a Shared Task for Multiword Expressions, pages 19--22
work page 2008
-
[10]
Francesca De Luca Fornaciari, Bego \ n a Altuna, Itziar Gonzalez-Dios, and Maite Melero. 2024. https://doi.org/10.18653/v1/2024.figlang-1.5 A hard nut to crack: Idiom detection with conversational large language models . In Proceedings of the 4th Workshop on Figurative Language Processing (FigLang 2024), pages 35--44, Mexico City, Mexico (Hybrid). Associa...
-
[11]
GülŞen Eryiğit, Ali Şentaş, and Johanna Monti. 2022. https://doi.org/10.1017/s1351324921000401 Gamified crowdsourcing for idiom corpora construction . Natural Language Engineering, 29(4):909–941
-
[12]
Afsaneh Fazly, Paul Cook, and Suzanne Stevenson. 2009. https://doi.org/10.1162/coli.08-010-R1-07-048 Unsupervised type and token identification of idiomatic expressions . Computational Linguistics, 35(1):61--103
-
[13]
Tianyu Gao, Xingcheng Yao, and Danqi Chen. 2021. https://doi.org/10.18653/v1/2021.emnlp-main.552 S im CSE : Simple contrastive learning of sentence embeddings . In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6894--6910, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics
-
[14]
Waseem Gharbieh, Virendra Bhavsar, and Paul Cook. 2016. https://doi.org/10.18653/v1/W16-1817 A word embedding approach to identifying verb-noun idiomatic combinations . In Proceedings of the 12th Workshop on Multiword Expressions, pages 112--118, Berlin, Germany. Association for Computational Linguistics
- [15]
-
[16]
Hessel Haagsma, Johan Bos, and Malvina Nissim. 2020. https://aclanthology.org/2020.lrec-1.35/ MAGPIE : A large corpus of potentially idiomatic expressions . In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 279--287, Marseille, France. European Language Resources Association
work page 2020
-
[17]
Lovisa Hagström, Youna Kim, Haeun Yu, Sang goo Lee, Richard Johansson, Hyunsoo Cho, and Isabelle Augenstein. 2026. https://arxiv.org/abs/2505.16518 CUB : Benchmarking context utilisation techniques for language models . Preprint, arXiv:2505.16518
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[18]
Lifeng Han, Gareth Jones, and Alan Smeaton. 2020. https://aclanthology.org/2020.mwe-1.6/ A lpha MWE : Construction of multilingual parallel corpora with MWE annotations . In Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons, pages 44--57, online. Association for Computational Linguistics
work page 2020
-
[19]
Kazi Saidul Hasan and Vincent Ng. 2014. https://doi.org/10.3115/v1/P14-1119 Automatic keyphrase extraction: A survey of the state of the art . In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1262--1273, Baltimore, Maryland. Association for Computational Linguistics
-
[20]
Kai Golan Hashiloni, Ofri Hefetz, and Kfir Bar. 2025. https://doi.org/10.18653/v1/2025.emnlp-main.1213 Easy as PIE ? identifying multi-word expressions with LLM s . In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 23771--23790, Suzhou, China. Association for Computational Linguistics
-
[21]
Wei He, Marco Idiart, Carolina Scarton, and Aline Villavicencio. 2024. https://doi.org/10.18653/v1/2024.findings-acl.741 Enhancing idiomatic representation in multiple languages via an adaptive contrastive triplet loss . In Findings of the Association for Computational Linguistics: ACL 2024, pages 12473--12485, Bangkok, Thailand. Association for Computati...
-
[22]
Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bojanowski, Armand Joulin, and Edouard Grave. 2022. https://arxiv.org/abs/2112.09118 Unsupervised dense information retrieval with contrastive learning . Preprint, arXiv:2112.09118
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[23]
Ray S. Jackendoff. 1997. The Architecture of the Language Faculty, volume 28 of Linguistic Inquiry Monographs. MIT Press, Cambridge, MA; London, England
work page 1997
-
[24]
Rohan Jha, Bo Wang, Michael G \"u nther, Georgios Mastrapas, Saba Sturua, Isabelle Mohr, Andreas Koukounas, Mohammad Kalim Akram, Nan Wang, and Han Xiao. 2024. https://doi.org/10.18653/v1/2024.mrl-1.11 J ina- C ol BERT -v2: A general-purpose multilingual late interaction retriever . In Proceedings of the Fourth Workshop on Multilingual Representation Lear...
-
[25]
Greg Kamradt. 2024. https://github.com/FullStackRetrieval-com/RetrievalTutorials/blob/main/tutorials/LevelsOfTextSplitting/5_Levels_Of_Text_Splitting.ipynb 5 levels of text splitting . GitHub repository. Accessed: 2025-12-27
work page 2024
-
[26]
Jenna Kanerva, Hanna Kitti, Li-Hsin Chang, Teemu Vahtola, Mathias Creutz, and Filip Ginter. 2025. https://doi.org/10.1007/s10579-023-09715-7 Semantic search as extractive paraphrase span detection . Language Resources and Evaluation, 59(1):257--276
-
[27]
Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 2020. https://doi.org/10.18653/v1/2020.emnlp-main.550 Dense passage retrieval for open-domain question answering . In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6769--6781, Online. Ass...
-
[28]
Jisu Kim, Youngwoo Shin, Uiji Hwang, Jihun Choi, Richeng Xuan, and Taeuk Kim. 2025. https://doi.org/10.18653/v1/2025.emnlp-main.1099 Memorization or reasoning? exploring the idiom understanding of LLM s . In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 21678--21699, Suzhou, China. Association for Computatio...
-
[29]
Ioannis Korkontzelos, Torsten Zesch, Fabio Massimo Zanzotto, and Chris Biemann. 2013. https://aclanthology.org/S13-2007/ S em E val-2013 task 5: Evaluating phrasal semantics . In Second Joint Conference on Lexical and Computational Semantics (* SEM ), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation ( S em E val 2013) , p...
work page 2013
-
[30]
Jinhyuk Lee, Zhuyun Dai, Xiaoqi Ren, Blair Chen, Daniel Cer, Jeremy R. Cole, Kai Hui, Michael Boratko, Rajvi Kapadia, Wen Ding, Yi Luan, Sai Meher Karthik Duddu, Gustavo Hernandez Abrego, Weiqiang Shi, Nithi Gupta, Aditya Kusupati, Prateek Jain, Siddhartha Reddy Jonnalagadda, Ming-Wei Chang, and Iftekhar Naim. 2024. https://arxiv.org/abs/2403.20327 Gecko:...
-
[31]
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. 2021. https://arxiv.org/abs/2005.11401 Retrieval-augmented generation for knowledge-intensive NLP tasks . Preprint, arXiv:2005.11401
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[32]
Zehan Li, Xin Zhang, Yanzhao Zhang, Dingkun Long, Pengjun Xie, and Meishan Zhang. 2023. https://arxiv.org/abs/2308.03281 Towards general text embeddings with multi-stage contrastive learning . Preprint, arXiv:2308.03281
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[33]
Xueguang Ma, Xi Victoria Lin, Barlas Oguz, Jimmy Lin, Wen-tau Yih, and Xilun Chen. 2025. https://doi.org/10.18653/v1/2025.acl-long.1457 DRAMA : Diverse augmentation from large language models to smaller dense retrievers . In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 30170--30186,...
-
[34]
Rui Meng, Ye Liu, Shafiq Rayhan Joty, Caiming Xiong, Yingbo Zhou, and Semih Yavuz. 2024. https://www.salesforce.com/blog/sfr-embedding/ SFR-Embedding-Mistral :enhance text retrieval with transfer learning . Salesforce AI Research Blog
work page 2024
-
[35]
Maggie Mi, Aline Villavicencio, and Nafise Sadat Moosavi. 2025. https://doi.org/10.18653/v1/2025.acl-long.362 Rolling the DICE on idiomaticity: How LLM s fail to grasp context . In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7314--7332, Vienna, Austria. Association for Computationa...
-
[36]
Pu Miao, Zeyao Du, and Junlin Zhang. 2023. https://doi.org/10.1145/3583780.3614833 Deb CSE : Rethinking unsupervised contrastive sentence embedding learning in the debiasing perspective . In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, (CIKM 23), page 1847–1856. ACM
- [37]
- [38]
-
[39]
Seoyoon Park, Hyeji Choi, Minseon Kim, Subin An, Xiaonan Wang, Gyuri Choi, and Hansaem Kim. 2025. https://doi.org/10.18653/v1/2025.emnlp-main.1540 FLUID QA : A multilingual benchmark for figurative language usage in dialogue across E nglish, C hinese, and K orean . In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, ...
-
[40]
Dylan Phelps, Thomas Pickard, Maggie Mi, Edward Gow-Smith, and Aline Villavicencio. 2024. https://aclanthology.org/2024.mwe-1.22/ Sign of the times: Evaluating the use of large language models for idiomaticity detection . In Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024, pages 178--187, T...
work page 2024
-
[41]
Howard R. Pollio, John M. Barlow, Howard J. Fine, and Marilyn R. Pollio. 1977. Psychology and the Poetics of Growth: Figurative Language in Psychology, Psychotherapy, and Education. Lawrence Erlbaum, Hillsdale, NJ
work page 1977
-
[42]
Jipeng Qiang, Yang Li, Chaowei Zhang, Yun Li, Yi Zhu, Yunhao Yuan, and Xindong Wu. 2023. https://doi.org/10.1162/tacl_a_00572 C hinese idiom paraphrasing . Transactions of the Association for Computational Linguistics, 11:740--754
-
[43]
Qwen, :, An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, and 25 others. 2025. https://arxiv.org/abs/2412.15115 Qwen2.5 technical report . Preprint, arXiv:2412.15115
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[44]
Carlos Ramisch, Agata Savary, Bruno Guillaume, Jakub Waszczuk, Marie Candito, Ashwini Vaidya, Verginica Barbu Mititelu, Archna Bhatia, Uxoa I \ n urrieta, Voula Giouli, Tunga G \"u ng \"o r, Menghan Jiang, Timm Lichte, Chaya Liebeskind, Johanna Monti, Renata Ramisch, Sara Stymne, Abigail Walsh, and Hongzhi Xu. 2020. https://aclanthology.org/2020.mwe-1.14/...
work page 2020
-
[45]
Nils Reimers and Iryna Gurevych. 2019. https://doi.org/10.18653/v1/D19-1410 Sentence- BERT : Sentence embeddings using S iamese BERT -networks . In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982--3992, Hong Kong, Chi...
-
[46]
Stephen Robertson and Hugo Zaragoza. 2009. https://doi.org/10.1561/1500000019 The probabilistic relevance framework: Bm25 and beyond . Found. Trends Inf. Retr., 3(4):333–389
-
[47]
Keshav Santhanam, Omar Khattab, Jon Saad-Falcon, Christopher Potts, and Matei Zaharia. 2022. https://doi.org/10.18653/v1/2022.naacl-main.272 C ol BERT v2: Effective and efficient retrieval via lightweight late interaction . In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language ...
-
[48]
Agata Savary, Cherifa Ben Khelil, Carlos Ramisch, Voula Giouli, Verginica Barbu Mititelu, Najet Hadj Mohamed, Cvetana Krstev, Chaya Liebeskind, Hongzhi Xu, Sara Stymne, Tunga G \"u ng \"o r, Thomas Pickard, Bruno Guillaume, Eduard Bej c ek, Archna Bhatia, Marie Candito, Polona Gantar, Uxoa I \ n urrieta, Albert Gatt, and 9 others. 2023. https://doi.org/10...
- [49]
-
[50]
Manon Scholivet, Agata Savary, Carlos Ramisch, Eric Bilinski, Takuya Nakamura, Maria Mitrofan, and Vasile Pais. 2026. https://doi.org/10.18653/v1/2026.mwe-1.33 Edition 2.0 of the PARSEME shared task on multilingual identification and paraphrasing of multiword expressions . In Proceedings of the 22nd Workshop on Multiword Expressions ( MWE 2026) , pages 25...
-
[51]
Zhan Shi, Guoyin Wang, Ke Bai, Jiwei Li, Xiang Li, Qingjun Cui, Belinda Zeng, Trishul Chilimbi, and Xiaodan Zhu. 2023. https://doi.org/10.18653/v1/2023.emnlp-main.448 O ss CSE : Overcoming surface structure bias in contrastive learning for unsupervised sentence embedding . In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Proc...
-
[52]
Caroline Sporleder, Linlin Li, Philip Gorinski, and Xaver Koch. 2010. https://aclanthology.org/L10-1425/ Idioms in context: The IDIX corpus . In Proceedings of the Seventh International Conference on Language Resources and Evaluation ( LREC '10) , Valletta, Malta. European Language Resources Association (ELRA)
work page 2010
-
[53]
Smith, Luke Zettlemoyer, and Tao Yu
Hongjin Su, Weijia Shi, Jungo Kasai, Yizhong Wang, Yushi Hu, Mari Ostendorf, Wen-tau Yih, Noah A. Smith, Luke Zettlemoyer, and Tao Yu. 2023. https://doi.org/10.18653/v1/2023.findings-acl.71 One embedder, any task: Instruction-finetuned text embeddings . In Findings of the Association for Computational Linguistics: ACL 2023, pages 1102--1121, Toronto, Cana...
-
[54]
Shiva Taslimipoor, Sara Bahaadini, and Ekaterina Kochmar. 2020. https://aclanthology.org/2020.mwe-1.19/ MTLB - STRUCT @parseme 2020: Capturing unseen multiword expressions using multi-task learning and pre-trained masked language models . In Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons, pages 142--148, online. Associa...
work page 2020
-
[55]
Harish Tayyar Madabushi, Edward Gow-Smith, Marcos Garcia, Carolina Scarton, Marco Idiart, and Aline Villavicencio. 2022. https://doi.org/10.18653/v1/2022.semeval-1.13 S em E val-2022 task 2: Multilingual idiomaticity detection and sentence embedding . In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), pages 107--121, ...
-
[56]
Simone Tedeschi, Federico Martelli, and Roberto Navigli. 2022. https://doi.org/10.18653/v1/2022.findings-naacl.208 ID 10 M : Idiom identification in 10 languages . In Findings of the Association for Computational Linguistics: NAACL 2022, pages 2715--2726, Seattle, United States. Association for Computational Linguistics
-
[57]
Su Nam Kim Timothy Baldwin. 2010. Handbook of Natural Language Processing, chapter 2:267-292
work page 2010
- [58]
-
[59]
Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, and Furu Wei. 2024 a . https://arxiv.org/abs/2212.03533 Text embeddings by weakly-supervised contrastive pre-training . Preprint, arXiv:2212.03533
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[60]
Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder, and Furu Wei. 2024 b . https://doi.org/10.18653/v1/2024.acl-long.642 Improving text embeddings with large language models . In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 11897--11916, Bangkok, Thailand. Associatio...
-
[61]
Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder, and Furu Wei. 2024 c . https://arxiv.org/abs/2402.05672 Multilingual e5 text embeddings: A technical report . Preprint, arXiv:2402.05672
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[62]
Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. 2023. https://arxiv.org/abs/2203.11171 Self- C onsistency improves chain of thought reasoning in language models . Preprint, arXiv:2203.11171
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[63]
Uriel Weinreich. 1969. Problems in the analysis of idioms. In Problems in the Analysis of Idioms, pages 23--82. University of California Press, Berkeley
work page 1969
-
[64]
Orion Weller, Benjamin Chang, Sean MacAvaney, Kyle Lo, Arman Cohan, Benjamin Van Durme, Dawn Lawrie, and Luca Soldaini. 2025. https://doi.org/10.18653/v1/2025.naacl-long.597 F ollow IR : Evaluating and teaching information retrieval models to follow instructions . In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Associat...
-
[65]
Di Wu, Da Yin, and Kai-Wei Chang. 2024. https://doi.org/10.18653/v1/2024.findings-acl.117 KPE val: Towards fine-grained semantic-based keyphrase evaluation . In Findings of the Association for Computational Linguistics: ACL 2024, pages 1959--1981, Bangkok, Thailand. Association for Computational Linguistics
-
[66]
Shitao Xiao, Zheng Liu, Peitian Zhang, Niklas Muennighoff, Defu Lian, and Jian-Yun Nie. 2024. https://arxiv.org/abs/2309.07597 C-pack: Packed resources for general chinese embeddings . Preprint, arXiv:2309.07597
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[67]
An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jialong Tang, Jialin Wang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Ma, and 43 others. 2024. https://arxiv.org/abs/2407.10671 Qwen2 technical report . Preprint, arXiv:2407.10671
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[68]
Ziheng Zeng and Suma Bhat. 2021. https://doi.org/10.1162/tacl_a_00442 Idiomatic expression identification using semantic compatibility . Transactions of the Association for Computational Linguistics, 9:1546--1562
- [69]
-
[70]
Xin Zhang, Yanzhao Zhang, Wen Xie, Dingkun Long, Mingxin Li, Pengjun Xie, Meishan Zhang, Wenjie Li, and Min Zhang. 2025 b . https://openreview.net/forum?id=NC6G1KCxlt Phased training for LLM -powered text retrieval models beyond data scaling . In Second Conference on Language Modeling
work page 2025
-
[71]
Yanzhao Zhang, Mingxin Li, Dingkun Long, Xin Zhang, Huan Lin, Baosong Yang, Pengjun Xie, An Yang, Dayiheng Liu, Junyang Lin, Fei Huang, and Jingren Zhou. 2025 c . https://arxiv.org/abs/2506.05176 Qwen3 embedding: Advancing text embedding and reranking through foundation models . Preprint, arXiv:2506.05176
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[72]
Jianing Zhou, Hongyu Gong, and Suma Bhat. 2021 a . https://doi.org/10.18653/v1/2021.mwe-1.5 PIE : A parallel idiomatic expression corpus for idiomatic sentence generation and paraphrasing . In Proceedings of the 17th Workshop on Multiword Expressions (MWE 2021), pages 33--48, Online. Association for Computational Linguistics
- [73]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.