G-IdiomAlign: A Gloss-Pivoted Benchmark for Cross-Lingual Idiom Alignment

Derek F. Wong; Fengying Ye; Lidia S. Chao; Runzhe Zhan; Yanming Sun; Zheqi Zhang

arxiv: 2606.18989 · v1 · pith:ZX5BN4ZLnew · submitted 2026-06-17 · 💻 cs.CL · cs.AI

G-IdiomAlign: A Gloss-Pivoted Benchmark for Cross-Lingual Idiom Alignment

Fengying Ye , Yanming Sun , Runzhe Zhan , Zheqi Zhang , Lidia S. Chao , Derek F. Wong This is my paper

Pith reviewed 2026-06-26 20:37 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords cross-lingual idiom alignmentgloss-pivoted benchmarkmultiple-choice equivalencegloss-contrastive generationliteral translation biasLLM evaluationlow-resource languagessemantic pivot

0 comments

The pith

English Wiktionary glosses improve cross-lingual idiom generation and matching for language models, though gains remain modest and literal biases persist.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds a benchmark that anchors idioms from many languages to English glosses drawn from Wiktionary so that alignment can be measured in a controlled way. It runs two tests: one multiple-choice task that isolates error types and one generation task that contrasts model outputs when the gloss is supplied or withheld. Across tested models the gloss raises semantic similarity scores and reduces literal translations, yet absolute performance stays low, especially for low-resource target languages. The work therefore isolates the contribution of an explicit meaning pivot while documenting the remaining difficulty of non-compositional expressions.

Core claim

The paper claims that a gloss-pivoted benchmark enables reproducible cross-lingual idiom alignment evaluation, that supplying the English gloss during generation produces measurably better outputs under an embedding-based semantic proxy than the no-gloss condition, and that the resulting performance difference concentrates more in attention heads than across layers, with stronger gloss anchoring linked to higher-quality generations.

What carries the argument

The Gloss-Contrastive Generation protocol that compares model outputs on identical inputs with and without the English gloss to measure the effect of the explicit semantic pivot.

If this is right

Literal translation remains the dominant error mode and is only partially offset by gloss input.
Attention-head differences rather than layer-wide changes drive the observed gains in at least one tested model.
Generations that more strongly reference the gloss tend to receive higher semantic scores.
The high-confidence reference alignment set supports consistent future comparisons.
Large headroom remains in open-ended generation even after glosses are supplied.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If gloss pivots help here, comparable meaning anchors could be tested on other figurative constructions such as metaphors or proverbs.
The concentration of effect in attention heads suggests targeted interventions at those positions might amplify the gloss benefit.
The modest absolute numbers imply that pairing glosses with additional signals such as example sentences could produce larger lifts.

Load-bearing premise

English Wiktionary glosses supply a reliable language-neutral description of idiom meaning that holds across different languages and cultures.

What would settle it

No measurable rise in embedding similarity or drop in literal-translation errors when the gloss is added, across a wider range of models and language pairs than tested here.

Figures

Figures reproduced from arXiv: 2606.18989 by Derek F. Wong, Fengying Ye, Lidia S. Chao, Runzhe Zhan, Yanming Sun, Zheqi Zhang.

**Figure 2.** Figure 2: Retention curves for OpenAI text-embedding [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Task 1 outcomes by target-language regime. [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

read the original abstract

Idioms are difficult to transfer across languages due to their non-compositionality and weak surface-form grounding, making literal mappings unreliable. We present G-IdiomAlign, a gloss-pivoted benchmark where each idiom is anchored by an English gloss from Wiktionary. We further construct a high-confidence reference alignment set for reproducible evaluation. G-IdiomAlign supports two protocols: (1) a controlled Multiple-Choice Idiom Equivalence with typed distractors for error attribution; and (2) a Gloss-Contrastive Generation contrasting No-gloss and With-gloss inputs to isolate the effect of an explicit semantic pivot. Across diverse LLMs, a bias to literal translation is a dominant failure mode, especially when the target is a low-resource language. Glosses consistently improve Gloss-Contrastive Generation under an embedding-based semantic proxy, but performance remains modest, indicating substantial headroom in the open output space. Subsequent analysis on Qwen3-8B further suggests that cross-condition differences are concentrated more in attention heads than in layers, while better With-gloss generations coincide with stronger gloss anchoring.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

G-IdiomAlign releases a new gloss-pivoted idiom alignment benchmark with two protocols, but the English Wiktionary glosses as language-neutral pivots remain unvalidated.

read the letter

The paper's main contribution is a new benchmark called G-IdiomAlign that anchors idioms to English Wiktionary glosses and supplies a high-confidence reference alignment set. It defines two concrete protocols: typed multiple-choice equivalence for error analysis and a gloss-contrastive generation task that compares no-gloss versus with-gloss inputs under an embedding proxy.

They show literal translation bias as a dominant failure mode across LLMs, worse on low-resource targets, and report that adding the gloss improves the generation scores modestly while leaving substantial headroom. The attention-head analysis on Qwen3-8B is a small extra observation.

The load-bearing assumption is that the English glosses function as accurate, culture-neutral semantic pivots. The abstract gives no evidence of validation against native speakers or cultural equivalence checks, and the stress-test concern lands because any measured gloss benefit could partly reflect proxy artifacts or English-centric bias rather than genuine alignment gains. Data selection and distractor construction details are also thin in the provided text.

This work is for people who build or evaluate multilingual MT systems and need reproducible idiom test sets. It is not reshaping the field but supplies a concrete resource that others can use or extend.

I would bring it to a reading group for the dataset and protocols. I would not cite it in my own work soon. It deserves peer review because benchmark papers with released data and clear evaluation setups are worth referee time even when the central pivot assumption needs more scrutiny.

Referee Report

2 major / 1 minor

Summary. The paper introduces G-IdiomAlign, a gloss-pivoted benchmark for cross-lingual idiom alignment anchored by English Wiktionary glosses. It constructs a high-confidence reference alignment set and supports two protocols: Multiple-Choice Idiom Equivalence with typed distractors for error analysis, and Gloss-Contrastive Generation contrasting No-gloss and With-gloss inputs. Experiments across LLMs identify literal translation bias as a dominant failure mode (especially for low-resource targets), show consistent but modest gains from glosses under an embedding-based semantic proxy, and report that cross-condition differences in Qwen3-8B are concentrated more in attention heads than layers, with better With-gloss outputs linked to stronger gloss anchoring.

Significance. If the central findings hold, the reproducible high-confidence reference alignment set and the controlled protocols provide a useful standardized resource for evaluating cross-lingual idiom handling in LLMs. The explicit isolation of gloss effects and the attention-head analysis offer concrete directions for mitigating literal bias in non-compositional transfer. The benchmark construction itself, with its focus on error attribution via distractors, strengthens the empirical contribution to multilingual NLP.

major comments (2)

[Benchmark Construction / Gloss-Contrastive Generation protocol] The claim that glosses improve Gloss-Contrastive Generation under the embedding proxy rests on English Wiktionary glosses serving as reliable language-neutral semantic pivots. No validation of gloss fidelity (e.g., human ratings of meaning preservation across target languages or comparison to native-speaker paraphrases) is described in the benchmark construction or evaluation sections; this is load-bearing because any English-centric bias in the glosses would directly confound the With-gloss vs. No-gloss contrast and the reported modest gains.
[Benchmark Construction] Details on idiom selection criteria, the construction of the high-confidence reference alignment set, and the exact typing of distractors in the Multiple-Choice task are absent from the high-level description. These omissions prevent assessment of whether the observed literal bias is an artifact of data curation rather than a general LLM property.

minor comments (1)

[Evaluation / Gloss-Contrastive Generation] The specific embedding model and similarity metric constituting the 'embedding-based semantic proxy' should be stated explicitly, along with any preprocessing steps, to allow exact reproduction of the generation evaluation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below.

read point-by-point responses

Referee: [Benchmark Construction / Gloss-Contrastive Generation protocol] The claim that glosses improve Gloss-Contrastive Generation under the embedding proxy rests on English Wiktionary glosses serving as reliable language-neutral semantic pivots. No validation of gloss fidelity (e.g., human ratings of meaning preservation across target languages or comparison to native-speaker paraphrases) is described in the benchmark construction or evaluation sections; this is load-bearing because any English-centric bias in the glosses would directly confound the With-gloss vs. No-gloss contrast and the reported modest gains.

Authors: We agree this is a valid concern. The manuscript uses Wiktionary glosses as provided anchors without reporting cross-lingual fidelity validation. We will revise the benchmark construction section to explicitly discuss this assumption, note its potential to introduce English-centric bias, and add a limitations paragraph on the need for future human validation studies. The reported gains remain tied to the embedding proxy as stated. revision: partial
Referee: [Benchmark Construction] Details on idiom selection criteria, the construction of the high-confidence reference alignment set, and the exact typing of distractors in the Multiple-Choice task are absent from the high-level description. These omissions prevent assessment of whether the observed literal bias is an artifact of data curation rather than a general LLM property.

Authors: The full manuscript contains a benchmark construction section with these elements, but we acknowledge the descriptions are insufficiently detailed. We will expand it to specify idiom selection criteria (e.g., Wiktionary coverage and inclusion filters), the exact procedure for building the high-confidence reference alignment set, and the distractor typing scheme (literal, semantic, unrelated). This will enable readers to evaluate whether literal bias is curation-dependent. revision: yes

Circularity Check

0 steps flagged

No circularity: benchmark construction and empirical evaluation against external references

full rationale

The paper constructs a new benchmark (G-IdiomAlign) anchored to external Wiktionary glosses and evaluates LLMs on multiple-choice and generation tasks. No mathematical derivations, fitted parameters, predictions, or self-citation chains are present in the provided text. The central claims rest on empirical contrasts (with-gloss vs no-gloss) measured against independent embedding proxies and human references, with no reduction of outputs to inputs by construction. This is the expected non-finding for dataset papers.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Benchmark paper with no free parameters, axioms, or invented entities; relies on standard assumptions about idiom non-compositionality and Wiktionary gloss quality.

pith-pipeline@v0.9.1-grok · 5743 in / 1004 out tokens · 14292 ms · 2026-06-26T20:37:52.982851+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

34 extracted references · 16 canonical work pages

[1]

MAGPIE : A Large Corpus of Potentially Idiomatic Expressions

Haagsma, Hessel and Bos, Johan and Nissim, Malvina. MAGPIE : A Large Corpus of Potentially Idiomatic Expressions. Proceedings of the Twelfth Language Resources and Evaluation Conference. 2020

2020
[2]

Rolling the DICE on Idiomaticity: How LLM s Fail to Grasp Context

Mi, Maggie and Villavicencio, Aline and Moosavi, Nafise Sadat. Rolling the DICE on Idiomaticity: How LLM s Fail to Grasp Context. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.acl-long.362

work page doi:10.18653/v1/2025.acl-long.362 2025
[3]

M ulti C o PIE : A Multilingual Corpus of Potentially Idiomatic Expressions for Cross-lingual PIE Disambiguation

Sentsova, Uliana and Ciminari, Debora and Genabith, Josef Van and Espa \ n a-Bonet, Cristina. M ulti C o PIE : A Multilingual Corpus of Potentially Idiomatic Expressions for Cross-lingual PIE Disambiguation. Proceedings of the 21st Workshop on Multiword Expressions (MWE 2025). 2025. doi:10.18653/v1/2025.mwe-1.8

work page doi:10.18653/v1/2025.mwe-1.8 2025
[4]

2006.09479 , archivePrefix=

Prateek Saxena and Soma Paul , year=. 2006.09479 , archivePrefix=

arXiv 2006
[5]

A Hard Nut to Crack: Idiom Detection with Conversational Large Language Models

De Luca Fornaciari, Francesca and Altuna, Bego \ n a and Gonzalez-Dios, Itziar and Melero, Maite. A Hard Nut to Crack: Idiom Detection with Conversational Large Language Models. Proceedings of the 4th Workshop on Figurative Language Processing (FigLang 2024). 2024. doi:10.18653/v1/2024.figlang-1.5

work page doi:10.18653/v1/2024.figlang-1.5 2024
[6]

ID 10 M : Idiom Identification in 10 Languages

Tedeschi, Simone and Martelli, Federico and Navigli, Roberto. ID 10 M : Idiom Identification in 10 Languages. Findings of the Association for Computational Linguistics: NAACL 2022. 2022. doi:10.18653/v1/2022.findings-naacl.208

work page doi:10.18653/v1/2022.findings-naacl.208 2022
[7]

Edition 1.2 of the PARSEME Shared Task on Semi-supervised Identification of Verbal Multiword Expressions

Ramisch, Carlos and Savary, Agata and Guillaume, Bruno and Waszczuk, Jakub and Candito, Marie and Vaidya, Ashwini and Barbu Mititelu, Verginica and Bhatia, Archna and I. Edition 1.2 of the PARSEME Shared Task on Semi-supervised Identification of Verbal Multiword Expressions. Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons. 2020

2020
[8]

CLIX : Cross-Lingual Explanations of Idiomatic Expressions

Gluck, Aaron and Wense, Katharina Von Der and Pacheco, Maria Leonor. CLIX : Cross-Lingual Explanations of Idiomatic Expressions. Findings of the Association for Computational Linguistics: ACL 2025. 2025. doi:10.18653/v1/2025.findings-acl.233

work page doi:10.18653/v1/2025.findings-acl.233 2025
[9]

LI dioms: A Multilingual Linked Idioms Data Set

Moussallem, Diego and Sherif, Mohamed Ahmed and Esteves, Diego and Zampieri, Marcos and Ngonga Ngomo, Axel-Cyrille. LI dioms: A Multilingual Linked Idioms Data Set. Proceedings of the Eleventh International Conference on Language Resources and Evaluation ( LREC 2018). 2018

2018
[10]

CHENGYU - BENCH : Benchmarking Large Language Models for C hinese Idiom Understanding and Use

Fu, Yicheng and Huang, Zhemin and Yang, Liuxin and Lu, Yumeng and Dai, Zhongdongming. CHENGYU - BENCH : Benchmarking Large Language Models for C hinese Idiom Understanding and Use. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.119

work page doi:10.18653/v1/2025.emnlp-main.119 2025
[11]

and Doh, Joon Young and Rodan, Eid and Zhu, Kevin and O ' Brien, Sean

Donthi, Sundesh and Spencer, Maximilian and Patel, Om B. and Doh, Joon Young and Rodan, Eid and Zhu, Kevin and O ' Brien, Sean. Improving LLM Abilities in Idiomatic Translation. Proceedings of the First Workshop on Language Models for Low-Resource Languages. 2025

2025
[12]

From Neural Machine Translation to Large Language Models: Analysing Translation Quality of Chinese Idioms , author=
[13]

Evaluating

Cai Yang and Yao Dou and David Heineman and Xiaofeng Wu and Wei Xu , journal=. Evaluating. 2025 , volume=

2025
[14]

Automating Idiom Translation with Cross-Lingual Natural Language Generation Grounded In Semantic Analyses Using Large Language Models

Qian, Ming. Automating Idiom Translation with Cross-Lingual Natural Language Generation Grounded In Semantic Analyses Using Large Language Models. Proceedings of the 16th Conference of the Association for Machine Translation in the Americas (Volume 2: Presentations). 2024

2024
[15]

Crossing the Threshold: Idiomatic Machine Translation through Retrieval Augmentation and Loss Weighting

Liu, Emmy and Chaudhary, Aditi and Neubig, Graham. Crossing the Threshold: Idiomatic Machine Translation through Retrieval Augmentation and Loss Weighting. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.18653/v1/2023.emnlp-main.933

work page doi:10.18653/v1/2023.emnlp-main.933 2023
[16]

Proceedings of the AAAI Conference on Artificial Intelligence , author=

Translate Meanings, Not Just Words:. Proceedings of the AAAI Conference on Artificial Intelligence , author=. 2024 , month=. doi:10.1609/aaai.v38i17.29817 , abstractNote=

work page doi:10.1609/aaai.v38i17.29817 2024
[17]

Comparative Study of Multilingual Idioms and Similes in Large Language Models

Khoshtab, Paria and Namazifard, Danial and Masoudi, Mostafa and Akhgary, Ali and Mahdizadeh Sani, Samin and Yaghoobzadeh, Yadollah. Comparative Study of Multilingual Idioms and Similes in Large Language Models. Proceedings of the 31st International Conference on Computational Linguistics. 2025

2025
[18]

On Bilingual Lexicon Induction with Large Language Models

Li, Yaoyiran and Korhonen, Anna and Vuli \'c , Ivan. On Bilingual Lexicon Induction with Large Language Models. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.18653/v1/2023.emnlp-main.595

work page doi:10.18653/v1/2023.emnlp-main.595 2023
[19]

Proceedings of the AAAI Conference on Artificial Intelligence , author=

Enhancing Bilingual Lexicon Induction via Bi-directional Translation Pair Retrieving , volume=. Proceedings of the AAAI Conference on Artificial Intelligence , author=. 2024 , month=. doi:10.1609/aaai.v38i16.29744 , abstractNote=

work page doi:10.1609/aaai.v38i16.29744 2024
[20]

Investigating Idiomaticity in Word Representations

He, Wei and Vieira, Tiago Kramer and Garcia, Marcos and Scarton, Carolina and Idiart, Marco and Villavicencio, Aline. Investigating Idiomaticity in Word Representations. Computational Linguistics. 2025. doi:10.1162/coli_a_00546

work page doi:10.1162/coli_a_00546 2025
[21]

Evaluating Machine Translation Performance on C hinese Idioms with a Blacklist Method

Shao, Yutong and Sennrich, Rico and Webber, Bonnie and Fancellu, Federico. Evaluating Machine Translation Performance on C hinese Idioms with a Blacklist Method. Proceedings of the Eleventh International Conference on Language Resources and Evaluation ( LREC 2018). 2018

2018
[22]

2002 , organization=

Sag, Ivan A and Baldwin, Timothy and Bond, Francis and Copestake, Ann and Flickinger, Dan , booktitle=. 2002 , organization=

2002
[23]

A ttention is not E xplanation

Jain, Sarthak and Wallace, Byron C. A ttention is not E xplanation. Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019. doi:10.18653/v1/N19-1357

work page doi:10.18653/v1/n19-1357 2019
[24]

Contextualized Embeddings Encode Monolingual and Cross-lingual Knowledge of Idiomaticity

Fakharian, Samin and Cook, Paul. Contextualized Embeddings Encode Monolingual and Cross-lingual Knowledge of Idiomaticity. Proceedings of the 17th Workshop on Multiword Expressions (MWE 2021). 2021. doi:10.18653/v1/2021.mwe-1.4

work page doi:10.18653/v1/2021.mwe-1.4 2021
[25]

CLCL : Non-compositional Expression Detection with Contrastive Learning and Curriculum Learning

Zhou, Jianing and Zeng, Ziheng and Bhat, Suma. CLCL : Non-compositional Expression Detection with Contrastive Learning and Curriculum Learning. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023. doi:10.18653/v1/2023.acl-long.43

work page doi:10.18653/v1/2023.acl-long.43 2023
[26]

PIE : A Parallel Idiomatic Expression Corpus for Idiomatic Sentence Generation and Paraphrasing

Zhou, Jianing and Gong, Hongyu and Bhat, Suma. PIE : A Parallel Idiomatic Expression Corpus for Idiomatic Sentence Generation and Paraphrasing. Proceedings of the 17th Workshop on Multiword Expressions (MWE 2021). 2021. doi:10.18653/v1/2021.mwe-1.5

work page doi:10.18653/v1/2021.mwe-1.5 2021
[27]

IEKG : A Commonsense Knowledge Graph for Idiomatic Expressions

Zeng, Ziheng and Cheng, Kellen Tan and Nanniyur, Srihari Venkat and Zhou, Jianing and Bhat, Suma. IEKG : A Commonsense Knowledge Graph for Idiomatic Expressions. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.18653/v1/2023.emnlp-main.881

work page doi:10.18653/v1/2023.emnlp-main.881 2023
[28]

DeepSeek-AI and Aixin Liu and Aoxue Mei and Bangcai Lin and Bing Xue and Bingxuan Wang and Bingzheng Xu and Bochao Wu and Bowei Zhang and Chaofan Lin and Chen Dong and Chengda Lu and Chenggang Zhao and Chengqi Deng and Chenhao Xu and Chong Ruan and Damai Dai and Daya Guo and Dejian Yang and Deli Chen and Erhang Li and Fangqi Zhou and Fangyun Lin and Fucon...

Pith/arXiv arXiv
[29]

2025 , journal =

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities , author =. 2025 , journal =

2025
[30]

2024 , url=

Un Ministral, des Ministraux , author=. 2024 , url=

2024
[31]

2025 , eprint =

Qwen3 Technical Report , author =. 2025 , eprint =

2025
[32]

An Yang and Junyang Lin and Jingren Zhou and et al. , year=. Qwen3-Embedding-
[33]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Exposing the Cracks: Vulnerabilities of Retrieval-Augmented LLM-based Machine Translation , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
[34]

Chao and Derek F

Fengying Ye and Shanshan Wang and Lidia S. Chao and Derek F. Wong. Probing Semantic Alignment, Lexical Invariance, and Syntactic Influence in LLM Metaphor Processing. Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2026

2026

[1] [1]

MAGPIE : A Large Corpus of Potentially Idiomatic Expressions

Haagsma, Hessel and Bos, Johan and Nissim, Malvina. MAGPIE : A Large Corpus of Potentially Idiomatic Expressions. Proceedings of the Twelfth Language Resources and Evaluation Conference. 2020

2020

[2] [2]

Rolling the DICE on Idiomaticity: How LLM s Fail to Grasp Context

Mi, Maggie and Villavicencio, Aline and Moosavi, Nafise Sadat. Rolling the DICE on Idiomaticity: How LLM s Fail to Grasp Context. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.acl-long.362

work page doi:10.18653/v1/2025.acl-long.362 2025

[3] [3]

M ulti C o PIE : A Multilingual Corpus of Potentially Idiomatic Expressions for Cross-lingual PIE Disambiguation

Sentsova, Uliana and Ciminari, Debora and Genabith, Josef Van and Espa \ n a-Bonet, Cristina. M ulti C o PIE : A Multilingual Corpus of Potentially Idiomatic Expressions for Cross-lingual PIE Disambiguation. Proceedings of the 21st Workshop on Multiword Expressions (MWE 2025). 2025. doi:10.18653/v1/2025.mwe-1.8

work page doi:10.18653/v1/2025.mwe-1.8 2025

[4] [4]

2006.09479 , archivePrefix=

Prateek Saxena and Soma Paul , year=. 2006.09479 , archivePrefix=

arXiv 2006

[5] [5]

A Hard Nut to Crack: Idiom Detection with Conversational Large Language Models

De Luca Fornaciari, Francesca and Altuna, Bego \ n a and Gonzalez-Dios, Itziar and Melero, Maite. A Hard Nut to Crack: Idiom Detection with Conversational Large Language Models. Proceedings of the 4th Workshop on Figurative Language Processing (FigLang 2024). 2024. doi:10.18653/v1/2024.figlang-1.5

work page doi:10.18653/v1/2024.figlang-1.5 2024

[6] [6]

ID 10 M : Idiom Identification in 10 Languages

Tedeschi, Simone and Martelli, Federico and Navigli, Roberto. ID 10 M : Idiom Identification in 10 Languages. Findings of the Association for Computational Linguistics: NAACL 2022. 2022. doi:10.18653/v1/2022.findings-naacl.208

work page doi:10.18653/v1/2022.findings-naacl.208 2022

[7] [7]

Edition 1.2 of the PARSEME Shared Task on Semi-supervised Identification of Verbal Multiword Expressions

Ramisch, Carlos and Savary, Agata and Guillaume, Bruno and Waszczuk, Jakub and Candito, Marie and Vaidya, Ashwini and Barbu Mititelu, Verginica and Bhatia, Archna and I. Edition 1.2 of the PARSEME Shared Task on Semi-supervised Identification of Verbal Multiword Expressions. Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons. 2020

2020

[8] [8]

CLIX : Cross-Lingual Explanations of Idiomatic Expressions

Gluck, Aaron and Wense, Katharina Von Der and Pacheco, Maria Leonor. CLIX : Cross-Lingual Explanations of Idiomatic Expressions. Findings of the Association for Computational Linguistics: ACL 2025. 2025. doi:10.18653/v1/2025.findings-acl.233

work page doi:10.18653/v1/2025.findings-acl.233 2025

[9] [9]

LI dioms: A Multilingual Linked Idioms Data Set

Moussallem, Diego and Sherif, Mohamed Ahmed and Esteves, Diego and Zampieri, Marcos and Ngonga Ngomo, Axel-Cyrille. LI dioms: A Multilingual Linked Idioms Data Set. Proceedings of the Eleventh International Conference on Language Resources and Evaluation ( LREC 2018). 2018

2018

[10] [10]

CHENGYU - BENCH : Benchmarking Large Language Models for C hinese Idiom Understanding and Use

Fu, Yicheng and Huang, Zhemin and Yang, Liuxin and Lu, Yumeng and Dai, Zhongdongming. CHENGYU - BENCH : Benchmarking Large Language Models for C hinese Idiom Understanding and Use. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.119

work page doi:10.18653/v1/2025.emnlp-main.119 2025

[11] [11]

and Doh, Joon Young and Rodan, Eid and Zhu, Kevin and O ' Brien, Sean

Donthi, Sundesh and Spencer, Maximilian and Patel, Om B. and Doh, Joon Young and Rodan, Eid and Zhu, Kevin and O ' Brien, Sean. Improving LLM Abilities in Idiomatic Translation. Proceedings of the First Workshop on Language Models for Low-Resource Languages. 2025

2025

[12] [12]

From Neural Machine Translation to Large Language Models: Analysing Translation Quality of Chinese Idioms , author=

[13] [13]

Evaluating

Cai Yang and Yao Dou and David Heineman and Xiaofeng Wu and Wei Xu , journal=. Evaluating. 2025 , volume=

2025

[14] [14]

Automating Idiom Translation with Cross-Lingual Natural Language Generation Grounded In Semantic Analyses Using Large Language Models

Qian, Ming. Automating Idiom Translation with Cross-Lingual Natural Language Generation Grounded In Semantic Analyses Using Large Language Models. Proceedings of the 16th Conference of the Association for Machine Translation in the Americas (Volume 2: Presentations). 2024

2024

[15] [15]

Crossing the Threshold: Idiomatic Machine Translation through Retrieval Augmentation and Loss Weighting

Liu, Emmy and Chaudhary, Aditi and Neubig, Graham. Crossing the Threshold: Idiomatic Machine Translation through Retrieval Augmentation and Loss Weighting. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.18653/v1/2023.emnlp-main.933

work page doi:10.18653/v1/2023.emnlp-main.933 2023

[16] [16]

Proceedings of the AAAI Conference on Artificial Intelligence , author=

Translate Meanings, Not Just Words:. Proceedings of the AAAI Conference on Artificial Intelligence , author=. 2024 , month=. doi:10.1609/aaai.v38i17.29817 , abstractNote=

work page doi:10.1609/aaai.v38i17.29817 2024

[17] [17]

Comparative Study of Multilingual Idioms and Similes in Large Language Models

Khoshtab, Paria and Namazifard, Danial and Masoudi, Mostafa and Akhgary, Ali and Mahdizadeh Sani, Samin and Yaghoobzadeh, Yadollah. Comparative Study of Multilingual Idioms and Similes in Large Language Models. Proceedings of the 31st International Conference on Computational Linguistics. 2025

2025

[18] [18]

On Bilingual Lexicon Induction with Large Language Models

Li, Yaoyiran and Korhonen, Anna and Vuli \'c , Ivan. On Bilingual Lexicon Induction with Large Language Models. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.18653/v1/2023.emnlp-main.595

work page doi:10.18653/v1/2023.emnlp-main.595 2023

[19] [19]

Proceedings of the AAAI Conference on Artificial Intelligence , author=

Enhancing Bilingual Lexicon Induction via Bi-directional Translation Pair Retrieving , volume=. Proceedings of the AAAI Conference on Artificial Intelligence , author=. 2024 , month=. doi:10.1609/aaai.v38i16.29744 , abstractNote=

work page doi:10.1609/aaai.v38i16.29744 2024

[20] [20]

Investigating Idiomaticity in Word Representations

He, Wei and Vieira, Tiago Kramer and Garcia, Marcos and Scarton, Carolina and Idiart, Marco and Villavicencio, Aline. Investigating Idiomaticity in Word Representations. Computational Linguistics. 2025. doi:10.1162/coli_a_00546

work page doi:10.1162/coli_a_00546 2025

[21] [21]

Evaluating Machine Translation Performance on C hinese Idioms with a Blacklist Method

Shao, Yutong and Sennrich, Rico and Webber, Bonnie and Fancellu, Federico. Evaluating Machine Translation Performance on C hinese Idioms with a Blacklist Method. Proceedings of the Eleventh International Conference on Language Resources and Evaluation ( LREC 2018). 2018

2018

[22] [22]

2002 , organization=

Sag, Ivan A and Baldwin, Timothy and Bond, Francis and Copestake, Ann and Flickinger, Dan , booktitle=. 2002 , organization=

2002

[23] [23]

A ttention is not E xplanation

Jain, Sarthak and Wallace, Byron C. A ttention is not E xplanation. Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019. doi:10.18653/v1/N19-1357

work page doi:10.18653/v1/n19-1357 2019

[24] [24]

Contextualized Embeddings Encode Monolingual and Cross-lingual Knowledge of Idiomaticity

Fakharian, Samin and Cook, Paul. Contextualized Embeddings Encode Monolingual and Cross-lingual Knowledge of Idiomaticity. Proceedings of the 17th Workshop on Multiword Expressions (MWE 2021). 2021. doi:10.18653/v1/2021.mwe-1.4

work page doi:10.18653/v1/2021.mwe-1.4 2021

[25] [25]

CLCL : Non-compositional Expression Detection with Contrastive Learning and Curriculum Learning

Zhou, Jianing and Zeng, Ziheng and Bhat, Suma. CLCL : Non-compositional Expression Detection with Contrastive Learning and Curriculum Learning. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023. doi:10.18653/v1/2023.acl-long.43

work page doi:10.18653/v1/2023.acl-long.43 2023

[26] [26]

PIE : A Parallel Idiomatic Expression Corpus for Idiomatic Sentence Generation and Paraphrasing

Zhou, Jianing and Gong, Hongyu and Bhat, Suma. PIE : A Parallel Idiomatic Expression Corpus for Idiomatic Sentence Generation and Paraphrasing. Proceedings of the 17th Workshop on Multiword Expressions (MWE 2021). 2021. doi:10.18653/v1/2021.mwe-1.5

work page doi:10.18653/v1/2021.mwe-1.5 2021

[27] [27]

IEKG : A Commonsense Knowledge Graph for Idiomatic Expressions

Zeng, Ziheng and Cheng, Kellen Tan and Nanniyur, Srihari Venkat and Zhou, Jianing and Bhat, Suma. IEKG : A Commonsense Knowledge Graph for Idiomatic Expressions. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.18653/v1/2023.emnlp-main.881

work page doi:10.18653/v1/2023.emnlp-main.881 2023

[28] [28]

DeepSeek-AI and Aixin Liu and Aoxue Mei and Bangcai Lin and Bing Xue and Bingxuan Wang and Bingzheng Xu and Bochao Wu and Bowei Zhang and Chaofan Lin and Chen Dong and Chengda Lu and Chenggang Zhao and Chengqi Deng and Chenhao Xu and Chong Ruan and Damai Dai and Daya Guo and Dejian Yang and Deli Chen and Erhang Li and Fangqi Zhou and Fangyun Lin and Fucon...

Pith/arXiv arXiv

[29] [29]

2025 , journal =

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities , author =. 2025 , journal =

2025

[30] [30]

2024 , url=

Un Ministral, des Ministraux , author=. 2024 , url=

2024

[31] [31]

2025 , eprint =

Qwen3 Technical Report , author =. 2025 , eprint =

2025

[32] [32]

An Yang and Junyang Lin and Jingren Zhou and et al. , year=. Qwen3-Embedding-

[33] [33]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Exposing the Cracks: Vulnerabilities of Retrieval-Augmented LLM-based Machine Translation , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

[34] [34]

Chao and Derek F

Fengying Ye and Shanshan Wang and Lidia S. Chao and Derek F. Wong. Probing Semantic Alignment, Lexical Invariance, and Syntactic Influence in LLM Metaphor Processing. Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2026

2026