Evaluating the Impact of Verbal Multiword Expressions on Machine Translation

Linfeng Liu; Saptarshi Ghosh; Tianyu Jiang

arxiv: 2508.17458 · v2 · submitted 2025-08-24 · 💻 cs.CL

Evaluating the Impact of Verbal Multiword Expressions on Machine Translation

Linfeng Liu , Saptarshi Ghosh , Tianyu Jiang This is my paper

Pith reviewed 2026-05-18 20:44 UTC · model grok-4.3

classification 💻 cs.CL

keywords verbal multiword expressionsmachine translationVMWEstranslation qualityidiomslight verb constructionsverb-particle constructions

0 comments

The pith

Verbal multiword expressions degrade machine translation quality primarily due to the expressions themselves.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests how verbal multiword expressions affect machine translation from English into other languages. It shows that sentences containing verbal idioms, verb-particle constructions, and light verb constructions produce lower-quality output than sentences without them. The drop occurs even after accounting for general sentence features, pointing to the expressions as the direct source of the problem. This finding helps explain a persistent weakness in current translation systems when they encounter everyday idiomatic language.

Core claim

Our experimental results consistently show that VMWEs negatively affect translation quality, with deeper analysis indicating that this degradation is primarily attributable to the VMWE itself rather than general sentence-level difficulty.

What carries the argument

Side-by-side evaluation of translation quality on VMWE-containing sentences versus matched controls drawn from established multiword expression and machine translation datasets.

If this is right

Translation systems produce measurably worse output on sentences that contain verbal idioms, verb-particle constructions, or light verb constructions.
The quality loss traces to the multiword expression rather than broader sentence properties.
The released evaluation framework lets researchers test whether new models handle VMWEs more effectively.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Better modeling of VMWEs could improve translation of natural spoken and written language.
The same isolation method could be applied to non-verbal multiword expressions or to other language pairs.
Fine-tuning translation models on VMWE-rich data offers a direct test of whether the observed gap can be closed.

Load-bearing premise

The chosen datasets and evaluation metrics successfully isolate the effect of VMWEs from other sources of translation difficulty such as sentence length, vocabulary rarity, or syntactic complexity.

What would settle it

A matched-pair analysis showing no quality difference between VMWE sentences and controls of equal length and complexity would contradict the claim that the expressions themselves drive the degradation.

Figures

Figures reproduced from arXiv: 2508.17458 by Linfeng Liu, Saptarshi Ghosh, Tianyu Jiang.

**Figure 2.** Figure 2: Comparison of the translation quality between sentences with and without VMWE, [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Paraphrasing structure. Ori: QE score between the original sentence and its direct translation. Para: QE score between paraphrased sentence and its direct translation. Mix: QE score between the original sentence and the translation of the paraphrased sentence. MT System en-zh en-de en-ru en-cs Ori δmix δpara Ori δmix δpara Ori δmix δpara Ori δmix δpara VID Madlad 7.25 +1.77 +2.21 3.00 +0.24 +0.92 6.49 +0.8… view at source ↗

**Figure 4.** Figure 4: Ranking of MT systems based on the MetricX-24 QE scores. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Screenshot from Google Translate webpage, an English sentence with an idiom [PITH_FULL_IMAGE:figures/full_fig_p020_5.png] view at source ↗

**Figure 6.** Figure 6: Screenshot from Google Translate webpage, an English sentence with an idiom “ [PITH_FULL_IMAGE:figures/full_fig_p021_6.png] view at source ↗

**Figure 7.** Figure 7: Screenshot from Google Translate webpage, an English sentence with an idiom “ [PITH_FULL_IMAGE:figures/full_fig_p021_7.png] view at source ↗

**Figure 8.** Figure 8: Screenshot from Google Translate webpage, an English sentence with an idiom “ [PITH_FULL_IMAGE:figures/full_fig_p021_8.png] view at source ↗

**Figure 9.** Figure 9: Screenshot from Google Translate webpage, an English sentence with an idiom [PITH_FULL_IMAGE:figures/full_fig_p021_9.png] view at source ↗

**Figure 10.** Figure 10: Screenshot from Google Translate webpage, an English sentence with an idiom [PITH_FULL_IMAGE:figures/full_fig_p022_10.png] view at source ↗

**Figure 11.** Figure 11: Screenshot from Google Translate webpage, an English sentence with an idiom [PITH_FULL_IMAGE:figures/full_fig_p022_11.png] view at source ↗

**Figure 12.** Figure 12: Ranking MT system’s on xCOMET QE scores for VMWE sentences. [PITH_FULL_IMAGE:figures/full_fig_p022_12.png] view at source ↗

**Figure 13.** Figure 13: Ranking languages based on translation quality for sentences with VMWE. [PITH_FULL_IMAGE:figures/full_fig_p023_13.png] view at source ↗

read the original abstract

Verbal multiword expressions (VMWEs) remain difficult for machine translation because their meanings are often not recoverable from their component words. In this study, we analyze the impact of three VMWE categories -- verbal idioms, verb-particle constructions, and light verb constructions -- on machine translation quality from English to multiple languages. Using both established multiword expression datasets and standard machine translation datasets, we evaluate how state-of-the-art translation systems handle these expressions. Our experimental results consistently show that VMWEs negatively affect translation quality, with deeper analysis indicating that this degradation is primarily attributable to the VMWE itself rather than general sentence-level difficulty. We release our code and evaluation framework to test new MT systems for the community.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper measures how verbal multiword expressions lower MT quality across languages but leaves the isolation from sentence confounders unclear.

read the letter

The main takeaway is that sentences containing verbal multiword expressions translate worse than others in current systems, and the authors document this across three categories and multiple target languages using existing datasets. It is mostly a confirmation of a known practical issue rather than a fresh theoretical step forward. What the work does well is run the comparison on established MWE and MT resources, report consistent quality drops, and release the evaluation code so others can apply the same checks to new models. That release is the most immediately useful part for people who maintain or benchmark translation pipelines. The soft spot sits in the attribution step. The abstract claims deeper analysis shows the drop comes from the VMWE itself rather than general sentence difficulty, yet it does not describe explicit matching or regression on length, lexical rarity, or syntactic complexity. If VMWE sentences differ systematically on those dimensions, the results could be explained without invoking the expressions as the primary cause. The stress-test note on this point holds up from what is visible. Readers who build or evaluate production MT systems will find the numbers and the released framework worth a look, especially if they already care about idioms and light verbs. The paper is not foundational, but the experiments are straightforward and the question is relevant to real deployments. It deserves a serious referee because the empirical core is grounded enough to repay review time even if the controls need tightening.

Referee Report

1 major / 1 minor

Summary. The manuscript evaluates the impact of verbal multiword expressions (VMWEs) — specifically verbal idioms, verb-particle constructions, and light verb constructions — on machine translation quality from English to multiple languages. Using established MWE and MT datasets along with standard MT systems, the authors report that VMWEs consistently degrade translation quality. Through deeper analysis, they conclude that this effect is primarily attributable to the VMWEs themselves rather than general sentence-level difficulty. The code and evaluation framework are released for community use.

Significance. This study addresses a practical challenge in machine translation regarding the handling of non-compositional expressions. If the attribution of quality degradation specifically to VMWEs is supported by adequate controls, the findings could inform the development of MT systems better equipped to handle idiomatic language. The release of code and framework strengthens the work by enabling reproducibility and further testing on new systems.

major comments (1)

[Methods] The central claim that degradation is 'primarily attributable to the VMWE itself rather than general sentence-level difficulty' is load-bearing and requires explicit controls. The manuscript mentions an 'attempt to control for sentence-level difficulty' but provides no details on matching, stratification, or multivariate regression for confounders such as sentence length, lexical rarity, or syntactic complexity (Methods section). Without these, systematic differences between VMWE and non-VMWE items could explain the quality drop.

minor comments (1)

[Abstract] The abstract would benefit from naming the specific target languages and MT systems evaluated to allow readers to assess the scope of the reported consistency.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the thoughtful and constructive feedback on our manuscript. We address the major comment below and will incorporate revisions to strengthen the presentation of our methods.

read point-by-point responses

Referee: [Methods] The central claim that degradation is 'primarily attributable to the VMWE itself rather than general sentence-level difficulty' is load-bearing and requires explicit controls. The manuscript mentions an 'attempt to control for sentence-level difficulty' but provides no details on matching, stratification, or multivariate regression for confounders such as sentence length, lexical rarity, or syntactic complexity (Methods section). Without these, systematic differences between VMWE and non-VMWE items could explain the quality drop.

Authors: We agree that the central claim requires robust controls and that the Methods section would benefit from greater transparency. In the submitted manuscript we referenced an attempt to control for sentence-level difficulty through selection of comparable non-VMWE sentences from the same datasets, but we acknowledge that explicit details on matching procedures, stratification by length or complexity, or regression-based adjustment for lexical rarity were not provided. In the revised version we will expand the Methods section to describe these controls in full, including the specific criteria used for sentence matching and any additional statistical checks performed to isolate VMWE effects. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical evaluation study

full rationale

This paper performs an empirical evaluation of VMWE effects on MT quality using established external datasets and off-the-shelf translation systems. The central claim rests on experimental comparisons and deeper analysis of translation metrics, without any derivations, equations, fitted parameters, or predictions that reduce to author-defined inputs by construction. No self-citation chains or ansatzes are invoked as load-bearing justifications for the results. The study is self-contained against independent benchmarks and standard evaluation practices.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim depends on the assumption that standard MT evaluation metrics and existing VMWE datasets can isolate expression-specific effects. No free parameters or invented entities are introduced.

axioms (2)

domain assumption Established multiword expression datasets and standard machine translation benchmarks provide representative samples for measuring VMWE impact.
The study relies on these resources to draw conclusions about translation quality.
domain assumption Translation quality metrics reflect the specific contribution of VMWEs when sentence-level difficulty is controlled.
This underpins the claim that degradation is attributable to the VMWE itself.

pith-pipeline@v0.9.0 · 5644 in / 1281 out tokens · 34449 ms · 2026-05-18T20:44:58.481526+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages · 3 internal anchors

[1]

Mathieu Constant, Gül¸ sen Eryiˇgit, Johanna Monti, Lonneke van der Plas, Carlos Ramisch, Michael Rosner, and Amalia Todirascu

URL https://arxiv.org/abs/2312.05187. Mathieu Constant, Gül¸ sen Eryiˇgit, Johanna Monti, Lonneke van der Plas, Carlos Ramisch, Michael Rosner, and Amalia Todirascu. Survey: Multiword expression processing: A Survey. Computational Linguistics, 43(4):837–892, December 2017. doi: 10.1162/COLI_a_ 00302. URL https://aclanthology.org/J17-4005/. Silvio Ricardo ...

work page doi:10.1162/coli_a_ 2017
[2]

URL https://aclanthology.org/W19-6110/

Linköping University Electronic Press. URL https://aclanthology.org/W19-6110/. Menglong Cui, Pengzhi Gao, Wei Liu, Jian Luan, and Bin Wang. Multilingual machine translation with open large language models at practical scale: An empirical study, 2025. URL https://arxiv.org/abs/2502.02481. DeepSeek-AI. Deepseek-r1: Incentivizing reasoning capability in llms...

work page doi:10.18653/v1/2024.findings-acl.152 2025
[3]

Marian: Fast Neural Machine Translation in C++

doi: 10.1162/tacl_a_00683. URL https://aclanthology.org/2024.tacl-1.54/. Hessel Haagsma, Johan Bos, and Malvina Nissim. MAGPIE: A large corpus of potentially idiomatic expressions. In Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Mae- gaard, Joseph Mariani, Hé...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1162/tacl_a_00683 2024
[4]

doi: 10.18653/v1/2023.wmt-1.1

Association for Computational Linguistics. doi: 10.18653/v1/2023.wmt-1.1. URL https://aclanthology.org/2023.wmt-1.1/. Tom Kocmi, Eleftherios Avramidis, Rachel Bawden, Ondˇ rej Bojar, Anton Dvorkovich, et al. Findings of the WMT24 general machine translation shared task: The LLM era is here but MT is not solved yet. In Barry Haddow, Tom Kocmi, Philipp Koeh...

work page doi:10.18653/v1/2023.wmt-1.1 2023
[5]

LLM tropes: Revealing fine-grained values and opinions in large language models

Association for Computational Linguistics. doi: 10.18653/v1/2024.findings-emnlp

work page doi:10.18653/v1/2024.findings-emnlp 2024
[6]

Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs

URL https://aclanthology.org/2024.findings-emnlp.631. Shushen Manakhimova, Eleftherios Avramidis, Vivien Macketanz, Ekaterina Lapshinova- Koltunski, Sergei Bagdasarov, and Sebastian Möller. Linguistically motivated evaluation of the 2023 state-of-the-art machine translation: Can ChatGPT outperform NMT? In Philipp Koehn, Barry Haddow, Tom Kocmi, and Christ...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.18653/v1/2023.wmt-1.23 2024
[7]

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

URL https://aclanthology.org/2020.coling-main.296/. Anita Rácz, István Nagy T., and Veronika Vincze. 4FX: Light verb constructions in a multilingual parallel corpus. In Nicoletta Calzolari, Khalid Choukri, Thierry De- clerck, Hrafn Loftsson, Bente Maegaard, et al. (eds.), Proceedings of the Ninth Interna- tional Conference on Language Resources and Evalua...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.18653/v1/2020.acl-main.259 2020
[8]

Huacheng Song and Hongzhi Xu

URL https://arxiv.org/abs/2006.09479. Huacheng Song and Hongzhi Xu. A deep analysis of the impact of multiword expressions and named entities on Chinese-English machine translations. In Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (eds.), Findings of the Association for Computational Linguistics: EMNLP 2024, pp. 6154–6165, Miami, Florida, USA, Novemb...

work page doi:10.18653/v1/2024.findings-emnlp.357 2006
[9]

arXiv preprint arXiv:2010.11934 , year=

Association for Computational Linguistics. URL https://aclanthology.org/2022. wmt-1.42/. Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, and Colin Raffel. mt5: A massively multilingual pre-trained text-to-text transformer, 2021. URL https://arxiv.org/abs/2010.11934. 15 Andrea Zaninello and Alexandra Birch...

work page arXiv 2022
[10]

Madlad400 (Kudugunta et al., 2023): A Google multilingual machine translation model, based on the T5 architecture (Raffel et al., 2023) that was trained on 250 billion tokens covering over 450 languages

work page 2023
[11]

SeamlessM4T (Communication et al., 2023): Meta AI’s massively multilingual and multimodal machine translation model, supporting an impressive range of translation capabilities with 96 languages for text input/output

work page 2023
[12]

M2M100 (Fan et al., 2020): A Meta-powered multilingual encoder-decoder model, primarily designed for translation tasks, supporting direct translation between 100 languages without requiring English as an intermediate language

work page 2020
[13]

Opus-MT (Tiedemann & Thottingal, 2020): Provides open translation services built on the Marian neural machine translation framework (Junczys-Dowmunt et al., 2018), trained on Opus data, and later converted to PyTorch models for the Hugging Face ecosystem

work page 2020
[14]

LLaMAX3 Alpaca (Lu et al., 2024): An LLM-based machine translation model, LLaMAX combines powerful multilingual translation capabilities with instruction-following abilities. This model extends Meta’s LLaMA 3 architecture (Grattafiori et al., 2024) to support translation between over 100 languages without sacrificing its ability to follow complex instructions

work page 2024
[15]

Phi-4-multimodal (Microsoft et al., 2025): Built upon the pretrained Phi-4-mini model, Phi-4-multi can processes text, image, and audio inputs to generate text outputs. While primarily known for its multimodal capabilities, it can handle translation tasks as part of its broader language understanding abilities, making it another LLM-based MT system for our task

work page 2025
[16]

GemmaX2 (Cui et al., 2025): A very recent multilingual LLM-based translation model, that achieved state-of-the-art performance across 28 languages. Based on Google’s Gemma2 architecture (Team et al., 2024), it consistently outperforms other LLM-based MT models like TowerInstruct and XALMA, achieving competitive results with Google Translate and GPT-4-turbo

work page 2025
[17]

Once you’re retired you have three months to get out

Google Translate API3: A widely used multilingual neural machine translation service developed by Google, offering translations for 249 languages and language varieties as of March 2025. B Invalid Translations by LLM-Based MT Models During evaluation, we observed several instances of invalid outputs from LLM-based MT models. The most common cases where tr...

work page 2025
[18]

Is the second element a particle (e.g., ’up’, ’off’)? - No → Not VPC (A) - Yes → Continue

work page
[19]

Does the remaining verb convey the same meaning as the full verb-particle phrase? - Yes → Not VPC (B) - No → Continue

Remove the particle from the combination. Does the remaining verb convey the same meaning as the full verb-particle phrase? - Yes → Not VPC (B) - No → Continue

work page
[20]

D: Valid VPC (Particle significantly alters meaning) Answer with reasoning and ’Final Answer: [answer]

Does the inclusion of the particle create a non-compositional meaning that is significantly different from the verb’s original meaning? - No → Not VPC (C) - Yes → VPC (D) A: Not a particle B: Meaning remains similar without the particle C: Particle does not significantly alter the meaning "D: Valid VPC (Particle significantly alters meaning) Answer with r...

work page
[21]

[CRAN] Contains cranberry word? Yes → VID No → Next test

work page
[22]

[LEX] Regular replacement changes meaning? Yes → VID No → Next test

work page
[23]

[MORPH] Morphological changes affect meaning? Yes → VID No → Next test

work page
[24]

[MORPHSYNT] Morphosyntactic changes affect meaning? Yes → VID No → Next test

work page
[25]

[SYNT] Syntactic changes affect meaning? Yes → VID No → Not VID Examples: - VID: ’kick the bucket’, ’let the cat out of the bag’ - Non-VID: ’take a walk’, ’make a decision’ Instructions:

work page
[26]

Analyze each test sequentially

work page
[27]

Provide brief reasoning for each test

work page
[28]

gave a" is a light-verb construction (LVC), where

Conclude with ’Final Answer: [Yes/No]’ Is this candidate a Verbal Idiom (VID)? Apply the decision tree. 27 Table 9: Prompts for VMWE candidate paraphrasing. VMWE Prompt LVC You are an expert in linguistics. Given a sentence containing a multi-word expres- sion (VMWE), a Light Verb Construct (LVC). Your task is to rephrase the sentence to remove the VMWE w...

work page

[1] [1]

Mathieu Constant, Gül¸ sen Eryiˇgit, Johanna Monti, Lonneke van der Plas, Carlos Ramisch, Michael Rosner, and Amalia Todirascu

URL https://arxiv.org/abs/2312.05187. Mathieu Constant, Gül¸ sen Eryiˇgit, Johanna Monti, Lonneke van der Plas, Carlos Ramisch, Michael Rosner, and Amalia Todirascu. Survey: Multiword expression processing: A Survey. Computational Linguistics, 43(4):837–892, December 2017. doi: 10.1162/COLI_a_ 00302. URL https://aclanthology.org/J17-4005/. Silvio Ricardo ...

work page doi:10.1162/coli_a_ 2017

[2] [2]

URL https://aclanthology.org/W19-6110/

Linköping University Electronic Press. URL https://aclanthology.org/W19-6110/. Menglong Cui, Pengzhi Gao, Wei Liu, Jian Luan, and Bin Wang. Multilingual machine translation with open large language models at practical scale: An empirical study, 2025. URL https://arxiv.org/abs/2502.02481. DeepSeek-AI. Deepseek-r1: Incentivizing reasoning capability in llms...

work page doi:10.18653/v1/2024.findings-acl.152 2025

[3] [3]

Marian: Fast Neural Machine Translation in C++

doi: 10.1162/tacl_a_00683. URL https://aclanthology.org/2024.tacl-1.54/. Hessel Haagsma, Johan Bos, and Malvina Nissim. MAGPIE: A large corpus of potentially idiomatic expressions. In Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Mae- gaard, Joseph Mariani, Hé...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1162/tacl_a_00683 2024

[4] [4]

doi: 10.18653/v1/2023.wmt-1.1

Association for Computational Linguistics. doi: 10.18653/v1/2023.wmt-1.1. URL https://aclanthology.org/2023.wmt-1.1/. Tom Kocmi, Eleftherios Avramidis, Rachel Bawden, Ondˇ rej Bojar, Anton Dvorkovich, et al. Findings of the WMT24 general machine translation shared task: The LLM era is here but MT is not solved yet. In Barry Haddow, Tom Kocmi, Philipp Koeh...

work page doi:10.18653/v1/2023.wmt-1.1 2023

[5] [5]

LLM tropes: Revealing fine-grained values and opinions in large language models

Association for Computational Linguistics. doi: 10.18653/v1/2024.findings-emnlp

work page doi:10.18653/v1/2024.findings-emnlp 2024

[6] [6]

Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs

URL https://aclanthology.org/2024.findings-emnlp.631. Shushen Manakhimova, Eleftherios Avramidis, Vivien Macketanz, Ekaterina Lapshinova- Koltunski, Sergei Bagdasarov, and Sebastian Möller. Linguistically motivated evaluation of the 2023 state-of-the-art machine translation: Can ChatGPT outperform NMT? In Philipp Koehn, Barry Haddow, Tom Kocmi, and Christ...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.18653/v1/2023.wmt-1.23 2024

[7] [7]

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

URL https://aclanthology.org/2020.coling-main.296/. Anita Rácz, István Nagy T., and Veronika Vincze. 4FX: Light verb constructions in a multilingual parallel corpus. In Nicoletta Calzolari, Khalid Choukri, Thierry De- clerck, Hrafn Loftsson, Bente Maegaard, et al. (eds.), Proceedings of the Ninth Interna- tional Conference on Language Resources and Evalua...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.18653/v1/2020.acl-main.259 2020

[8] [8]

Huacheng Song and Hongzhi Xu

URL https://arxiv.org/abs/2006.09479. Huacheng Song and Hongzhi Xu. A deep analysis of the impact of multiword expressions and named entities on Chinese-English machine translations. In Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (eds.), Findings of the Association for Computational Linguistics: EMNLP 2024, pp. 6154–6165, Miami, Florida, USA, Novemb...

work page doi:10.18653/v1/2024.findings-emnlp.357 2006

[9] [9]

arXiv preprint arXiv:2010.11934 , year=

Association for Computational Linguistics. URL https://aclanthology.org/2022. wmt-1.42/. Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, and Colin Raffel. mt5: A massively multilingual pre-trained text-to-text transformer, 2021. URL https://arxiv.org/abs/2010.11934. 15 Andrea Zaninello and Alexandra Birch...

work page arXiv 2022

[10] [10]

Madlad400 (Kudugunta et al., 2023): A Google multilingual machine translation model, based on the T5 architecture (Raffel et al., 2023) that was trained on 250 billion tokens covering over 450 languages

work page 2023

[11] [11]

SeamlessM4T (Communication et al., 2023): Meta AI’s massively multilingual and multimodal machine translation model, supporting an impressive range of translation capabilities with 96 languages for text input/output

work page 2023

[12] [12]

M2M100 (Fan et al., 2020): A Meta-powered multilingual encoder-decoder model, primarily designed for translation tasks, supporting direct translation between 100 languages without requiring English as an intermediate language

work page 2020

[13] [13]

Opus-MT (Tiedemann & Thottingal, 2020): Provides open translation services built on the Marian neural machine translation framework (Junczys-Dowmunt et al., 2018), trained on Opus data, and later converted to PyTorch models for the Hugging Face ecosystem

work page 2020

[14] [14]

LLaMAX3 Alpaca (Lu et al., 2024): An LLM-based machine translation model, LLaMAX combines powerful multilingual translation capabilities with instruction-following abilities. This model extends Meta’s LLaMA 3 architecture (Grattafiori et al., 2024) to support translation between over 100 languages without sacrificing its ability to follow complex instructions

work page 2024

[15] [15]

Phi-4-multimodal (Microsoft et al., 2025): Built upon the pretrained Phi-4-mini model, Phi-4-multi can processes text, image, and audio inputs to generate text outputs. While primarily known for its multimodal capabilities, it can handle translation tasks as part of its broader language understanding abilities, making it another LLM-based MT system for our task

work page 2025

[16] [16]

GemmaX2 (Cui et al., 2025): A very recent multilingual LLM-based translation model, that achieved state-of-the-art performance across 28 languages. Based on Google’s Gemma2 architecture (Team et al., 2024), it consistently outperforms other LLM-based MT models like TowerInstruct and XALMA, achieving competitive results with Google Translate and GPT-4-turbo

work page 2025

[17] [17]

Once you’re retired you have three months to get out

Google Translate API3: A widely used multilingual neural machine translation service developed by Google, offering translations for 249 languages and language varieties as of March 2025. B Invalid Translations by LLM-Based MT Models During evaluation, we observed several instances of invalid outputs from LLM-based MT models. The most common cases where tr...

work page 2025

[18] [18]

Is the second element a particle (e.g., ’up’, ’off’)? - No → Not VPC (A) - Yes → Continue

work page

[19] [19]

Does the remaining verb convey the same meaning as the full verb-particle phrase? - Yes → Not VPC (B) - No → Continue

Remove the particle from the combination. Does the remaining verb convey the same meaning as the full verb-particle phrase? - Yes → Not VPC (B) - No → Continue

work page

[20] [20]

D: Valid VPC (Particle significantly alters meaning) Answer with reasoning and ’Final Answer: [answer]

Does the inclusion of the particle create a non-compositional meaning that is significantly different from the verb’s original meaning? - No → Not VPC (C) - Yes → VPC (D) A: Not a particle B: Meaning remains similar without the particle C: Particle does not significantly alter the meaning "D: Valid VPC (Particle significantly alters meaning) Answer with r...

work page

[21] [21]

[CRAN] Contains cranberry word? Yes → VID No → Next test

work page

[22] [22]

[LEX] Regular replacement changes meaning? Yes → VID No → Next test

work page

[23] [23]

[MORPH] Morphological changes affect meaning? Yes → VID No → Next test

work page

[24] [24]

[MORPHSYNT] Morphosyntactic changes affect meaning? Yes → VID No → Next test

work page

[25] [25]

[SYNT] Syntactic changes affect meaning? Yes → VID No → Not VID Examples: - VID: ’kick the bucket’, ’let the cat out of the bag’ - Non-VID: ’take a walk’, ’make a decision’ Instructions:

work page

[26] [26]

Analyze each test sequentially

work page

[27] [27]

Provide brief reasoning for each test

work page

[28] [28]

gave a" is a light-verb construction (LVC), where

Conclude with ’Final Answer: [Yes/No]’ Is this candidate a Verbal Idiom (VID)? Apply the decision tree. 27 Table 9: Prompts for VMWE candidate paraphrasing. VMWE Prompt LVC You are an expert in linguistics. Given a sentence containing a multi-word expres- sion (VMWE), a Light Verb Construct (LVC). Your task is to rephrase the sentence to remove the VMWE w...

work page