Findings of the WMT 25 Shared Task on Automated Translation Evaluation Systems: Linguistic Diversity is Challenging and References Still Help

· 2025 · DOI 10.18653/v1/2025.wmt-1.24

17 Pith papers cite this work. Polarity classification is still indexing.

17 Pith papers citing it

open at publisher browse 17 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Creativity Bias: How Machine Evaluation Struggles with Creativity in Literary Translations

cs.CL · 2026-05-13 · unverdicted · novelty 7.0

Automatic evaluation tools for literary translations correlate poorly with expert human judgments on creativity and exhibit bias favoring machine-translated texts.

Hearing to Translate: The Effectiveness of Speech Modality Integration into LLMs

cs.CL · 2025-12-18 · unverdicted · novelty 7.0

Cascaded systems remain the most reliable for speech translation overall, but recent SpeechLLMs match or outperform them in many conditions while standalone speech models lag.

Beyond "To whom it may concern": Tailoring Machine Translation to Audience and Intent

cs.CL · 2026-06-02 · unverdicted · novelty 6.0

Explicit purpose instructions improve LLM translation adaptedness across 50 languages and 8 domains, with larger gains on informal text, while standard metrics often penalize the adapted outputs.

Dynamic Meta-Metrics: Source-Sentence Conditioned Weighting for MT Evaluation

cs.CL · 2026-05-09 · unverdicted · novelty 6.0

Dynamic Meta-Metrics learns source-sentence conditioned combinations of MT metrics, with MLP-based and soft-conditioned versions showing gains over linear and GP ensembles on WMT data.

Misaligned by Reward: Socially Undesirable Preferences in LLMs

cs.CL · 2026-05-06 · unverdicted · novelty 6.0

Reward models for LLMs frequently select socially undesirable options across four social domains, show no overall best performer, and exhibit a bias-avoidance versus context-sensitivity trade-off.

Psychologically Potent, Computationally Invisible: LLMs Generate Social-Comparison-Eliciting Posts They Fail to Detect

cs.CL · 2026-05-01 · unverdicted · novelty 6.0

LLMs generate Xiaohongshu-style posts that elicit social comparison but show stable failures in prompt-based detection of the same reader-grounded signal.

A Systematic Analysis of Linguistic Features in AI-Generated Text Detection Across Domains and Models

cs.CL · 2026-06-02 · unverdicted · novelty 5.0

Lexical richness is a robust linguistic signal for AI-generated text detection across models and domains, while most other features are context-dependent.

CompactQE: Interpretable Translation Quality Estimation via Small Open-Weight LLMs

cs.CL · 2026-05-15 · unverdicted · novelty 5.0

Small open-source LLMs achieve competitive system-level correlations with human judgments in machine translation quality estimation, outperforming traditional neural metrics and fine-tuned models via single-pass multi-output prompting.

HydraQE: OSU's Submission for the IWSLT 2026 Speech Translation Metrics Shared Task

cs.CL · 2026-06-07 · unverdicted · novelty 4.0

HydraQE is a new end-to-end speech translation QE system using Qwen3-ASR backbone, sparsemax layer mixing, bidirectional Transformer, and multi-task curriculum training on human and pseudo labels that outperforms cascaded baselines.

Model-Based Quality Assessment for Massively Multilingual Parallel Data

cs.CL · 2026-05-29 · unverdicted · novelty 4.0

Large-scale benchmarks of multilingual embeddings and QE models show no universal performer; direction-aware routing and calibration recommended for parallel data assessment.

Why Low-Resource NLP Needs More Than Cross-Lingual Transfer: Lessons Learned from Luxembourgish

cs.CL · 2026-05-11 · unverdicted · novelty 4.0

Cross-lingual transfer and language-specific data efforts are interdependent and complementary for effective low-resource NLP, as demonstrated through Luxembourgish case studies and synthesis.

LLM Consumer Behavior Theory: Foundations of a Novel Research Field

cs.AI · 2026-06-16 · unverdicted · novelty 3.0

Introduces LLM Consumer Behavior Theory to analyze consumer behavior when LLMs serve as autonomous decision-making agents in markets.

ROC Analysis for Evaluating Translation Quality Estimation Systems

cs.CL · 2026-05-23 · unverdicted · novelty 3.0

ROC analysis is proposed for evaluating translation quality estimation systems, claimed to match existing methods while providing actionable business insights.

Hy-MT2: A Family of Fast, Efficient and Powerful Multilingual Translation Models in the Wild

cs.CL · 2026-05-21 · unverdicted · novelty 3.0

Hy-MT2 presents three new multilingual translation models that claim to outperform listed open-source and commercial systems on diverse tasks while enabling low-storage on-device use.

FMI_SU_Yotkova_Kastreva at SemEval-2026 Task 13: Lightweight Detection of LLM-Generated Code via Stylometric Signals

cs.CL · 2026-05-05 · unverdicted · novelty 3.0

A feature-based decision tree with parsing-derived signals and heuristics detects LLM-generated code in a lightweight, CPU-only setup for SemEval-2026 Task 13.

Bridging the Linguistic Divide: A Survey on Leveraging Large Language Models for Machine Translation

cs.CL · 2025-04-02 · unverdicted · novelty 3.0

A literature survey that organizes prompting, fine-tuning, preference optimization, and context-aware techniques for LLM-based machine translation with emphasis on low-resource languages.

Syntax as a Rosetta Stone: Universal Dependencies for In-Context Coptic Translation

cs.CL · 2026-04-20

citing papers explorer

Showing 2 of 2 citing papers after filters.

Hearing to Translate: The Effectiveness of Speech Modality Integration into LLMs cs.CL · 2025-12-18 · unverdicted · none · ref 55
Cascaded systems remain the most reliable for speech translation overall, but recent SpeechLLMs match or outperform them in many conditions while standalone speech models lag.
Bridging the Linguistic Divide: A Survey on Leveraging Large Language Models for Machine Translation cs.CL · 2025-04-02 · unverdicted · none · ref 174
A literature survey that organizes prompting, fine-tuning, preference optimization, and context-aware techniques for LLM-based machine translation with emphasis on low-resource languages.

Findings of the WMT 25 Shared Task on Automated Translation Evaluation Systems: Linguistic Diversity is Challenging and References Still Help

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer