hub

Findings of the WMT 25 general machine translation shared task: Time to stop evaluating on easy test sets

Tom Kocmi, Ekaterina Artemova, Eleftherios Avramidis, Rachel Bawden, Ond r ej Bojar, Konstantin Dranch, Anton Dvorkovich, Sergey Dukanov, Mark Fishel, Markus Freitag, Thamme Gowda, Roman Grundkiewicz, Barry Haddow, Marzena Karpinska, Philip · 2025 · DOI 10.18653/v1/2025.wmt-1.22

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

open at publisher browse 10 citing papers

hub tools

JSON dossier citing papers JSON publisher DOI

citation-role summary

other 1

citation-polarity summary

unclear 1

representative citing papers

Creativity Bias: How Machine Evaluation Struggles with Creativity in Literary Translations

cs.CL · 2026-05-13 · unverdicted · novelty 7.0

Automatic evaluation tools for literary translations correlate poorly with expert human judgments on creativity and exhibit bias favoring machine-translated texts.

What Does LLM Refinement Actually Improve? A Systematic Study on Document-Level Literary Translation

cs.CL · 2026-05-13 · accept · novelty 7.0

Document-level machine translation followed by segment-level LLM refinement provides the strongest and most stable improvements in literary translation quality, mainly enhancing fluency and style rather than adequacy.

Combining On-Policy Optimization and Distillation for Long-Context Reasoning in Large Language Models

cs.CL · 2026-05-12 · unverdicted · novelty 7.0

dGRPO merges outcome-based policy optimization with dense teacher guidance from on-policy distillation, yielding more stable long-context reasoning on the new LongBlocks synthetic dataset.

Misaligned by Reward: Socially Undesirable Preferences in LLMs

cs.CL · 2026-05-06 · unverdicted · novelty 6.0

Reward models for LLMs frequently select socially undesirable options across four social domains, show no overall best performer, and exhibit a bias-avoidance versus context-sensitivity trade-off.

Why Low-Resource NLP Needs More Than Cross-Lingual Transfer: Lessons Learned from Luxembourgish

cs.CL · 2026-05-11 · unverdicted · novelty 4.0

Cross-lingual transfer and language-specific data efforts are interdependent and complementary for effective low-resource NLP, as demonstrated through Luxembourgish case studies and synthesis.

FMI_SU_Yotkova_Kastreva at SemEval-2026 Task 13: Lightweight Detection of LLM-Generated Code via Stylometric Signals

cs.CL · 2026-05-05 · unverdicted · novelty 3.0

A feature-based decision tree with parsing-derived signals and heuristics detects LLM-generated code in a lightweight, CPU-only setup for SemEval-2026 Task 13.

Hy-MT2: A Family of Fast, Efficient and Powerful Multilingual Translation Models in the Wild

cs.CL · 2026-05-21

Dynamic Meta-Metrics: Source-Sentence Conditioned Weighting for MT Evaluation

cs.CL · 2026-05-09

Psychologically Potent, Computationally Invisible: LLMs Generate Social-Comparison-Eliciting Posts They Fail to Detect

cs.CL · 2026-05-01

Syntax as a Rosetta Stone: Universal Dependencies for In-Context Coptic Translation

cs.CL · 2026-04-20

citing papers explorer

Showing 10 of 10 citing papers.

Creativity Bias: How Machine Evaluation Struggles with Creativity in Literary Translations cs.CL · 2026-05-13 · unverdicted · none · ref 26
Automatic evaluation tools for literary translations correlate poorly with expert human judgments on creativity and exhibit bias favoring machine-translated texts.
What Does LLM Refinement Actually Improve? A Systematic Study on Document-Level Literary Translation cs.CL · 2026-05-13 · accept · none · ref 15
Document-level machine translation followed by segment-level LLM refinement provides the strongest and most stable improvements in literary translation quality, mainly enhancing fluency and style rather than adequacy.
Combining On-Policy Optimization and Distillation for Long-Context Reasoning in Large Language Models cs.CL · 2026-05-12 · unverdicted · none · ref 78
dGRPO merges outcome-based policy optimization with dense teacher guidance from on-policy distillation, yielding more stable long-context reasoning on the new LongBlocks synthetic dataset.
Misaligned by Reward: Socially Undesirable Preferences in LLMs cs.CL · 2026-05-06 · unverdicted · none · ref 153
Reward models for LLMs frequently select socially undesirable options across four social domains, show no overall best performer, and exhibit a bias-avoidance versus context-sensitivity trade-off.
Why Low-Resource NLP Needs More Than Cross-Lingual Transfer: Lessons Learned from Luxembourgish cs.CL · 2026-05-11 · unverdicted · none · ref 165
Cross-lingual transfer and language-specific data efforts are interdependent and complementary for effective low-resource NLP, as demonstrated through Luxembourgish case studies and synthesis.
FMI_SU_Yotkova_Kastreva at SemEval-2026 Task 13: Lightweight Detection of LLM-Generated Code via Stylometric Signals cs.CL · 2026-05-05 · unverdicted · none · ref 237
A feature-based decision tree with parsing-derived signals and heuristics detects LLM-generated code in a lightweight, CPU-only setup for SemEval-2026 Task 13.
Hy-MT2: A Family of Fast, Efficient and Powerful Multilingual Translation Models in the Wild cs.CL · 2026-05-21 · unreviewed · ref 84
Dynamic Meta-Metrics: Source-Sentence Conditioned Weighting for MT Evaluation cs.CL · 2026-05-09 · unreviewed · ref 29
Psychologically Potent, Computationally Invisible: LLMs Generate Social-Comparison-Eliciting Posts They Fail to Detect cs.CL · 2026-05-01 · unreviewed · ref 153
Syntax as a Rosetta Stone: Universal Dependencies for In-Context Coptic Translation cs.CL · 2026-04-20 · unreviewed · ref 10

Findings of the WMT 25 general machine translation shared task: Time to stop evaluating on easy test sets

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer