Hy-MT2: A Family of Fast, Efficient and Powerful Multilingual Translation Models in the Wild

An Wang; Baifang Chen; Binghong Wu; Bin Xing; Bo Lv; Chengcheng Xu; Chenhao Wang; Decheng Wu; Guanghua Yu; Guanwei Zhang

arxiv: 2605.22064 · v1 · pith:3AKYVQMHnew · submitted 2026-05-21 · 💻 cs.CL

Hy-MT2: A Family of Fast, Efficient and Powerful Multilingual Translation Models in the Wild

Mao Zheng , Zheng Li , Tao Chen , Bo Lv , Mingrui Sun , Mingyang Song , Jinlong Song , Hong Huang

show 45 more authors

Decheng Wu Hai Wang Yifan Song Yanfeng Chen Guanwei Zhang Guanghua Yu Yi Su Hong Liu Jinxiang Ou Keyao Wang Weile Chen Haozhao Kuang Kai Wang Nuo Chen Zihao Zheng Chenhao Wang Bin Xing Chengcheng Xu Tinghao Yu Binghong Wu Long Xu Jiacheng Shi Yunhao Wang Baifang Chen Lei Zhang Qi Yang Zhao Wu Jiacheng Li Lan Jiang Lanrui Wang Kai Zhang Shuaipeng Li Zhongzhi Chen Weixuan Sun Jiaqi Zhu An Wang Wei Li Jun Xia Weidong Han Wutian Yang Litong Hui Luoguo Jia Jiajia Wu Xinpeng Zhou Tianxiang Fei

This is my paper

Pith reviewed 2026-05-22 06:58 UTC · model grok-4.3

classification 💻 cs.CL

keywords multilingual translationlarge language modelsmodel quantizationinstruction followingMoE modelson-device deploymentreal-world evaluation

0 comments

The pith

Hy-MT2 is a family of three multilingual translation models that outperform both open-source systems and commercial APIs across real-world tasks while supporting efficient device deployment.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Hy-MT2 as models sized 1.8B, 7B, and 30B-A3B that translate among 33 languages and respond to instructions in those languages. They target complex real-world business and domain-specific scenarios with a fast-thinking approach. Multi-dimensional tests show the 7B and 30B versions surpass open models such as DeepSeek-V4-Pro and Kimi K2.6, while the 1.8B version exceeds commercial APIs from Microsoft and Doubao. Extreme quantization lets the smallest model run with 440 MB storage and 1.5 times faster inference.

Core claim

Hy-MT2 models achieve strong results on general, business, domain-specific, and instruction-following translation tasks, with the 7B and 30B variants outperforming listed open-source models in fast-thinking mode and the 1.8B variant surpassing listed commercial APIs overall.

What carries the argument

The Hy-MT2 model family with its three size variants (1.8B, 7B, 30B-A3B MoE) optimized for multilingual translation and instruction following.

Load-bearing premise

The multi-dimensional evaluations accurately measure real-world performance without undisclosed data selection or test-set overlap that would inflate gains over baselines.

What would settle it

An independent test on previously unseen real-world business or domain-specific translation examples in which the Hy-MT2 models no longer outperform the compared open-source models or commercial APIs.

Figures

Figures reproduced from arXiv: 2605.22064 by An Wang, Baifang Chen, Binghong Wu, Bin Xing, Bo Lv, Chengcheng Xu, Chenhao Wang, Decheng Wu, Guanghua Yu, Guanwei Zhang, Hai Wang, Haozhao Kuang, Hong Huang, Hong Liu, Jiacheng Li, Jiacheng Shi, Jiajia Wu, Jiaqi Zhu, Jinlong Song, Jinxiang Ou, Jun Xia, Kai Wang, Kai Zhang, Keyao Wang, Lan Jiang, Lanrui Wang, Lei Zhang, Litong Hui, Long Xu, Luoguo Jia, Mao Zheng, Mingrui Sun, Mingyang Song, Nuo Chen, Qi Yang, Shuaipeng Li, Tao Chen, Tianxiang Fei, Tinghao Yu, Weidong Han, Weile Chen, Wei Li, Weixuan Sun, Wutian Yang, Xinpeng Zhou, Yanfeng Chen, Yifan Song, Yi Su, Yunhao Wang, Zhao Wu, Zheng Li, Zhongzhi Chen, Zihao Zheng.

**Figure 2.** Figure 2: Family-Centric Post-training pipline of Hy-MT2. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Case study of Hy-MT2 on translation instruction following (Part 1). [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

**Figure 4.** Figure 4: Case study of Hy-MT2 on translation instruction following (Part 2). [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗

**Figure 5.** Figure 5: Case study of Hy-MT2 on translation instruction following (Part 3). [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗

read the original abstract

Hy-MT2 is a family of fast-thinking multilingual translation models designed for complex real-world scenarios. It includes three model sizes: 1.8B, 7B, and 30B-A3B (MoE), all of which support translation among 33 languages and effectively follow translation instructions in multiple languages. For on-device deployment, with AngelSlim 1.25-bit extreme quantization, the 1.8B model requires only 440 MB of storage and improves inference speed by 1.5x. Multi-dimensional evaluations show that Hy-MT2 delivers outstanding performance across general, real-world business, domain-specific, and instruction-following translation tasks. The 7B and 30B models outperform open-source models such as DeepSeek-V4-Pro and Kimi K2.6 in fast-thinking mode, while the lightweight 1.8B model also surpasses mainstream commercial APIs from providers such as Microsoft and Doubao overall.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Hy-MT2 offers practical multilingual translation models with strong efficiency claims via quantization, but the outperformance results rest on evaluations that lack needed transparency.

read the letter

Hy-MT2 is a family of multilingual translation models in 1.8B, 7B, and 30B-A3B MoE sizes that target real-world use across 33 languages, with heavy quantization to make the smallest one run on devices. The core pitch is that these models handle general, business, domain-specific, and instruction-following tasks well while staying fast and small. The 1.8B version at 1.25-bit quantization uses only 440 MB and runs 1.5x faster, which is a concrete engineering detail worth noting if it works as described. The paper also positions the larger models as beating DeepSeek-V4-Pro and Kimi K2.6 in fast mode, and the small one beating Microsoft and Doubao overall. That combination of sizes, MoE structure, and extreme quantization for translation is the actual new packaging here, even if the underlying scaling approach builds on existing multilingual LLM work. The practical focus on deployment and instruction following across languages is handled clearly and could be useful for teams building on-device tools. The soft spot is the evaluation section. The claims rely on multi-dimensional tests, but the manuscript gives limited information on exact test sets, prompt formats, baseline setups, or any contamination checks with training data. Without those, it is hard to tell whether the reported margins reflect genuine gains or favorable test conditions. The stress-test concern about possible undisclosed selection or overlap still applies based on what is shown. This paper is mainly for engineers and applied researchers who need efficient translation models for consumer hardware or business scenarios. Readers seeking new theoretical methods or fully documented benchmarks will find less to work with. It deserves a serious referee to examine the experimental details and see whether the results can be reproduced or clarified.

Referee Report

1 major / 1 minor

Summary. The manuscript introduces Hy-MT2, a family of fast-thinking multilingual translation models in three sizes (1.8B, 7B, and 30B-A3B MoE) supporting translation across 33 languages with instruction-following capabilities. It emphasizes on-device efficiency via AngelSlim 1.25-bit quantization (440 MB storage and 1.5x speed-up for the 1.8B model) and reports superior performance over open-source models (DeepSeek-V4-Pro, Kimi K2.6) and commercial APIs (Microsoft, Doubao) across general, real-world business, domain-specific, and instruction-following tasks based on multi-dimensional evaluations.

Significance. If the reported evaluations hold under rigorous, reproducible conditions without undisclosed test-set curation or training-data overlap, the work would offer a practical advance in efficient multilingual MT suitable for real-world and resource-constrained settings. The scaling to MoE, combined with extreme quantization, addresses deployment needs that many prior open models overlook.

major comments (1)

[Abstract and Evaluation sections] The central performance claims (7B/30B outperforming DeepSeek-V4-Pro and Kimi K2.6; 1.8B surpassing Microsoft and Doubao) rest entirely on the assertion of 'multi-dimensional evaluations' across four task categories, yet no section supplies the exact test sets, prompt sources, metrics, baseline implementation details, or contamination checks. This absence directly undermines verification of the headline superiority results.

minor comments (1)

[Abstract] The phrase 'fast-thinking mode' appears in the abstract without prior definition or reference to a specific inference setting or comparison protocol.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their thorough review and for highlighting the need for greater transparency in our evaluation methodology. We agree that detailed documentation of test sets, prompts, metrics, baselines, and contamination checks is essential to substantiate the performance claims and enable independent verification. We will revise the manuscript to address this concern directly.

read point-by-point responses

Referee: [Abstract and Evaluation sections] The central performance claims (7B/30B outperforming DeepSeek-V4-Pro and Kimi K2.6; 1.8B surpassing Microsoft and Doubao) rest entirely on the assertion of 'multi-dimensional evaluations' across four task categories, yet no section supplies the exact test sets, prompt sources, metrics, baseline implementation details, or contamination checks. This absence directly undermines verification of the headline superiority results.

Authors: We acknowledge that the current manuscript does not provide sufficient detail on the evaluation protocol. In the revised version, we will expand the Evaluation section with a new subsection that explicitly lists: (1) the exact test sets and their sources for each of the four task categories (general, real-world business, domain-specific, and instruction-following); (2) the prompt templates and sources used for instruction-following tasks; (3) the primary and secondary metrics (e.g., BLEU, COMET, human preference scores) with computation details; (4) baseline implementation specifics, including API versions or reproduction steps for DeepSeek-V4-Pro, Kimi K2.6, Microsoft Translator, and Doubao; and (5) the procedures followed to detect and mitigate training-data contamination. Where possible, we will release evaluation scripts and dataset references to support reproducibility. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper presents a family of multilingual translation models (1.8B, 7B, 30B-A3B MoE) with quantization details and reports performance on general, business, domain-specific, and instruction-following tasks. No equations, self-definitional quantities, or fitted parameters renamed as predictions appear. Claims rest on external benchmark evaluations rather than quantities defined in terms of the paper's own inputs. No self-citation chains, uniqueness theorems, or ansatzes are invoked to force results. The derivation chain is self-contained and empirical.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available, so no free parameters, axioms, or invented entities can be extracted or audited from the provided text.

pith-pipeline@v0.9.0 · 5889 in / 1140 out tokens · 37624 ms · 2026-05-22T06:58:14.977203+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

88 extracted references · 88 canonical work pages · 10 internal anchors

[1]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Bitdistiller: Unleashing the potential of sub-4-bit llms via self-distillation , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

work page
[2]

2025 , eprint=

Hunyuan-MT Technical Report , author=. 2025 , eprint=

work page 2025
[3]

2024 , eprint=

On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes , author=. 2024 , eprint=

work page 2024
[4]

Advances in Neural Information Processing Systems , volume=

ParetoQ: Improving scaling laws in extremely low-bit LLM quantization , author=. Advances in Neural Information Processing Systems , volume=

work page
[5]

2025 , eprint=

MiniLLM: Knowledge Distillation of Large Language Models , author=. 2025 , eprint=

work page 2025
[6]

Thinking Machines Lab: Connectionism , year =

Kevin Lu and Thinking Machines Lab , title =. Thinking Machines Lab: Connectionism , year =

work page
[7]

2023 , eprint=

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers , author=. 2023 , eprint=

work page 2023
[8]

2025 , note =

OpenAI , title =. 2025 , note =

work page 2025
[9]

2025 , note =

Anthropic , title =. 2025 , note =

work page 2025
[10]

2025 , note =

DeepMind , title =. 2025 , note =

work page 2025
[11]

2024 , eprint=

DeepSeek-V3 Technical Report , author=. 2024 , eprint=

work page 2024
[12]

2025 , eprint=

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models , author=. 2025 , eprint=

work page 2025
[13]

2025 , eprint=

Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs , author=. 2025 , eprint=

work page 2025
[14]

2025 , eprint=

Qwen3 Technical Report , author=. 2025 , eprint=

work page 2025
[15]

2025 , eprint=

Tower+: Bridging Generality and Translation Specialization in Multilingual LLMs , author=. 2025 , eprint=

work page 2025
[16]

2025 , eprint=

Seed-X: Building Strong Multilingual Translation LLM with 7B Parameters , author=. 2025 , eprint=

work page 2025
[17]

2025 , eprint=

Multilingual Machine Translation with Open Large Language Models at Practical Scale: An Empirical Study , author=. 2025 , eprint=

work page 2025
[18]

2025 , note =

AI@Meta , title =. 2025 , note =

work page 2025
[19]

2025 , eprint=

Gemma 3 Technical Report , author=. 2025 , eprint=

work page 2025
[21]

Brown and John Cocke and Stephen Della Pietra and Vincent J

Peter F. Brown and John Cocke and Stephen Della Pietra and Vincent J. Della Pietra and Frederick Jelinek and John D. Lafferty and Robert L. Mercer and Paul S. Roossin , title =. Comput. Linguistics , volume =. 1990 , timestamp =

work page 1990
[22]

Brown and Stephen Della Pietra and Vincent J

Peter F. Brown and Stephen Della Pietra and Vincent J. Della Pietra and Robert L. Mercer , title =. Comput. Linguistics , volume =. 1993 , timestamp =

work page 1993
[23]

Bleu: A method for automatic evaluation of machine translation

Kishore Papineni and Salim Roukos and Todd Ward and Wei. Bleu: a Method for Automatic Evaluation of Machine Translation , booktitle =. 2002 , url =. doi:10.3115/1073083.1073135 , timestamp =

work page doi:10.3115/1073083.1073135 2002
[24]

Le , editor =

Ilya Sutskever and Oriol Vinyals and Quoc V. Le , editor =. Sequence to Sequence Learning with Neural Networks , booktitle =. 2014 , url =

work page 2014
[25]

Neural Machine Translation by Jointly Learning to Align and Translate , booktitle =

Dzmitry Bahdanau and Kyunghyun Cho and Yoshua Bengio , editor =. Neural Machine Translation by Jointly Learning to Align and Translate , booktitle =. 2015 , url =

work page 2015
[26]

Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

Google's neural machine translation system: Bridging the gap between human and machine translation , author=. arXiv preprint arXiv:1609.08144 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[27]

Advances in neural information processing systems , volume=

Attention is all you need , author=. Advances in neural information processing systems , volume=

work page
[28]

Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis , booktitle =

Wenhao Zhu and Hongyi Liu and Qingxiu Dong and Jingjing Xu and Shujian Huang and Lingpeng Kong and Jiajun Chen and Lei Li , editor =. Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis , booktitle =. 2024 , url =. doi:10.18653/V1/2024.FINDINGS-NAACL.176 , timestamp =

work page doi:10.18653/v1/2024.findings-naacl.176 2024
[29]

Findings of the

Tom Kocmi and Eleftherios Avramidis and Rachel Bawden and Ondrej Bojar and Anton Dvorkovich and Christian Federmann and Mark Fishel and Markus Freitag and Thamme Gowda and Roman Grundkiewicz and Barry Haddow and Marzena Karpinska and Philipp Koehn and Benjamin Marie and Christof Monz and Kenton Murray and Masaaki Nagata and Martin Popel and Maja Popovic a...

work page doi:10.18653/v1/2024.wmt-1.1 2024
[30]

Jianhui Pang and Fanghua Ye and Derek Fai Wong and Dian Yu and Shuming Shi and Zhaopeng Tu and Longyue Wang , title =. Trans. Assoc. Comput. Linguistics , volume =. 2025 , url =. doi:10.1162/TACL\_A\_00730 , timestamp =

work page internal anchor Pith review doi:10.1162/tacl 2025
[31]

https://arxiv.org/ abs/2301.08745

Wenxiang Jiao and Wenxuan Wang and Jen. Is ChatGPT. CoRR , volume =. 2023 , url =. doi:10.48550/ARXIV.2301.08745 , eprinttype =. 2301.08745 , timestamp =

work page doi:10.48550/arxiv.2301.08745 2023
[32]

Multicultural Education Review , volume =

Cong Lin and Liz Jackson , title =. Multicultural Education Review , volume =. 2021 , publisher =

work page 2021
[33]

Tencent Minority-Mandarin Translation System , booktitle =

Hu, Bojie and Han, Ambyer and Zhang, Zheyang and Huang, Shen and Ju, Qi , editor =. Tencent Minority-Mandarin Translation System , booktitle =. 2019 , pages =

work page 2019
[34]

2025 , eprint=

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning , author=. 2025 , eprint=

work page 2025
[35]

2025 , eprint=

FastCuRL: Curriculum Reinforcement Learning with Stage-wise Context Scaling for Efficient Training R1-like Reasoning Models , author=. 2025 , eprint=

work page 2025
[36]

DeepCoder: A Fully Open-Source 14B Coder at O3-mini Level , author=

work page
[37]

2025 , eprint=

Kimi K2: Open Agentic Intelligence , author=. 2025 , eprint=

work page 2025
[38]

2025 , eprint=

MT-R1-Zero: Advancing LLM-based Machine Translation via R1-Zero-like Reinforcement Learning , author=. 2025 , eprint=

work page 2025
[39]

2025 , eprint=

SSR-Zero: Simple Self-Rewarding Reinforcement Learning for Machine Translation , author=. 2025 , eprint=

work page 2025
[40]

2025 , eprint=

TAT-R1: Terminology-Aware Translation with Reinforcement Learning and Word Alignment , author=. 2025 , eprint=

work page 2025
[41]

MMLU-Pro:

Yubo Wang and Xueguang Ma and Ge Zhang and Yuansheng Ni and Abhranil Chandra and Shiguang Guo and Weiming Ren and Aaran Arulraj and Xuan He and Ziyan Jiang and Tianle Li and Max Ku and Kai Wang and Alex Zhuang and Rongqi Fan and Xiang Yue and Wenhu Chen , editor =. MMLU-Pro:. Advances in Neural Information Processing Systems 38: Annual Conference on Neura...

work page 2024
[42]

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

M. SuperGPQA: Scaling. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2502.14739 , eprinttype =. 2502.14739 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2502.14739 2025
[43]

The Eleventh International Conference on Learning Representations,

Freda Shi and Mirac Suzgun and Markus Freitag and Xuezhi Wang and Suraj Srivats and Soroush Vosoughi and Hyung Won Chung and Yi Tay and Sebastian Ruder and Denny Zhou and Dipanjan Das and Jason Wei , title =. The Eleventh International Conference on Learning Representations,. 2023 , url =

work page 2023
[44]

GPQA: A Graduate-Level Google-Proof Q&A Benchmark

David Rein and Betty Li Hou and Asa Cooper Stickland and Jackson Petty and Richard Yuanzhe Pang and Julien Dirani and Julian Michael and Samuel R. Bowman , title =. CoRR , volume =. 2023 , url =. doi:10.48550/ARXIV.2311.12022 , eprinttype =. 2311.12022 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2311.12022 2023
[45]

Training Verifiers to Solve Math Word Problems

Karl Cobbe and Vineet Kosaraju and Mohammad Bavarian and Mark Chen and Heewoo Jun and Lukasz Kaiser and Matthias Plappert and Jerry Tworek and Jacob Hilton and Reiichiro Nakano and Christopher Hesse and John Schulman , title =. CoRR , volume =. 2021 , url =. 2110.14168 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv 2021
[46]

9th International Conference on Learning Representations,

Dan Hendrycks and Collin Burns and Steven Basart and Andy Zou and Mantas Mazeika and Dawn Song and Jacob Steinhardt , title =. 9th International Conference on Learning Representations,. 2021 , url =

work page 2021
[47]

MultiPL-E:

Federico Cassano and John Gouwar and Daniel Nguyen and Sydney Nguyen and Luna Phipps. MultiPL-E:. 2023 , url =. doi:10.1109/TSE.2023.3267446 , timestamp =

work page doi:10.1109/tse.2023.3267446 2023
[48]

CRUXEval:

Alex Gu and Baptiste Rozi. CRUXEval:. Forty-first International Conference on Machine Learning,. 2024 , url =

work page 2024
[49]

Angelika Romanou and Negar Foroutan and Anna Sotnikova and Zeming Chen and Sree Harsha Nelaturu and Shivalika Singh and Rishabh Maheshwary and Micol Altomare and Mohamed A. Haggag and Imanol Schlag and Marzieh Fadaee and Sara Hooker and Antoine Bosselut and Snegha A and Alfonso Amayuelas and Azril Hafizi Amirudin and Viraat Aryabumi and Danylo Boiko and M...

work page 2025
[50]

2022 , eprint=

No Language Left Behind: Scaling Human-Centered Machine Translation , author=. 2022 , eprint=

work page 2022
[51]

2502.12404 , archivePrefix=

Daniel Deutsch and Eleftheria Briakou and Isaac Caswell and Mara Finkelstein and Rebecca Galor and Juraj Juraska and Geza Kovacs and Alison Lui and Ricardo Rei and Jason Riesa and Shruti Rijhwani and Parker Riley and Elizabeth Salesky and Firas Trabelsi and Stephanie Winkler and Biao Zhang and Markus Freitag , year=. 2502.12404 , archivePrefix=

work page arXiv
[52]

2023 , eprint=

xCOMET: Transparent Machine Translation Evaluation through Fine-grained Error Detection , author=. 2023 , eprint=

work page 2023
[53]

Large Language Models Are State-of-the-Art Evaluators of Translation Quality

Kocmi, Tom and Federmann, Christian. Large Language Models Are State-of-the-Art Evaluators of Translation Quality. Proceedings of the 24th Annual Conference of the European Association for Machine Translation. 2023

work page 2023
[54]

2022 , eprint=

CometKiwi: IST-Unbabel 2022 Submission for the Quality Estimation Shared Task , author=. 2022 , eprint=

work page 2022
[55]

Rishabh Agarwal and Avi Singh and Lei Zhang and Bernd Bohnet and Luis Rosias and Stephanie C. Y. Chan and Biao Zhang and Ankesh Anand and Zaheer Abbas and Azade Nova and John D. Co. Many-Shot In-Context Learning , booktitle =. 2024 , url =

work page 2024
[56]

Can Many-Shot In-Context Learning Help LLMs as Evaluators?

Mingyang Song and Mao Zheng and Xuan Luo , editor =. Can Many-Shot In-Context Learning Help LLMs as Evaluators?. Proceedings of the 31st International Conference on Computational Linguistics,. 2025 , url =

work page 2025
[57]

s1: Simple test-time scaling

Niklas Muennighoff and Zitong Yang and Weijia Shi and Xiang Lisa Li and Li Fei. s1: Simple test-time scaling , journal =. 2025 , url =. doi:10.48550/ARXIV.2501.19393 , eprinttype =. 2501.19393 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2501.19393 2025
[58]

2025 , eprint=

A Survey on Test-Time Scaling in Large Language Models: What, How, Where, and How Well? , author=. 2025 , eprint=

work page 2025
[59]

2023 , eprint=

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models , author=. 2023 , eprint=

work page 2023
[61]

Liu , title =

Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu , title =. arXiv e-prints , year =

work page
[62]

A Monolingual Approach to Contextualized Word Embeddings for Mid-Resource Languages

Ortiz Su 'a rez, Pedro Javier and Romary, Laurent and Sagot, Benoit. A Monolingual Approach to Contextualized Word Embeddings for Mid-Resource Languages. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020

work page 2020
[63]

Asynchronous pipelines for processing huge corpora on medium to low resource infrastructures , series =

Pedro Javier. Asynchronous pipelines for processing huge corpora on medium to low resource infrastructures , series =. doi:10.14618/ids-pub-9021 , url =

work page doi:10.14618/ids-pub-9021
[64]

, author=

Parallel data, tools and interfaces in OPUS. , author=. Lrec , volume=

work page
[65]

Proceedings of the First Conference on Machine Translation , month =

Buck, Christian and Koehn, Philipp , title =. Proceedings of the First Conference on Machine Translation , month =. 2016 , address =

work page 2016
[66]

2025 , eprint=

RegMix: Data Mixture as Regression for Language Model Pre-training , author=. 2025 , eprint=

work page 2025
[68]

In: Haddow, B., Kocmi, T., Koehn, P., Monz, C

Lavie, Alon and Hanneman, Greg and Agrawal, Sweta and Kanojia, Diptesh and Lo, Chi-Kiu and Zouhar, Vil \'e m and Blain, Frederic and Zerva, Chrysoula and Avramidis, Eleftherios and Deoghare, Sourabh and Sindhujan, Archchana and Wang, Jiayi and Adelani, David Ifeoluwa and Thompson, Brian and Kocmi, Tom and Freitag, Markus and Deutsch, Daniel. Findings of t...

work page doi:10.18653/v1/2025.wmt-1.24 2025
[69]

A Survey on Quantization Methods for Optimization of Deep Neural Networks , year=

Kulkarni, Uday and Hosamani, Abhishek S and Masur, Abhishek S and Hegde, Shashank and Vernekar, Ganesh R and Siri Chandana, K , booktitle=. A Survey on Quantization Methods for Optimization of Deep Neural Networks , year=

work page
[71]

Using a new analytic measure for the annotation and analysis of MT errors on real data

Lommel, Arle and Burchardt, Aljoscha and Popovi \'c , Maja and Harris, Kim and Avramidis, Eleftherios and Uszkoreit, Hans. Using a new analytic measure for the annotation and analysis of MT errors on real data. Proceedings of the 17th Annual Conference of the European Association for Machine Translation. 2014

work page 2014
[72]

Results of the WMT 21 Metrics Shared Task: Evaluating Metrics with Expert-based Human Evaluations on TED and News Domain

Freitag, Markus and Rei, Ricardo and Mathur, Nitika and Lo, Chi-kiu and Stewart, Craig and Foster, George and Lavie, Alon and Bojar, Ond r ej. Results of the WMT 21 Metrics Shared Task: Evaluating Metrics with Expert-based Human Evaluations on TED and News Domain. Proceedings of the Sixth Conference on Machine Translation. 2021

work page 2021
[73]

2025 , eprint=

HY-MT1.5 Technical Report , author=. 2025 , eprint=

work page 2025
[74]

2025 , journal=

Generalizing Verifiable Instruction Following , author=. 2025 , journal=

work page 2025
[76]

M a XIFE : Multilingual and Cross-lingual Instruction Following Evaluation

Liu, Yile and Ma, Ziwei and Jiang, Xiu and Hu, Jinglu and ChangJing, ChangJing and Li, Liang. M a XIFE : Multilingual and Cross-lingual Instruction Following Evaluation. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025

work page 2025
[78]

2026 , eprint=

OpenAI GPT-5 System Card , author=. 2026 , eprint=

work page 2026
[79]

Introducing gemini 3

DeepMind. Introducing gemini 3. https://blog.google/products/gemini/gemini-3-collection/, 2025. Accessed: 2025-12-29

work page 2025
[80]

Bitdistiller: Unleashing the potential of sub-4-bit llms via self-distillation

Dayou Du, Yijia Zhang, Shijie Cao, Jiaqi Guo, Ting Cao, Xiaowen Chu, and Ningyi Xu. Bitdistiller: Unleashing the potential of sub-4-bit llms via self-distillation. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.\ 102--116, 2024

work page 2024
[81]

Results of the WMT 21 metrics shared task: Evaluating metrics with expert-based human evaluations on TED and news domain

Markus Freitag, Ricardo Rei, Nitika Mathur, Chi-kiu Lo, Craig Stewart, George Foster, Alon Lavie, and Ond r ej Bojar. Results of the WMT 21 metrics shared task: Evaluating metrics with expert-based human evaluations on TED and news domain. In Loic Barrault, Ondrej Bojar, Fethi Bougares, Rajen Chatterjee, Marta R. Costa-jussa, Christian Federmann, Mark Fis...

work page 2021
[82]

Multi-if: Benchmarking llms on multi-turn and multilingual instructions following

Yun He, Di Jin, Chaoqi Wang, Chloe Bi, Karishma Mandyam, Hejia Zhang, Chen Zhu, Ning Li, Tengyu Xu, Hongjiang Lv, et al. Multi-if: Benchmarking llms on multi-turn and multilingual instructions following. arXiv preprint arXiv:2410.15553, 2024

work page arXiv 2024
[83]

Sherry: Hardware-efficient 1.25-bit ternary quantization via fine-grained sparsification

Hong Huang, Decheng Wu, Qiangqiang Hu, Guanghua Yu, Jinhai Yang, Jianchen Zhu, Xue Liu, and Dapeng Wu. Sherry: Hardware-efficient 1.25-bit ternary quantization via fine-grained sparsification. arXiv preprint arXiv:2601.07892, 2026

work page arXiv 2026
[84]

Findings of the WMT 25 general machine translation shared task: Time to stop evaluating on easy test sets

Tom Kocmi, Ekaterina Artemova, Eleftherios Avramidis, Rachel Bawden, Ond r ej Bojar, Konstantin Dranch, Anton Dvorkovich, Sergey Dukanov, Mark Fishel, Markus Freitag, Thamme Gowda, Roman Grundkiewicz, Barry Haddow, Marzena Karpinska, Philipp Koehn, Howard Lakougna, Jessica Lundin, Christof Monz, Kenton Murray, Masaaki Nagata, Stefano Perrella, Lorenzo Pro...

work page doi:10.18653/v1/2025.wmt-1.22 2025
[85]

M a XIFE : Multilingual and cross-lingual instruction following evaluation

Yile Liu, Ziwei Ma, Xiu Jiang, Jinglu Hu, ChangJing ChangJing, and Liang Li. M a XIFE : Multilingual and cross-lingual instruction following evaluation. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar (eds.), Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.\...

work page 2025
[86]

Paretoq: Improving scaling laws in extremely low-bit llm quantization

Zechun Liu, Changsheng Zhao, Hanxian Huang, Sijia Chen, Jing Zhang, Jiawei Zhao, Scott Roy, Lisa Jin, Yunyang Xiong, Yangyang Shi, et al. Paretoq: Improving scaling laws in extremely low-bit llm quantization. Advances in Neural Information Processing Systems, 38: 0 91311--91336, 2026

work page 2026

Showing first 80 references.

[1] [1]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Bitdistiller: Unleashing the potential of sub-4-bit llms via self-distillation , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

work page

[2] [2]

2025 , eprint=

Hunyuan-MT Technical Report , author=. 2025 , eprint=

work page 2025

[3] [3]

2024 , eprint=

On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes , author=. 2024 , eprint=

work page 2024

[4] [4]

Advances in Neural Information Processing Systems , volume=

ParetoQ: Improving scaling laws in extremely low-bit LLM quantization , author=. Advances in Neural Information Processing Systems , volume=

work page

[5] [5]

2025 , eprint=

MiniLLM: Knowledge Distillation of Large Language Models , author=. 2025 , eprint=

work page 2025

[6] [6]

Thinking Machines Lab: Connectionism , year =

Kevin Lu and Thinking Machines Lab , title =. Thinking Machines Lab: Connectionism , year =

work page

[7] [7]

2023 , eprint=

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers , author=. 2023 , eprint=

work page 2023

[8] [8]

2025 , note =

OpenAI , title =. 2025 , note =

work page 2025

[9] [9]

2025 , note =

Anthropic , title =. 2025 , note =

work page 2025

[10] [10]

2025 , note =

DeepMind , title =. 2025 , note =

work page 2025

[11] [11]

2024 , eprint=

DeepSeek-V3 Technical Report , author=. 2024 , eprint=

work page 2024

[12] [12]

2025 , eprint=

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models , author=. 2025 , eprint=

work page 2025

[13] [13]

2025 , eprint=

Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs , author=. 2025 , eprint=

work page 2025

[14] [14]

2025 , eprint=

Qwen3 Technical Report , author=. 2025 , eprint=

work page 2025

[15] [15]

2025 , eprint=

Tower+: Bridging Generality and Translation Specialization in Multilingual LLMs , author=. 2025 , eprint=

work page 2025

[16] [16]

2025 , eprint=

Seed-X: Building Strong Multilingual Translation LLM with 7B Parameters , author=. 2025 , eprint=

work page 2025

[17] [17]

2025 , eprint=

Multilingual Machine Translation with Open Large Language Models at Practical Scale: An Empirical Study , author=. 2025 , eprint=

work page 2025

[18] [18]

2025 , note =

AI@Meta , title =. 2025 , note =

work page 2025

[19] [19]

2025 , eprint=

Gemma 3 Technical Report , author=. 2025 , eprint=

work page 2025

[20] [21]

Brown and John Cocke and Stephen Della Pietra and Vincent J

Peter F. Brown and John Cocke and Stephen Della Pietra and Vincent J. Della Pietra and Frederick Jelinek and John D. Lafferty and Robert L. Mercer and Paul S. Roossin , title =. Comput. Linguistics , volume =. 1990 , timestamp =

work page 1990

[21] [22]

Brown and Stephen Della Pietra and Vincent J

Peter F. Brown and Stephen Della Pietra and Vincent J. Della Pietra and Robert L. Mercer , title =. Comput. Linguistics , volume =. 1993 , timestamp =

work page 1993

[22] [23]

Bleu: A method for automatic evaluation of machine translation

Kishore Papineni and Salim Roukos and Todd Ward and Wei. Bleu: a Method for Automatic Evaluation of Machine Translation , booktitle =. 2002 , url =. doi:10.3115/1073083.1073135 , timestamp =

work page doi:10.3115/1073083.1073135 2002

[23] [24]

Le , editor =

Ilya Sutskever and Oriol Vinyals and Quoc V. Le , editor =. Sequence to Sequence Learning with Neural Networks , booktitle =. 2014 , url =

work page 2014

[24] [25]

Neural Machine Translation by Jointly Learning to Align and Translate , booktitle =

Dzmitry Bahdanau and Kyunghyun Cho and Yoshua Bengio , editor =. Neural Machine Translation by Jointly Learning to Align and Translate , booktitle =. 2015 , url =

work page 2015

[25] [26]

Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

Google's neural machine translation system: Bridging the gap between human and machine translation , author=. arXiv preprint arXiv:1609.08144 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[26] [27]

Advances in neural information processing systems , volume=

Attention is all you need , author=. Advances in neural information processing systems , volume=

work page

[27] [28]

Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis , booktitle =

Wenhao Zhu and Hongyi Liu and Qingxiu Dong and Jingjing Xu and Shujian Huang and Lingpeng Kong and Jiajun Chen and Lei Li , editor =. Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis , booktitle =. 2024 , url =. doi:10.18653/V1/2024.FINDINGS-NAACL.176 , timestamp =

work page doi:10.18653/v1/2024.findings-naacl.176 2024

[28] [29]

Findings of the

Tom Kocmi and Eleftherios Avramidis and Rachel Bawden and Ondrej Bojar and Anton Dvorkovich and Christian Federmann and Mark Fishel and Markus Freitag and Thamme Gowda and Roman Grundkiewicz and Barry Haddow and Marzena Karpinska and Philipp Koehn and Benjamin Marie and Christof Monz and Kenton Murray and Masaaki Nagata and Martin Popel and Maja Popovic a...

work page doi:10.18653/v1/2024.wmt-1.1 2024

[29] [30]

Jianhui Pang and Fanghua Ye and Derek Fai Wong and Dian Yu and Shuming Shi and Zhaopeng Tu and Longyue Wang , title =. Trans. Assoc. Comput. Linguistics , volume =. 2025 , url =. doi:10.1162/TACL\_A\_00730 , timestamp =

work page internal anchor Pith review doi:10.1162/tacl 2025

[30] [31]

https://arxiv.org/ abs/2301.08745

Wenxiang Jiao and Wenxuan Wang and Jen. Is ChatGPT. CoRR , volume =. 2023 , url =. doi:10.48550/ARXIV.2301.08745 , eprinttype =. 2301.08745 , timestamp =

work page doi:10.48550/arxiv.2301.08745 2023

[31] [32]

Multicultural Education Review , volume =

Cong Lin and Liz Jackson , title =. Multicultural Education Review , volume =. 2021 , publisher =

work page 2021

[32] [33]

Tencent Minority-Mandarin Translation System , booktitle =

Hu, Bojie and Han, Ambyer and Zhang, Zheyang and Huang, Shen and Ju, Qi , editor =. Tencent Minority-Mandarin Translation System , booktitle =. 2019 , pages =

work page 2019

[33] [34]

2025 , eprint=

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning , author=. 2025 , eprint=

work page 2025

[34] [35]

2025 , eprint=

FastCuRL: Curriculum Reinforcement Learning with Stage-wise Context Scaling for Efficient Training R1-like Reasoning Models , author=. 2025 , eprint=

work page 2025

[35] [36]

DeepCoder: A Fully Open-Source 14B Coder at O3-mini Level , author=

work page

[36] [37]

2025 , eprint=

Kimi K2: Open Agentic Intelligence , author=. 2025 , eprint=

work page 2025

[37] [38]

2025 , eprint=

MT-R1-Zero: Advancing LLM-based Machine Translation via R1-Zero-like Reinforcement Learning , author=. 2025 , eprint=

work page 2025

[38] [39]

2025 , eprint=

SSR-Zero: Simple Self-Rewarding Reinforcement Learning for Machine Translation , author=. 2025 , eprint=

work page 2025

[39] [40]

2025 , eprint=

TAT-R1: Terminology-Aware Translation with Reinforcement Learning and Word Alignment , author=. 2025 , eprint=

work page 2025

[40] [41]

MMLU-Pro:

Yubo Wang and Xueguang Ma and Ge Zhang and Yuansheng Ni and Abhranil Chandra and Shiguang Guo and Weiming Ren and Aaran Arulraj and Xuan He and Ziyan Jiang and Tianle Li and Max Ku and Kai Wang and Alex Zhuang and Rongqi Fan and Xiang Yue and Wenhu Chen , editor =. MMLU-Pro:. Advances in Neural Information Processing Systems 38: Annual Conference on Neura...

work page 2024

[41] [42]

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

M. SuperGPQA: Scaling. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2502.14739 , eprinttype =. 2502.14739 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2502.14739 2025

[42] [43]

The Eleventh International Conference on Learning Representations,

Freda Shi and Mirac Suzgun and Markus Freitag and Xuezhi Wang and Suraj Srivats and Soroush Vosoughi and Hyung Won Chung and Yi Tay and Sebastian Ruder and Denny Zhou and Dipanjan Das and Jason Wei , title =. The Eleventh International Conference on Learning Representations,. 2023 , url =

work page 2023

[43] [44]

GPQA: A Graduate-Level Google-Proof Q&A Benchmark

David Rein and Betty Li Hou and Asa Cooper Stickland and Jackson Petty and Richard Yuanzhe Pang and Julien Dirani and Julian Michael and Samuel R. Bowman , title =. CoRR , volume =. 2023 , url =. doi:10.48550/ARXIV.2311.12022 , eprinttype =. 2311.12022 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2311.12022 2023

[44] [45]

Training Verifiers to Solve Math Word Problems

Karl Cobbe and Vineet Kosaraju and Mohammad Bavarian and Mark Chen and Heewoo Jun and Lukasz Kaiser and Matthias Plappert and Jerry Tworek and Jacob Hilton and Reiichiro Nakano and Christopher Hesse and John Schulman , title =. CoRR , volume =. 2021 , url =. 2110.14168 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv 2021

[45] [46]

9th International Conference on Learning Representations,

Dan Hendrycks and Collin Burns and Steven Basart and Andy Zou and Mantas Mazeika and Dawn Song and Jacob Steinhardt , title =. 9th International Conference on Learning Representations,. 2021 , url =

work page 2021

[46] [47]

MultiPL-E:

Federico Cassano and John Gouwar and Daniel Nguyen and Sydney Nguyen and Luna Phipps. MultiPL-E:. 2023 , url =. doi:10.1109/TSE.2023.3267446 , timestamp =

work page doi:10.1109/tse.2023.3267446 2023

[47] [48]

CRUXEval:

Alex Gu and Baptiste Rozi. CRUXEval:. Forty-first International Conference on Machine Learning,. 2024 , url =

work page 2024

[48] [49]

Angelika Romanou and Negar Foroutan and Anna Sotnikova and Zeming Chen and Sree Harsha Nelaturu and Shivalika Singh and Rishabh Maheshwary and Micol Altomare and Mohamed A. Haggag and Imanol Schlag and Marzieh Fadaee and Sara Hooker and Antoine Bosselut and Snegha A and Alfonso Amayuelas and Azril Hafizi Amirudin and Viraat Aryabumi and Danylo Boiko and M...

work page 2025

[49] [50]

2022 , eprint=

No Language Left Behind: Scaling Human-Centered Machine Translation , author=. 2022 , eprint=

work page 2022

[50] [51]

2502.12404 , archivePrefix=

Daniel Deutsch and Eleftheria Briakou and Isaac Caswell and Mara Finkelstein and Rebecca Galor and Juraj Juraska and Geza Kovacs and Alison Lui and Ricardo Rei and Jason Riesa and Shruti Rijhwani and Parker Riley and Elizabeth Salesky and Firas Trabelsi and Stephanie Winkler and Biao Zhang and Markus Freitag , year=. 2502.12404 , archivePrefix=

work page arXiv

[51] [52]

2023 , eprint=

xCOMET: Transparent Machine Translation Evaluation through Fine-grained Error Detection , author=. 2023 , eprint=

work page 2023

[52] [53]

Large Language Models Are State-of-the-Art Evaluators of Translation Quality

Kocmi, Tom and Federmann, Christian. Large Language Models Are State-of-the-Art Evaluators of Translation Quality. Proceedings of the 24th Annual Conference of the European Association for Machine Translation. 2023

work page 2023

[53] [54]

2022 , eprint=

CometKiwi: IST-Unbabel 2022 Submission for the Quality Estimation Shared Task , author=. 2022 , eprint=

work page 2022

[54] [55]

Rishabh Agarwal and Avi Singh and Lei Zhang and Bernd Bohnet and Luis Rosias and Stephanie C. Y. Chan and Biao Zhang and Ankesh Anand and Zaheer Abbas and Azade Nova and John D. Co. Many-Shot In-Context Learning , booktitle =. 2024 , url =

work page 2024

[55] [56]

Can Many-Shot In-Context Learning Help LLMs as Evaluators?

Mingyang Song and Mao Zheng and Xuan Luo , editor =. Can Many-Shot In-Context Learning Help LLMs as Evaluators?. Proceedings of the 31st International Conference on Computational Linguistics,. 2025 , url =

work page 2025

[56] [57]

s1: Simple test-time scaling

Niklas Muennighoff and Zitong Yang and Weijia Shi and Xiang Lisa Li and Li Fei. s1: Simple test-time scaling , journal =. 2025 , url =. doi:10.48550/ARXIV.2501.19393 , eprinttype =. 2501.19393 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2501.19393 2025

[57] [58]

2025 , eprint=

A Survey on Test-Time Scaling in Large Language Models: What, How, Where, and How Well? , author=. 2025 , eprint=

work page 2025

[58] [59]

2023 , eprint=

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models , author=. 2023 , eprint=

work page 2023

[59] [61]

Liu , title =

Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu , title =. arXiv e-prints , year =

work page

[60] [62]

A Monolingual Approach to Contextualized Word Embeddings for Mid-Resource Languages

Ortiz Su 'a rez, Pedro Javier and Romary, Laurent and Sagot, Benoit. A Monolingual Approach to Contextualized Word Embeddings for Mid-Resource Languages. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020

work page 2020

[61] [63]

Asynchronous pipelines for processing huge corpora on medium to low resource infrastructures , series =

Pedro Javier. Asynchronous pipelines for processing huge corpora on medium to low resource infrastructures , series =. doi:10.14618/ids-pub-9021 , url =

work page doi:10.14618/ids-pub-9021

[62] [64]

, author=

Parallel data, tools and interfaces in OPUS. , author=. Lrec , volume=

work page

[63] [65]

Proceedings of the First Conference on Machine Translation , month =

Buck, Christian and Koehn, Philipp , title =. Proceedings of the First Conference on Machine Translation , month =. 2016 , address =

work page 2016

[64] [66]

2025 , eprint=

RegMix: Data Mixture as Regression for Language Model Pre-training , author=. 2025 , eprint=

work page 2025

[65] [68]

In: Haddow, B., Kocmi, T., Koehn, P., Monz, C

Lavie, Alon and Hanneman, Greg and Agrawal, Sweta and Kanojia, Diptesh and Lo, Chi-Kiu and Zouhar, Vil \'e m and Blain, Frederic and Zerva, Chrysoula and Avramidis, Eleftherios and Deoghare, Sourabh and Sindhujan, Archchana and Wang, Jiayi and Adelani, David Ifeoluwa and Thompson, Brian and Kocmi, Tom and Freitag, Markus and Deutsch, Daniel. Findings of t...

work page doi:10.18653/v1/2025.wmt-1.24 2025

[66] [69]

A Survey on Quantization Methods for Optimization of Deep Neural Networks , year=

Kulkarni, Uday and Hosamani, Abhishek S and Masur, Abhishek S and Hegde, Shashank and Vernekar, Ganesh R and Siri Chandana, K , booktitle=. A Survey on Quantization Methods for Optimization of Deep Neural Networks , year=

work page

[67] [71]

Using a new analytic measure for the annotation and analysis of MT errors on real data

Lommel, Arle and Burchardt, Aljoscha and Popovi \'c , Maja and Harris, Kim and Avramidis, Eleftherios and Uszkoreit, Hans. Using a new analytic measure for the annotation and analysis of MT errors on real data. Proceedings of the 17th Annual Conference of the European Association for Machine Translation. 2014

work page 2014

[68] [72]

Results of the WMT 21 Metrics Shared Task: Evaluating Metrics with Expert-based Human Evaluations on TED and News Domain

Freitag, Markus and Rei, Ricardo and Mathur, Nitika and Lo, Chi-kiu and Stewart, Craig and Foster, George and Lavie, Alon and Bojar, Ond r ej. Results of the WMT 21 Metrics Shared Task: Evaluating Metrics with Expert-based Human Evaluations on TED and News Domain. Proceedings of the Sixth Conference on Machine Translation. 2021

work page 2021

[69] [73]

2025 , eprint=

HY-MT1.5 Technical Report , author=. 2025 , eprint=

work page 2025

[70] [74]

2025 , journal=

Generalizing Verifiable Instruction Following , author=. 2025 , journal=

work page 2025

[71] [76]

M a XIFE : Multilingual and Cross-lingual Instruction Following Evaluation

Liu, Yile and Ma, Ziwei and Jiang, Xiu and Hu, Jinglu and ChangJing, ChangJing and Li, Liang. M a XIFE : Multilingual and Cross-lingual Instruction Following Evaluation. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025

work page 2025

[72] [78]

2026 , eprint=

OpenAI GPT-5 System Card , author=. 2026 , eprint=

work page 2026

[73] [79]

Introducing gemini 3

DeepMind. Introducing gemini 3. https://blog.google/products/gemini/gemini-3-collection/, 2025. Accessed: 2025-12-29

work page 2025

[74] [80]

Bitdistiller: Unleashing the potential of sub-4-bit llms via self-distillation

Dayou Du, Yijia Zhang, Shijie Cao, Jiaqi Guo, Ting Cao, Xiaowen Chu, and Ningyi Xu. Bitdistiller: Unleashing the potential of sub-4-bit llms via self-distillation. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.\ 102--116, 2024

work page 2024

[75] [81]

Results of the WMT 21 metrics shared task: Evaluating metrics with expert-based human evaluations on TED and news domain

Markus Freitag, Ricardo Rei, Nitika Mathur, Chi-kiu Lo, Craig Stewart, George Foster, Alon Lavie, and Ond r ej Bojar. Results of the WMT 21 metrics shared task: Evaluating metrics with expert-based human evaluations on TED and news domain. In Loic Barrault, Ondrej Bojar, Fethi Bougares, Rajen Chatterjee, Marta R. Costa-jussa, Christian Federmann, Mark Fis...

work page 2021

[76] [82]

Multi-if: Benchmarking llms on multi-turn and multilingual instructions following

Yun He, Di Jin, Chaoqi Wang, Chloe Bi, Karishma Mandyam, Hejia Zhang, Chen Zhu, Ning Li, Tengyu Xu, Hongjiang Lv, et al. Multi-if: Benchmarking llms on multi-turn and multilingual instructions following. arXiv preprint arXiv:2410.15553, 2024

work page arXiv 2024

[77] [83]

Sherry: Hardware-efficient 1.25-bit ternary quantization via fine-grained sparsification

Hong Huang, Decheng Wu, Qiangqiang Hu, Guanghua Yu, Jinhai Yang, Jianchen Zhu, Xue Liu, and Dapeng Wu. Sherry: Hardware-efficient 1.25-bit ternary quantization via fine-grained sparsification. arXiv preprint arXiv:2601.07892, 2026

work page arXiv 2026

[78] [84]

Findings of the WMT 25 general machine translation shared task: Time to stop evaluating on easy test sets

Tom Kocmi, Ekaterina Artemova, Eleftherios Avramidis, Rachel Bawden, Ond r ej Bojar, Konstantin Dranch, Anton Dvorkovich, Sergey Dukanov, Mark Fishel, Markus Freitag, Thamme Gowda, Roman Grundkiewicz, Barry Haddow, Marzena Karpinska, Philipp Koehn, Howard Lakougna, Jessica Lundin, Christof Monz, Kenton Murray, Masaaki Nagata, Stefano Perrella, Lorenzo Pro...

work page doi:10.18653/v1/2025.wmt-1.22 2025

[79] [85]

M a XIFE : Multilingual and cross-lingual instruction following evaluation

Yile Liu, Ziwei Ma, Xiu Jiang, Jinglu Hu, ChangJing ChangJing, and Liang Li. M a XIFE : Multilingual and cross-lingual instruction following evaluation. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar (eds.), Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.\...

work page 2025

[80] [86]

Paretoq: Improving scaling laws in extremely low-bit llm quantization

Zechun Liu, Changsheng Zhao, Hanxian Huang, Sijia Chen, Jing Zhang, Jiawei Zhao, Scott Roy, Lisa Jin, Yunyang Xiong, Yangyang Shi, et al. Paretoq: Improving scaling laws in extremely low-bit llm quantization. Advances in Neural Information Processing Systems, 38: 0 91311--91336, 2026

work page 2026