CAT-Translate: Building Compact Open-Source Models for Japanese-English Translation
Pith reviewed 2026-06-26 14:25 UTC · model grok-4.3
The pith
Compact models specialized for Japanese-English translation outperform large multilingual systems on real-world benchmarks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors train compact language models on synthetically generated Japanese-English parallel corpora using two-stage supervised fine-tuning and Multi-Objective GRPO, then demonstrate that these models achieve stronger results than large multilingual translation systems on real-world benchmarks in business, legal, medical, financial, and patent domains.
What carries the argument
Two-stage supervised fine-tuning followed by Multi-Objective GRPO applied to synthetically generated parallel corpora to produce compact bidirectional Japanese-English translation models.
If this is right
- Specialized compact models can surpass much larger multilingual systems when the target is a single language pair and the evaluation uses real-world rather than academic test sets.
- Synthetic data generated at scale can support training that transfers effectively to professional translation tasks.
- Models under 7B parameters can deliver higher accuracy than larger general-purpose systems on domain-specific translation.
- Developers needing only Japanese-English support can obtain better practical performance by training a dedicated small model instead of relying on a general multilingual one.
Where Pith is reading between the lines
- Organizations that deploy translation in narrow domains may reduce inference costs by switching from large multilingual systems to smaller specialized ones.
- The gap between WMT and real-world results suggests that academic benchmarks may understate the value of domain adaptation.
- The same training recipe could be applied to other language pairs where high-quality synthetic data can be produced but authentic parallel text remains limited.
Load-bearing premise
The synthetically generated parallel corpora used for training are of high enough quality and domain coverage to produce models that generalize to real-world business, legal, medical, financial, and patent translations.
What would settle it
A side-by-side evaluation in which the compact models fail to exceed the multilingual models on a fresh collection of authentic Japanese-English translations drawn from the same professional domains.
Figures
read the original abstract
Nowadays, large multilingual translation models demonstrate impressive translation capabilities in the machine translation benchmarks. This raises a practical question to the developers: is it worth developing translation models specialized for a particular language pair if you only need to support that language pair? To give an anecdotal answer to this question, we develop a family of small language models (0.8B, 1.4B, 3.3B, and 7B parameters) specialized for Japanese-English bidirectional translation. We employ a two-stage supervised fine-tuning approach followed by Multi-Objective GRPO (Ichihara et al. 2025) to train models on synthetically generated parallel corpora. We evaluate our models on WMT and real-world translation benchmarks across business, legal, medical, financial, and patent domains. While multilingual models achieve strong performance on WMT benchmarks, our compact models outperform them on real-world benchmarks, suggesting the practical utility of developing specialized translation models even in the era of large multilingual models.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a family of compact open-source models (0.8B, 1.4B, 3.3B, 7B parameters) for bidirectional Japanese-English translation. The models are trained via two-stage supervised fine-tuning followed by Multi-Objective GRPO on synthetically generated parallel corpora. Evaluation is reported on WMT benchmarks plus real-world sets spanning business, legal, medical, financial, and patent domains; the central claim is that the specialized compact models outperform large multilingual models on the real-world benchmarks (while multilingual models remain strong on WMT), demonstrating the practical value of language-pair specialization.
Significance. If substantiated with adequate validation, the result would indicate that compact, pair-specific models can deliver superior domain generalization for Japanese-English translation in practical settings compared with general multilingual systems. The open-source release of the model family would constitute a concrete, usable contribution to the field.
major comments (2)
- [Abstract] Abstract: the headline claim that compact models 'outperform' multilingual baselines on real-world business/legal/medical/financial/patent benchmarks is presented without any reported details on data-generation procedure, evaluation metrics, statistical significance testing, or exact baseline configurations. This information is load-bearing for the central empirical claim.
- [Abstract / Methods (inferred from training description)] The manuscript provides no quantitative validation (e.g., n-gram overlap, terminology fidelity, or human preference scores) that the synthetically generated parallel corpora match the distribution or quality of authentic in-domain references. Because the generalization argument rests entirely on these corpora, the absence of such checks directly affects the defensibility of the outperformance result.
Simulated Author's Rebuttal
We thank the referee for the detailed feedback on the presentation of our empirical claims. We address each major comment below and have made revisions to improve clarity and transparency where the manuscript can be strengthened without altering the core results.
read point-by-point responses
-
Referee: [Abstract] Abstract: the headline claim that compact models 'outperform' multilingual baselines on real-world business/legal/medical/financial/patent benchmarks is presented without any reported details on data-generation procedure, evaluation metrics, statistical significance testing, or exact baseline configurations. This information is load-bearing for the central empirical claim.
Authors: We agree that the abstract would benefit from additional context to make the central claim self-contained. In the revised manuscript, we have expanded the abstract to specify the primary metrics (BLEU and COMET), the exact multilingual baselines compared (NLLB-3.3B and Llama-3-8B variants), and that statistical significance was evaluated using paired bootstrap resampling. The data-generation procedure remains detailed in Section 3 but is now referenced concisely in the abstract. These changes address the load-bearing information without exceeding typical abstract length. revision: yes
-
Referee: [Abstract / Methods (inferred from training description)] The manuscript provides no quantitative validation (e.g., n-gram overlap, terminology fidelity, or human preference scores) that the synthetically generated parallel corpora match the distribution or quality of authentic in-domain references. Because the generalization argument rests entirely on these corpora, the absence of such checks directly affects the defensibility of the outperformance result.
Authors: We acknowledge that direct quantitative validation of the synthetic corpora (such as n-gram overlap or human preference scores against authentic references) is not reported. The manuscript relies on the downstream performance on real-world domain test sets as evidence of utility. In revision, we have added a dedicated paragraph in Section 3.2 describing the synthetic generation pipeline and its domain-specific prompting strategy, plus a limitations paragraph noting the absence of explicit fidelity metrics and identifying this as an area for future verification. We maintain that the end-to-end benchmark results remain informative, but agree the point merits explicit acknowledgment. revision: partial
Circularity Check
No circularity: empirical training/evaluation chain is independent of inputs
full rationale
The paper describes an empirical pipeline (two-stage SFT + Multi-Objective GRPO on synthetic corpora, followed by evaluation on WMT and domain-specific benchmarks). No equations, derivations, or first-principles claims exist that could reduce to fitted parameters or self-definitions by construction. The central result (compact models outperforming multilingual baselines on real-world benchmarks) is measured on held-out external test sets and does not rely on any load-bearing self-citation chain or ansatz smuggled via prior work. The Ichihara et al. 2025 citation is to a training method, not to a uniqueness theorem or result that the present paper's claims depend upon. This is the normal case of a self-contained empirical study.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Synthetic parallel corpora can train effective domain-specific translation models
Reference graph
Works this paper leans on
-
[1]
Sandhini Agarwal, Lama Ahmad, Jason Ai, Sam Altman, Andy Applebaum, Edwin Arbus, Rahul K Arora, Yu Bai, Bowen Baker, Haiming Bao, and 1 others. 2025. gpt-oss-120b & gpt-oss-20b model card. arXiv preprint arXiv:2508.10925
Pith/arXiv arXiv 2025
-
[2]
Roee Aharoni, Melvin Johnson, and Orhan Firat. 2019. https://doi.org/10.18653/v1/N19-1388 Massively multilingual neural machine translation . In Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) , pages 3874--3884, Minneapolis, M...
-
[3]
Mikko Aulamo, Nikolay Bogoychev, Shaoxiong Ji, Graeme Nail, Gema Ram \'i rez-S \'a nchez, J \"o rg Tiedemann, Jelmer van der Linde, and Jaume Zaragoza. 2023. https://aclanthology.org/2023.eamt-1.61/ HPLT : High performance language technologies . In Proceedings of the 24th Annual Conference of the European Association for Machine Translation, pages 517--5...
2023
-
[4]
Mona Baker, Gill Francis, and Elena Tognini-Bonelli, editors. 1993. https://www.jbe-platform.com/content/books/9789027285874 Text and Technology . John Benjamins
arXiv 1993
-
[5]
Laurie Burchell, Ona de Gibert, Nikolay Arefyev, Mikko Aulamo, Marta Ba \ n \'o n, Pinzhen Chen, Mariia Fedorova, Liane Guillou, Barry Haddow, Jan Haji c , Jind r ich Helcl, Erik Henriksson, Mateusz Klimaszewski, Ville Komulainen, Andrey Kutuzov, Joona Kyt \"o niemi, Veronika Laippala, Petter M hlum, Bhavitvya Malik, and 16 others. 2025. https://doi.org/1...
-
[6]
Justin Cui, Wei-Lin Chiang, Ion Stoica, and Cho-Jui Hsieh. 2025 a . https://openreview.net/forum?id=CdFnEu0JZV OR -bench: An over-refusal benchmark for large language models . In Forty-second International Conference on Machine Learning
2025
-
[7]
Menglong Cui, Pengzhi Gao, Wei Liu, Jian Luan, and Bin Wang. 2025 b . https://doi.org/10.18653/v1/2025.naacl-long.280 Multilingual machine translation with open large language models at practical scale: An empirical study . In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human ...
-
[8]
Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Michael Auli, and Armand Joulin. 2021. http://jmlr.org/papers/v22/20-1307.html Beyond english-centric multilingual machine translation . Journa...
2021
-
[9]
Mara Finkelstein, Isaac Caswell, Tobias Domhan, Jan-Thorsten Peter, Juraj Juraska, Parker Riley, Daniel Deutsch, Geza Kovacs, Cole Dilanni, Colin Cherry, Eleftheria Briakou, Elizabeth Nielsen, Jiaming Luo, Kat Black, Ryan Mullins, Sweta Agrawal, Wenda Xu, Erin Kats, Stephane Jaskiewicz, and 2 others. 2026. Translate G emma T echnical R eport. arXiv prepri...
arXiv 2026
-
[10]
Markus Freitag, David Grangier, and Isaac Caswell. 2020. https://doi.org/10.18653/v1/2020.emnlp-main.5 BLEU might be guilty but references are not innocent . In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 61--71, Online. Association for Computational Linguistics
-
[11]
Markus Freitag, Nitika Mathur, Daniel Deutsch, Chi-Kiu Lo, Eleftherios Avramidis, Ricardo Rei, Brian Thompson, Frederic Blain, Tom Kocmi, Jiayi Wang, David Ifeoluwa Adelani, Marianna Buchicchio, Chrysoula Zerva, and Alon Lavie. 2024. https://doi.org/10.18653/v1/2024.wmt-1.2 Are LLM s breaking MT metrics? results of the WMT 24 metrics shared task . In Proc...
-
[12]
Leo Gao, John Schulman, and Jacob Hilton. 2023. https://proceedings.mlr.press/v202/gao23h.html Scaling L aws for R eward M odel O veroptimization . In Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 10835--10866. PMLR
2023
-
[13]
Charles Goddard, Shamane Siriwardhana, Malikeh Ehghaghi, Luke Meyers, Vladimir Karpukhin, Brian Benedict, Mark McQuade, and Jacob Solawetz. 2024. https://doi.org/10.18653/v1/2024.emnlp-industry.36 Arcee ' s M erge K it: A toolkit for merging large language models . In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: ...
-
[14]
C. A. E. Goodhart. 1984. https://doi.org/10.1007/978-1-349-17295-5_4 Problems of Monetary Management: The UK Experience , pages 91--121. Macmillan Education UK, London
-
[15]
Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, Xiaokang Zhang, Xingkai Yu, Yu Wu, Z. F. Wu, Zhibin Gou, Zhihong Shao, Zhuoshu Li, Ziyi Gao, Aixin Liu, and 175 others. 2025. Deep S eek- R 1: I ncentivizing R easoning C apability in LLM s via R einforcement L earning. arXiv preprint arX...
Pith/arXiv arXiv 2025
-
[16]
Masanori Hirano and Kentaro Imashiro. 2025. 金融分野に特化した複数ターン日本語生成ベンチマークの構築 [translate: Development of a multi-turn japanese generation benchmark specialized in the financial sector]. In Proceedings of the Thirty-first Annual Meeting of the Association for Natural Language Processing
2025
-
[17]
Masanori Hirano, Kentaro Imashiro, Kento Nozawa, and Kaizaburo Nakahachi. 2025. https://doi.org/10.51094/jxiv.1461 Plamo translate: 翻訳特化大規模言語モデルの開発 [translate: Plamo translate: Development of a large-scale language model specialized for translation] . In Proceedings of the Thirty-first Annual Meeting of the Association for Natural Language Processing
-
[18]
Hieu Hoang and Philipp Koehn. 2008. https://aclanthology.org/W08-0510/ Design of the M oses decoder for statistical machine translation . In Software Engineering, Testing, and Quality Assurance for Natural Language Processing, pages 58--65, Columbus, Ohio. Association for Computational Linguistics
2008
-
[19]
Pin-Lun Hsu, Yun Dai, Vignesh Kothapalli, Qingquan Song, Shao Tang, Siyu Zhu, Steven Shimizu, Shivam Sahni, Haowen Ning, Yanning Chen, and Zhipeng Wang. 2025. https://openreview.net/forum?id=36SjAIT42G Liger-kernel: Efficient triton kernels for LLM training . In Championing Open-source DEvelopment in ML Workshop @ ICML25
2025
-
[20]
Yuki Ichihara, Yuu Jinnai, Tetsuro Morimura, Mitsuki Sakamoto, Ryota Mitsuhashi, and Eiji Uchibe. 2025. https://doi.org/10.48550/arXiv.2509.22047 M O-GRPO : M itigating R eward H acking of G roup R elative P olicy O ptimization on M ulti- O bjective P roblems . arXiv preprint arXiv:2509.22047
-
[21]
Bartoldson, Bhavya Kailkhura, Avi Schwarzschild, Aniruddha Saha, Micah Goldblum, Jonas Geiping, and Tom Goldstein
Neel Jain, Ping yeh Chiang, Yuxin Wen, John Kirchenbauer, Hong-Min Chu, Gowthami Somepalli, Brian R. Bartoldson, Bhavya Kailkhura, Avi Schwarzschild, Aniruddha Saha, Micah Goldblum, Jonas Geiping, and Tom Goldstein. 2024. https://openreview.net/forum?id=0bMmZ3fkCk NEFT une: Noisy embeddings improve instruction finetuning . In The Twelfth International Con...
2024
-
[22]
Junfeng Jiang, Jiahao Huang, and Akiko Aizawa. 2025. https://aclanthology.org/2025.coling-main.395/ JM ed B ench: A benchmark for evaluating J apanese biomedical large language models . In Proceedings of the 31st International Conference on Computational Linguistics, pages 5918--5935, Abu Dhabi, UAE. Association for Computational Linguistics
2025
-
[23]
Pratik Joshi, Sebastin Santy, Amar Budhiraja, Kalika Bali, and Monojit Choudhury. 2020. https://doi.org/10.18653/v1/2020.acl-main.560 The state and fate of linguistic diversity and inclusion in the NLP world . In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6282--6293, Online. Association for Computational...
-
[24]
Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2017. https://aclanthology.org/E17-2068/ Bag of tricks for efficient text classification . In Proceedings of the 15th Conference of the E uropean Chapter of the Association for Computational Linguistics: Volume 2, Short Papers , pages 427--431, Valencia, Spain. Association for Computationa...
2017
-
[25]
Juraj Juraska, Daniel Deutsch, Mara Finkelstein, and Markus Freitag. 2024. https://doi.org/10.18653/v1/2024.wmt-1.35 M etric X -24: The G oogle submission to the WMT 2024 metrics shared task . In Proceedings of the Ninth Conference on Machine Translation, pages 492--504, Miami, Florida, USA. Association for Computational Linguistics
-
[26]
Tom Kocmi, Ekaterina Artemova, Eleftherios Avramidis, Rachel Bawden, Ond r ej Bojar, Konstantin Dranch, Anton Dvorkovich, Sergey Dukanov, Mark Fishel, Markus Freitag, Thamme Gowda, Roman Grundkiewicz, Barry Haddow, Marzena Karpinska, Philipp Koehn, Howard Lakougna, Jessica Lundin, Christof Monz, Kenton Murray, and 10 others. 2025. https://doi.org/10.18653...
-
[27]
Tom Kocmi, Eleftherios Avramidis, Rachel Bawden, Ond r ej Bojar, Anton Dvorkovich, Christian Federmann, Mark Fishel, Markus Freitag, Thamme Gowda, Roman Grundkiewicz, Barry Haddow, Marzena Karpinska, Philipp Koehn, Benjamin Marie, Christof Monz, Kenton Murray, Masaaki Nagata, Martin Popel, Maja Popovi \'c , and 3 others. 2024. https://doi.org/10.18653/v1/...
-
[28]
Tom Kocmi and Christian Federmann. 2023 a . https://doi.org/10.18653/v1/2023.wmt-1.64 GEMBA - MQM : Detecting translation quality error spans with GPT -4 . In Proceedings of the Eighth Conference on Machine Translation, pages 768--775, Singapore. Association for Computational Linguistics
-
[29]
Tom Kocmi and Christian Federmann. 2023 b . https://aclanthology.org/2023.eamt-1.19/ Large language models are state-of-the-art evaluators of translation quality . In Proceedings of the 24th Annual Conference of the European Association for Machine Translation, pages 193--203, Tampere, Finland. European Association for Machine Translation
2023
-
[30]
Katherine Lee, Daphne Ippolito, Andrew Nystrom, Chiyuan Zhang, Douglas Eck, Chris Callison-Burch, and Nicholas Carlini. 2022. https://doi.org/10.18653/v1/2022.acl-long.577 Deduplicating training data makes language models better . In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8424...
-
[31]
Youyuan Lin, Masaaki Nagata, and Chenhui Chu. 2024. Post-editing with error annotation for machine translation: Dataset construction using gpt-4. In Proceedings of the Thirtieth Annual Meeting of the Association for Natural Language Processing
2024
-
[32]
Zichen Liu, Changyu Chen, Wenjun Li, Penghui Qi, Tianyu Pang, Chao Du, Wee Sun Lee, and Min Lin. 2025. https://doi.org/10.48550/arXiv.2503.20783 Understanding R 1- Z ero-like training: A critical perspective . In Conference on Language Modeling (COLM)
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2503.20783 2025
-
[33]
Marco Lui and Timothy Baldwin. 2012. https://aclanthology.org/P12-3005/ langid.py: An off-the-shelf language identification tool . In Proceedings of the ACL 2012 System Demonstrations , pages 25--30, Jeju Island, Korea. Association for Computational Linguistics
2012
-
[34]
Makoto Morishita, Katsuki Chousa, Jun Suzuki, and Masaaki Nagata. 2022. https://aclanthology.org/2022.lrec-1.721/ JP ara C rawl v3.0: A large-scale E nglish- J apanese parallel corpus . In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 6704--6710, Marseille, France. European Language Resources Association
2022
-
[35]
Makoto Morishita, Jun Suzuki, and Masaaki Nagata. 2020. https://aclanthology.org/2020.lrec-1.443/ JP ara C rawl: A large scale web-based E nglish- J apanese parallel corpus . In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 3603--3609, Marseille, France. European Language Resources Association
2020
-
[36]
Toshiaki Nakazawa, Takashi Tsunakawa, Isao Goto, Kazuhiro Kasada, Katsuhito Sudoh, Shoichi Okuyama, Takashi Ieda, and Masaaki Nagata. 2025. https://doi.org/10.18653/v1/2025.wat-1.1 Findings of the first patent claims translation task at WAT 2025 . In Proceedings of the Twelfth Workshop on Asian Translation (WAT 2025), pages 1--15, Mumbai, India. Associati...
-
[37]
Dayy \'a n O ' Brien, Bhavitvya Malik, Ona de Gibert, Pinzhen Chen, Barry Haddow, and J \"o rg Tiedemann. 2025. https://doi.org/10.18653/v1/2025.wmt-1.17 D oc HPLT : A massively multilingual document-level translation dataset . In Proceedings of the Tenth Conference on Machine Translation, pages 286--300, Suzhou, China. Association for Computational Linguistics
-
[38]
Andrew Or, Apurva Jain, Daniel Vega-Myhre, Jesse Cai, Charles David Hernandez, Zhenrui Zheng, Driss Guessous, Vasiliy Kuznetsov, Christian Puhrsch, Mark Saroufim, Supriya Rao, Thien Tran, and Aleksandar Samardžić. 2025. Torchao: Pytorch-native training-to-serving model optimization. arXiv preprint arXiv:2507.16099
arXiv 2025
-
[39]
Alexander Pan, Kush Bhatia, and Jacob Steinhardt. 2022. https://openreview.net/forum?id=JYtwGwIL7ye The E ffects of R eward M isspecification: M apping and M itigating M isaligned M odels . In International Conference on Learning Representations
2022
-
[40]
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. https://doi.org/10.3115/1073083.1073135 B leu: a method for automatic evaluation of machine translation . In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311--318, Philadelphia, Pennsylvania, USA. Association for Computational Linguistics
-
[41]
Guilherme Penedo, Hynek Kydl \' c ek, Loubna Ben allal, Anton Lozhkov, Margaret Mitchell, Colin Raffel, Leandro Von Werra, and Thomas Wolf. 2024. https://openreview.net/forum?id=n6SCkn2QaG The fineweb datasets: Decanting the web for the finest text data at scale . In The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchma...
2024
-
[42]
Guerreiro, Ricardo Rei, and Andr \'e F
Jos \'e Pombal, Nuno M. Guerreiro, Ricardo Rei, and Andr \'e F. T. Martins. 2025 a . https://doi.org/10.1162/tacl.a.37 Adding chocolate to mint: Mitigating metric interference in machine translation . Transactions of the Association for Computational Linguistics, 13:1319--1339
-
[43]
Jos \'e Pombal, Dongkeun Yoon, Patrick Fernandes, Ian Wu, Seungone Kim, Ricardo Rei, Graham Neubig, and Andre Martins. 2025 b . M-prometheus: A suite of open multilingual LLM judges. In Second Conference on Language Modeling
2025
-
[44]
Matt Post. 2018. https://doi.org/10.18653/v1/W18-6319 A call for clarity in reporting BLEU scores . In Proceedings of the Third Conference on Machine Translation: Research Papers, pages 186--191, Brussels, Belgium. Association for Computational Linguistics
-
[45]
Ricardo Rei, Jos \'e G. C. de Souza, Duarte Alves, Chrysoula Zerva, Ana C Farinha, Taisiya Glushkova, Alon Lavie, Luisa Coheur, and Andr \'e F. T. Martins. 2022. https://doi.org/10.18653/v1/2022.wmt-1.52 COMET -22: Unbabel- IST 2022 submission for the metrics shared task . In Proceedings of the Seventh Conference on Machine Translation (WMT), pages 578--5...
-
[46]
Ricardo Rei, Craig Stewart, Ana C Farinha, and Alon Lavie. 2020. https://doi.org/10.18653/v1/2020.wmt-1.101 Unbabel ' s participation in the WMT 20 metrics shared task . In Proceedings of the Fifth Conference on Machine Translation, pages 911--920, Online. Association for Computational Linguistics
-
[47]
Mat \= i ss Rikters, Ryokan Ri, Tong Li, and Toshiaki Nakazawa. 2019. https://doi.org/10.18653/v1/D19-5204 Designing the business conversation corpus . In Proceedings of the 6th Workshop on Asian Translation, pages 54--61, Hong Kong, China. Association for Computational Linguistics
-
[48]
Paul R \"o ttger, Hannah Kirk, Bertie Vidgen, Giuseppe Attanasio, Federico Bianchi, and Dirk Hovy. 2024. https://doi.org/10.18653/v1/2024.naacl-long.301 XST est: A test suite for identifying exaggerated safety behaviours in large language models . In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Ling...
-
[49]
Holger Schwenk, Guillaume Wenzek, Sergey Edunov, Edouard Grave, Armand Joulin, and Angela Fan. 2021. https://doi.org/10.18653/v1/2021.acl-long.507 CCM atrix: Mining billions of high-quality parallel sentences on the web . In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference ...
-
[50]
NLLB Team. 2024. Scaling neural machine translation to 200 languages. Nature, 630(8018):841--846
2024
-
[51]
Seiko Yamagishi, Shunsuke Shindo, and Yusuke Miyao. 2025. 大規模言語モデルの法廷通訳への導入可能性の検証 [translate: An investigation into the feasibility of introducing large language models into court interpretation]. In Proceedings of the Thirty-first Annual Meeting of the Association for Natural Language Processing
2025
-
[52]
An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, and 1 others. 2025. Qwen3 technical report. arXiv preprint arXiv:2505.09388
Pith/arXiv arXiv 2025
-
[53]
Mao Zheng, Zheng Li, Tao Chen, Mingyang Song, and Di Wang. 2025 a . Hy-mt1.5 technical report. arXiv preprint arXiv:2512.24092
arXiv 2025
-
[54]
Mao Zheng, Zheng Li, Yang Du, Bingxin Qu, and Mingyang Song. 2025 b . https://doi.org/10.18653/v1/2025.wmt-1.36 Shy-hunyuan- MT at WMT 25 general machine translation shared task . In Proceedings of the Tenth Conference on Machine Translation, pages 607--613, Suzhou, China. Association for Computational Linguistics
-
[55]
Mao Zheng, Zheng Li, Bingxin Qu, Mingyang Song, Yang Du, Mingrui Sun, and Di Wang. 2025 c . Hunyuan-mt technical report. arXiv preprint arXiv:2509.05209
arXiv 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.