Recognition: no theorem link
CHORUS: Effort-Aware Multi-Agent Human-AI Collaboration for Professional Translation
Pith reviewed 2026-05-15 20:57 UTC · model grok-4.3
The pith
A multi-agent AI system for professional translators reduces completion time by 33.8 percent, lowers cognitive effort, and improves final quality.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CHORUS is a mixed-initiative translation system that incorporates MQM theory to support the translation process and personal style as translators work. Formative findings established the benefit of MQM and the requirement to adapt to each translator's idiosyncratic traits. The within-subject study with 30 licensed English-Chinese translators found that the system reduced completion time by 33.8%, lowered translators' cognitive effort, and improved final translation quality according to BLEU and COMET metrics. Participants reported that issues became easier to inspect, repeated prompting dropped relative to single-agent systems, and the interface offered reflections on their habits.
What carries the argument
Multi-agent AI architecture that applies MQM theory for issue detection, adapts interfaces to individual translator traits, and minimizes repeated prompting while providing habit reflections.
If this is right
- Translators complete tasks in roughly two-thirds the time while maintaining or raising quality.
- Cognitive effort drops, which may reduce fatigue during extended professional sessions.
- Final outputs achieve higher automatic metric scores on BLEU and COMET.
- Translation issues become easier to locate and address during the workflow.
- Repeated prompting to AI agents decreases, freeing attention for higher-level decisions.
Where Pith is reading between the lines
- Similar multi-agent structures could transfer to other expert domains that require process accountability, such as legal editing or technical documentation.
- Trait adaptation suggests that future tools could track and reflect user patterns across sessions to build long-term personalization.
- Lowered effort might enable translators to accept higher volumes or more complex projects without quality loss.
- Reduced prompting repetition points to efficiency gains in any human-AI loop where iteration currently consumes time.
Load-bearing premise
Incorporating MQM theory improves professional translation outcomes and the system can effectively adapt to each translator's idiosyncratic traits identified in the formative study.
What would settle it
A replication study with licensed translators that finds no significant reduction in completion time, no drop in reported cognitive effort, and no improvement in BLEU or COMET scores when CHORUS is compared to standard single-agent translation tools.
Figures
read the original abstract
Despite the widespread use of automatic AI translation systems in daily language tasks, professional translation remains crucial in domain-specific and high-stakes scenarios. Yet professional translators rarely rely on these systems in their everyday practice due to a lack of detailed support for the translation process, matching professional styles, and accountability for the final outcome. To bridge the gap, we present CHORUS, a mixed-initiative translation system that supports the translation process and personal style as translators work. A formative study found that incorporating MQM theory may be beneficial for achieving professional translation, and that the system should adapt to each individual translator's idiosyncratic traits. The final within-subject study with 30 licensed English--Chinese translators found that our system reduced completion time by 33.8\%, lowered translators' cognitive effort, and improved final translation quality using the BLEU and COMET as automatic evaluation metrics. Participants' qualitative analysis also revealed that the system made translation issues easier to inspect, reduced repeated prompting compared to single-agent AI systems, and offered reflections on their habits and traits. Our findings illustrate how multi-agent AI systems can be designed to support expert workflows and their potential for professional use.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents CHORUS, a mixed-initiative multi-agent system for professional English-to-Chinese translation. Drawing from a formative study that identified benefits of MQM theory and need for personalization, the authors conduct a within-subject study with 30 licensed translators showing that CHORUS reduces completion time by 33.8%, lowers cognitive effort, and improves quality per BLEU and COMET metrics, along with qualitative benefits in issue inspection and reduced prompting.
Significance. If the empirical findings are robustly supported, this research would contribute meaningfully to HCI by illustrating how multi-agent AI can be integrated into expert professional workflows, offering time savings and reduced effort in high-stakes translation. The focus on licensed professionals and adaptation to individual traits adds practical relevance.
major comments (2)
- [Within-subject evaluation] The improvement in final translation quality is asserted based solely on automatic BLEU and COMET scores. However, the formative study highlighted MQM theory as beneficial, yet no MQM or human expert scoring is reported in the evaluation to validate the quality gains. Given that BLEU and COMET may not fully capture stylistic and domain-specific accuracy in professional English-Chinese translation, this undermines the central claim of quality improvement.
- [Results section] The abstract and reported results lack statistical details such as p-values, effect sizes, or descriptions of baselines and controls for the within-subject study with 30 participants. This makes it challenging to evaluate the significance of the 33.8% time reduction and other outcomes.
minor comments (2)
- [Abstract] The abstract mentions 'lowered translators' cognitive effort' without specifying the measurement method (e.g., NASA-TLX or similar).
- [Formative study] More details on how the system adapts to idiosyncratic traits could be provided to clarify the implementation.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback, which helps us clarify the evaluation approach and strengthen the presentation of results. We address each major comment below with our response and planned revisions.
read point-by-point responses
-
Referee: [Within-subject evaluation] The improvement in final translation quality is asserted based solely on automatic BLEU and COMET scores. However, the formative study highlighted MQM theory as beneficial, yet no MQM or human expert scoring is reported in the evaluation to validate the quality gains. Given that BLEU and COMET may not fully capture stylistic and domain-specific accuracy in professional English-Chinese translation, this undermines the central claim of quality improvement.
Authors: We acknowledge that BLEU and COMET have limitations in capturing stylistic nuances and domain-specific accuracy for professional English-to-Chinese translation. The formative study used MQM insights to guide system design (e.g., issue detection support), but the main evaluation with 30 participants prioritized scalable automatic metrics alongside qualitative feedback on easier issue inspection and reduced prompting. We will revise the manuscript to add an explicit discussion of metric limitations, cite relevant literature on their correlation with human judgments in translation, and qualify the quality claims as supported by both automatic scores and participant reports. This will appear in the Results and Limitations sections. revision: partial
-
Referee: [Results section] The abstract and reported results lack statistical details such as p-values, effect sizes, or descriptions of baselines and controls for the within-subject study with 30 participants. This makes it challenging to evaluate the significance of the 33.8% time reduction and other outcomes.
Authors: We agree that the abstract and high-level results summary should include these details for clarity. The full Results section already reports paired t-tests, p-values, and effect sizes (e.g., for completion time and cognitive effort) with the single-agent AI condition as the within-subject baseline. We will revise the abstract to incorporate key statistical information and ensure the results summary explicitly describes the tests, effect sizes, and controls. These updates will be made in the next version. revision: yes
Circularity Check
No circularity in empirical user study
full rationale
The paper reports direct measurements from a within-subject study with 30 licensed translators (completion time reduced 33.8%, cognitive effort lowered, quality via BLEU/COMET) and a prior formative study. No equations, fitted parameters, derivations, or self-citation chains exist that reduce any claim to its inputs by construction. Outcomes are independent participant data, not statistical artifacts or renamed fits. This is a standard empirical HCI paper whose central results stand on external human-subject evidence.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption MQM theory is beneficial for achieving professional translation quality
- domain assumption The system should adapt to each individual translator's idiosyncratic traits
Reference graph
Works this paper leans on
-
[1]
Vicent Alabau, Ragnar Bonkb, Christian Buck, Michael Carlb, Francisco Casacu- berta Nolla, Mercedes García-Martínez, Jesus Gonzalez Rubio, Philipp Koehn, Luis Alberto Leiva Torres, Bartolomé Mesa-Lao, et al. 2013. CASMACAT: An open source workbench for advanced computer aided translation. (2013)
work page 2013
-
[2]
Abeer Alabbas and Khalid Alomar. 2025. A weighted composite metric for evalu- ating user experience in educational chatbots: balancing usability, engagement, and effectiveness.Future Internet17, 2 (2025), 64
work page 2025
-
[3]
Luisa Bentivogli, Mauro Cettolo, Marcello Federico, and Christian Federmann
-
[4]
InProceedings of the 15th International Conference on Spoken Language Translation
Machine translation human evaluation: an investigation of evaluation based on post-editing and its relation with direct assessment. InProceedings of the 15th International Conference on Spoken Language Translation. 62–69
-
[5]
Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology. Qualitative research in psychology3, 2 (2006), 77–101
work page 2006
- [6]
-
[7]
Chris Callison-Burch, Miles Osborne, and Philipp Koehn. 2006. Re-evaluating the role of BLEU in machine translation research. In11th conference of the european chapter of the association for computational linguistics. 249–256
work page 2006
-
[8]
Eirini Chatzikoumi. 2020. How to evaluate machine translation: A review of automated and human metrics.Natural Language Engineering26, 2 (2020), 137– 161
work page 2020
-
[9]
Pinzhen Chen, Zhicheng Guo, Barry Haddow, and Kenneth Heafield. 2024. Itera- tive translation refinement with large language models. InProceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1). 181–190
work page 2024
-
[10]
Daniel Deutsch, Eleftheria Briakou, Isaac Caswell, Mara Finkelstein, Rebecca Galor, Juraj Juraska, Geza Kovacs, Alison Lui, Ricardo Rei, Jason Riesa, Shruti Rijhwani, Parker Riley, Elizabeth Salesky, Firas Trabelsi, Stephanie Winkler, Biao Zhang, and Markus Freitag. 2025. WMT24++: Expanding the Language Coverage of WMT24 to 55 Languages & Dialects. arXiv:...
-
[11]
Yue Dong, Zichao Li, Mehdi Rezagholizadeh, and Jackie Chi Kit Cheung. 2019. EditNTS: An neural programmer-interpreter model for sentence simplification through explicit editing. InProceedings of the 57th Annual Meeting of the Associa- tion for Computational Linguistics. 3393–3402
work page 2019
-
[12]
Johannes Eschbach-Dymanus, Frank Essenberger, Bianka Buschbeck, and Miriam Exel. 2024. Exploring the effectiveness of llm domain adaptation for business it machine translation. InProceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1). 610–622
work page 2024
-
[13]
Zhaopeng Feng, Yan Zhang, Hao Li, Wenqiang Liu, Jun Lang, Yang Feng, Jian Wu, and Zuozhu Liu. 2024. Improving llm-based machine translation with systematic self-correction.CoRR(2024)
work page 2024
-
[14]
George Foster, Pierre Isabelle, and Pierre Plamondon. 1997. Target-text mediated interactive machine translation.Machine Translation12, 1 (1997), 175–194
work page 1997
-
[15]
Markus Freitag, George Foster, David Grangier, Viresh Ratnakar, Qijun Tan, and Wolfgang Macherey. 2021. Experts, errors, and context: A large-scale study of human evaluation for machine translation.Transactions of the Association for Computational Linguistics9 (2021), 1460–1474
work page 2021
-
[16]
Markus Freitag, Nitika Mathur, Daniel Deutsch, Chi-Kiu Lo, Eleftherios Avramidis, Ricardo Rei, Brian Thompson, Frederic Blain, Tom Kocmi, Jiayi Wang, et al. 2024. Are LLMs breaking MT metrics? results of the WMT24 metrics shared task. In Proceedings of the Ninth Conference on Machine Translation. 47–81
work page 2024
-
[17]
Markus Freitag, Ricardo Rei, Nitika Mathur, Chi-kiu Lo, Craig Stewart, George Foster, Alon Lavie, and Ondřej Bojar. 2021. Results of the WMT21 metrics shared task: Evaluating metrics with expert-based human evaluations on TED and news domain. InProceedings of the Sixth Conference on Machine Translation. 733–774
work page 2021
-
[18]
António Góis and André FT Martins. 2019. Translator2vec: Understanding and representing human post-editors. InProceedings of Machine Translation Summit XVII: Research Track. 43–54
work page 2019
-
[19]
Attila Görög. 2014. Quality evaluation today: the dynamic quality framework. In Proceedings of Translating and the Computer 36
work page 2014
-
[20]
Yvette Graham, Timothy Baldwin, Alistair Moffat, and Justin Zobel. 2013. Con- tinuous measurement scales in human evaluation of machine translation. In Proceedings of the 7th linguistic annotation workshop and interoperability with discourse. 33–41
work page 2013
-
[21]
Yvette Graham, Barry Haddow, and Philipp Koehn. 2019. Translationese in machine translation evaluation.arXiv preprint arXiv:1906.09833(2019)
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[22]
Spence Green, Jeffrey Heer, and Christopher D Manning. 2015. Natural language translation at the intersection of AI and HCI.Commun. ACM58, 9 (2015), 46–53
work page 2015
-
[23]
Spence Green, Sida I Wang, Jason Chuang, Jeffrey Heer, Sebastian Schuster, and Christopher D Manning. 2014. Human effort and machine learnability in computer aided translation. InProceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 1225–1236
work page 2014
- [24]
-
[25]
James W Hunt and Thomas G Szymanski. 1977. A fast algorithm for computing longest common subsequences.Commun. ACM20, 5 (1977), 350–353
work page 1977
-
[26]
Dayeon Ki and Marine Carpuat. 2024. Guiding large language models to post-edit machine translation with error annotations. InFindings of the Association for Computational Linguistics: NAACL 2024. 4253–4273
work page 2024
-
[27]
Rebecca Knowles and Philipp Koehn. 2016. Neural interactive translation predic- tion. InConferences of the Association for Machine Translation in the Americas: MT Researchers’ Track. 107–120
work page 2016
-
[28]
Maarit Koponen. 2016. Is machine translation post-editing worth the effort? A survey of research into post-editing and effort.The Journal of Specialised Translation25 (2016), 131–148
work page 2016
-
[29]
2001.Repairing texts: Empirical investigations of machine transla- tion post-editing processes
Hans P Krings. 2001.Repairing texts: Empirical investigations of machine transla- tion post-editing processes. Vol. 5. Kent State University Press
work page 2001
-
[30]
Joseph B Kruskal. 1983. An overview of sequence comparison: Time warps, string edits, and macromolecules.SIAM review25, 2 (1983), 201–237
work page 1983
-
[31]
Tsz Kin Lam, Shigehiko Schamoni, and Stefan Riezler. 2019. Interactive-predictive neural machine translation through reinforcement and imitation. InProceedings of Machine Translation Summit XVII: Research Track. 96–106
work page 2019
-
[32]
Philippe Langlais, George Foster, and Guy Lapalme. 2000. TransType: a computer- aided translation typing system. InANLP-NAACL 2000 Workshop: Embedded Machine Translation Systems
work page 2000
-
[33]
Samuel Läubli, Sheila Castilho, Graham Neubig, Rico Sennrich, Qinlan Shen, and Antonio Toral. 2020. A set of recommendations for assessing human–machine parity in language translation.Journal of artificial intelligence research67 (2020), 653–672
work page 2020
-
[34]
Arle Lommel. 2018. Metrics for translation quality assessment: A case for stan- dardising error typologies. InTranslation quality assessment: From principles to practice. Springer, 109–127
work page 2018
-
[35]
Arle Lommel, Aljoscha Burchardt, Maja Popović, Kim Harris, Eleftherios Avramidis, and Hans Uszkoreit. 2014. Using a new analytic measure for the annotation and analysis of MT errors on real data. InProceedings of the 17th Annual conference of the European Association for Machine Translation. 165–172
work page 2014
-
[36]
Arle Lommel, Serge Gladkoff, Alan K Melby, Sue Ellen Wright, Ingemar Strand- vik, Katerina Gasova, Angelika Vaasa, Andy Benzo, Romina Marazzato Sparano, Monica Foresi, et al. 2024. The multi-range theory of translation quality mea- surement: MQM scoring models and statistical quality control. InProceedings of the 16th Conference of the Association for Mac...
work page 2024
-
[37]
Arle Lommel, Hans Uszkoreit, and Aljoscha Burchardt. 2014. Multidimensional quality metrics (MQM): A framework for declaring and describing translation quality metrics.Tradumàtica12 (2014), 0455–463
work page 2014
-
[38]
Qingyu Lu, Liang Ding, Kanjian Zhang, Jinxia Zhang, and Dacheng Tao. 2025. MQM-APE: toward high-quality error annotation predictors with automatic post- editing in LLM translation evaluators. InProceedings of the 31st International Conference on Computational Linguistics. 5570–5587. Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Wang et al
work page 2025
-
[39]
Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, et al
-
[40]
Self-refine: Iterative refinement with self-feedback.Advances in Neural Information Processing Systems36 (2023), 46534–46594
work page 2023
-
[41]
Benjamin Marie, Atsushi Fujita, and Raphael Rubino. 2021. Scientific credibility of machine translation research: A meta-evaluation of 769 papers. InProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 7297–7306
work page 2021
-
[42]
Raphaël Merx, Aso Mahmudi, Katrina Langford, Leo Alberto de Araujo, and Ekaterina Vylomova. 2024. Low-resource machine translation through retrieval- augmented LLM prompting: A study on the Mambai language. InProceedings of the 2nd Workshop on Resources and Technologies for Indigenous, Endangered and Lesser-resourced Languages in Eurasia (EURALI)@ LREC-CO...
work page 2024
-
[43]
Yasmin Moslem, Rejwanul Haque, John Kelleher, and Andy Way. 2023. Adaptive machine translation with large language models. InProceedings of the 24th Annual Conference of the European Association for Machine Translation. 227–237
work page 2023
-
[44]
MQM Council. [n. d.]. MQM (Multidimensional Quality Metrics). https://themqm. org/. Accessed: 2026-03-20
work page 2026
-
[45]
Daniel Ortiz-Martinez, Luis A Leiva, Vicent Alabau, Ismael Garcia-Varea, and Francisco Casacuberta. 2011. An interactive machine translation system with online learning. InProceedings of the ACL-HLT 2011 System Demonstrations. 68– 73
work page 2011
-
[46]
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. InProceedings of the 40th annual meeting of the Association for Computational Linguistics. 311–318
work page 2002
-
[47]
Joon Sung Park, Joseph O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. 2023. Generative agents: Interactive simulacra of human behavior. InProceedings of the 36th annual acm symposium on user interface software and technology. 1–22
work page 2023
-
[48]
Alvaro Peris and Francisco Casacuberta. 2019. Online learning for effort reduction in interactive neural machine translation.Computer Speech & Language58 (2019), 98–126
work page 2019
-
[49]
Krishna Pillutla, Swabha Swayamdipta, Rowan Zellers, John Thickstun, Sean Welleck, Yejin Choi, and Zaid Harchaoui. 2021. Mauve: Measuring the gap between neural text and human text using divergence frontiers.Advances in Neural Information Processing Systems34 (2021), 4816–4828
work page 2021
-
[50]
Mark A Przybocki, Gregory A Sanders, and Audrey N Le. 2006. Edit Distance: A Metric for Machine Translation Evaluation.. InLREC. 2038–2043
work page 2006
-
[51]
Vikas Raunak, Arul Menezes, and Hany Awadalla. 2023. Dissecting in-context learning of translations in GPT-3. InFindings of the Association for Computational Linguistics: EMNLP 2023. 866–872
work page 2023
-
[52]
Vikas Raunak, Amr Sharaf, Yiren Wang, Hany Awadalla, and Arul Menezes
-
[53]
InFindings of the Association for Computational Linguistics: EMNLP 2023
Leveraging GPT-4 for automatic translation post-editing. InFindings of the Association for Computational Linguistics: EMNLP 2023. 12009–12024
work page 2023
-
[54]
Ricardo Rei, Craig Stewart, Ana C Farinha, and Alon Lavie. 2020. COMET: A neural framework for MT evaluation. InProceedings of the 2020 conference on empirical methods in natural language processing (emnlp). 2685–2702
work page 2020
-
[55]
Ehud Reiter. 2018. A structured review of the validity of BLEU.Computational Linguistics44, 3 (2018), 393–401
work page 2018
-
[56]
RWS. [n. d.]. The LISA QA Model. https://docs.rws.com/en-US/sdl-multitrans- 785465/the-lisa-qa-model-788069. Accessed: 2026-03-26
work page 2026
-
[57]
Klaus R Scherer. 2014. On the nature and function of emotion: A component process approach. InApproaches to emotion. Psychology Press, 293–317
work page 2014
-
[58]
Jörg Schütz. 1999. Deploying the SAE J2450 translation quality metric in language technology evaluation projects. InProceedings of Translating and the Computer 21
work page 1999
-
[59]
Matthew Snover, Bonnie Dorr, Richard Schwartz, Linnea Micciulla, and John Makhoul. 2006. A study of translation edit rate with targeted human annotation. InProceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers. 223–231
work page 2006
-
[60]
Matthew G Snover, Nitin Madnani, Bonnie Dorr, and Richard Schwartz. 2009. Ter-plus: paraphrase, semantic, and alignment enhancements to translation edit rate.Machine Translation23, 2 (2009), 117–127
work page 2009
-
[61]
Yixiao Song, Parker Riley, Daniel Deutsch, and Markus Freitag. 2025. Enhancing Human Evaluation in Machine Translation with Comparative Judgement. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 20536–20551
work page 2025
-
[62]
Maria Stasimioti and Vilelmini Sosoni. 2020. Translation vs post-editing of NMT output: Insights from the English-Greek language pair. InProceedings of 1st Workshop on Post-Editing in Modern-Day Translation. 109–124
work page 2020
-
[63]
Marco Turchi, Matteo Negri, and Marcello Federico. 2013. Coping with the subjectivity of human judgements in MT quality estimation. InProceedings of the Eighth Workshop on Statistical Machine Translation. 240–251
work page 2013
-
[64]
David Vilar, Markus Freitag, Colin Cherry, Jiaming Luo, Viresh Ratnakar, and George Foster. 2023. Prompting palm for translation: Assessing strategies and performance. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 15406–15427
work page 2023
-
[65]
Robert A Wagner and Michael J Fischer. 1974. The string-to-string correction problem.Journal of the ACM (JACM)21, 1 (1974), 168–173
work page 1974
-
[66]
Dongqi Wang, Haoran Wei, Zhirui Zhang, Shujian Huang, Jun Xie, and Jiajun Chen. 2022. Non-parametric online learning from human feedback for neural ma- chine translation. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 11431–11439
work page 2022
- [67]
-
[68]
Jiayi Wang, Ke Wang, Fengming Zhou, Chengyu Wang, Zhiyong Fu, Zeyu Feng, Yu Zhao, and Yuqi Zhang. 2024. Synslator: An interactive machine translation tool with online learning. InCompanion Proceedings of the ACM Web Conference
work page 2024
-
[69]
Xi Wang, Shiyang Zhang, Fanfei Meng, and Lan Li. 2025. The Hidden Pitfalls of E-Dictionaries: How Inaccuracies Affect Chinese Language Users.Lexicography 12, 2 (2025), 107–130
work page 2025
-
[70]
Minghao Wu, Jiahao Xu, and Longyue Wang. 2024. Transagents: Build your translation company with language agents. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 131–141
work page 2024
-
[71]
Wenda Xu, Guanglei Zhu, Xuandong Zhao, Liangming Pan, Lei Li, and William Wang. 2024. Pride and prejudice: LLM amplifies self-bias in self-refinement. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 15474–15492
work page 2024
-
[72]
Daichi Yamaguchi, Rei Miyata, Atsushi Fujita, Tomoyuki Kajiwara, and Satoshi Sato. 2024. Automatic Decomposition of Text Editing Examples into Primitive Edit Operations: Toward Analytic Evaluation of Editing Systems. InProceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)....
work page 2024
- [73]
- [74]
- [75]
-
[76]
Xuan Zhang, Navid Rajabi, Kevin Duh, and Philipp Koehn. 2023. Machine trans- lation with large language models: Prompting, few-shot learning, and fine-tuning with QLoRA. InProceedings of the Eighth Conference on Machine Translation. 468–481
work page 2023
-
[77]
Wei Zhao, Goran Glavaš, Maxime Peyrard, Yang Gao, Robert West, and Steffen Eger. 2020. On the limitations of cross-lingual encoders as exposed by reference- free machine translation evaluation. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 1656–1671
work page 2020
-
[78]
Wanjun Zhong, Lianghong Guo, Qiqi Gao, He Ye, and Yanlin Wang. 2024. Memo- rybank: Enhancing large language models with long-term memory. InProceedings of the AAAI conference on artificial intelligence, Vol. 38. 19724–19731
work page 2024
-
[79]
Wenhao Zhu, Hongyi Liu, Qingxiu Dong, Jingjing Xu, Shujian Huang, Lingpeng Kong, Jiajun Chen, and Lei Li. 2024. Multilingual machine translation with large language models: Empirical results and analysis. InFindings of the association for computational linguistics: NAACL 2024. 2765–2781. Received 20 February 2007; revised 12 March 2009; accepted 5 June 2009
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.