Smarter edits? Post-editing with error highlights and translation suggestions

Alina Karakanta; Andrea Camasta; Dora \v{Z}ug\v{c}i\'c; Fleur V.J. van Tellingen; Gautam Ranka; Joyce van der Wal; Livio Guerra

arxiv: 2605.21135 · v1 · pith:QH5Q2UDHnew · submitted 2026-05-20 · 💻 cs.CL

Smarter edits? Post-editing with error highlights and translation suggestions

Fleur V.J. van Tellingen , Gautam Ranka , Dora \v{Z}ug\v{c}i\'c , Joyce van der Wal , Andrea Camasta , Livio Guerra , Alina Karakanta This is my paper

Pith reviewed 2026-05-21 04:46 UTC · model grok-4.3

classification 💻 cs.CL

keywords post-editingmachine translationerror highlightsautomatic post-editingquality estimationuser experienceprofessional translatorsLLM

0 comments

The pith

Professional translators saw no productivity or quality gains from LLM error highlights or correction suggestions in post-editing, though they preferred automatic post-editing highlights and liked the suggestions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether adding error highlights generated by large language models and correction suggestions from automatic post-editing can make machine translation post-editing faster, better, or more pleasant for professional translators. A controlled study had English-to-Dutch translators work under four conditions: plain post-editing, post-editing with quality-estimation highlights, post-editing with automatic post-editing highlights, and post-editing with both highlights and correction suggestions. Productivity and final translation quality stayed the same across all conditions. Translators rated the automatic post-editing highlights higher than quality-estimation ones and reported better overall experience when correction suggestions were available. These results matter because they show which interface features actually register with users even when objective metrics do not move.

Core claim

In a study with professional En-Nl translators, post-editing with APE error highlights and correction suggestions showed no productivity or quality gains compared to regular post-editing or QE-derived highlights, yet APE highlights were better received than QE highlights and correction suggestions improved user experience.

What carries the argument

A four-condition user study that measures productivity (time and edits), final quality, and subjective user-experience ratings while varying the source of error highlights and the presence of correction suggestions.

If this is right

Automatic post-editing highlights can be more acceptable to translators than quality-estimation highlights even when neither improves speed or quality.
Correction suggestions can raise subjective satisfaction with the post-editing interface without raising objective productivity.
Standard post-editing without extra highlights remains competitive on both speed and output quality.
User-experience measures should be tracked separately from productivity when evaluating new post-editing features.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Tool designers might try combining highlight sources or making suggestions more interactive to turn the observed experience gains into actual speed improvements.
The preference for APE highlights could stem from how closely they match the kinds of errors translators naturally notice.
Results might shift if the study moved to language pairs with very different error profiles or to translators with less experience.
Future experiments could test whether the same features affect revision behavior when translators work on longer documents or under time pressure.

Load-bearing premise

The particular LLM-derived highlights and APE suggestions tested here would behave the same way in other real professional workflows and that results from these En-Nl translators would hold for different language pairs or translator groups.

What would settle it

A replication study using different language pairs or different underlying models that finds measurable increases in words per minute or quality scores when the same highlights and suggestions are provided would disprove the no-gain result.

Figures

Figures reproduced from arXiv: 2605.21135 by Alina Karakanta, Andrea Camasta, Dora \v{Z}ug\v{c}i\'c, Fleur V.J. van Tellingen, Gautam Ranka, Joyce van der Wal, Livio Guerra.

**Figure 2.** Figure 2: SmartPE: Post editing with error highlights and correction suggestions (S-APE). To test the ability of translators to identify critical errors, two critical errors were manually inserted in each text (negation, serious mistranslation, serious omission) before annotating the errors. Out of the 16 total inserted critical errors, only 11 were annotated by xCOMET and 10 by xTower. However, since we wanted … view at source ↗

**Figure 3.** Figure 3: shows the productivity per individual translation (PET) and as a group mean. The group mean is nearly flat across conditions, showing no productivity gains compared to regular PE. The results shown in the figures were confirmed statistically using one-way repeated measures ANOVA on log-transformed PET-level means (see Appendix 8), which revealed no significant effect of condition on productivity. We obs… view at source ↗

**Figure 4.** Figure 4: Final translation quality in terms of Direct Assessment scores per post-editor (PET) and group mean. Perceived effect on quality When asked if the error highlights helped improve the quality of the translation, half of the translators (4) thought that the quality did improve, while the rest stated that the highlights made no difference. For suggestions, almost all translators (7) found that the correcti… view at source ↗

**Figure 5.** Figure 5: Differences in metrics between news (left) and biomedical (right) domains. and 4.2, as productivity and quality did not show any statistically significant differences across domains (productivity 1.65 chr/s vs 1.71 chr/s and DA scores 83.8 vs 87.2 for news and biomedical respectively). Despite this, news and biomedical texts showed differences across several dimensions of the post-editing process, with… view at source ↗

**Figure 7.** Figure 7: Post-editing interface showing error annotations with suggestions. • Make sure you have a space where you can work without distractions. • Make sure to familiarise yourself with the interface before you start. • Join the Teams meeting. We will ask you to share your screen (only the interface window) and the meeting will be recorded. Workflow • Open the interface by double-clicking on the ‘main’ file in th… view at source ↗

**Figure 6.** Figure 6: Example of the post-editing interface showing error annotations with minor errors highlighted in yellow and major errors in orange. 3. Post-editing with error annotations and suggestions Hovering the mouse over highlighted text will show a translation suggestion in a black box above the highlight. To adopt the suggestion, click on the black box. This will substitute the highlighted text with the translati… view at source ↗

read the original abstract

As MT quality increases, interest in enhanced post-editing features such as QE-derived error highlights is growing, yet evidence for their usefulness remains limited. In this work, we explore the usefulness of LLM-derived error highlights and correction suggestions based on automatic post-editing (APE). We conduct a study where professional translators (En-Nl) post-edit translations using APE error highlights and correction suggestions and compare productivity, quality and user experience to regular PE and PE with QE-derived highlights. While no condition yielded productivity or quality gains compared to regular PE, APE highlights were better received than QE-derived highlights, and correction suggestions improved overall user experience.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This study finds no productivity or quality gains from LLM-based post-editing aids but reports better user experience with APE highlights and suggestions over QE ones.

read the letter

This paper's main takeaway is that adding LLM-derived error highlights and correction suggestions based on automatic post-editing did not improve productivity or quality over regular post-editing for professional translators. However, the translators preferred the APE highlights to QE-derived ones, and the suggestions made the overall experience better. The new part is the head-to-head test of these specific assistance types in an actual En-Nl professional workflow. The paper does well by running a user study with clear measures and by reporting the null results on the important productivity and quality metrics without overclaiming. That kind of honest reporting helps the field avoid chasing features that don't deliver on speed or accuracy. The soft spots are the missing details on sample size, statistical tests, and how the highlights and suggestions were actually generated. Without those, the UX preference could be specific to this LLM setup or this group of translators rather than a general pattern. The concern about generalization is reasonable here because the abstract gives no numbers on highlight accuracy or acceptance rates. This work is for people who build or study translation post-editing interfaces. Readers interested in what translators actually respond to in practice will find it worthwhile. It deserves a serious referee since it brings empirical evidence from a controlled study even if the core productivity claim is negative. I recommend sending it for peer review. The study adds to the evidence on these tools in a straightforward way.

Referee Report

3 major / 1 minor

Summary. This paper reports results from a controlled user study with professional translators performing English-to-Dutch post-editing. It compares four conditions: standard post-editing, post-editing with quality-estimation-derived error highlights, post-editing with automatic-post-editing-derived error highlights, and post-editing with automatic-post-editing-derived correction suggestions. Productivity, final translation quality, and user-experience measures are reported. The central findings are that none of the enhanced conditions produced productivity or quality gains relative to standard post-editing, yet APE-derived highlights were rated more favorably than QE-derived highlights and the addition of correction suggestions improved overall user experience.

Significance. If the empirical results hold under broader conditions, the work supplies useful negative evidence on productivity and quality gains from current LLM-based post-editing aids while documenting positive effects on translator satisfaction. Such findings are relevant for MT tool design and for HCI research on translation workflows, indicating that user-experience considerations may matter more for adoption than raw efficiency metrics. The head-to-head comparison of APE versus QE signals is timely given the rapid integration of LLMs into translation pipelines.

major comments (3)

[Methods] Methods section: The generation procedures for APE error highlights and correction suggestions are described at a high level but without any quantitative assessment of their intrinsic quality (e.g., highlight precision/recall against human error annotations or suggestion acceptance rates during the study). This omission makes it difficult to attribute the reported UX preference for APE over QE to the underlying signal type rather than to incidental differences in the quality of the particular LLM outputs used.
[Results] Results section: The null findings on productivity and quality are presented without accompanying effect sizes, confidence intervals, or power analysis. Given that user studies with professional translators often involve modest sample sizes, the absence of these statistics leaves open the possibility that meaningful differences were simply undetected.
[Discussion] Discussion section: The claim that APE highlights are better received than QE-derived highlights is framed as a general advantage, yet the study is restricted to a single language pair (En-Nl) and a specific set of LLM prompts and models. The paper should explicitly discuss the risk that the observed preference is implementation- or domain-specific and outline concrete steps (additional language pairs, alternative models, or ablation of prompt components) that would be needed to test broader applicability.

minor comments (1)

[Results] Table 2 or the corresponding results table: Ensure that all condition labels are fully spelled out in the caption so that readers can map them directly to the four experimental arms without cross-referencing the text.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. The comments highlight important aspects of methodological transparency, statistical reporting, and generalizability that we will address in the revision. We respond to each major comment below.

read point-by-point responses

Referee: [Methods] Methods section: The generation procedures for APE error highlights and correction suggestions are described at a high level but without any quantitative assessment of their intrinsic quality (e.g., highlight precision/recall against human error annotations or suggestion acceptance rates during the study). This omission makes it difficult to attribute the reported UX preference for APE over QE to the underlying signal type rather than to incidental differences in the quality of the particular LLM outputs used.

Authors: We agree that quantitative assessment of the generated highlights and suggestions would strengthen attribution of the UX differences. Our primary focus was the user study outcomes rather than intrinsic system evaluation, and we did not obtain separate human error annotations for precision/recall. However, we did log suggestion acceptance rates during the sessions. In the revised manuscript we will report these acceptance rates and add a brief discussion of how they relate to the observed UX preference. We will also clarify the generation procedures with additional implementation details. revision: partial
Referee: [Results] Results section: The null findings on productivity and quality are presented without accompanying effect sizes, confidence intervals, or power analysis. Given that user studies with professional translators often involve modest sample sizes, the absence of these statistics leaves open the possibility that meaningful differences were simply undetected.

Authors: We accept this point. In the revised results section we will report effect sizes (Cohen’s d) and 95% confidence intervals for all key comparisons. A prospective power analysis was not performed because the study was exploratory and constrained by the limited availability of professional translators; we will add a post-hoc discussion of achieved power and the implications for detecting small-to-medium effects given our sample size. revision: yes
Referee: [Discussion] Discussion section: The claim that APE highlights are better received than QE-derived highlights is framed as a general advantage, yet the study is restricted to a single language pair (En-Nl) and a specific set of LLM prompts and models. The paper should explicitly discuss the risk that the observed preference is implementation- or domain-specific and outline concrete steps (additional language pairs, alternative models, or ablation of prompt components) that would be needed to test broader applicability.

Authors: We agree that the current framing risks over-generalization. In the revised discussion we will explicitly state the limitations of the single En-Nl pair, the chosen models, and prompt design. We will also add a dedicated paragraph outlining concrete next steps: replication with at least two additional language pairs, comparison with alternative LLMs, and systematic prompt ablations to isolate which components drive the preference. revision: yes

Circularity Check

0 steps flagged

No significant circularity: empirical user study with direct measurements

full rationale

The paper reports an empirical user study comparing post-editing conditions (regular PE, QE highlights, APE highlights plus suggestions) on productivity, quality, and UX metrics collected from professional En-Nl translators. No derivation chain, equations, fitted parameters renamed as predictions, or first-principles results exist that could reduce to inputs by construction. Claims rest on observed experimental outcomes rather than self-definitional loops or load-bearing self-citations. The work is self-contained against its own study data and does not invoke uniqueness theorems or ansatzes from prior author work to force conclusions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on standard assumptions of user-study methodology rather than new mathematical derivations. No free parameters, invented entities, or non-standard axioms are introduced in the abstract.

axioms (1)

domain assumption Professional translators' self-reported experience and measured productivity accurately reflect real-world post-editing performance.
The study design assumes that the recruited participants and task conditions generalize to professional practice.

pith-pipeline@v0.9.0 · 5660 in / 1309 out tokens · 30692 ms · 2026-05-21T04:46:06.433520+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

While no condition yielded productivity or quality gains compared to regular PE, APE highlights were better received than QE-derived highlights, and correction suggestions improved overall user experience.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We compare productivity, measured as the average number of source characters processed over the text-level edit time

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

82 extracted references · 82 canonical work pages

[1]

Keystroke Logging in Writing Research: Using Inputlog to Analyze Writing Processes , journal =

Leijten, Mariëlle and Van Waes, Luuk , year =. Keystroke Logging in Writing Research: Using Inputlog to Analyze Writing Processes , journal =

work page
[2]

2023 , eprint=

xCOMET: Transparent Machine Translation Evaluation through Fine-grained Error Detection , author=. 2023 , eprint=

work page 2023
[3]

In: Webber, B., Cohn, T., He, Y., Liu, Y

Rei, Ricardo and Stewart, Craig and Farinha, Ana C and Lavie, Alon. COMET : A Neural Framework for MT Evaluation. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020. doi:10.18653/v1/2020.emnlp-main.213

work page doi:10.18653/v1/2020.emnlp-main.213 2020
[5]

Large Language Models Are State-of-the-Art Evaluators of Translation Quality

Kocmi, Tom and Federmann, Christian. Large Language Models Are State-of-the-Art Evaluators of Translation Quality. Proceedings of the 24th Annual Conference of the European Association for Machine Translation. 2023

work page 2023
[6]

In: Koehn, P., Haddow, B., Kocmi, T., Monz, C

Kocmi, Tom and Federmann, Christian. GEMBA - MQM : Detecting Translation Quality Error Spans with GPT -4. Proceedings of the Eighth Conference on Machine Translation. 2023. doi:10.18653/v1/2023.wmt-1.64

work page doi:10.18653/v1/2023.wmt-1.64 2023
[7]

Error Span Annotation: A Balanced Approach for Human Evaluation of Machine Translation

Kocmi, Tom and Zouhar, Vil \'e m and Avramidis, Eleftherios and Grundkiewicz, Roman and Karpinska, Marzena and Popovi \'c , Maja and Sachan, Mrinmaya and Shmatova, Mariya. Error Span Annotation: A Balanced Approach for Human Evaluation of Machine Translation. Proceedings of the Ninth Conference on Machine Translation. 2024. doi:10.18653/v1/2024.wmt-1.131

work page doi:10.18653/v1/2024.wmt-1.131 2024
[8]

2025 , eprint=

QE4PE: Word-level Quality Estimation for Human Post-Editing , author=. 2025 , eprint=

work page 2025
[9]

Error Analysis Prompting Enables Human-Like Translation Evaluation in Large Language Models

Lu, Qingyu and Qiu, Baopu and Ding, Liang and Zhang, Kanjian and Kocmi, Tom and Tao, Dacheng. Error Analysis Prompting Enables Human-Like Translation Evaluation in Large Language Models. Findings of the Association for Computational Linguistics: ACL 2024. 2024. doi:10.18653/v1/2024.findings-acl.520

work page doi:10.18653/v1/2024.findings-acl.520 2024
[10]

The Devil Is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation

Fernandes, Patrick and Deutsch, Daniel and Finkelstein, Mara and Riley, Parker and Martins, Andr \'e and Neubig, Graham and Garg, Ankush and Clark, Jonathan and Freitag, Markus and Firat, Orhan. The Devil Is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation. Proceedings of the Eighth Conference on Machine Tran...

work page doi:10.18653/v1/2023.wmt-1.100 2023
[11]

and Rei, Ricardo and Stigt, Daan van and Coheur, Luisa and Colombo, Pierre and Martins, André F

Guerreiro, Nuno M. and Rei, Ricardo and Stigt, Daan van and Coheur, Luisa and Colombo, Pierre and Martins, André F. T. , title = ". Transactions of the Association for Computational Linguistics , volume =. 2024 , month =. doi:10.1162/tacl_a_00683 , url =

work page doi:10.1162/tacl_a_00683 2024
[12]

Multidimensional quality metrics (MQM): A framework for declaring and describing translation quality metrics , journal =

Arle Lommel and Hans Uszkoreit and Aljoscha Burchardt , year =. Multidimensional quality metrics (MQM): A framework for declaring and describing translation quality metrics , journal =

work page
[13]

Kepler, Fabio and Tr \'e nous, Jonay and Treviso, Marcos and Vera, Miguel and Martins, Andr \'e F. T. O pen K iwi: An Open Source Framework for Quality Estimation. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 2019. doi:10.18653/v1/P19-3020

work page doi:10.18653/v1/p19-3020 2019
[14]

Advances in Neural Information Processing Systems , year =

Lianmin Zheng and Wei-Lin Chiang and Ying Sheng and Siyuan Zhuang and Zhanghao Wu and Yonghao Zhuang and Zi Lin and Zhuohan Li and Dacheng Li and Eric Xing and Hao Zhang and Joseph E Gonzalez and Ion Stoica , title =. Advances in Neural Information Processing Systems , year =

work page
[15]

Findings of the WMT 2023 Shared Task on Automatic Post-Editing

Bhattacharyya, Pushpak and Chatterjee, Rajen and Freitag, Markus and Kanojia, Diptesh and Negri, Matteo and Turchi, Marco. Findings of the WMT 2023 Shared Task on Automatic Post-Editing. Proceedings of the Eighth Conference on Machine Translation. 2023. doi:10.18653/v1/2023.wmt-1.55

work page doi:10.18653/v1/2023.wmt-1.55 2023
[16]

Machine Translation Meets Large Language Models: Evaluating C hat GPT ' s Ability to Automatically Post-Edit Literary Texts

Macken, Lieve. Machine Translation Meets Large Language Models: Evaluating C hat GPT ' s Ability to Automatically Post-Edit Literary Texts. Proceedings of the 1st Workshop on Creative-text Translation and Technology. 2024

work page 2024
[17]

Quality Estimation-Assisted Automatic Post-Editing

Deoghare, Sourabh and Kanojia, Diptesh and Blain, Fred and Ranasinghe, Tharindu and Bhattacharyya, Pushpak. Quality Estimation-Assisted Automatic Post-Editing. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023. doi:10.18653/v1/2023.findings-emnlp.115

work page doi:10.18653/v1/2023.findings-emnlp.115 2023
[18]

Combining Quality Estimation and Automatic Post-editing to Enhance Machine Translation output

Chatterjee, Rajen and Negri, Matteo and Turchi, Marco and Blain, Fr \'e d \'e ric and Specia, Lucia. Combining Quality Estimation and Automatic Post-editing to Enhance Machine Translation output. Proceedings of the 13th Conference of the Association for Machine Translation in the A mericas (Volume 1: Research Track). 2018

work page 2018
[19]

Leveraging GPT -4 for Automatic Translation Post-Editing

Raunak, Vikas and Sharaf, Amr and Wang, Yiren and Awadalla, Hany and Menezes, Arul. Leveraging GPT -4 for Automatic Translation Post-Editing. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023. doi:10.18653/v1/2023.findings-emnlp.804

work page doi:10.18653/v1/2023.findings-emnlp.804 2023
[20]

doi:10.3115/1073083.1073135 , editor =

Papineni, Kishore and Roukos, Salim and Ward, Todd and Zhu, Wei-Jing. B leu: a Method for Automatic Evaluation of Machine Translation. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 2002. doi:10.3115/1073083.1073135

work page doi:10.3115/1073083.1073135 2002
[21]

In: Bojar, O., Chatterjee, R., Federmann, C., Haddow, B., Hokamp, C., Huck, M., Logacheva, V., Pecina, P

Popovi \'c , Maja. chr F : character n-gram F -score for automatic MT evaluation. Proceedings of the Tenth Workshop on Statistical Machine Translation. 2015. doi:10.18653/v1/W15-3049

work page doi:10.18653/v1/w15-3049 2015
[22]

Deploying MT Quality Estimation on a large scale: Lessons learned and open questions

Tamchyna, Ale s. Deploying MT Quality Estimation on a large scale: Lessons learned and open questions. Proceedings of Machine Translation Summit XVIII: Users and Providers Track. 2021

work page 2021
[23]

Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems , pages =

Coppers, Sven and Van den Bergh, Jan and Luyten, Kris and Coninx, Karin and van der Lek-Ciudin, Iulianna and Vanallemeersch, Tom and Vandeghinste, Vincent , title =. Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems , pages =. 2018 , isbn =. doi:10.1145/3173574.3174098 , abstract =

work page doi:10.1145/3173574.3174098 2018
[24]

MMPE : A M ulti- M odal I nterface for P ost- E diting M achine T ranslation

Herbig, Nico and D. MMPE : A M ulti- M odal I nterface for P ost- E diting M achine T ranslation. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.155

work page doi:10.18653/v1/2020.acl-main.155 2020
[25]

MT Quality Estimation for Computer-assisted Translation: Does it Really Help?

Turchi, Marco and Negri, Matteo and Federico, Marcello. MT Quality Estimation for Computer-assisted Translation: Does it Really Help?. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 2015. doi:10.3115/v1/P15-2087

work page doi:10.3115/v1/p15-2087 2015
[26]

Informatics , VOLUME =

Béchara, Hannah and Orăsan, Constantin and Parra Escartín, Carla and Zampieri, Marcos and Lowe, William , TITLE =. Informatics , VOLUME =. 2021 , NUMBER =

work page 2021
[27]

The Prague Bulletin of Mathematical Linguistics , year=

Questing for quality estimation a user study , author=. The Prague Bulletin of Mathematical Linguistics , year=

work page
[28]

Introducing Quality Estimation to Machine Translation Post-editing Workflow: An Empirical Study on Its Usefulness

Liu, Siqi and Dai, Guangrong and Li, Dechao. Introducing Quality Estimation to Machine Translation Post-editing Workflow: An Empirical Study on Its Usefulness. Proceedings of Machine Translation Summit XX: Volume 1. 2025

work page 2025
[29]

The Impact of MT Quality Estimation on Post-Editing Effort

Teixeira, Carlos and O ' Brien, Sharon. The Impact of MT Quality Estimation on Post-Editing Effort. Proceedings of Machine Translation Summit XVI: Commercial MT Users and Translators Track. 2017

work page 2017
[30]

Investigating the Helpfulness of Word-Level Quality Estimation for Post-Editing Machine Translation Output

Shenoy, Raksha and Herbig, Nico and Kr. Investigating the Helpfulness of Word-Level Quality Estimation for Post-Editing Machine Translation Output. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021. doi:10.18653/v1/2021.emnlp-main.799

work page doi:10.18653/v1/2021.emnlp-main.799 2021
[31]

Word-Level Quality Estimation for Korean-English Neural Machine Translation , year=

Eo, Sugyeong and Park, Chanjun and Moon, Hyeonseok and Seo, Jaehyung and Lim, Heuiseok , journal=. Word-Level Quality Estimation for Korean-English Neural Machine Translation , year=

work page
[32]

Natural Language Engineering , volume=

Can machine translation systems be evaluated by the crowd alone , author=. Natural Language Engineering , volume=. 2017 , publisher=

work page 2017
[33]

Translating Step-by-Step: Decomposing the Translation Process for Improved Translation Quality of Long-Form Texts

Briakou, Eleftheria and Luo, Jiaming and Cherry, Colin and Freitag, Markus. Translating Step-by-Step: Decomposing the Translation Process for Improved Translation Quality of Long-Form Texts. Proceedings of the Ninth Conference on Machine Translation. 2024. doi:10.18653/v1/2024.wmt-1.123

work page doi:10.18653/v1/2024.wmt-1.123 2024
[34]

Are AI agents the new machine translation frontier? Challenges and opportunities of single- and multi-agent systems for multilingual digital communication

Briva-Iglesias, Vicent. Are AI agents the new machine translation frontier? Challenges and opportunities of single- and multi-agent systems for multilingual digital communication. Proceedings of Machine Translation Summit XX: Volume 1. 2025

work page 2025
[35]

(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts

Wu, Minghao and Xu, Jiahao and Yuan, Yulin and Haffari, Gholamreza and Wan, Longyue and Luo, Weihua and Zhang, Kaifu. (Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts. Transactions of the Association for Computational Linguistics. 2025. doi:10.1162/tacl.a.25

work page doi:10.1162/tacl.a.25 2025
[36]

Giving the Old a Fresh Spin: Quality Estimation-Assisted Constrained Decoding for Automatic Post-Editing

Deoghare, Sourabh and Kanojia, Diptesh and Bhattacharyya, Pushpak. Giving the Old a Fresh Spin: Quality Estimation-Assisted Constrained Decoding for Automatic Post-Editing. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers). 2025. ...

work page doi:10.18653/v1/2025.naacl-short.77 2025
[37]

Machine Translation , volume=

A user study of neural interactive translation prediction , author=. Machine Translation , volume=. 2019 , publisher=

work page 2019
[38]

New directions in empirical translation process research: exploring the CRITT TPR-DB , pages=

Learning advanced post-editing , author=. New directions in empirical translation process research: exploring the CRITT TPR-DB , pages=. 2016 , publisher=

work page 2016
[39]

Human-centered, augmented machine translation: analysing user experience, quality and productivity in interactive post-editing vs traditional post-editing , author=. Tradum

work page
[40]

Translation, Cognition & Behavior , volume=

The impact of traditional and interactive post-editing on machine translation user experience, quality, and productivity , author=. Translation, Cognition & Behavior , volume=. 2023 , publisher=

work page 2023
[41]

2017 , school=

Productivity in post-editing and in neural interactive translation prediction: A study of English-to-Spanish professional translators , author=. 2017 , school=

work page 2017
[42]

Translation studies , volume=

Translators and translation technology: The dance of agency , author=. Translation studies , volume=. 2011 , publisher=

work page 2011
[43]

Perspectives , volume=

Human-centered augmented translation: Against antagonistic dualisms , author=. Perspectives , volume=. 2024 , publisher=

work page 2024
[44]

2024 , school=

Productivity in the post-editing of neural machine translation: A mixed-methods analysis of speed and edits at Toppan Digital Language , author=. 2024 , school=

work page 2024
[45]

Findings of the WMT 2024 Biomedical Translation Shared Task: Test Sets on Abstract Level

Neves, Mariana and Grozea, Cristian and Thomas, Philippe and Roller, Roland and Bawden, Rachel and N \'e v \'e ol, Aur \'e lie and Castle, Steffen and Bonato, Vanessa and Di Nunzio, Giorgio Maria and Vezzani, Federica and Vicente Navarro, Maika and Yeganova, Lana and Jimeno Yepes, Antonio. Findings of the WMT 2024 Biomedical Translation Shared Task: Test ...

work page doi:10.18653/v1/2024.wmt-1.6 2024
[46]

Alabau, Vicent, Michael Carl, Francisco Casacuberta, Mercedes Garc \' a Mart \' nez, Jes \'u s Gonz \'a lez-Rubio, Bartolom \'e Mesa-Lao, Daniel Ortiz-Mart \' nez, Moritz Schaeffer, and Germ \'a n Sanchis-Trilles. 2016. Learning advanced post-editing. In New directions in empirical translation process research: exploring the CRITT TPR-DB , pages 95--110. Springer

work page 2016
[47]

Briakou, Eleftheria, Jiaming Luo, Colin Cherry, and Markus Freitag. 2024. Translating step-by-step: Decomposing the translation process for improved translation quality of long-form texts. In Haddow, Barry, Tom Kocmi, Philipp Koehn, and Christof Monz, editors, Proceedings of the Ninth Conference on Machine Translation , pages 1301--1317, Miami, Florida, U...

work page 2024
[48]

Briva-Iglesias, Vicent, Sharon O’Brien, and Benjamin R Cowan. 2023. The impact of traditional and interactive post-editing on machine translation user experience, quality, and productivity. Translation, Cognition & Behavior , 6(1):60--86

work page 2023
[49]

Briva-Iglesias, Vicent. 2025a. Are AI agents the new machine translation frontier? challenges and opportunities of single- and multi-agent systems for multilingual digital communication. In Bouillon, Pierrette, Johanna Gerlach, Sabrina Girletti, Lise Volkart, Raphael Rubino, Rico Sennrich, Ana C. Farinha, Marco Gaido, Joke Daems, Dorothy Kenny, Helena Mon...

work page
[50]

Briva-Iglesias, Vicent. 2025b. Human-centered, augmented machine translation: analysing user experience, quality and productivity in interactive post-editing vs traditional post-editing. Tradum \`a tica tecnologies de la traducci \'o , (23):350--382

work page
[51]

Béchara, Hannah, Constantin Orăsan, Carla Parra Escartín, Marcos Zampieri, and William Lowe. 2021. The role of machine translation quality estimation in the post-editing workflow. Informatics , 8(3)

work page 2021
[52]

Chatterjee, Rajen, Matteo Negri, Marco Turchi, Fr \'e d \'e ric Blain, and Lucia Specia. 2018. Combining quality estimation and automatic post-editing to enhance machine translation output. In Cherry, Colin and Graham Neubig, editors, Proceedings of the 13th Conference of the Association for Machine Translation in the A mericas (Volume 1: Research Track) ...

work page 2018
[53]

Coppers, Sven, Jan Van den Bergh, Kris Luyten, Karin Coninx, Iulianna van der Lek-Ciudin, Tom Vanallemeersch, and Vincent Vandeghinste. 2018. Intellingo: An intelligible translation environment. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems , CHI '18, page 1–13, New York, NY, USA. Association for Computing Machinery

work page 2018
[54]

Deoghare, Sourabh, Diptesh Kanojia, Fred Blain, Tharindu Ranasinghe, and Pushpak Bhattacharyya. 2023. Quality estimation-assisted automatic post-editing. In Bouamor, Houda, Juan Pino, and Kalika Bali, editors, Findings of the Association for Computational Linguistics: EMNLP 2023 , pages 1686--1698, Singapore, December. Association for Computational Linguistics

work page 2023
[55]

Deoghare, Sourabh, Diptesh Kanojia, and Pushpak Bhattacharyya. 2025. Giving the old a fresh spin: Quality estimation-assisted constrained decoding for automatic post-editing. In Chiruzzo, Luis, Alan Ritter, and Lu Wang, editors, Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Huma...

work page 2025
[56]

Escart \' n, Carla Parra, Hanna B \'e chara, and Constantin Or a san. 2017. Questing for quality estimation a user study. The Prague Bulletin of Mathematical Linguistics

work page 2017
[57]

Fernandes, Patrick, Daniel Deutsch, Mara Finkelstein, Parker Riley, Andr \'e Martins, Graham Neubig, Ankush Garg, Jonathan Clark, Markus Freitag, and Orhan Firat. 2023. The devil is in the errors: Leveraging large language models for fine-grained machine translation evaluation. In Proceedings of the Eighth Conference on Machine Translation , pages 1066--1...

work page 2023
[58]

Graham, Yvette, Timothy Baldwin, Alistair Moffat, and Justin Zobel. 2017. Can machine translation systems be evaluated by the crowd alone. Natural Language Engineering , 23(1):3--30

work page 2017
[59]

Guerreiro, Nuno M., Ricardo Rei, Daan van Stigt, Luisa Coheur, Pierre Colombo, and André F. T. Martins. 2023. xcomet: Transparent machine translation evaluation through fine-grained error detection

work page 2023
[60]

Guerreiro, Nuno M., Ricardo Rei, Daan van Stigt, Luisa Coheur, Pierre Colombo, and André F. T. Martins. 2024. xcomet: Transparent Machine Translation Evaluation through Fine-grained Error Detection . Transactions of the Association for Computational Linguistics , 12:979--995, 09

work page 2024
[61]

Kepler, Fabio, Jonay Tr \'e nous, Marcos Treviso, Miguel Vera, and Andr \'e F. T. Martins. 2019. O pen K iwi: An open source framework for quality estimation. In Costa-juss \`a , Marta R. and Enrique Alfonseca, editors, Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations , pages 117--122, Florence...

work page 2019
[62]

Knowles, Rebecca, Marina Sanchez-Torron, and Philipp Koehn. 2019. A user study of neural interactive translation prediction. Machine Translation , 33(1):135--154

work page 2019
[63]

Kocmi, Tom and Christian Federmann. 2023a. GEMBA - MQM : Detecting translation quality error spans with GPT -4. In Koehn, Philipp, Barry Haddow, Tom Kocmi, and Christof Monz, editors, Proceedings of the Eighth Conference on Machine Translation , pages 768--775, Singapore, December. Association for Computational Linguistics

work page
[64]

Kocmi, Tom and Christian Federmann. 2023b. Large language models are state-of-the-art evaluators of translation quality. In Proceedings of the 24th Annual Conference of the European Association for Machine Translation , pages 193--203, Tampere, Finland, June. European Association for Machine Translation

work page
[65]

Kocmi, Tom, Vil \'e m Zouhar, Eleftherios Avramidis, Roman Grundkiewicz, Marzena Karpinska, Maja Popovi \'c , Mrinmaya Sachan, and Mariya Shmatova. 2024. Error span annotation: A balanced approach for human evaluation of machine translation. In Haddow, Barry, Tom Kocmi, Philipp Koehn, and Christof Monz, editors, Proceedings of the Ninth Conference on Mach...

work page 2024
[66]

Liu, Siqi, Guangrong Dai, and Dechao Li. 2025. Introducing quality estimation to machine translation post-editing workflow: An empirical study on its usefulness. In Bouillon, Pierrette, Johanna Gerlach, Sabrina Girletti, Lise Volkart, Raphael Rubino, Rico Sennrich, Ana C. Farinha, Marco Gaido, Joke Daems, Dorothy Kenny, Helena Moniz, and Sara Szoc, editor...

work page 2025
[67]

Lommel, Arle, Hans Uszkoreit, and Aljoscha Burchardt. 2014. Multidimensional quality metrics (mqm): A framework for declaring and describing translation quality metrics. Revista Tradumàtica: tecnologies de la traducció

work page 2014
[68]

Lu, Qingyu, Baopu Qiu, Liang Ding, Kanjian Zhang, Tom Kocmi, and Dacheng Tao. 2024. Error analysis prompting enables human-like translation evaluation in large language models. In Findings of the Association for Computational Linguistics: ACL 2024 , pages 8801--8816, Bangkok, Thailand, August. Association for Computational Linguistics

work page 2024
[69]

Macken, Lieve. 2024. Machine translation meets large language models: Evaluating C hat GPT ' s ability to automatically post-edit literary texts. In Vanroy, Bram, Marie-Aude Lefer, Lieve Macken, and Paola Ruffo, editors, Proceedings of the 1st Workshop on Creative-text Translation and Technology , pages 65--81, Sheffield, United Kingdom, June. European As...

work page 2024
[70]

Neves, Mariana, Cristian Grozea, Philippe Thomas, Roland Roller, Rachel Bawden, Aur \'e lie N \'e v \'e ol, Steffen Castle, Vanessa Bonato, Giorgio Maria Di Nunzio, Federica Vezzani, Maika Vicente Navarro, Lana Yeganova, and Antonio Jimeno Yepes. 2024. Findings of the WMT 2024 biomedical translation shared task: Test sets on abstract level. In Haddow, Bar...

work page 2024
[71]

Olohan, Maeve. 2011. Translators and translation technology: The dance of agency. Translation studies , 4(3):342--357

work page 2011
[72]

O’Brien, Sharon. 2024. Human-centered augmented translation: Against antagonistic dualisms. Perspectives , 32(3):391--406

work page 2024
[73]

Papineni, Kishore, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. B leu: a method for automatic evaluation of machine translation. In Isabelle, Pierre, Eugene Charniak, and Dekang Lin, editors, Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics , pages 311--318, Philadelphia, Pennsylvania, USA, July. Association for ...

work page 2002
[74]

Popovi \'c , Maja. 2015. chr F : character n-gram F -score for automatic MT evaluation. In Bojar, Ond r ej, Rajan Chatterjee, Christian Federmann, Barry Haddow, Chris Hokamp, Matthias Huck, Varvara Logacheva, and Pavel Pecina, editors, Proceedings of the Tenth Workshop on Statistical Machine Translation , pages 392--395, Lisbon, Portugal, September. Assoc...

work page 2015
[75]

Raunak, Vikas, Amr Sharaf, Yiren Wang, Hany Awadalla, and Arul Menezes. 2023. Leveraging GPT -4 for automatic translation post-editing. In Bouamor, Houda, Juan Pino, and Kalika Bali, editors, Findings of the Association for Computational Linguistics: EMNLP 2023 , pages 12009--12024, Singapore, December. Association for Computational Linguistics

work page 2023
[76]

Sarti, Gabriele, Vilém Zouhar, Grzegorz Chrupała, Ana Guerberof-Arenas, Malvina Nissim, and Arianna Bisazza. 2025. Qe4pe: Word-level quality estimation for human post-editing

work page 2025
[77]

Shenoy, Raksha, Nico Herbig, Antonio Kr \"u ger, and Josef van Genabith. 2021. Investigating the helpfulness of word-level quality estimation for post-editing machine translation output. In Moens, Marie-Francine, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih, editors, Proceedings of the 2021 Conference on Empirical Methods in Natural Language Proces...

work page 2021
[78]

Teixeira, Carlos and Sharon O ' Brien. 2017. The impact of MT quality estimation on post-editing effort. In Yamada, Masaru and Mark Seligman, editors, Proceedings of Machine Translation Summit XVI: Commercial MT Users and Translators Track , pages 142--153, Nagoya Japan, September 18 – September 22

work page 2017
[79]

Terribile, Silvia. 2024. Productivity in the post-editing of neural machine translation: A mixed-methods analysis of speed and edits at Toppan Digital Language . Ph.D. thesis, The University of Manchester (United Kingdom)

work page 2024
[80]

Treviso, Marcos, Nuno M Guerreiro, Sweta Agrawal, Ricardo Rei, Jos \'e Pombal, Tania Vaz, Helena Wu, Beatriz Silva, Daan van Stigt, and Andr \'e FT Martins. 2024. xtower: A multilingual llm for explaining and correcting translation errors. arXiv preprint arXiv:2406.19482

work page arXiv 2024
[81]

Turchi, Marco, Matteo Negri, and Marcello Federico. 2015. MT quality estimation for computer-assisted translation: Does it really help? In Zong, Chengqing and Michael Strube, editors, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: ...

work page 2015

Showing first 80 references.

[1] [1]

Keystroke Logging in Writing Research: Using Inputlog to Analyze Writing Processes , journal =

Leijten, Mariëlle and Van Waes, Luuk , year =. Keystroke Logging in Writing Research: Using Inputlog to Analyze Writing Processes , journal =

work page

[2] [2]

2023 , eprint=

xCOMET: Transparent Machine Translation Evaluation through Fine-grained Error Detection , author=. 2023 , eprint=

work page 2023

[3] [3]

In: Webber, B., Cohn, T., He, Y., Liu, Y

Rei, Ricardo and Stewart, Craig and Farinha, Ana C and Lavie, Alon. COMET : A Neural Framework for MT Evaluation. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020. doi:10.18653/v1/2020.emnlp-main.213

work page doi:10.18653/v1/2020.emnlp-main.213 2020

[4] [5]

Large Language Models Are State-of-the-Art Evaluators of Translation Quality

Kocmi, Tom and Federmann, Christian. Large Language Models Are State-of-the-Art Evaluators of Translation Quality. Proceedings of the 24th Annual Conference of the European Association for Machine Translation. 2023

work page 2023

[5] [6]

In: Koehn, P., Haddow, B., Kocmi, T., Monz, C

Kocmi, Tom and Federmann, Christian. GEMBA - MQM : Detecting Translation Quality Error Spans with GPT -4. Proceedings of the Eighth Conference on Machine Translation. 2023. doi:10.18653/v1/2023.wmt-1.64

work page doi:10.18653/v1/2023.wmt-1.64 2023

[6] [7]

Error Span Annotation: A Balanced Approach for Human Evaluation of Machine Translation

Kocmi, Tom and Zouhar, Vil \'e m and Avramidis, Eleftherios and Grundkiewicz, Roman and Karpinska, Marzena and Popovi \'c , Maja and Sachan, Mrinmaya and Shmatova, Mariya. Error Span Annotation: A Balanced Approach for Human Evaluation of Machine Translation. Proceedings of the Ninth Conference on Machine Translation. 2024. doi:10.18653/v1/2024.wmt-1.131

work page doi:10.18653/v1/2024.wmt-1.131 2024

[7] [8]

2025 , eprint=

QE4PE: Word-level Quality Estimation for Human Post-Editing , author=. 2025 , eprint=

work page 2025

[8] [9]

Error Analysis Prompting Enables Human-Like Translation Evaluation in Large Language Models

Lu, Qingyu and Qiu, Baopu and Ding, Liang and Zhang, Kanjian and Kocmi, Tom and Tao, Dacheng. Error Analysis Prompting Enables Human-Like Translation Evaluation in Large Language Models. Findings of the Association for Computational Linguistics: ACL 2024. 2024. doi:10.18653/v1/2024.findings-acl.520

work page doi:10.18653/v1/2024.findings-acl.520 2024

[9] [10]

The Devil Is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation

Fernandes, Patrick and Deutsch, Daniel and Finkelstein, Mara and Riley, Parker and Martins, Andr \'e and Neubig, Graham and Garg, Ankush and Clark, Jonathan and Freitag, Markus and Firat, Orhan. The Devil Is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation. Proceedings of the Eighth Conference on Machine Tran...

work page doi:10.18653/v1/2023.wmt-1.100 2023

[10] [11]

and Rei, Ricardo and Stigt, Daan van and Coheur, Luisa and Colombo, Pierre and Martins, André F

Guerreiro, Nuno M. and Rei, Ricardo and Stigt, Daan van and Coheur, Luisa and Colombo, Pierre and Martins, André F. T. , title = ". Transactions of the Association for Computational Linguistics , volume =. 2024 , month =. doi:10.1162/tacl_a_00683 , url =

work page doi:10.1162/tacl_a_00683 2024

[11] [12]

Multidimensional quality metrics (MQM): A framework for declaring and describing translation quality metrics , journal =

Arle Lommel and Hans Uszkoreit and Aljoscha Burchardt , year =. Multidimensional quality metrics (MQM): A framework for declaring and describing translation quality metrics , journal =

work page

[12] [13]

Kepler, Fabio and Tr \'e nous, Jonay and Treviso, Marcos and Vera, Miguel and Martins, Andr \'e F. T. O pen K iwi: An Open Source Framework for Quality Estimation. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 2019. doi:10.18653/v1/P19-3020

work page doi:10.18653/v1/p19-3020 2019

[13] [14]

Advances in Neural Information Processing Systems , year =

Lianmin Zheng and Wei-Lin Chiang and Ying Sheng and Siyuan Zhuang and Zhanghao Wu and Yonghao Zhuang and Zi Lin and Zhuohan Li and Dacheng Li and Eric Xing and Hao Zhang and Joseph E Gonzalez and Ion Stoica , title =. Advances in Neural Information Processing Systems , year =

work page

[14] [15]

Findings of the WMT 2023 Shared Task on Automatic Post-Editing

Bhattacharyya, Pushpak and Chatterjee, Rajen and Freitag, Markus and Kanojia, Diptesh and Negri, Matteo and Turchi, Marco. Findings of the WMT 2023 Shared Task on Automatic Post-Editing. Proceedings of the Eighth Conference on Machine Translation. 2023. doi:10.18653/v1/2023.wmt-1.55

work page doi:10.18653/v1/2023.wmt-1.55 2023

[15] [16]

Machine Translation Meets Large Language Models: Evaluating C hat GPT ' s Ability to Automatically Post-Edit Literary Texts

Macken, Lieve. Machine Translation Meets Large Language Models: Evaluating C hat GPT ' s Ability to Automatically Post-Edit Literary Texts. Proceedings of the 1st Workshop on Creative-text Translation and Technology. 2024

work page 2024

[16] [17]

Quality Estimation-Assisted Automatic Post-Editing

Deoghare, Sourabh and Kanojia, Diptesh and Blain, Fred and Ranasinghe, Tharindu and Bhattacharyya, Pushpak. Quality Estimation-Assisted Automatic Post-Editing. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023. doi:10.18653/v1/2023.findings-emnlp.115

work page doi:10.18653/v1/2023.findings-emnlp.115 2023

[17] [18]

Combining Quality Estimation and Automatic Post-editing to Enhance Machine Translation output

Chatterjee, Rajen and Negri, Matteo and Turchi, Marco and Blain, Fr \'e d \'e ric and Specia, Lucia. Combining Quality Estimation and Automatic Post-editing to Enhance Machine Translation output. Proceedings of the 13th Conference of the Association for Machine Translation in the A mericas (Volume 1: Research Track). 2018

work page 2018

[18] [19]

Leveraging GPT -4 for Automatic Translation Post-Editing

Raunak, Vikas and Sharaf, Amr and Wang, Yiren and Awadalla, Hany and Menezes, Arul. Leveraging GPT -4 for Automatic Translation Post-Editing. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023. doi:10.18653/v1/2023.findings-emnlp.804

work page doi:10.18653/v1/2023.findings-emnlp.804 2023

[19] [20]

doi:10.3115/1073083.1073135 , editor =

Papineni, Kishore and Roukos, Salim and Ward, Todd and Zhu, Wei-Jing. B leu: a Method for Automatic Evaluation of Machine Translation. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 2002. doi:10.3115/1073083.1073135

work page doi:10.3115/1073083.1073135 2002

[20] [21]

In: Bojar, O., Chatterjee, R., Federmann, C., Haddow, B., Hokamp, C., Huck, M., Logacheva, V., Pecina, P

Popovi \'c , Maja. chr F : character n-gram F -score for automatic MT evaluation. Proceedings of the Tenth Workshop on Statistical Machine Translation. 2015. doi:10.18653/v1/W15-3049

work page doi:10.18653/v1/w15-3049 2015

[21] [22]

Deploying MT Quality Estimation on a large scale: Lessons learned and open questions

Tamchyna, Ale s. Deploying MT Quality Estimation on a large scale: Lessons learned and open questions. Proceedings of Machine Translation Summit XVIII: Users and Providers Track. 2021

work page 2021

[22] [23]

Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems , pages =

Coppers, Sven and Van den Bergh, Jan and Luyten, Kris and Coninx, Karin and van der Lek-Ciudin, Iulianna and Vanallemeersch, Tom and Vandeghinste, Vincent , title =. Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems , pages =. 2018 , isbn =. doi:10.1145/3173574.3174098 , abstract =

work page doi:10.1145/3173574.3174098 2018

[23] [24]

MMPE : A M ulti- M odal I nterface for P ost- E diting M achine T ranslation

Herbig, Nico and D. MMPE : A M ulti- M odal I nterface for P ost- E diting M achine T ranslation. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.155

work page doi:10.18653/v1/2020.acl-main.155 2020

[24] [25]

MT Quality Estimation for Computer-assisted Translation: Does it Really Help?

Turchi, Marco and Negri, Matteo and Federico, Marcello. MT Quality Estimation for Computer-assisted Translation: Does it Really Help?. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 2015. doi:10.3115/v1/P15-2087

work page doi:10.3115/v1/p15-2087 2015

[25] [26]

Informatics , VOLUME =

Béchara, Hannah and Orăsan, Constantin and Parra Escartín, Carla and Zampieri, Marcos and Lowe, William , TITLE =. Informatics , VOLUME =. 2021 , NUMBER =

work page 2021

[26] [27]

The Prague Bulletin of Mathematical Linguistics , year=

Questing for quality estimation a user study , author=. The Prague Bulletin of Mathematical Linguistics , year=

work page

[27] [28]

Introducing Quality Estimation to Machine Translation Post-editing Workflow: An Empirical Study on Its Usefulness

Liu, Siqi and Dai, Guangrong and Li, Dechao. Introducing Quality Estimation to Machine Translation Post-editing Workflow: An Empirical Study on Its Usefulness. Proceedings of Machine Translation Summit XX: Volume 1. 2025

work page 2025

[28] [29]

The Impact of MT Quality Estimation on Post-Editing Effort

Teixeira, Carlos and O ' Brien, Sharon. The Impact of MT Quality Estimation on Post-Editing Effort. Proceedings of Machine Translation Summit XVI: Commercial MT Users and Translators Track. 2017

work page 2017

[29] [30]

Investigating the Helpfulness of Word-Level Quality Estimation for Post-Editing Machine Translation Output

Shenoy, Raksha and Herbig, Nico and Kr. Investigating the Helpfulness of Word-Level Quality Estimation for Post-Editing Machine Translation Output. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021. doi:10.18653/v1/2021.emnlp-main.799

work page doi:10.18653/v1/2021.emnlp-main.799 2021

[30] [31]

Word-Level Quality Estimation for Korean-English Neural Machine Translation , year=

Eo, Sugyeong and Park, Chanjun and Moon, Hyeonseok and Seo, Jaehyung and Lim, Heuiseok , journal=. Word-Level Quality Estimation for Korean-English Neural Machine Translation , year=

work page

[31] [32]

Natural Language Engineering , volume=

Can machine translation systems be evaluated by the crowd alone , author=. Natural Language Engineering , volume=. 2017 , publisher=

work page 2017

[32] [33]

Translating Step-by-Step: Decomposing the Translation Process for Improved Translation Quality of Long-Form Texts

Briakou, Eleftheria and Luo, Jiaming and Cherry, Colin and Freitag, Markus. Translating Step-by-Step: Decomposing the Translation Process for Improved Translation Quality of Long-Form Texts. Proceedings of the Ninth Conference on Machine Translation. 2024. doi:10.18653/v1/2024.wmt-1.123

work page doi:10.18653/v1/2024.wmt-1.123 2024

[33] [34]

Are AI agents the new machine translation frontier? Challenges and opportunities of single- and multi-agent systems for multilingual digital communication

Briva-Iglesias, Vicent. Are AI agents the new machine translation frontier? Challenges and opportunities of single- and multi-agent systems for multilingual digital communication. Proceedings of Machine Translation Summit XX: Volume 1. 2025

work page 2025

[34] [35]

(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts

Wu, Minghao and Xu, Jiahao and Yuan, Yulin and Haffari, Gholamreza and Wan, Longyue and Luo, Weihua and Zhang, Kaifu. (Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts. Transactions of the Association for Computational Linguistics. 2025. doi:10.1162/tacl.a.25

work page doi:10.1162/tacl.a.25 2025

[35] [36]

Giving the Old a Fresh Spin: Quality Estimation-Assisted Constrained Decoding for Automatic Post-Editing

Deoghare, Sourabh and Kanojia, Diptesh and Bhattacharyya, Pushpak. Giving the Old a Fresh Spin: Quality Estimation-Assisted Constrained Decoding for Automatic Post-Editing. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers). 2025. ...

work page doi:10.18653/v1/2025.naacl-short.77 2025

[36] [37]

Machine Translation , volume=

A user study of neural interactive translation prediction , author=. Machine Translation , volume=. 2019 , publisher=

work page 2019

[37] [38]

New directions in empirical translation process research: exploring the CRITT TPR-DB , pages=

Learning advanced post-editing , author=. New directions in empirical translation process research: exploring the CRITT TPR-DB , pages=. 2016 , publisher=

work page 2016

[38] [39]

Human-centered, augmented machine translation: analysing user experience, quality and productivity in interactive post-editing vs traditional post-editing , author=. Tradum

work page

[39] [40]

Translation, Cognition & Behavior , volume=

The impact of traditional and interactive post-editing on machine translation user experience, quality, and productivity , author=. Translation, Cognition & Behavior , volume=. 2023 , publisher=

work page 2023

[40] [41]

2017 , school=

Productivity in post-editing and in neural interactive translation prediction: A study of English-to-Spanish professional translators , author=. 2017 , school=

work page 2017

[41] [42]

Translation studies , volume=

Translators and translation technology: The dance of agency , author=. Translation studies , volume=. 2011 , publisher=

work page 2011

[42] [43]

Perspectives , volume=

Human-centered augmented translation: Against antagonistic dualisms , author=. Perspectives , volume=. 2024 , publisher=

work page 2024

[43] [44]

2024 , school=

Productivity in the post-editing of neural machine translation: A mixed-methods analysis of speed and edits at Toppan Digital Language , author=. 2024 , school=

work page 2024

[44] [45]

Findings of the WMT 2024 Biomedical Translation Shared Task: Test Sets on Abstract Level

Neves, Mariana and Grozea, Cristian and Thomas, Philippe and Roller, Roland and Bawden, Rachel and N \'e v \'e ol, Aur \'e lie and Castle, Steffen and Bonato, Vanessa and Di Nunzio, Giorgio Maria and Vezzani, Federica and Vicente Navarro, Maika and Yeganova, Lana and Jimeno Yepes, Antonio. Findings of the WMT 2024 Biomedical Translation Shared Task: Test ...

work page doi:10.18653/v1/2024.wmt-1.6 2024

[45] [46]

Alabau, Vicent, Michael Carl, Francisco Casacuberta, Mercedes Garc \' a Mart \' nez, Jes \'u s Gonz \'a lez-Rubio, Bartolom \'e Mesa-Lao, Daniel Ortiz-Mart \' nez, Moritz Schaeffer, and Germ \'a n Sanchis-Trilles. 2016. Learning advanced post-editing. In New directions in empirical translation process research: exploring the CRITT TPR-DB , pages 95--110. Springer

work page 2016

[46] [47]

Briakou, Eleftheria, Jiaming Luo, Colin Cherry, and Markus Freitag. 2024. Translating step-by-step: Decomposing the translation process for improved translation quality of long-form texts. In Haddow, Barry, Tom Kocmi, Philipp Koehn, and Christof Monz, editors, Proceedings of the Ninth Conference on Machine Translation , pages 1301--1317, Miami, Florida, U...

work page 2024

[47] [48]

Briva-Iglesias, Vicent, Sharon O’Brien, and Benjamin R Cowan. 2023. The impact of traditional and interactive post-editing on machine translation user experience, quality, and productivity. Translation, Cognition & Behavior , 6(1):60--86

work page 2023

[48] [49]

Briva-Iglesias, Vicent. 2025a. Are AI agents the new machine translation frontier? challenges and opportunities of single- and multi-agent systems for multilingual digital communication. In Bouillon, Pierrette, Johanna Gerlach, Sabrina Girletti, Lise Volkart, Raphael Rubino, Rico Sennrich, Ana C. Farinha, Marco Gaido, Joke Daems, Dorothy Kenny, Helena Mon...

work page

[49] [50]

Briva-Iglesias, Vicent. 2025b. Human-centered, augmented machine translation: analysing user experience, quality and productivity in interactive post-editing vs traditional post-editing. Tradum \`a tica tecnologies de la traducci \'o , (23):350--382

work page

[50] [51]

Béchara, Hannah, Constantin Orăsan, Carla Parra Escartín, Marcos Zampieri, and William Lowe. 2021. The role of machine translation quality estimation in the post-editing workflow. Informatics , 8(3)

work page 2021

[51] [52]

Chatterjee, Rajen, Matteo Negri, Marco Turchi, Fr \'e d \'e ric Blain, and Lucia Specia. 2018. Combining quality estimation and automatic post-editing to enhance machine translation output. In Cherry, Colin and Graham Neubig, editors, Proceedings of the 13th Conference of the Association for Machine Translation in the A mericas (Volume 1: Research Track) ...

work page 2018

[52] [53]

Coppers, Sven, Jan Van den Bergh, Kris Luyten, Karin Coninx, Iulianna van der Lek-Ciudin, Tom Vanallemeersch, and Vincent Vandeghinste. 2018. Intellingo: An intelligible translation environment. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems , CHI '18, page 1–13, New York, NY, USA. Association for Computing Machinery

work page 2018

[53] [54]

Deoghare, Sourabh, Diptesh Kanojia, Fred Blain, Tharindu Ranasinghe, and Pushpak Bhattacharyya. 2023. Quality estimation-assisted automatic post-editing. In Bouamor, Houda, Juan Pino, and Kalika Bali, editors, Findings of the Association for Computational Linguistics: EMNLP 2023 , pages 1686--1698, Singapore, December. Association for Computational Linguistics

work page 2023

[54] [55]

Deoghare, Sourabh, Diptesh Kanojia, and Pushpak Bhattacharyya. 2025. Giving the old a fresh spin: Quality estimation-assisted constrained decoding for automatic post-editing. In Chiruzzo, Luis, Alan Ritter, and Lu Wang, editors, Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Huma...

work page 2025

[55] [56]

Escart \' n, Carla Parra, Hanna B \'e chara, and Constantin Or a san. 2017. Questing for quality estimation a user study. The Prague Bulletin of Mathematical Linguistics

work page 2017

[56] [57]

Fernandes, Patrick, Daniel Deutsch, Mara Finkelstein, Parker Riley, Andr \'e Martins, Graham Neubig, Ankush Garg, Jonathan Clark, Markus Freitag, and Orhan Firat. 2023. The devil is in the errors: Leveraging large language models for fine-grained machine translation evaluation. In Proceedings of the Eighth Conference on Machine Translation , pages 1066--1...

work page 2023

[57] [58]

Graham, Yvette, Timothy Baldwin, Alistair Moffat, and Justin Zobel. 2017. Can machine translation systems be evaluated by the crowd alone. Natural Language Engineering , 23(1):3--30

work page 2017

[58] [59]

Guerreiro, Nuno M., Ricardo Rei, Daan van Stigt, Luisa Coheur, Pierre Colombo, and André F. T. Martins. 2023. xcomet: Transparent machine translation evaluation through fine-grained error detection

work page 2023

[59] [60]

Guerreiro, Nuno M., Ricardo Rei, Daan van Stigt, Luisa Coheur, Pierre Colombo, and André F. T. Martins. 2024. xcomet: Transparent Machine Translation Evaluation through Fine-grained Error Detection . Transactions of the Association for Computational Linguistics , 12:979--995, 09

work page 2024

[60] [61]

Kepler, Fabio, Jonay Tr \'e nous, Marcos Treviso, Miguel Vera, and Andr \'e F. T. Martins. 2019. O pen K iwi: An open source framework for quality estimation. In Costa-juss \`a , Marta R. and Enrique Alfonseca, editors, Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations , pages 117--122, Florence...

work page 2019

[61] [62]

Knowles, Rebecca, Marina Sanchez-Torron, and Philipp Koehn. 2019. A user study of neural interactive translation prediction. Machine Translation , 33(1):135--154

work page 2019

[62] [63]

Kocmi, Tom and Christian Federmann. 2023a. GEMBA - MQM : Detecting translation quality error spans with GPT -4. In Koehn, Philipp, Barry Haddow, Tom Kocmi, and Christof Monz, editors, Proceedings of the Eighth Conference on Machine Translation , pages 768--775, Singapore, December. Association for Computational Linguistics

work page

[63] [64]

Kocmi, Tom and Christian Federmann. 2023b. Large language models are state-of-the-art evaluators of translation quality. In Proceedings of the 24th Annual Conference of the European Association for Machine Translation , pages 193--203, Tampere, Finland, June. European Association for Machine Translation

work page

[64] [65]

Kocmi, Tom, Vil \'e m Zouhar, Eleftherios Avramidis, Roman Grundkiewicz, Marzena Karpinska, Maja Popovi \'c , Mrinmaya Sachan, and Mariya Shmatova. 2024. Error span annotation: A balanced approach for human evaluation of machine translation. In Haddow, Barry, Tom Kocmi, Philipp Koehn, and Christof Monz, editors, Proceedings of the Ninth Conference on Mach...

work page 2024

[65] [66]

Liu, Siqi, Guangrong Dai, and Dechao Li. 2025. Introducing quality estimation to machine translation post-editing workflow: An empirical study on its usefulness. In Bouillon, Pierrette, Johanna Gerlach, Sabrina Girletti, Lise Volkart, Raphael Rubino, Rico Sennrich, Ana C. Farinha, Marco Gaido, Joke Daems, Dorothy Kenny, Helena Moniz, and Sara Szoc, editor...

work page 2025

[66] [67]

Lommel, Arle, Hans Uszkoreit, and Aljoscha Burchardt. 2014. Multidimensional quality metrics (mqm): A framework for declaring and describing translation quality metrics. Revista Tradumàtica: tecnologies de la traducció

work page 2014

[67] [68]

Lu, Qingyu, Baopu Qiu, Liang Ding, Kanjian Zhang, Tom Kocmi, and Dacheng Tao. 2024. Error analysis prompting enables human-like translation evaluation in large language models. In Findings of the Association for Computational Linguistics: ACL 2024 , pages 8801--8816, Bangkok, Thailand, August. Association for Computational Linguistics

work page 2024

[68] [69]

Macken, Lieve. 2024. Machine translation meets large language models: Evaluating C hat GPT ' s ability to automatically post-edit literary texts. In Vanroy, Bram, Marie-Aude Lefer, Lieve Macken, and Paola Ruffo, editors, Proceedings of the 1st Workshop on Creative-text Translation and Technology , pages 65--81, Sheffield, United Kingdom, June. European As...

work page 2024

[69] [70]

Neves, Mariana, Cristian Grozea, Philippe Thomas, Roland Roller, Rachel Bawden, Aur \'e lie N \'e v \'e ol, Steffen Castle, Vanessa Bonato, Giorgio Maria Di Nunzio, Federica Vezzani, Maika Vicente Navarro, Lana Yeganova, and Antonio Jimeno Yepes. 2024. Findings of the WMT 2024 biomedical translation shared task: Test sets on abstract level. In Haddow, Bar...

work page 2024

[70] [71]

Olohan, Maeve. 2011. Translators and translation technology: The dance of agency. Translation studies , 4(3):342--357

work page 2011

[71] [72]

O’Brien, Sharon. 2024. Human-centered augmented translation: Against antagonistic dualisms. Perspectives , 32(3):391--406

work page 2024

[72] [73]

Papineni, Kishore, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. B leu: a method for automatic evaluation of machine translation. In Isabelle, Pierre, Eugene Charniak, and Dekang Lin, editors, Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics , pages 311--318, Philadelphia, Pennsylvania, USA, July. Association for ...

work page 2002

[73] [74]

Popovi \'c , Maja. 2015. chr F : character n-gram F -score for automatic MT evaluation. In Bojar, Ond r ej, Rajan Chatterjee, Christian Federmann, Barry Haddow, Chris Hokamp, Matthias Huck, Varvara Logacheva, and Pavel Pecina, editors, Proceedings of the Tenth Workshop on Statistical Machine Translation , pages 392--395, Lisbon, Portugal, September. Assoc...

work page 2015

[74] [75]

Raunak, Vikas, Amr Sharaf, Yiren Wang, Hany Awadalla, and Arul Menezes. 2023. Leveraging GPT -4 for automatic translation post-editing. In Bouamor, Houda, Juan Pino, and Kalika Bali, editors, Findings of the Association for Computational Linguistics: EMNLP 2023 , pages 12009--12024, Singapore, December. Association for Computational Linguistics

work page 2023

[75] [76]

Sarti, Gabriele, Vilém Zouhar, Grzegorz Chrupała, Ana Guerberof-Arenas, Malvina Nissim, and Arianna Bisazza. 2025. Qe4pe: Word-level quality estimation for human post-editing

work page 2025

[76] [77]

Shenoy, Raksha, Nico Herbig, Antonio Kr \"u ger, and Josef van Genabith. 2021. Investigating the helpfulness of word-level quality estimation for post-editing machine translation output. In Moens, Marie-Francine, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih, editors, Proceedings of the 2021 Conference on Empirical Methods in Natural Language Proces...

work page 2021

[77] [78]

Teixeira, Carlos and Sharon O ' Brien. 2017. The impact of MT quality estimation on post-editing effort. In Yamada, Masaru and Mark Seligman, editors, Proceedings of Machine Translation Summit XVI: Commercial MT Users and Translators Track , pages 142--153, Nagoya Japan, September 18 – September 22

work page 2017

[78] [79]

Terribile, Silvia. 2024. Productivity in the post-editing of neural machine translation: A mixed-methods analysis of speed and edits at Toppan Digital Language . Ph.D. thesis, The University of Manchester (United Kingdom)

work page 2024

[79] [80]

Treviso, Marcos, Nuno M Guerreiro, Sweta Agrawal, Ricardo Rei, Jos \'e Pombal, Tania Vaz, Helena Wu, Beatriz Silva, Daan van Stigt, and Andr \'e FT Martins. 2024. xtower: A multilingual llm for explaining and correcting translation errors. arXiv preprint arXiv:2406.19482

work page arXiv 2024

[80] [81]

Turchi, Marco, Matteo Negri, and Marcello Federico. 2015. MT quality estimation for computer-assisted translation: Does it really help? In Zong, Chengqing and Michael Strube, editors, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: ...

work page 2015