Agentic AI Translate: An Agentic Translator Prototype for Translation as Communication Design

Masaru Yamada

arxiv: 2605.17041 · v1 · pith:ZFILQMSLnew · submitted 2026-05-16 · 💻 cs.CL · cs.AI· cs.HC

Agentic AI Translate: An Agentic Translator Prototype for Translation as Communication Design

Masaru Yamada This is my paper

Pith reviewed 2026-05-20 15:46 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.HC

keywords agentic translationcommunication designinteractive specificationverification cycledocument coherence

0 comments

The pith

Translation becomes communication design when an agentic system first specifies purpose and audience before any text is generated.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a prototype that replaces direct text-to-text machine translation with an interactive specification phase followed by a four-stage cycle of identification, prompting, generation, and verification. In the specification phase the user and system jointly build a structured brief covering communicative purpose, register, audience, and genre conventions. The verification step applies structured error-span scoring to ground quality judgments, while a lightweight memory mechanism tracks proper nouns and maintains a bilingual summary to support document-level consistency. The central argument is that this architecture embodies translation as the deliberate design of communication outcomes rather than mechanical conversion of strings. Empirical testing of whether the resulting translations better achieve their stated goals is explicitly left for later work.

Core claim

By turning the metalanguage of translation studies into executable instructions for generative models, the prototype demonstrates that translation can be operationalized as a goal-directed design process: an initial dialogue produces a detailed brief, after which an Identify-Prompt-Generate-Verify cycle produces and checks output against that brief, with memory elements preserving coherence across the document.

What carries the argument

The four-stage agentic cycle (Identify -> Prompt -> Generate -> Verify) that runs after an interactive specification phase builds a structured translation brief from communicative purpose, register, audience, and genre conventions.

If this is right

Translation quality assessment can shift from surface fluency to evidence-based checks against explicit communication criteria.
Document-level consistency can be maintained through lightweight memory of key terms and running bilingual summaries.
The design process itself becomes visible and adjustable, exposing choices about audience and purpose that were previously hidden inside the model.
Future extensions could automate parts of the brief construction while still requiring human oversight of the communicative goals.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Professional workflows might move from post-editing raw output toward editing and refining the initial specification brief.
The same cycle structure could be tested on related tasks such as content adaptation or localization where audience goals also vary.
If the approach scales, training data for translation models might increasingly include paired briefs and final outputs rather than source-target sentence pairs alone.

Load-bearing premise

That grounding the generation and verification stages in an explicit communication brief will produce translations that better serve the intended goals than direct text-in text-out methods.

What would settle it

A controlled comparison in which the same source texts and target briefs are given to both the agentic prototype and a standard direct-translation model, followed by expert raters scoring how well each output fulfills the stated communicative objectives.

read the original abstract

We present Agentic AI Translate, an agentic translator prototype that operationalises the thesis of Yamada (forthcoming) -- that the metalanguage of Translation Studies has become an instruction code for generative AI. The system replaces the dominant text-in / text-out paradigm of machine translation with a four-stage agentic cycle (Identify -> Prompt -> Generate -> Verify), preceded by an interactive specification phase in which the user composes -- through model-assisted dialogue -- a structured translation brief grounded in skopos theory, register, audience, and genre conventions. The verification stage adopts the GEMBA-MQM error-span protocol (Kocmi & Federmann, 2023) for evidence-grounded scoring, and document-level coherence is preserved through a DelTA-lite memory of proper nouns and a running bilingual summary, after Wang et al. (2025). We describe the philosophical motivation, the architectural commitments, the four reference-material categories the system consumes, and the principal design tensions the architecture makes explicit. Empirical validation is left for future work; the contribution here is conceptual and architectural -- an executable embodiment of the position that translation in the GenAI era is communication design, not text conversion.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a conceptual architecture paper for an agentic MT prototype that integrates skopos theory and GEMBA-MQM but supplies no tests or outputs.

read the letter

The main point is a high-level design for an agentic translator that opens with an interactive specification phase grounded in skopos theory, register, audience, and genre, then runs a four-stage cycle of Identify, Prompt, Generate, and Verify, using GEMBA-MQM for error-span scoring and a lightweight DelTA-lite memory for proper nouns and bilingual summaries. The paper frames this as an executable version of the claim that translation in the GenAI era is communication design rather than raw text conversion. That framing is consistent and draws cleanly on the cited prior protocols without circularity. The architecture description itself is clear enough that a reader can see the intended flow and the reference-material categories it consumes. The design tensions it flags are also stated plainly. The central limitation is the absence of any implementation details, sample runs, or evaluation. The paper states outright that empirical validation is deferred, so there is no evidence yet that the interactive brief or the verification stage actually improves goal alignment over standard MT pipelines. The DelTA-lite memory is described at the level of intent rather than mechanics. This leaves the practical advantage of the whole system as an open question. The work is aimed at people who already work at the intersection of translation studies and prompt or agent design. A reader building agentic systems in NLP or looking for structured ways to inject translation theory into generative pipelines could extract usable design commitments from it. It is coherent on its own terms and engages the literature honestly, so it is worth sending for peer review to get feedback on the stage definitions and on how one might test the communication-design claim in practice.

Referee Report

1 major / 2 minor

Summary. The paper presents Agentic AI Translate, a prototype for an agentic AI-based translation system. It operationalizes the thesis that the metalanguage of Translation Studies can be used as instruction code for generative AI. The system features an interactive specification phase based on skopos theory to create a translation brief, followed by a four-stage agentic cycle consisting of Identify, Prompt, Generate, and Verify stages. The Verify stage uses the GEMBA-MQM protocol for error scoring, and document-level coherence is maintained using DelTA-lite memory and a bilingual summary. The contribution is described as conceptual and architectural, with empirical validation deferred to future work.

Significance. If implemented and tested, this architecture could significantly advance the field by moving machine translation from a simple text conversion model to one that incorporates communication design principles, audience awareness, and purpose-driven translation as per skopos theory. By making explicit the design tensions and reference material categories, it provides a framework that could inspire more sophisticated agentic systems in NLP and translation technology. The integration of established protocols like GEMBA-MQM adds rigor to the verification process.

major comments (1)

[architectural commitments] The description of the prototype as an 'executable embodiment' relies on the coherence of the high-level design, but lacks specific details such as prompt templates, agent roles, or example workflows for the Identify -> Prompt -> Generate -> Verify cycle (see architectural commitments section). This makes it difficult to assess how the system would actually function in practice, which is central to the claim of providing a prototype.

minor comments (2)

[references] The reference to Yamada (forthcoming) should include more context or a preprint link if available to aid readers in understanding the foundational thesis.
[DelTA-lite memory description] Clarify the exact implementation or differences of 'DelTA-lite memory' from the referenced Wang et al. (2025) work to avoid ambiguity in the coherence preservation mechanism.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive summary, recognition of the work's potential significance, and recommendation for minor revision. We address the single major comment below.

read point-by-point responses

Referee: [architectural commitments] The description of the prototype as an 'executable embodiment' relies on the coherence of the high-level design, but lacks specific details such as prompt templates, agent roles, or example workflows for the Identify -> Prompt -> Generate -> Verify cycle (see architectural commitments section). This makes it difficult to assess how the system would actually function in practice, which is central to the claim of providing a prototype.

Authors: We acknowledge that the manuscript presents the prototype at a conceptual and architectural level without concrete prompt templates, explicit agent role specifications, or worked example workflows. This choice aligns with the paper's stated scope, which frames the contribution as an executable embodiment of Translation Studies metalanguage as instruction code, while deferring full implementation and empirical testing to future work. Nevertheless, to make the high-level design more readily assessable, we will revise the architectural commitments section to include (1) concise descriptions of the primary agent roles associated with each stage of the cycle and (2) an illustrative, non-implementation-specific example workflow that traces a sample translation brief through Identify, Prompt, Generate, and Verify. Full prompt templates will remain outside the scope of this conceptual paper, as they are implementation artifacts subject to rapid iteration and more appropriate for supplementary code releases or a subsequent systems paper. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is a purely conceptual and architectural description of a prototype system that operationalizes skopos theory and related Translation Studies concepts into an agentic workflow. It contains no equations, no fitted parameters, no quantitative predictions, and no derivations that could reduce to inputs by construction. The central contribution is explicitly framed as a design commitment whose coherence stands on its own description, with empirical validation deferred; external references including GEMBA-MQM and the author's forthcoming thesis function as foundational inputs rather than self-referential loops.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The central claim rests on domain assumptions from translation studies and AI evaluation protocols rather than new mathematical derivations or fitted parameters.

axioms (2)

domain assumption Skopos theory supplies the appropriate structure for translation briefs in an interactive AI setting
Invoked to ground the specification phase in the abstract.
domain assumption GEMBA-MQM error-span protocol provides evidence-grounded scoring suitable for the verification stage
Adopted directly for the Verify step.

invented entities (2)

Agentic AI Translate prototype no independent evidence
purpose: To operationalize translation as communication design via the four-stage cycle
New system introduced as an executable embodiment of the thesis.
DelTA-lite memory no independent evidence
purpose: To preserve document-level coherence through proper nouns and bilingual summary
Introduced as part of the architecture after Wang et al. (2025).

pith-pipeline@v0.9.0 · 5733 in / 1533 out tokens · 42599 ms · 2026-05-20T15:46:38.427836+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The system replaces the dominant text-in / text-out paradigm of machine translation with a four-stage agentic cycle (Identify -> Prompt -> Generate -> Verify), preceded by an interactive specification phase grounded in skopos theory, register, audience, and genre conventions.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The verification stage adopts the GEMBA-MQM error-span protocol for evidence-grounded scoring

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages · 4 internal anchors

[1]

Agrawal, S., Zhou, C., Lewis, M., Zettlemoyer, L., & Ghazvininejad, M. (2023). In-context examples selection for machine translation. InFindings of ACL 2023(pp. 8857–8873)

work page 2023
[2]

Briakou, E., Luo, J., Cherry, C., & Freitag, M. (2024). Translating step-by-step: Decomposing the translation process for improved translation quality of long-form texts. InProceedings of WMT 2024. arXiv:2409.06790

work page arXiv 2024
[3]

Feng, Z., Zhang, Y., Li, H., Liu, W., Lang, J., Feng, Y., Wu, J., & Liu, Z. (2025). TEaR: Improving LLM-based machine translation with systematic self-refinement. InProceedings of NAACL 2025. arXiv:2402.16379

work page arXiv 2025
[4]

Fernandes, P., Yin, K., Liu, E., Martins, A. F. T., & Neubig, G. (2023). The devil is in the errors: Leveraging large language models for fine-grained machine translation evaluation. arXiv:2308.07286

work page arXiv 2023
[5]

Freitag, M., Foster, G., Grangier, D., Ratnakar, V., Tan, Q., & Macherey, W. (2021). Ex- perts, errors, and context: A large-scale study of human evaluation for machine translation. Transactions of the Association for Computational Linguistics, 9, 1460–1474

work page 2021
[6]

Freitag, M., et al. (2024). Are LLMs breaking MT metrics? Results of the WMT24 metrics shared task. InProceedings of WMT 2024. 9

work page 2024
[7]

(2009).Stratégies et tactiques en traduction et interprétation.In Gambier, Y., & Doorslaer, L

Gambier, Y. (2009).Stratégies et tactiques en traduction et interprétation.In Gambier, Y., & Doorslaer, L. van (Eds.),Handbook of Translation Studies(Vol. 1). Amsterdam: John Benjamins

work page 2009
[8]

M., Voita, E., & Martins, A

Guerreiro, N. M., Voita, E., & Martins, A. F. T. (2023). Looking for a needle in a haystack: A comprehensive study of hallucinations in neural machine translation. InProceedings of EACL 2023(pp. 1059–1075)

work page 2023
[9]

M., Rei, R., van Stigt, D., Coheur, L., Colombo, P., & Martins, A

Guerreiro, N. M., Rei, R., van Stigt, D., Coheur, L., Colombo, P., & Martins, A. F. T. (2024). xCOMET: Transparent machine translation evaluation through fine-grained error detection. Transactions of the Association for Computational Linguistics, 12, 979–995

work page 2024
[10]

(2015).Translation Quality Assessment: Past and Present.London: Routledge

House, J. (2015).Translation Quality Assessment: Past and Present.London: Routledge

work page 2015
[11]

Large Language Models Cannot Self-Correct Reasoning Yet

Huang, J., Chen, X., Mishra, S., Zheng, H. S., Yu, A. W., Song, X., & Zhou, D. (2024). Large lan- guage models cannot self-correct reasoning yet. InProceedings of ICLR 2024.arXiv:2310.01798

work page internal anchor Pith review Pith/arXiv arXiv 2024
[12]

Juraska, J., Finkelstein, M., Deutsch, D., Siddhant, A., Tran, M., & Freitag, M. (2023). MetricX-23: The Google submission to the WMT 2023 metrics shared task. InProceedings of WMT 2023

work page 2023
[13]

Kano, N., Seraku, N., Takahashi, F., & Tsuji, S. (1984). Attractive quality and must-be quality. Journal of the Japanese Society for Quality Control, 14(2), 39–48

work page 1984
[14]

Karpinska, M., & Iyyer, M. (2023). Large language models effectively leverage document-level context for literary translation, but critical errors persist. InProceedings of WMT 2023(pp. 419–451)

work page 2023
[15]

Kayano, S., & Sugawara, Y. (2025). Specification-aware machine translation and evaluation for purpose alignment. InProceedings of WMT 2025. arXiv:2509.17559

work page arXiv 2025
[16]

Kocmi, T., & Federmann, C. (2023). Large language models are state-of-the-art evaluators of translation quality. InProceedings of EAMT 2023.arXiv:2302.14520

work page arXiv 2023
[17]

Kocmi, T., & Federmann, C. (2023). GEMBA-MQM: Detecting translation quality error spans with GPT-4. InProceedings of WMT 2023.arXiv:2310.13988

work page arXiv 2023
[18]

Kocmi, T., et al. (2024). Findings of the 2024 Conference on Machine Translation (WMT24). InProceedings of WMT 2024

work page 2024
[19]

Madaan, A., et al. (2023). Self-Refine: Iterative refinement with self-feedback. InAdvances in Neural Information Processing Systems 36(NeurIPS 2023). arXiv:2303.17651

work page internal anchor Pith review Pith/arXiv arXiv 2023
[20]

(2016).Introducing Translation Studies: Theories and Applications(4th ed.)

Munday, J. (2016).Introducing Translation Studies: Theories and Applications(4th ed.). London: Routledge

work page 2016
[21]

(1997).Translating as a Purposeful Activity: Functionalist Approaches Explained

Nord, C. (1997).Translating as a Purposeful Activity: Functionalist Approaches Explained. Manchester: St. Jerome

work page 1997
[22]

(1971/2000).Translation Criticism: The Potentials and Limitations(E

Reiss, K. (1971/2000).Translation Criticism: The Potentials and Limitations(E. Rhodes, Trans.). Manchester: St. Jerome

work page 1971
[23]

Singh, P., Jangra, A., et al. (2024). Translating across cultures: LLMs for intralingual cultural adaptation. InProceedings of CoNLL 2024. 10

work page 2024
[24]

Stechly, K., Valmeekam, K., & Kambhampati, S. (2024). On the self-verification limitations of large language models on reasoning and planning tasks. InProceedings of ICML 2024. arXiv:2402.08115

work page arXiv 2024
[25]

(1986).That’s Not What I Meant! How Conversational Style Makes or Breaks Relationships.New York: William Morrow

Tannen, D. (1986).That’s Not What I Meant! How Conversational Style Makes or Breaks Relationships.New York: William Morrow

work page 1986
[26]

Vermeer, H. J. (1978). Ein Rahmen für eine allgemeine Translationstheorie.Lebende Sprachen, 23, 99–102

work page 1978
[27]

Vilar, D., Freitag, M., Cherry, C., Luo, J., Ratnakar, V., & Foster, G. (2023). Prompting PaLM for translation: Assessing strategies and performance. InProceedings of ACL 2023(pp. 15406–15427)

work page 2023
[28]

Wang, P., Li, L., Chen, L., Cai, Z., Zhu, D., Lin, B., Cao, Y., Liu, Q., Liu, T., & Sui, Z. (2024). Large language models are not fair evaluators. InProceedings of ACL 2024.arXiv:2305.17926

work page internal anchor Pith review Pith/arXiv arXiv 2024
[29]

F., Meng, F., Zhou, J., & Zhang, M

Wang, Y., Zeng, J., Liu, X., Wong, D. F., Meng, F., Zhou, J., & Zhang, M. (2025). DelTA: An online document-level translation agent based on multi-level memory. InProceedings of ICLR 2025.arXiv:2410.08143

work page arXiv 2025
[30]

Wu, M., Yuan, Y., Haffari, G., & Wang, L. (2024). (Perhaps) Beyond human translation: Harnessing multi-agent collaboration for translating ultra-long literary texts.Transactions of the Association for Computational Linguistics(2025). arXiv:2405.11804

work page arXiv 2024
[31]

(forthcoming)

Yamada, M. (forthcoming). Metalanguage and GenAI: Empowering language learners and translators in training. In M. A. Jiménez-Crespo & V. Enríquez-Raido (Eds.),The Routledge Handbook of Translation and Technology(2nd ed.). London: Routledge

work page
[32]

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

Zheng, L., Chiang, W.-L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., Lin, Z., Li, Z., Li, D., Xing, E. P., Zhang, H., Gonzalez, J. E., & Stoica, I. (2023). Judging LLM-as-a-judge with MT-Bench and Chatbot Arena. InAdvances in Neural Information Processing Systems 36 (NeurIPS 2023). arXiv:2306.05685. 11

work page internal anchor Pith review Pith/arXiv arXiv 2023

[1] [1]

Agrawal, S., Zhou, C., Lewis, M., Zettlemoyer, L., & Ghazvininejad, M. (2023). In-context examples selection for machine translation. InFindings of ACL 2023(pp. 8857–8873)

work page 2023

[2] [2]

Briakou, E., Luo, J., Cherry, C., & Freitag, M. (2024). Translating step-by-step: Decomposing the translation process for improved translation quality of long-form texts. InProceedings of WMT 2024. arXiv:2409.06790

work page arXiv 2024

[3] [3]

Feng, Z., Zhang, Y., Li, H., Liu, W., Lang, J., Feng, Y., Wu, J., & Liu, Z. (2025). TEaR: Improving LLM-based machine translation with systematic self-refinement. InProceedings of NAACL 2025. arXiv:2402.16379

work page arXiv 2025

[4] [4]

Fernandes, P., Yin, K., Liu, E., Martins, A. F. T., & Neubig, G. (2023). The devil is in the errors: Leveraging large language models for fine-grained machine translation evaluation. arXiv:2308.07286

work page arXiv 2023

[5] [5]

Freitag, M., Foster, G., Grangier, D., Ratnakar, V., Tan, Q., & Macherey, W. (2021). Ex- perts, errors, and context: A large-scale study of human evaluation for machine translation. Transactions of the Association for Computational Linguistics, 9, 1460–1474

work page 2021

[6] [6]

Freitag, M., et al. (2024). Are LLMs breaking MT metrics? Results of the WMT24 metrics shared task. InProceedings of WMT 2024. 9

work page 2024

[7] [7]

(2009).Stratégies et tactiques en traduction et interprétation.In Gambier, Y., & Doorslaer, L

Gambier, Y. (2009).Stratégies et tactiques en traduction et interprétation.In Gambier, Y., & Doorslaer, L. van (Eds.),Handbook of Translation Studies(Vol. 1). Amsterdam: John Benjamins

work page 2009

[8] [8]

M., Voita, E., & Martins, A

Guerreiro, N. M., Voita, E., & Martins, A. F. T. (2023). Looking for a needle in a haystack: A comprehensive study of hallucinations in neural machine translation. InProceedings of EACL 2023(pp. 1059–1075)

work page 2023

[9] [9]

M., Rei, R., van Stigt, D., Coheur, L., Colombo, P., & Martins, A

Guerreiro, N. M., Rei, R., van Stigt, D., Coheur, L., Colombo, P., & Martins, A. F. T. (2024). xCOMET: Transparent machine translation evaluation through fine-grained error detection. Transactions of the Association for Computational Linguistics, 12, 979–995

work page 2024

[10] [10]

(2015).Translation Quality Assessment: Past and Present.London: Routledge

House, J. (2015).Translation Quality Assessment: Past and Present.London: Routledge

work page 2015

[11] [11]

Large Language Models Cannot Self-Correct Reasoning Yet

Huang, J., Chen, X., Mishra, S., Zheng, H. S., Yu, A. W., Song, X., & Zhou, D. (2024). Large lan- guage models cannot self-correct reasoning yet. InProceedings of ICLR 2024.arXiv:2310.01798

work page internal anchor Pith review Pith/arXiv arXiv 2024

[12] [12]

Juraska, J., Finkelstein, M., Deutsch, D., Siddhant, A., Tran, M., & Freitag, M. (2023). MetricX-23: The Google submission to the WMT 2023 metrics shared task. InProceedings of WMT 2023

work page 2023

[13] [13]

Kano, N., Seraku, N., Takahashi, F., & Tsuji, S. (1984). Attractive quality and must-be quality. Journal of the Japanese Society for Quality Control, 14(2), 39–48

work page 1984

[14] [14]

Karpinska, M., & Iyyer, M. (2023). Large language models effectively leverage document-level context for literary translation, but critical errors persist. InProceedings of WMT 2023(pp. 419–451)

work page 2023

[15] [15]

Kayano, S., & Sugawara, Y. (2025). Specification-aware machine translation and evaluation for purpose alignment. InProceedings of WMT 2025. arXiv:2509.17559

work page arXiv 2025

[16] [16]

Kocmi, T., & Federmann, C. (2023). Large language models are state-of-the-art evaluators of translation quality. InProceedings of EAMT 2023.arXiv:2302.14520

work page arXiv 2023

[17] [17]

Kocmi, T., & Federmann, C. (2023). GEMBA-MQM: Detecting translation quality error spans with GPT-4. InProceedings of WMT 2023.arXiv:2310.13988

work page arXiv 2023

[18] [18]

Kocmi, T., et al. (2024). Findings of the 2024 Conference on Machine Translation (WMT24). InProceedings of WMT 2024

work page 2024

[19] [19]

Madaan, A., et al. (2023). Self-Refine: Iterative refinement with self-feedback. InAdvances in Neural Information Processing Systems 36(NeurIPS 2023). arXiv:2303.17651

work page internal anchor Pith review Pith/arXiv arXiv 2023

[20] [20]

(2016).Introducing Translation Studies: Theories and Applications(4th ed.)

Munday, J. (2016).Introducing Translation Studies: Theories and Applications(4th ed.). London: Routledge

work page 2016

[21] [21]

(1997).Translating as a Purposeful Activity: Functionalist Approaches Explained

Nord, C. (1997).Translating as a Purposeful Activity: Functionalist Approaches Explained. Manchester: St. Jerome

work page 1997

[22] [22]

(1971/2000).Translation Criticism: The Potentials and Limitations(E

Reiss, K. (1971/2000).Translation Criticism: The Potentials and Limitations(E. Rhodes, Trans.). Manchester: St. Jerome

work page 1971

[23] [23]

Singh, P., Jangra, A., et al. (2024). Translating across cultures: LLMs for intralingual cultural adaptation. InProceedings of CoNLL 2024. 10

work page 2024

[24] [24]

Stechly, K., Valmeekam, K., & Kambhampati, S. (2024). On the self-verification limitations of large language models on reasoning and planning tasks. InProceedings of ICML 2024. arXiv:2402.08115

work page arXiv 2024

[25] [25]

(1986).That’s Not What I Meant! How Conversational Style Makes or Breaks Relationships.New York: William Morrow

Tannen, D. (1986).That’s Not What I Meant! How Conversational Style Makes or Breaks Relationships.New York: William Morrow

work page 1986

[26] [26]

Vermeer, H. J. (1978). Ein Rahmen für eine allgemeine Translationstheorie.Lebende Sprachen, 23, 99–102

work page 1978

[27] [27]

Vilar, D., Freitag, M., Cherry, C., Luo, J., Ratnakar, V., & Foster, G. (2023). Prompting PaLM for translation: Assessing strategies and performance. InProceedings of ACL 2023(pp. 15406–15427)

work page 2023

[28] [28]

Wang, P., Li, L., Chen, L., Cai, Z., Zhu, D., Lin, B., Cao, Y., Liu, Q., Liu, T., & Sui, Z. (2024). Large language models are not fair evaluators. InProceedings of ACL 2024.arXiv:2305.17926

work page internal anchor Pith review Pith/arXiv arXiv 2024

[29] [29]

F., Meng, F., Zhou, J., & Zhang, M

Wang, Y., Zeng, J., Liu, X., Wong, D. F., Meng, F., Zhou, J., & Zhang, M. (2025). DelTA: An online document-level translation agent based on multi-level memory. InProceedings of ICLR 2025.arXiv:2410.08143

work page arXiv 2025

[30] [30]

Wu, M., Yuan, Y., Haffari, G., & Wang, L. (2024). (Perhaps) Beyond human translation: Harnessing multi-agent collaboration for translating ultra-long literary texts.Transactions of the Association for Computational Linguistics(2025). arXiv:2405.11804

work page arXiv 2024

[31] [31]

(forthcoming)

Yamada, M. (forthcoming). Metalanguage and GenAI: Empowering language learners and translators in training. In M. A. Jiménez-Crespo & V. Enríquez-Raido (Eds.),The Routledge Handbook of Translation and Technology(2nd ed.). London: Routledge

work page

[32] [32]

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

Zheng, L., Chiang, W.-L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., Lin, Z., Li, Z., Li, D., Xing, E. P., Zhang, H., Gonzalez, J. E., & Stoica, I. (2023). Judging LLM-as-a-judge with MT-Bench and Chatbot Arena. InAdvances in Neural Information Processing Systems 36 (NeurIPS 2023). arXiv:2306.05685. 11

work page internal anchor Pith review Pith/arXiv arXiv 2023