{"work":{"id":"e294e5a1-5dd2-44a0-b348-adbd62fe1916","openalex_id":null,"doi":null,"arxiv_id":"1609.08144","raw_key":null,"title":"Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation","authors":null,"authors_text":"Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey","year":2016,"venue":"cs.CL","abstract":"Neural Machine Translation (NMT) is an end-to-end learning approach for automated translation, with the potential to overcome many of the weaknesses of conventional phrase-based translation systems. Unfortunately, NMT systems are known to be computationally expensive both in training and in translation inference. Also, most NMT systems have difficulty with rare words. These issues have hindered NMT's use in practical deployments and services, where both accuracy and speed are essential. In this work, we present GNMT, Google's Neural Machine Translation system, which attempts to address many of these issues. Our model consists of a deep LSTM network with 8 encoder and 8 decoder layers using attention and residual connections. To improve parallelism and therefore decrease training time, our attention mechanism connects the bottom layer of the decoder to the top layer of the encoder. To accelerate the final translation speed, we employ low-precision arithmetic during inference computations. To improve handling of rare words, we divide words into a limited set of common sub-word units (\"wordpieces\") for both input and output. This method provides a good balance between the flexibility of \"character\"-delimited models and the efficiency of \"word\"-delimited models, naturally handles translation of rare words, and ultimately improves the overall accuracy of the system. Our beam search technique employs a length-normalization procedure and uses a coverage penalty, which encourages generation of an output sentence that is most likely to cover all the words in the source sentence. On the WMT'14 English-to-French and English-to-German benchmarks, GNMT achieves competitive results to state-of-the-art. Using a human side-by-side evaluation on a set of isolated simple sentences, it reduces translation errors by an average of 60% compared to Google's phrase-based production system.","external_url":"https://arxiv.org/abs/1609.08144","cited_by_count":null,"metadata_source":"pith","metadata_fetched_at":"2026-06-28T17:12:24.438559+00:00","pith_arxiv_id":"1609.08144","created_at":"2026-05-09T03:35:49.603902+00:00","updated_at":"2026-06-28T17:12:24.438559+00:00","title_quality_ok":true,"display_title":"Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation","render_title":"Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation"},"hub":{"state":{"work_id":"e294e5a1-5dd2-44a0-b348-adbd62fe1916","tier":"hub","tier_reason":"10+ Pith inbound or 1,000+ external citations","pith_inbound_count":78,"external_cited_by_count":null,"distinct_field_count":11,"first_pith_cited_at":"2017-01-23T18:10:00+00:00","last_pith_cited_at":"2026-05-31T11:27:48+00:00","author_build_status":"not_needed","summary_status":"needed","contexts_status":"needed","graph_status":"needed","ask_index_status":"not_needed","reader_status":"not_needed","recognition_status":"not_needed","updated_at":"2026-06-29T16:29:02.998931+00:00","tier_text":"hub"},"tier":"hub","role_counts":[{"context_role":"background","n":11},{"context_role":"other","n":2},{"context_role":"baseline","n":1},{"context_role":"method","n":1}],"polarity_counts":[{"context_polarity":"background","n":10},{"context_polarity":"unclear","n":3},{"context_polarity":"baseline","n":1},{"context_polarity":"use_method","n":1}],"runs":{"context_extract":{"job_type":"context_extract","status":"succeeded","result":{"enqueued_papers":25},"error":null,"updated_at":"2026-05-25T18:26:14.232410+00:00"},"graph_features":{"job_type":"graph_features","status":"succeeded","result":{"co_cited":[{"title":"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding","work_id":"ed240a10-5b19-406c-baa5-30803f465785","shared_citers":18},{"title":"Neural Machine Translation by Jointly Learning to Align and Translate","work_id":"d831e763-d530-4029-a65c-ac595d82cb2a","shared_citers":15},{"title":"Adam: A Method for Stochastic Optimization","work_id":"1910796d-9b52-4683-bf5c-de9632c1028b","shared_citers":13},{"title":"Scaling Laws for Neural Language Models","work_id":"b7dd8749-9c45-4977-ab9b-64478dce1ae8","shared_citers":10},{"title":"Evaluating Large Language Models Trained on Code","work_id":"042493e9-b26f-4b4e-bbde-382072ca9b08","shared_citers":9},{"title":"GPT-4 Technical Report","work_id":"b928e041-6991-4c08-8c81-0359e4097c7b","shared_citers":9},{"title":"Layer Normalization","work_id":"20a2d720-0046-4c7c-bcd6-327ec8143f69","shared_citers":9},{"title":"LLaMA: Open and Efficient Foundation Language Models","work_id":"c018fc23-6f3f-4035-9d02-28a2173b2b9d","shared_citers":9},{"title":"PaLM: Scaling Language Modeling with Pathways","work_id":"a94f3ef7-2c49-4445-93fe-6ec16aafd966","shared_citers":9},{"title":"Proximal Policy Optimization Algorithms","work_id":"240c67fe-d14d-4520-91c1-38a4e272ca19","shared_citers":9},{"title":"The Llama 3 Herd of Models","work_id":"1549a635-88af-4ac1-acfe-51ae7bb53345","shared_citers":9},{"title":"Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer","work_id":"50e3b368-0243-4726-8186-233869802ad1","shared_citers":8},{"title":"Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism","work_id":"c888e6d1-0b1d-43d6-9ef5-f0912a0efa1b","shared_citers":8},{"title":"Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge","work_id":"28ea1282-d657-4c61-a83c-f1249be6d6b1","shared_citers":8},{"title":"Gaussian Error Linear Units (GELUs)","work_id":"0466fd22-03a1-4a61-af0a-a900e77bb023","shared_citers":7},{"title":"Gemini: A Family of Highly Capable Multimodal Models","work_id":"83f7c85b-3f11-450f-ac0c-64d9745220b2","shared_citers":7},{"title":"LaMDA: Language Models for Dialog Applications","work_id":"1b66d0a5-f6ae-4332-8025-c662dc64b238","shared_citers":7},{"title":"Language Models are Few-Shot Learners","work_id":"214732c0-2edd-44a0-af9e-28184a2b8279","shared_citers":7},{"title":"Qwen3 Technical Report","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","shared_citers":7},{"title":"Training Verifiers to Solve Math Word Problems","work_id":"acab1aa8-b4d6-40e0-a3ee-25341701dca2","shared_citers":7},{"title":"arXiv preprint arXiv:1802.05365 , year=","work_id":"dd973cba-647d-49d3-9d24-061b637bb0cd","shared_citers":6},{"title":"DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models","work_id":"c5006563-f3ec-438a-9e35-b7b484f34828","shared_citers":6},{"title":"Efficient Estimation of Word Representations in Vector Space","work_id":"59edaa01-a696-45b3-9a08-5eae777a799e","shared_citers":6},{"title":"Exploring the limits of language modeling","work_id":"a9dbcb7a-e48d-42a4-8d60-a8f723751a97","shared_citers":6}],"time_series":[{"n":4,"year":2017},{"n":1,"year":2018},{"n":16,"year":2019},{"n":4,"year":2020},{"n":3,"year":2021},{"n":4,"year":2022},{"n":8,"year":2023},{"n":3,"year":2024},{"n":4,"year":2025},{"n":26,"year":2026}],"dependency_candidates":[{"n":1,"role":"method","polarity":"use_method","paper_title":"From Syntax to Semantics: Unveiling the Emergence of Chirality in SMILES Translation Models","primary_cat":"cs.LG","context_text":"SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing.arXiv [cs.CL], 2018. doi:10.48550/arXiv.1808.06226. [33] RDKit: Open-source cheminformatics. [34] Esben Jannik Bjerrum. SMILES enumeration as data augmentation for neural network modeling of molecules. arXiv [cs.LG], 2017. doi:10.48550/arXiv.1703.07076. [35] Josep Arús-Pous, Simon Viet Johansson, Oleksii Prykhodko, Esben Jannik Bjerrum, Christian Tyrchan, Jean-Louis Reymond, Hongming Chen, and Ola Engkvist. Randomized SMILES strings improve the quality of molecular generative models.J. Cheminform., 11(1):71, 2019. ISSN 1758-2946,1758-2946. doi:10.1186/s13321-019-0393-0. [36] Yasuhiro Yoshikai, Tadahaya Mizuno, Shumpei Nemoto, and Hiroyuki Kusuhara.","citing_arxiv_id":"2605.09949"},{"n":1,"role":"baseline","polarity":"baseline","paper_title":"Bridging the Linguistic Divide: A Survey on Leveraging Large Language Models for Machine Translation","primary_cat":"cs.CL","context_text":"tence without any examples. The model relies solely on its pre-trained knowledge to generate translations. ˆY=f θLLM (prompt, X) (1) Different approaches in zero-shot prompting vary by prompt format and the use of a pivot language for low-resource languages. It has been observed that zero-shot prompting with ChatGPT lags behind MT systems by Google MT [56], Tencent [57], and DeepL by around 5.0 BLEU points [58]. Pivot prompting has been explored to translate between distant languages, where the LLM first translates the sentence to English and then into the target language. This strategy, which uses a resource- rich language (English) as a pivot, improves translation quality between De→Zh and Ro→Zh [58].","citing_arxiv_id":"2504.01919"}]},"error":null,"updated_at":"2026-05-25T18:26:14.383732+00:00"},"identity_refresh":{"job_type":"identity_refresh","status":"succeeded","result":{"items":[{"title":"Qwen3 Technical Report","outcome":"unchanged","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","resolver":"local_arxiv","confidence":0.98,"old_work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e"}],"counts":{"fixed":0,"merged":0,"unchanged":1,"quarantined":0,"needs_external_resolution":0},"errors":[],"attempted":1},"error":null,"updated_at":"2026-05-25T18:26:26.587554+00:00"},"summary_claims":{"job_type":"summary_claims","status":"succeeded","result":{"title":"Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation","claims":[{"claim_text":"Neural Machine Translation (NMT) is an end-to-end learning approach for automated translation, with the potential to overcome many of the weaknesses of conventional phrase-based translation systems. Unfortunately, NMT systems are known to be computationally expensive both in training and in translation inference. Also, most NMT systems have difficulty with rare words. These issues have hindered NMT's use in practical deployments and services, where both accuracy and speed are essential. In this work, we present GNMT, Google's Neural Machine Translation system, which attempts to address many of","claim_type":"abstract","evidence_strength":"source_metadata"},{"claim_text":"08144 [31] Ruiyi Yan and Yugo Murawaki. 2025. Addressing Tokenization Inconsistency in Steganography and Watermarking Based on Large Language Models. InPro- ceedings of the 2025 Conference on Empirical Methods in Natural Language Pro- cessing. Association for Computational Linguistics, Suzhou, China, 7076-7098. doi:10.18653/v1/2025.emnlp-main.361 [32] Ruiyi Yan and Yugo Murawaki. 2025. Low-Overhead Disambiguation for Genera- tive Linguistic Steganography via Tokenization Consistency. InProceedin","claim_type":"other","confidence":0.95,"evidence_strength":"citation_context"},{"claim_text":"Yuqing Wang, Ying Song, Xiaozhou Li, Nana Reinikainen, and Mika V. Mäntylä. 2025. A Comparative Study of Semantic Log Representations for Software Log-based Anomaly Detection.Proc. ACM Softw. Eng.1, 1 (April 2025), 12 pages. https://doi.org/XXXXXXX.XXXXXXX 1 Introduction As modern software systems become increasingly complex, the potential for anomalies grows [44]. The anomalies may arise from various causes, e.g., misconfigurations, resource contention, or un- predictable workloads [11]. Even a","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing.arXiv [cs.CL], 2018. doi:10.48550/arXiv.1808.06226. [33] RDKit: Open-source cheminformatics. [34] Esben Jannik Bjerrum. SMILES enumeration as data augmentation for neural network modeling of molecules. arXiv [cs.LG], 2017. doi:10.48550/arXiv.1703.07076. [35] Josep Arús-Pous, Simon Viet Johansson, Oleksii Prykhodko, Esben Jannik Bjerrum, Christian Tyrchan, Jean-Louis Reymond, Hongming Che","claim_type":"method","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"For instance, Best-of-N sampling has demonstrated remarkable effectiveness in open-domain tasks like question answering and web search [3], [4], offering a simple yet powerful way to select high-quality responses. In machine translation, Beam Search-often paired with modified scoring functions-remains a standard technique for enhancing trans- lation quality [5], [6]. More recently, the use of MCTS with policy-driven LLMs has enabled the autonomous generation of high-quality training data, bypass","claim_type":"background","confidence":0.85,"evidence_strength":"citation_context"},{"claim_text":"tence without any examples. The model relies solely on its pre-trained knowledge to generate translations. ˆY=f θLLM (prompt, X) (1) Different approaches in zero-shot prompting vary by prompt format and the use of a pivot language for low-resource languages. It has been observed that zero-shot prompting with ChatGPT lags behind MT systems by Google MT [56], Tencent [57], and DeepL by around 5.0 BLEU points [58]. Pivot prompting has been explored to translate between distant languages, where the ","claim_type":"baseline","confidence":0.8,"evidence_strength":"citation_context"},{"claim_text":"code re-compiled with Clang -O2 and -Os respectively: higher than N=1 performance of 59.1%. However, on the unstripped split the phenomenon is very different. Log Probability reranking strongly outperforms on the R EAL split, and moreover neural reranking leads to degenerate performance on the S YNTH split. We hypothesize that our neural reranker generally struggles due to the distribution shift [30], [31]. While it may learn assembly semantics from its discriminative training task, it may not h","claim_type":"background","confidence":0.8,"evidence_strength":"citation_context"}],"why_cited":"Pith tracks Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation because it crossed a citation-hub threshold. Current citing contexts most often use it as background evidence (10 contexts).","role_counts":[{"n":10,"context_role":"background"},{"n":2,"context_role":"other"},{"n":1,"context_role":"baseline"},{"n":1,"context_role":"method"}]},"error":null,"updated_at":"2026-05-25T18:26:14.394716+00:00"}},"summary":{"title":"Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation","claims":[{"claim_text":"Neural Machine Translation (NMT) is an end-to-end learning approach for automated translation, with the potential to overcome many of the weaknesses of conventional phrase-based translation systems. Unfortunately, NMT systems are known to be computationally expensive both in training and in translation inference. Also, most NMT systems have difficulty with rare words. These issues have hindered NMT's use in practical deployments and services, where both accuracy and speed are essential. In this work, we present GNMT, Google's Neural Machine Translation system, which attempts to address many of","claim_type":"abstract","evidence_strength":"source_metadata"},{"claim_text":"08144 [31] Ruiyi Yan and Yugo Murawaki. 2025. Addressing Tokenization Inconsistency in Steganography and Watermarking Based on Large Language Models. InPro- ceedings of the 2025 Conference on Empirical Methods in Natural Language Pro- cessing. Association for Computational Linguistics, Suzhou, China, 7076-7098. doi:10.18653/v1/2025.emnlp-main.361 [32] Ruiyi Yan and Yugo Murawaki. 2025. Low-Overhead Disambiguation for Genera- tive Linguistic Steganography via Tokenization Consistency. InProceedin","claim_type":"other","confidence":0.95,"evidence_strength":"citation_context"},{"claim_text":"Yuqing Wang, Ying Song, Xiaozhou Li, Nana Reinikainen, and Mika V. Mäntylä. 2025. A Comparative Study of Semantic Log Representations for Software Log-based Anomaly Detection.Proc. ACM Softw. Eng.1, 1 (April 2025), 12 pages. https://doi.org/XXXXXXX.XXXXXXX 1 Introduction As modern software systems become increasingly complex, the potential for anomalies grows [44]. The anomalies may arise from various causes, e.g., misconfigurations, resource contention, or un- predictable workloads [11]. Even a","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing.arXiv [cs.CL], 2018. doi:10.48550/arXiv.1808.06226. [33] RDKit: Open-source cheminformatics. [34] Esben Jannik Bjerrum. SMILES enumeration as data augmentation for neural network modeling of molecules. arXiv [cs.LG], 2017. doi:10.48550/arXiv.1703.07076. [35] Josep Arús-Pous, Simon Viet Johansson, Oleksii Prykhodko, Esben Jannik Bjerrum, Christian Tyrchan, Jean-Louis Reymond, Hongming Che","claim_type":"method","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"For instance, Best-of-N sampling has demonstrated remarkable effectiveness in open-domain tasks like question answering and web search [3], [4], offering a simple yet powerful way to select high-quality responses. In machine translation, Beam Search-often paired with modified scoring functions-remains a standard technique for enhancing trans- lation quality [5], [6]. More recently, the use of MCTS with policy-driven LLMs has enabled the autonomous generation of high-quality training data, bypass","claim_type":"background","confidence":0.85,"evidence_strength":"citation_context"},{"claim_text":"tence without any examples. The model relies solely on its pre-trained knowledge to generate translations. ˆY=f θLLM (prompt, X) (1) Different approaches in zero-shot prompting vary by prompt format and the use of a pivot language for low-resource languages. It has been observed that zero-shot prompting with ChatGPT lags behind MT systems by Google MT [56], Tencent [57], and DeepL by around 5.0 BLEU points [58]. Pivot prompting has been explored to translate between distant languages, where the ","claim_type":"baseline","confidence":0.8,"evidence_strength":"citation_context"},{"claim_text":"code re-compiled with Clang -O2 and -Os respectively: higher than N=1 performance of 59.1%. However, on the unstripped split the phenomenon is very different. Log Probability reranking strongly outperforms on the R EAL split, and moreover neural reranking leads to degenerate performance on the S YNTH split. We hypothesize that our neural reranker generally struggles due to the distribution shift [30], [31]. While it may learn assembly semantics from its discriminative training task, it may not h","claim_type":"background","confidence":0.8,"evidence_strength":"citation_context"}],"why_cited":"Pith tracks Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation because it crossed a citation-hub threshold. Current citing contexts most often use it as background evidence (10 contexts).","role_counts":[{"n":10,"context_role":"background"},{"n":2,"context_role":"other"},{"n":1,"context_role":"baseline"},{"n":1,"context_role":"method"}]},"graph":{"co_cited":[{"title":"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding","work_id":"ed240a10-5b19-406c-baa5-30803f465785","shared_citers":18},{"title":"Neural Machine Translation by Jointly Learning to Align and Translate","work_id":"d831e763-d530-4029-a65c-ac595d82cb2a","shared_citers":15},{"title":"Adam: A Method for Stochastic Optimization","work_id":"1910796d-9b52-4683-bf5c-de9632c1028b","shared_citers":13},{"title":"Scaling Laws for Neural Language Models","work_id":"b7dd8749-9c45-4977-ab9b-64478dce1ae8","shared_citers":10},{"title":"Evaluating Large Language Models Trained on Code","work_id":"042493e9-b26f-4b4e-bbde-382072ca9b08","shared_citers":9},{"title":"GPT-4 Technical Report","work_id":"b928e041-6991-4c08-8c81-0359e4097c7b","shared_citers":9},{"title":"Layer Normalization","work_id":"20a2d720-0046-4c7c-bcd6-327ec8143f69","shared_citers":9},{"title":"LLaMA: Open and Efficient Foundation Language Models","work_id":"c018fc23-6f3f-4035-9d02-28a2173b2b9d","shared_citers":9},{"title":"PaLM: Scaling Language Modeling with Pathways","work_id":"a94f3ef7-2c49-4445-93fe-6ec16aafd966","shared_citers":9},{"title":"Proximal Policy Optimization Algorithms","work_id":"240c67fe-d14d-4520-91c1-38a4e272ca19","shared_citers":9},{"title":"The Llama 3 Herd of Models","work_id":"1549a635-88af-4ac1-acfe-51ae7bb53345","shared_citers":9},{"title":"Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer","work_id":"50e3b368-0243-4726-8186-233869802ad1","shared_citers":8},{"title":"Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism","work_id":"c888e6d1-0b1d-43d6-9ef5-f0912a0efa1b","shared_citers":8},{"title":"Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge","work_id":"28ea1282-d657-4c61-a83c-f1249be6d6b1","shared_citers":8},{"title":"Gaussian Error Linear Units (GELUs)","work_id":"0466fd22-03a1-4a61-af0a-a900e77bb023","shared_citers":7},{"title":"Gemini: A Family of Highly Capable Multimodal Models","work_id":"83f7c85b-3f11-450f-ac0c-64d9745220b2","shared_citers":7},{"title":"LaMDA: Language Models for Dialog Applications","work_id":"1b66d0a5-f6ae-4332-8025-c662dc64b238","shared_citers":7},{"title":"Language Models are Few-Shot Learners","work_id":"214732c0-2edd-44a0-af9e-28184a2b8279","shared_citers":7},{"title":"Qwen3 Technical Report","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","shared_citers":7},{"title":"Training Verifiers to Solve Math Word Problems","work_id":"acab1aa8-b4d6-40e0-a3ee-25341701dca2","shared_citers":7},{"title":"arXiv preprint arXiv:1802.05365 , year=","work_id":"dd973cba-647d-49d3-9d24-061b637bb0cd","shared_citers":6},{"title":"DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models","work_id":"c5006563-f3ec-438a-9e35-b7b484f34828","shared_citers":6},{"title":"Efficient Estimation of Word Representations in Vector Space","work_id":"59edaa01-a696-45b3-9a08-5eae777a799e","shared_citers":6},{"title":"Exploring the limits of language modeling","work_id":"a9dbcb7a-e48d-42a4-8d60-a8f723751a97","shared_citers":6}],"time_series":[{"n":4,"year":2017},{"n":1,"year":2018},{"n":16,"year":2019},{"n":4,"year":2020},{"n":3,"year":2021},{"n":4,"year":2022},{"n":8,"year":2023},{"n":3,"year":2024},{"n":4,"year":2025},{"n":26,"year":2026}],"dependency_candidates":[{"n":1,"role":"method","polarity":"use_method","paper_title":"From Syntax to Semantics: Unveiling the Emergence of Chirality in SMILES Translation Models","primary_cat":"cs.LG","context_text":"SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing.arXiv [cs.CL], 2018. doi:10.48550/arXiv.1808.06226. [33] RDKit: Open-source cheminformatics. [34] Esben Jannik Bjerrum. SMILES enumeration as data augmentation for neural network modeling of molecules. arXiv [cs.LG], 2017. doi:10.48550/arXiv.1703.07076. [35] Josep Arús-Pous, Simon Viet Johansson, Oleksii Prykhodko, Esben Jannik Bjerrum, Christian Tyrchan, Jean-Louis Reymond, Hongming Chen, and Ola Engkvist. Randomized SMILES strings improve the quality of molecular generative models.J. Cheminform., 11(1):71, 2019. ISSN 1758-2946,1758-2946. doi:10.1186/s13321-019-0393-0. [36] Yasuhiro Yoshikai, Tadahaya Mizuno, Shumpei Nemoto, and Hiroyuki Kusuhara.","citing_arxiv_id":"2605.09949"},{"n":1,"role":"baseline","polarity":"baseline","paper_title":"Bridging the Linguistic Divide: A Survey on Leveraging Large Language Models for Machine Translation","primary_cat":"cs.CL","context_text":"tence without any examples. The model relies solely on its pre-trained knowledge to generate translations. ˆY=f θLLM (prompt, X) (1) Different approaches in zero-shot prompting vary by prompt format and the use of a pivot language for low-resource languages. It has been observed that zero-shot prompting with ChatGPT lags behind MT systems by Google MT [56], Tencent [57], and DeepL by around 5.0 BLEU points [58]. Pivot prompting has been explored to translate between distant languages, where the LLM first translates the sentence to English and then into the target language. This strategy, which uses a resource- rich language (English) as a pivot, improves translation quality between De→Zh and Ro→Zh [58].","citing_arxiv_id":"2504.01919"}]},"authors":[]}}