Direct Translation between Sign Languages

Bowen Xie; Liang Huang; Milan Gautam; Stefan Lee; Wuyang Meng; Zetian Wu

arxiv: 2605.20588 · v1 · pith:R3KUMINXnew · submitted 2026-05-20 · 💻 cs.CL · cs.CV

Direct Translation between Sign Languages

Zetian Wu , Bowen Xie , Wuyang Meng , Milan Gautam , Stefan Lee , Liang Huang This is my paper

Pith reviewed 2026-05-21 05:50 UTC · model grok-4.3

classification 💻 cs.CL cs.CV

keywords sign language translationdirect sign-to-signback-translationsynthetic parallel dataASLCSLDGS

0 comments

The pith

A unified model trained on synthetic sign-sign pairs translates directly between sign languages more accurately and faster than routing through text.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to let users of different sign languages communicate directly without spoken-language text or human interpreters, which would remove a major barrier for deaf communities. It generates synthetic sign-to-sign training pairs from separate unaligned sign-text collections by back-translating through text, then trains one model to handle both text-to-sign and sign-to-sign tasks at once. This direct route avoids the compounding mistakes and extra steps of a three-part cascade, producing sign sequences that are geometrically closer to real signs and that match the intended meaning better when checked by translation back to text.

Core claim

Using back-translation to create synthetic sign-sign pairs from unaligned sign language corpora, a single MBART-based model jointly trained for text-to-sign and sign-to-sign translation outperforms a cascaded sign-to-text-to-sign baseline, with 20 percent lower DTW-aligned MPJPE on geometric metrics, 50 percent higher BLEU-4 after back-translation to sentences, and roughly 2.3 times faster inference, with similar gains observed on a small existing cross-lingual sign set.

What carries the argument

Synthetic sign-sign pairs produced by back-translation from separate sign-text corpora, used to jointly train a single model for direct sign-to-sign mapping.

If this is right

The direct method records 20 percent lower DTW-aligned MPJPE than the cascade on geometric sign error.
After translating model outputs back to text, it reaches 50 percent higher BLEU-4 than the cascaded route.
Inference runs approximately 2.3 times faster than the three-stage cascade.
Comparable accuracy and speed gains appear on a small existing set of real cross-lingual sign data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the synthetic-pair approach scales, it could support direct translation for additional sign language pairs that lack parallel data.
Direct modeling may retain visual features such as facial expression and spatial grammar that text intermediaries discard.
Applying the same back-translation recipe to larger unaligned corpora could test whether performance keeps improving with more synthetic data.

Load-bearing premise

The synthetic sign-sign pairs created by back-translation are close enough in visual form and meaning to real parallel data that joint training improves results without hiding systematic mismatches the chosen metrics cannot detect.

What would settle it

Evaluating the direct model against the cascade on a sizable collection of human-collected, time-aligned parallel sign utterances between two sign languages and finding no consistent advantage for the direct method.

Figures

Figures reproduced from arXiv: 2605.20588 by Bowen Xie, Liang Huang, Milan Gautam, Stefan Lee, Wuyang Meng, Zetian Wu.

**Figure 1.** Figure 1: Direct sign-to-sign translation. Given a source clip in one sign language (e.g. CSL), our single MBART-based model produces an equivalent clip in a target sign language (e.g. ASL) without going through written text. Compared with the cascaded S2T → MT → T2S baseline, the direct model is roughly 2.3× faster and yields lower DTW-aligned MPJPE. parallel training set is very small (at most 2.3K S2S pairs in on… view at source ↗

**Figure 2.** Figure 2: Back-translation: NMT and signs side by side. (a) Standard NMT back-translation [27] introduced in 2.1 (b) Our cross-lingual sign back-translation introduced in 2.2 model learns to recover a clean target from a noisy input rather than to imitate a noisy supervision signal. Edunov et al. [6] subsequently showed that the technique scales to hundreds of millions of monolingual sentences and consistently impro… view at source ↗

**Figure 3.** Figure 3: Model overview. A single MBART-based encoder–decoder handles both T2S and S2S, with the input/output language signaled by special tokens at the encoder input and decoder prefix. For T2S the encoder reads a spoken-language text; for S2S it reads a sign clip encoded into discrete motion tokens via a VQ-VAE. In both cases the decoder emits motion tokens autoregressively, which are decoded back to motion by th… view at source ↗

**Figure 4.** Figure 4: CSL-ASL pairs from the strict subset. One removed pair also shown for comparison. We evaluate on three sign↔sign test sets. The BT-input set consists of held-out (text, sign) pairs from each source corpus, passed through the same back-translation pipeline used at training time (§2): the held-out text is translated and fed to our T2S model to produce a synthetic source clip, which is paired with the gold t… view at source ↗

read the original abstract

The field of sign language translation has witnessed significant progress in the translation between sign and spoken languages, but the translation between sign languages remains largely unexplored and out of reach. The latter can help 1.5 billion deaf and hard-of-hearing (DHH) people worldwide communicate across language barriers without relying on hearing interpreters or written-language fluency. The cascade approach composing separate sign-to-text, text-to-text, and text-to-sign systems suffers from error propagation and extra latency as well as the loss of information unique in the visual modality. We aim to develop direct sign-to-sign translation. However, a large-scale open-domain parallel corpus has not been curated between sign languages. To enable direct translation between sign language utterances, we use back-translation to produce synthetic sign-sign pairs from unaligned individual language utterance-sign corpora. Using this data, we jointly train a single MBART-based model for both text->sign (T2S) and sign->sign (S2S). On synthetically generated paired sets between American Sign Language (ASL), Chinese Sign Language (CSL), and German Sign Language (DGS), our direct S2S method outperforms the cascaded baseline on geometric sign error metrics (20% lower DTW-aligned MPJPE) and language matching metrics after predicted sign utterances are translated back to sentences (50% high BLEU-4) while achieving a roughly 2.3* speedup. On a small set of pre-existing cross-lingual sign data, we find similar improvements for our proposed method.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes direct sign-to-sign (S2S) translation between ASL, CSL, and DGS by using back-translation on unaligned sign-text corpora to synthesize parallel sign-sign pairs, then jointly training a single MBART-based model for both text-to-sign (T2S) and S2S tasks. It reports that this direct approach outperforms a cascaded sign-to-text + text-to-text + text-to-sign baseline on synthetic pairs with 20% lower DTW-aligned MPJPE, 50% higher BLEU-4 (after back-translating predictions to text), and 2.3× speedup, with similar trends on a small pre-existing cross-lingual sign dataset.

Significance. If the synthetic pairs faithfully preserve both geometric sign content and linguistic intent without systematic artifacts, the work could enable practical direct sign-language translation that avoids cascade error propagation, extra latency, and loss of visual-modality information, directly benefiting cross-lingual communication for DHH populations. The joint MBART training and reported speed-up are attractive engineering contributions, but the significance is conditional on independent verification that performance gains are not artifacts of the back-translation process.

major comments (3)

[Abstract] Abstract: the language-matching claim (50% higher BLEU-4) is obtained by translating predicted sign utterances back to text before scoring. This re-introduces the text modality the method is designed to bypass and creates partial circularity in the metric, even though the geometric DTW-aligned MPJPE remains independent.
[Synthetic data generation] Synthetic data generation (described in the abstract and method): no validation of back-translation quality is reported (e.g., human semantic-equivalence ratings or geometric consistency checks on the generated sign-sign pairs themselves). Without such checks, the 20% MPJPE reduction and BLEU gains could arise from artifacts or simplifications in the synthetic pairs that the joint T2S+S2S model exploits while the cascade does not.
[Experiments] Experiments section: results are presented without error bars, statistical significance tests, or ablations on the synthetic-data generation process (including the back-translation sampling temperature listed as a free parameter). This leaves the robustness of the central outperformance claim difficult to assess.

minor comments (2)

[Abstract] Abstract: '50% high BLEU-4' appears to be a typo and should read '50% higher BLEU-4'.
[Abstract] Abstract: '2.3* speedup' should be written as '2.3× speedup' for typographic clarity.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for their constructive comments on our work. We address each of the major comments point by point below, indicating where revisions will be made to the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the language-matching claim (50% higher BLEU-4) is obtained by translating predicted sign utterances back to text before scoring. This re-introduces the text modality the method is designed to bypass and creates partial circularity in the metric, even though the geometric DTW-aligned MPJPE remains independent.

Authors: We concur that relying on back-translation to text for the BLEU-4 score does reintroduce the text modality and introduces a degree of circularity in that particular metric. The geometric evaluation via DTW-aligned MPJPE is independent of text and demonstrates a clear 20% improvement for the direct method. In the revised manuscript, we will update the abstract to emphasize the geometric metric as the primary result and provide additional clarification on the evaluation procedure for language matching. We view this as a partial revision to better frame the claims. revision: partial
Referee: [Synthetic data generation] Synthetic data generation (described in the abstract and method): no validation of back-translation quality is reported (e.g., human semantic-equivalence ratings or geometric consistency checks on the generated sign-sign pairs themselves). Without such checks, the 20% MPJPE reduction and BLEU gains could arise from artifacts or simplifications in the synthetic pairs that the joint T2S+S2S model exploits while the cascade does not.

Authors: The referee is correct that we did not report any validation of the synthetic sign-sign pairs, such as human ratings for semantic equivalence or checks for geometric consistency. This omission means we cannot fully exclude the possibility that the performance gains are due to artifacts in the synthetic data. We will revise the manuscript to include a limitations section discussing this issue and add qualitative examples of the generated pairs along with any available consistency metrics. However, comprehensive human evaluations are resource-intensive and not included in this revision. revision: partial
Referee: [Experiments] Experiments section: results are presented without error bars, statistical significance tests, or ablations on the synthetic-data generation process (including the back-translation sampling temperature listed as a free parameter). This leaves the robustness of the central outperformance claim difficult to assess.

Authors: We agree that the lack of error bars, statistical tests, and ablations on parameters like the back-translation sampling temperature weakens the assessment of result robustness. We will incorporate error bars from multiple training runs, conduct statistical significance tests where appropriate, and add ablations varying the sampling temperature to the experiments section. These changes will be made in the revised version. revision: yes

standing simulated objections not resolved

Comprehensive human semantic-equivalence ratings for the synthetic sign-sign pairs would require recruiting qualified sign language interpreters or native signers for annotation, which is not feasible within the timeline and resources available for this revision.

Circularity Check

0 steps flagged

No circularity detected in empirical sign-to-sign translation method

full rationale

The paper presents an applied empirical pipeline: back-translation generates synthetic sign-sign pairs from unaligned sign-text corpora, a single MBART model is jointly trained for T2S and S2S, and performance is measured via independent geometric metrics (DTW-aligned MPJPE) plus BLEU after an auxiliary sign-to-text step. No mathematical derivation, first-principles result, or equation chain is claimed that reduces to its own inputs by construction. No parameters are fitted on a subset and then relabeled as predictions, no self-citation load-bearing uniqueness theorems appear, and no ansatz is smuggled via prior work. The evaluation choices and data-generation technique are standard in machine translation and do not create self-definitional or tautological equivalences; results are reported as experimental outcomes on held-out synthetic and pre-existing sets.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the quality of back-translation synthetic pairs and the assumption that a single sequence model can jointly handle text and sign modalities without modality-specific losses. No new physical entities are postulated. Free parameters are the standard MBART hyperparameters and any back-translation sampling choices.

free parameters (1)

back-translation sampling temperature
Controls diversity of synthetic sign-sign pairs; value not stated in abstract but required for reproducibility.

axioms (1)

domain assumption Back-translation from unaligned sign-text corpora produces pairs whose visual content is faithful enough for downstream training.
Invoked when the authors state they use back-translation to produce synthetic sign-sign pairs.

pith-pipeline@v0.9.0 · 5809 in / 1383 out tokens · 35879 ms · 2026-05-21T05:50:45.330004+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We use back-translation to produce synthetic sign-sign pairs from unaligned individual language utterance-sign corpora... jointly train a single MBART-based model for both T2S and S2S
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

On synthetically generated paired sets... 20% lower DTW-aligned MPJPE... 50% higher BLEU-4

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages · 3 internal anchors

[1]

Neural sign language translation

Necati Cihan Camgöz, Simon Hadfield, Oscar Koller, Hermann Ney, and Richard Bowden. Neural sign language translation. InCVPR, pages 7784–7793, 2018

work page 2018
[2]

Sign language transformers: Joint end-to-end sign language recognition and translation

Necati Cihan Camgöz, Oscar Koller, Simon Hadfield, and Richard Bowden. Sign language transformers: Joint end-to-end sign language recognition and translation. InCVPR, pages 10023–10033, 2020

work page 2020
[3]

No Language Left Behind: Scaling Human-Centered Machine Translation

Marta R. Costa-jussà, James Cross, Onur Çelebi, Maha Elbayad, Kenneth Heafield, Kevin Heffernan, Elahe Kalbassi, Janice Lam, Daniel Licht, et al. No language left behind: Scaling human-centered machine translation.arXiv:2207.04672, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[4]

Machine translation from signed to spoken languages: State of the art and challenges.Universal Access in the Information Society, pages 1–27, 2023

Mathieu De Coster, Dimitar Shterionov, Mieke Van Herreweghe, and Joni Dambre. Machine translation from signed to spoken languages: State of the art and challenges.Universal Access in the Information Society, pages 1–27, 2023

work page 2023
[5]

How2Sign: A large-scale multimodal dataset for continuous american sign language

Amanda Duarte, Shruti Palaskar, Lucas Ventura, Deepti Ghadiyaram, Kenneth DeHaan, Florian Metze, Jordi Torres, and Xavier Giró-i Nieto. How2Sign: A large-scale multimodal dataset for continuous american sign language. InCVPR, 2021

work page 2021
[6]

Understanding back-translation at scale

Sergey Edunov, Myle Ott, Michael Auli, and David Grangier. Understanding back-translation at scale. In EMNLP, 2018

work page 2018
[7]

TranslateGemma technical report.arXiv:2601.09012, 2026

Mara Finkelstein, Isaac Caswell, Tobias Domhan, Jan-Thorsten Peter, Juraj Juraska, Parker Riley, Daniel Deutsch, Cole Dilanni, Colin Cherry, Eleftheria Briakou, Elizabeth Nielsen, Jiaming Luo, Sweta Agrawal, Wenda Xu, Erin Kats, Stephane Jaskiewicz, Markus Freitag, and David Vilar. TranslateGemma technical report.arXiv:2601.09012, 2026

work page arXiv 2026
[8]

Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks

Alex Graves, Santiago Fernández, Faustino Gomez, and Jürgen Schmidhuber. Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. InICML, pages 369–376, 2006

work page 2006
[9]

JWSign: A highly multilingual corpus of bible translations for more diversity in sign language processing

Shester Gueuwou, Sophie Siake, Colin Leong, and Mathias Müller. JWSign: A highly multilingual corpus of bible translations for more diversity in sign language processing. InFindings of EMNLP, pages 9907–9927, 2023

work page 2023
[10]

How to align multiple signed language corpora for better sign-to-sign translations? InProceedings of NAACL-HLT (Long Papers), pages 4003– 4016, 2025

Mert Inan, Yang Zhong, Vidya Ganesh, and Malihe Alikhani. How to align multiple signed language corpora for better sign-to-sign translations? InProceedings of NAACL-HLT (Long Papers), pages 4003– 4016, 2025. doi: 10.18653/v1/2025.naacl-long.202

work page doi:10.18653/v1/2025.naacl-long.202 2025
[11]

Direct speech-to-speech translation with a sequence-to-sequence model.Interspeech, 2019

Ye Jia, Ron J Weiss, Fadi Biadsy, Wolfgang Macherey, Melvin Johnson, Zhifeng Chen, and Yonghui Wu. Direct speech-to-speech translation with a sequence-to-sequence model.Interspeech, 2019

work page 2019
[12]

Translatotron 2: High-quality direct speech-to-speech translation with voice preservation.ICML, 2022

Ye Jia, Michelle Tadmor Ramanovich, Tal Remez, and Roi Pomerantz. Translatotron 2: High-quality direct speech-to-speech translation with voice preservation.ICML, 2022

work page 2022
[13]

Machine translation between spoken languages and signed languages represented in SignWriting

Zifan Jiang, Amit Moryossef, Mathias Müller, and Sarah Ebling. Machine translation between spoken languages and signed languages represented in SignWriting. InFindings of EACL, pages 1706–1724, 2023

work page 2023
[14]

Meaningful pose-based sign language evaluation

Zifan Jiang, Colin Leong, Amit Moryossef, Oliver Cory, Maksym Ivashechkin, Neha Tarigopula, Biao Zhang, Anne Göhring, Annette Rios, Rico Sennrich, and Sarah Ebling. Meaningful pose-based sign language evaluation. InProceedings of the Tenth Conference on Machine Translation (WMT), pages 64–80, 2025

work page 2025
[15]

Kingma and Jimmy Ba

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. InICLR, 2015

work page 2015
[16]

Gloss-free end-to-end sign language translation

Kezhou Lin, Xiaohan Wang, Linchao Zhu, Ke Sun, Bang Zhang, and Yi Yang. Gloss-free end-to-end sign language translation. InACL, pages 12904–12916, 2023

work page 2023
[17]

Multilingual denoising pre-training for neural machine translation.Transactions of the Association for Computational Linguistics, 8:726–742, 2020

Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, and Luke Zettlemoyer. Multilingual denoising pre-training for neural machine translation.Transactions of the Association for Computational Linguistics, 8:726–742, 2020. 10

work page 2020
[18]

Data augmentation for sign language gloss translation

Amit Moryossef, Kayo Yin, Graham Neubig, and Yoav Goldberg. Data augmentation for sign language gloss translation. InWorkshop on Automatic Translation for Signed and Spoken Languages (AT4SSL), pages 1–11, 2021

work page 2021
[19]

Findings of the second WMT shared task on sign language translation (WMT-SLT23)

Mathias Müller, Malihe Alikhani, Eleftherios Avramidis, Richard Bowden, Annelies Braffort, Necati Cihan Camgöz, Sarah Ebling, Cristina España-Bonet, Anne Göhring, Roman Grundkiewicz, Mert Inan, Zifan Jiang, Oscar Koller, Amit Moryossef, Annette Rios, Dimitar Shterionov, Sandra Sidler-Miserez, Katja Tissi, and Davy Van Landuyt. Findings of the second WMT s...

work page 2023
[20]

BLEU: A method for automatic evaluation of machine translation

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. BLEU: A method for automatic evaluation of machine translation. InACL, 2002

work page 2002
[21]

A call for clarity in reporting BLEU scores

Matt Post. A call for clarity in reporting BLEU scores. InConference on Machine Translation, 2018

work page 2018
[22]

Qwen3 Technical Report

Qwen Team. Qwen3 technical report.arXiv:2505.09388, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[23]

Sentence-BERT: Sentence embeddings using Siamese BERT-networks

Nils Reimers and Iryna Gurevych. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. InEMNLP, 2019

work page 2019
[24]

AudioPaLM: A Large Language Model That Can Speak and Listen

Paul K. Rubenstein, Chulayuth Asawaroengchai, Duc Dung Nguyen, Ankur Bapna, Zalán Borsos, Félix de Chaumont Quitry, Peter Chen, Dalia El Badawy, Wei Han, Eugene Kharitonov, et al. AudioPaLM: A large language model that can speak and listen.arXiv:2306.12925, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[25]

Progressive transformers for end-to-end sign language production

Ben Saunders, Necati Cihan Camgöz, and Richard Bowden. Progressive transformers for end-to-end sign language production. InECCV, pages 687–705, 2020

work page 2020
[26]

Mixed SIGNals: Sign language production via a mixture of motion primitives

Ben Saunders, Necati Cihan Camgöz, and Richard Bowden. Mixed SIGNals: Sign language production via a mixture of motion primitives. InICCV, pages 1919–1929, 2021

work page 1919
[27]

Improving neural machine translation models with monolingual data

Rico Sennrich, Barry Haddow, and Alexandra Birch. Improving neural machine translation models with monolingual data. InACL, 2016

work page 2016
[28]

Sign language production using neural machine translation and generative adversarial networks

Stephanie Stoll, Necati Cihan Camgöz, Simon Hadfield, and Richard Bowden. Sign language production using neural machine translation and generative adversarial networks. InBMVC, 2018

work page 2018
[29]

Neural discrete representation learning

Aäron van den Oord, Oriol Vinyals, and Koray Kavukcuoglu. Neural discrete representation learning. In NeurIPS, 2017

work page 2017
[30]

MLSLT: Towards multilingual sign language translation

Aoxiong Yin, Zhou Zhao, Weike Jin, Meng Zhang, Xingshan Zeng, and Xiaofei He. MLSLT: Towards multilingual sign language translation. InCVPR, pages 5109–5119, 2022

work page 2022
[31]

Better sign language translation with STMC-transformer

Kayo Yin and Jesse Read. Better sign language translation with STMC-transformer. InCOLING, pages 5975–5989, 2020

work page 2020
[32]

Including signed languages in natural language processing

Kayo Yin, Amit Moryossef, Julie Hochgesang, Yoav Goldberg, and Malihe Alikhani. Including signed languages in natural language processing. InACL-IJCNLP, pages 7347–7360, 2021

work page 2021
[33]

Neural sign language synthesis: Words are our glosses

Jan Zelinka and Jakub Kanis. Neural sign language synthesis: Words are our glosses. InWACV, pages 3384–3392, 2020

work page 2020
[34]

Improving sign language translation with monolingual data by sign back-translation

Hao Zhou, Wengang Zhou, Weizhen Qi, Junfu Pu, and Houqiang Li. Improving sign language translation with monolingual data by sign back-translation. InCVPR, pages 1316–1325, 2021

work page 2021
[35]

Signs as tokens: A retrieval-enhanced multilingual sign language generator

Ronglai Zuo, Rolandos Alexandros Potamias, Evangelos Ververas, Jiankang Deng, and Stefanos Zafeiriou. Signs as tokens: A retrieval-enhanced multilingual sign language generator. InICCV, 2025. 11

work page 2025

[1] [1]

Neural sign language translation

Necati Cihan Camgöz, Simon Hadfield, Oscar Koller, Hermann Ney, and Richard Bowden. Neural sign language translation. InCVPR, pages 7784–7793, 2018

work page 2018

[2] [2]

Sign language transformers: Joint end-to-end sign language recognition and translation

Necati Cihan Camgöz, Oscar Koller, Simon Hadfield, and Richard Bowden. Sign language transformers: Joint end-to-end sign language recognition and translation. InCVPR, pages 10023–10033, 2020

work page 2020

[3] [3]

No Language Left Behind: Scaling Human-Centered Machine Translation

Marta R. Costa-jussà, James Cross, Onur Çelebi, Maha Elbayad, Kenneth Heafield, Kevin Heffernan, Elahe Kalbassi, Janice Lam, Daniel Licht, et al. No language left behind: Scaling human-centered machine translation.arXiv:2207.04672, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[4] [4]

Machine translation from signed to spoken languages: State of the art and challenges.Universal Access in the Information Society, pages 1–27, 2023

Mathieu De Coster, Dimitar Shterionov, Mieke Van Herreweghe, and Joni Dambre. Machine translation from signed to spoken languages: State of the art and challenges.Universal Access in the Information Society, pages 1–27, 2023

work page 2023

[5] [5]

How2Sign: A large-scale multimodal dataset for continuous american sign language

Amanda Duarte, Shruti Palaskar, Lucas Ventura, Deepti Ghadiyaram, Kenneth DeHaan, Florian Metze, Jordi Torres, and Xavier Giró-i Nieto. How2Sign: A large-scale multimodal dataset for continuous american sign language. InCVPR, 2021

work page 2021

[6] [6]

Understanding back-translation at scale

Sergey Edunov, Myle Ott, Michael Auli, and David Grangier. Understanding back-translation at scale. In EMNLP, 2018

work page 2018

[7] [7]

TranslateGemma technical report.arXiv:2601.09012, 2026

Mara Finkelstein, Isaac Caswell, Tobias Domhan, Jan-Thorsten Peter, Juraj Juraska, Parker Riley, Daniel Deutsch, Cole Dilanni, Colin Cherry, Eleftheria Briakou, Elizabeth Nielsen, Jiaming Luo, Sweta Agrawal, Wenda Xu, Erin Kats, Stephane Jaskiewicz, Markus Freitag, and David Vilar. TranslateGemma technical report.arXiv:2601.09012, 2026

work page arXiv 2026

[8] [8]

Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks

Alex Graves, Santiago Fernández, Faustino Gomez, and Jürgen Schmidhuber. Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. InICML, pages 369–376, 2006

work page 2006

[9] [9]

JWSign: A highly multilingual corpus of bible translations for more diversity in sign language processing

Shester Gueuwou, Sophie Siake, Colin Leong, and Mathias Müller. JWSign: A highly multilingual corpus of bible translations for more diversity in sign language processing. InFindings of EMNLP, pages 9907–9927, 2023

work page 2023

[10] [10]

How to align multiple signed language corpora for better sign-to-sign translations? InProceedings of NAACL-HLT (Long Papers), pages 4003– 4016, 2025

Mert Inan, Yang Zhong, Vidya Ganesh, and Malihe Alikhani. How to align multiple signed language corpora for better sign-to-sign translations? InProceedings of NAACL-HLT (Long Papers), pages 4003– 4016, 2025. doi: 10.18653/v1/2025.naacl-long.202

work page doi:10.18653/v1/2025.naacl-long.202 2025

[11] [11]

Direct speech-to-speech translation with a sequence-to-sequence model.Interspeech, 2019

Ye Jia, Ron J Weiss, Fadi Biadsy, Wolfgang Macherey, Melvin Johnson, Zhifeng Chen, and Yonghui Wu. Direct speech-to-speech translation with a sequence-to-sequence model.Interspeech, 2019

work page 2019

[12] [12]

Translatotron 2: High-quality direct speech-to-speech translation with voice preservation.ICML, 2022

Ye Jia, Michelle Tadmor Ramanovich, Tal Remez, and Roi Pomerantz. Translatotron 2: High-quality direct speech-to-speech translation with voice preservation.ICML, 2022

work page 2022

[13] [13]

Machine translation between spoken languages and signed languages represented in SignWriting

Zifan Jiang, Amit Moryossef, Mathias Müller, and Sarah Ebling. Machine translation between spoken languages and signed languages represented in SignWriting. InFindings of EACL, pages 1706–1724, 2023

work page 2023

[14] [14]

Meaningful pose-based sign language evaluation

Zifan Jiang, Colin Leong, Amit Moryossef, Oliver Cory, Maksym Ivashechkin, Neha Tarigopula, Biao Zhang, Anne Göhring, Annette Rios, Rico Sennrich, and Sarah Ebling. Meaningful pose-based sign language evaluation. InProceedings of the Tenth Conference on Machine Translation (WMT), pages 64–80, 2025

work page 2025

[15] [15]

Kingma and Jimmy Ba

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. InICLR, 2015

work page 2015

[16] [16]

Gloss-free end-to-end sign language translation

Kezhou Lin, Xiaohan Wang, Linchao Zhu, Ke Sun, Bang Zhang, and Yi Yang. Gloss-free end-to-end sign language translation. InACL, pages 12904–12916, 2023

work page 2023

[17] [17]

Multilingual denoising pre-training for neural machine translation.Transactions of the Association for Computational Linguistics, 8:726–742, 2020

Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, and Luke Zettlemoyer. Multilingual denoising pre-training for neural machine translation.Transactions of the Association for Computational Linguistics, 8:726–742, 2020. 10

work page 2020

[18] [18]

Data augmentation for sign language gloss translation

Amit Moryossef, Kayo Yin, Graham Neubig, and Yoav Goldberg. Data augmentation for sign language gloss translation. InWorkshop on Automatic Translation for Signed and Spoken Languages (AT4SSL), pages 1–11, 2021

work page 2021

[19] [19]

Findings of the second WMT shared task on sign language translation (WMT-SLT23)

Mathias Müller, Malihe Alikhani, Eleftherios Avramidis, Richard Bowden, Annelies Braffort, Necati Cihan Camgöz, Sarah Ebling, Cristina España-Bonet, Anne Göhring, Roman Grundkiewicz, Mert Inan, Zifan Jiang, Oscar Koller, Amit Moryossef, Annette Rios, Dimitar Shterionov, Sandra Sidler-Miserez, Katja Tissi, and Davy Van Landuyt. Findings of the second WMT s...

work page 2023

[20] [20]

BLEU: A method for automatic evaluation of machine translation

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. BLEU: A method for automatic evaluation of machine translation. InACL, 2002

work page 2002

[21] [21]

A call for clarity in reporting BLEU scores

Matt Post. A call for clarity in reporting BLEU scores. InConference on Machine Translation, 2018

work page 2018

[22] [22]

Qwen3 Technical Report

Qwen Team. Qwen3 technical report.arXiv:2505.09388, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[23] [23]

Sentence-BERT: Sentence embeddings using Siamese BERT-networks

Nils Reimers and Iryna Gurevych. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. InEMNLP, 2019

work page 2019

[24] [24]

AudioPaLM: A Large Language Model That Can Speak and Listen

Paul K. Rubenstein, Chulayuth Asawaroengchai, Duc Dung Nguyen, Ankur Bapna, Zalán Borsos, Félix de Chaumont Quitry, Peter Chen, Dalia El Badawy, Wei Han, Eugene Kharitonov, et al. AudioPaLM: A large language model that can speak and listen.arXiv:2306.12925, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[25] [25]

Progressive transformers for end-to-end sign language production

Ben Saunders, Necati Cihan Camgöz, and Richard Bowden. Progressive transformers for end-to-end sign language production. InECCV, pages 687–705, 2020

work page 2020

[26] [26]

Mixed SIGNals: Sign language production via a mixture of motion primitives

Ben Saunders, Necati Cihan Camgöz, and Richard Bowden. Mixed SIGNals: Sign language production via a mixture of motion primitives. InICCV, pages 1919–1929, 2021

work page 1919

[27] [27]

Improving neural machine translation models with monolingual data

Rico Sennrich, Barry Haddow, and Alexandra Birch. Improving neural machine translation models with monolingual data. InACL, 2016

work page 2016

[28] [28]

Sign language production using neural machine translation and generative adversarial networks

Stephanie Stoll, Necati Cihan Camgöz, Simon Hadfield, and Richard Bowden. Sign language production using neural machine translation and generative adversarial networks. InBMVC, 2018

work page 2018

[29] [29]

Neural discrete representation learning

Aäron van den Oord, Oriol Vinyals, and Koray Kavukcuoglu. Neural discrete representation learning. In NeurIPS, 2017

work page 2017

[30] [30]

MLSLT: Towards multilingual sign language translation

Aoxiong Yin, Zhou Zhao, Weike Jin, Meng Zhang, Xingshan Zeng, and Xiaofei He. MLSLT: Towards multilingual sign language translation. InCVPR, pages 5109–5119, 2022

work page 2022

[31] [31]

Better sign language translation with STMC-transformer

Kayo Yin and Jesse Read. Better sign language translation with STMC-transformer. InCOLING, pages 5975–5989, 2020

work page 2020

[32] [32]

Including signed languages in natural language processing

Kayo Yin, Amit Moryossef, Julie Hochgesang, Yoav Goldberg, and Malihe Alikhani. Including signed languages in natural language processing. InACL-IJCNLP, pages 7347–7360, 2021

work page 2021

[33] [33]

Neural sign language synthesis: Words are our glosses

Jan Zelinka and Jakub Kanis. Neural sign language synthesis: Words are our glosses. InWACV, pages 3384–3392, 2020

work page 2020

[34] [34]

Improving sign language translation with monolingual data by sign back-translation

Hao Zhou, Wengang Zhou, Weizhen Qi, Junfu Pu, and Houqiang Li. Improving sign language translation with monolingual data by sign back-translation. InCVPR, pages 1316–1325, 2021

work page 2021

[35] [35]

Signs as tokens: A retrieval-enhanced multilingual sign language generator

Ronglai Zuo, Rolandos Alexandros Potamias, Evangelos Ververas, Jiankang Deng, and Stefanos Zafeiriou. Signs as tokens: A retrieval-enhanced multilingual sign language generator. InICCV, 2025. 11

work page 2025