Evaluation of Pose Estimation Systems for Sign Language Translation

Catherine O'Brien; Gerard Sant; Mathias M\"uller; Sarah Ebling

arxiv: 2604.24609 · v1 · submitted 2026-04-27 · 💻 cs.CL

Evaluation of Pose Estimation Systems for Sign Language Translation

Catherine O'Brien , Gerard Sant , Mathias M\"uller , Sarah Ebling This is my paper

Pith reviewed 2026-05-08 03:34 UTC · model grok-4.3

classification 💻 cs.CL

keywords sign language translationpose estimationSDPoseSapiensMediaPipeBLEUocclusionhand keypoints

0 comments

The pith

The choice of pose estimator affects sign language translation quality, with SDPose and Sapiens achieving the highest BLEU scores.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests multiple pose estimators by inserting their outputs into an otherwise identical sign language translation pipeline. SDPose and Sapiens reach BLEU scores near 11.5 while the common MediaPipe baseline reaches 10, with performance differences linked to how reliably each tool detects hand keypoints and handles occlusion. The study also releases code to make substitution of alternative estimators straightforward. A sympathetic reader would care because many translation systems adopt pose inputs precisely to reduce data volume and protect signer privacy, turning the estimator decision into a direct lever on output quality.

Core claim

We present a systematic comparison of pose estimators for pose-based SLT, covering widely used baselines (MediaPipe Holistic, OpenPose) and newer whole-body/high-capacity models (MMPose WholeBody, OpenPifPaf, AlphaPose, SDPose, Sapiens, SMPLest-X). We quantify downstream impact by training a controlled SLT pipeline on RWTH-PHOENIX-Weather 2014 where only the pose representation varies, evaluating with BLEU and BLEURT. SDPose and Sapiens achieve the best translation performance (BLEU ~11.5), outperforming the common MediaPipe baseline (BLEU ~10). In occlusion cases, Sapiens is correct in all tested instances (15/15), while OpenPifPaf fails in nearly all (1/15) and also yields the weakest tr

What carries the argument

A controlled sign language translation pipeline on RWTH-PHOENIX-Weather 2014 in which only the input pose sequence is swapped while the model, training procedure, and evaluation metrics stay fixed.

Load-bearing premise

That observed differences in translation scores can be attributed primarily to pose estimator quality rather than interactions with the specific SLT architecture, training procedure, or dataset characteristics.

What would settle it

Repeating the full comparison after swapping in a different sign language translation model architecture or training it on a new dataset and finding that the ranking of pose estimators by BLEU score reverses.

Figures

Figures reproduced from arXiv: 2604.24609 by Catherine O'Brien, Gerard Sant, Mathias M\"uller, Sarah Ebling.

**Figure 1.** Figure 1: Example illustrating missing hand keypoints across pose estimators. (a) Original RGB frame view at source ↗

**Figure 2.** Figure 2: Examples of erroneous SMPLest-X hand pose estimates overlaid on original frames, cropped to highlight the hands. In both examples (left: How2Sign (Duarte et al., 2021); right: Signsuisse), the predicted hand keypoints exhibit incorrect orientation and articulation that do not match the observed hand configurations. the translation quality in our experiments is much lower than state-of-the art systems, es… view at source ↗

**Figure 3.** Figure 3: Distribution of per-sequence jerk jitter ( view at source ↗

**Figure 4.** Figure 4: Pose estimation visualizations with hand view at source ↗

**Figure 5.** Figure 5: Distribution of per-sequence jerk jitter view at source ↗

**Figure 6.** Figure 6: Distribution of per-sequence jerk jitter view at source ↗

read the original abstract

Many sign language translation (SLT) systems operate on pose sequences instead of raw video to reduce input dimensionality, improve portability, and partially anonymize signers. The choice of pose estimator is often treated as an implementation detail, with systems defaulting to widely available tools such as MediaPipe Holistic or OpenPose. We present a systematic comparison of pose estimators for pose-based SLT, covering widely used baselines (MediaPipe Holistic, OpenPose) and newer whole-body/high-capacity models (MMPose WholeBody, OpenPifPaf, AlphaPose, SDPose, Sapiens, SMPLest-X). We quantify downstream impact by training a controlled SLT pipeline on RWTH-PHOENIX-Weather 2014 where only the pose representation varies, evaluating with BLEU and BLEURT. To contextualize translation outcomes, we analyze temporal stability, missing hand keypoints, and robustness to occlusion using higher-resolution videos from the Signsuisse dataset. SDPose and Sapiens achieve the best translation performance (BLEU ~11.5), outperforming the common MediaPipe baseline (BLEU ~10). In occlusion cases, Sapiens is correct in all tested instances (15/15), while OpenPifPaf fails in nearly all (1/15) and also yields the weakest translation scores. Estimators that frequently leave out hand keypoints are associated with lower BLEU/BLEURT. We release code that can be used not only to reproduce our experiments, but also considerably lowers the barrier for other researchers to use alternative pose estimators.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper shows SDPose and Sapiens beat MediaPipe by about 1.5 BLEU in a fixed SLT pipeline and handle occlusions far better, backed by code release and a second dataset.

read the letter

The main takeaway is straightforward: in a controlled sign language translation setup on RWTH-PHOENIX-Weather 2014, SDPose and Sapiens reach BLEU around 11.5 while MediaPipe sits near 10, and Sapiens succeeds on all 15 occlusion cases where OpenPifPaf succeeds on only one. The Signsuisse checks on missing hand keypoints and temporal stability line up with those scores, so the ordering makes sense on the surface. They also ship code that lets others plug in different estimators without rebuilding the whole pipeline from scratch. That combination of fixed-pipeline comparison plus practical release is the useful part. The experiment isolates the pose input cleanly enough to support the central claim that estimator choice matters downstream. Soft spots are limited but real. They give no variance across seeds or runs, so the 1.5-point gap could be noise; a couple of extra training runs would have settled that. Because the SLT architecture and training stay fixed, we still do not know whether the same ordering would hold under a different translator or loss. The missing-keypoint correlation is observed rather than tested with an ablation that forces keypoints back in. Nothing here is load-bearing or circular, though. The work is aimed at SLT practitioners who need to choose a pose tool and want numbers rather than marketing claims. It is the sort of targeted benchmarking that deserves referee time; the controls are clear, the code is out, and the gaps are fixable without rewriting the paper.

Referee Report

2 major / 3 minor

Summary. The paper conducts a controlled empirical comparison of pose estimation systems for sign language translation (SLT). It fixes the SLT architecture and training procedure on the RWTH-PHOENIX-Weather 2014 dataset while varying only the input pose sequences from eight estimators (MediaPipe Holistic, OpenPose, MMPose WholeBody, OpenPifPaf, AlphaPose, SDPose, Sapiens, SMPLest-X). Translation quality is measured with BLEU and BLEURT. Complementary analyses on Signsuisse quantify temporal stability, hand-keypoint omission rates, and occlusion robustness (e.g., Sapiens correct on 15/15 cases vs. OpenPifPaf on 1/15), which are shown to correlate with the BLEU/BLEURT ordering. The central finding is that SDPose and Sapiens achieve the highest scores (BLEU ~11.5), outperforming the MediaPipe baseline (~10), with hand-keypoint completeness and occlusion handling as key factors. Code is released for reproducibility.

Significance. If the results hold, the work is significant for the SLT field because it isolates the effect of pose representation through a fixed pipeline and provides actionable evidence that estimator choice—particularly accurate hand keypoints and occlusion robustness—directly affects downstream translation performance. The public code release is a clear strength, as it enables direct verification, extension to new estimators, and lowers the barrier for other researchers. The study addresses a common implementation detail that is often overlooked in pose-based SLT systems.

major comments (2)

The translation performance comparison reports concrete BLEU/BLEURT differences (e.g., ~11.5 vs. ~10) but provides no standard deviations across runs, confidence intervals, or statistical significance tests. This is load-bearing for the outperformance claim, as training stochasticity could explain the observed gaps without multiple seeds or hypothesis testing.
The occlusion robustness analysis (15 instances) shows Sapiens at 15/15 correct and OpenPifPaf at 1/15, correlating with translation scores, but the small fixed sample size and lack of details on case selection limit the strength of the generalization that occlusion handling drives the BLEU ordering.

minor comments (3)

Exact numerical values and full tables for BLEU/BLEURT (rather than approximate ~11.5) should be presented in the main results to allow precise comparison and replication.
The experimental setup states that the SLT pipeline is fixed but does not list key hyperparameters (learning rate, batch size, epochs, model dimensions) in the text; while code is released, the manuscript should include them for standalone readability.
Notation for pose estimators is occasionally inconsistent (full names vs. abbreviations); a single table or section defining all acronyms would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the positive assessment and constructive comments. We address each major comment below.

read point-by-point responses

Referee: The translation performance comparison reports concrete BLEU/BLEURT differences (e.g., ~11.5 vs. ~10) but provides no standard deviations across runs, confidence intervals, or statistical significance tests. This is load-bearing for the outperformance claim, as training stochasticity could explain the observed gaps without multiple seeds or hypothesis testing.

Authors: We thank the referee for pointing this out. The current results are based on single training runs for each pose estimator due to the high computational cost of training the SLT models. However, we recognize the importance of accounting for training stochasticity. In the revised manuscript, we will conduct additional experiments with at least three different random seeds for the top models (SDPose, Sapiens, MediaPipe) and report mean BLEU/BLEURT scores along with standard deviations. We will also perform statistical tests to assess the significance of the observed differences. revision: yes
Referee: The occlusion robustness analysis (15 instances) shows Sapiens at 15/15 correct and OpenPifPaf at 1/15, correlating with translation scores, but the small fixed sample size and lack of details on case selection limit the strength of the generalization that occlusion handling drives the BLEU ordering.

Authors: The 15 occlusion cases were manually selected from the Signsuisse dataset to represent common occlusion scenarios in sign language videos (e.g., hands overlapping with body or each other). We agree that the small sample size limits generalizability, and we will add a detailed description of the selection process in the revised paper. Additionally, we will emphasize that this analysis serves as supporting evidence for the correlation with translation performance rather than a comprehensive study. revision: partial

Circularity Check

0 steps flagged

No significant circularity

full rationale

This is a direct empirical benchmarking study with no derivations, fitted parameters, or self-referential predictions. The pipeline fixes the SLT architecture and training on RWTH-PHOENIX-Weather 2014 while varying only the pose input sequences from different estimators; downstream BLEU/BLEURT scores and auxiliary analyses (temporal stability, hand-keypoint omission, occlusion robustness on Signsuisse) are independent measurements on fixed datasets. No equations, ansatzes, uniqueness theorems, or load-bearing self-citations appear in the reported chain. Released code enables external verification of the controls.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the empirical isolation of pose quality as the variable of interest and on the representativeness of the two evaluation datasets for real signing conditions.

axioms (1)

domain assumption The RWTH-PHOENIX-Weather 2014 and Signsuisse datasets capture the relevant variations (including occlusion) that affect pose-based SLT performance.
Invoked when generalizing from the reported BLEU differences and occlusion counts to broader SLT systems.

pith-pipeline@v0.9.0 · 5577 in / 1422 out tokens · 47230 ms · 2026-05-08T03:34:04.081296+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages · 1 internal anchor

[1]

Evaluation of Pose Estimation Systems for Sign Language Translation

Introduction Sign language processing (SLP) is gaining ground within Natural Language Processing (NLP), yet it remains substantially underrepresented (Bragg et al., 2019; Yin et al., 2021; Müller et al., 2022). In spoken-language NLP, many core modeling deci- sions and preprocessing choices have been sys- tematically studied and benchmarked. In contrast, ...

work page internal anchor Pith review Pith/arXiv arXiv 2019
[2]

Pose Estimation Pose estimation systems extract human skeletal keypoints from video, representing body, hand, and facial articulators as spatio-temporal trajecto- ries

Background 2.1. Pose Estimation Pose estimation systems extract human skeletal keypoints from video, representing body, hand, and facial articulators as spatio-temporal trajecto- ries. Modern pipelines typically rely on deep learn- ing–baseddetectorssuchasOpenPose(Caoetal., 2021),MediaPipeHolistic(Lugaresietal.,2019;Gr- ishchenko and Bazarevsky, 2020), or...

work page 2021
[3]

Wholebody

Pose Estimators AsshowninTable1, weconsiderbothposeestima- torswidelyusedinprevioussignlanguageprocess- ing research—such as MediaPipe (Lugaresi et al., 2019)andOpenPose(Caoetal.,2021)—aswellas morerecentsystemsthathavenotbeenextensively used but appear to be strong candidates. All eval- uated methods are human pose estimators rather than models specializ...

work page 2019
[4]

to enhance both standard accuracy and ro- bustness under domain shift. Evaluated against both in-domain and out-of-distribution benchmarks (Jin et al., 2020a; Ju et al., 2023) (e.g., human and stylized images), SDPose achieves competi- tiveresultswithstrongcross-domaingeneralization, highlighting the potential of diffusion-based priors in structured visio...

work page 2023
[5]

We eval- uate each estimator within an identical translation pipeline to measure its impact on translation qual- ity (Section 4.1)

Methodology of Experiments This study compares multiple pose estimators in the context of sign language translation. We eval- uate each estimator within an identical translation pipeline to measure its impact on translation qual- ity (Section 4.1). To contextualize these results, we also analyze estimator behavior with respect to temporal instability, occ...

work page 2024
[6]

Translation Results Table 2 shows the evaluation results of our sign lan- guage translation (SLT) experiments

Results & Discussion 5.1. Translation Results Table 2 shows the evaluation results of our sign lan- guage translation (SLT) experiments. Every result is an average across three training runs. Overall, Figure 2: Examples of erroneous SMPLest-X hand poseestimatesoverlaidonoriginalframes,cropped to highlight the hands. In both examples (left: How2Sign (Duart...

work page 2021
[7]

Ourexperimentsshowthat this default is not necessarily optimal: several esti- mators outperform MediaPipe on Phoenix, includ- ing SDPose, Sapiens, AlphaPose, and MMPose Wholebody

Conclusions We presented a controlled comparison of pose es- timators for pose-based SLT, motivated by the fact that most prior SLT pipelines default to MediaPipe asaconvenientchoice. Ourexperimentsshowthat this default is not necessarily optimal: several esti- mators outperform MediaPipe on Phoenix, includ- ing SDPose, Sapiens, AlphaPose, and MMPose Whol...

work page
[8]

Accordingly, this datasethasnotableflaws

Limitations and Future Work Limitations of the Phoenix datasetIt should be noted that the signing in the Phoenix dataset is done live by hearing interpreters. Accordingly, this datasethasnotableflaws. Duetothetimepressure of the live setting, the interpreters may omit some information. Furthermore,thesigningisaninterpre- tation of German spoken language, ...

work page
[9]

Bibliographical References MykhayloAndriluka,UmarIqbal,EldarInsafutdinov, Leonid Pishchulin, Anton Milan, Juergen Gall, and Bernt Schiele. 2018. Posetrack: A bench- mark for human pose estimation and tracking. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5167–5176. Safaeid Hossain Arib, Rabeya Akter, Sejuti Rah- man, and S...

work page 2018
[10]

In2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7784–7793

Neural sign language translation. In2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7784–7793. Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2021. Openpose: Real- time multi-person 2d pose estimation using part affinityfields.IEEETransactionsonPatternAnal- ysis and Machine Intelligence, 43(1):172–186. Zhe C...

work page 2021
[11]

In2021 IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 2734–2743

How2sign: A large-scale multimodal dataset for continuous american sign language. In2021 IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 2734–2743. Chengyu Fan and Tahiya Chowdhury. 2025. When pose estimation fails: Measuring occlusion for reliable multimodal interaction. InCompanion Proceedingsofthe27thInternationalConferen...

work page 2025
[12]

CristianLazo-Quispe,JoeHuamani-Malca,Manuel Huamán-Ramos, Pablo Rivas, and Tomas Cerny

Openpifpaf: Composite fields for semantic keypoint detection and spatio-temporal associa- tion.IEEETransactionsonIntelligentTransporta- tion Systems, 23(8):13498–13511. CristianLazo-Quispe,JoeHuamani-Malca,Manuel Huamán-Ramos, Pablo Rivas, and Tomas Cerny

work page
[13]

In LXAI Workshop Thirty-sixth Conference on Neu- ral Information Processing Systems (NeurIPS 2022)

Impact of pose estimation models for landmark-based sign language recognition. In LXAI Workshop Thirty-sixth Conference on Neu- ral Information Processing Systems (NeurIPS 2022). Dongxu Li, Cristian Rodriguez Opazo, Xin Yu, and Hongdong Li. 2020. Word-level deep sign lan- guage recognition from video: A new large-scale dataset and methods comparison. In20...

work page 2022
[14]

Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J

Benchmarking 3d human pose estima- tion models under occlusions.arXiv preprint arXiv:2504.10350. Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J. Black. 2015. Smpl: a skinned multi-person linear model.ACM Trans. Graph., 34(6). Camillo Lugaresi, Jiuqiang Tang, Hadon Nash, Chris McClanahan, Esha Uboweja, Michael Hays, Fan Zhan...

work page arXiv 2015
[15]

InProceedingsoftheThirdInternational Workshop on Automatic Translation for Signed and Spoken Languages (AT4SSL), pages 1–6, Geneva, Switzerland

Pose-based sign language appearance transfer. InProceedingsoftheThirdInternational Workshop on Automatic Translation for Signed and Spoken Languages (AT4SSL), pages 1–6, Geneva, Switzerland. European Association for Machine Translation. Amit Moryossef, Ioannis Tsochantaridis, Roee Aha- roni, Sarah Ebling, and Srini Narayanan. 2020. Real-time sign language...

work page 2020
[16]

InProceedings of the 58th Annual MeetingoftheAssociationforComputationalLin- guistics, pages 7881–7892, Online

BLEURT: Learning robust metrics for text generation. InProceedings of the 58th Annual MeetingoftheAssociationforComputationalLin- guistics, pages 7881–7892, Online. Association for Computational Linguistics. Valerie Sutton. 1990.Lessons in SignWriting. Sign- Writing. Laia Tarrés, Gerard I. Gállego, Amanda Duarte, Jordi Torres, and Xavier Giró-i Nieto. 202...

work page 1990
[17]

Hongwen Zhang, Yating Tian, Yuxiang Zhang, MengchengLi,LiangAn,ZhenanSun,andYebin Liu

Scaling sign language translation.Ad- vancesinneuralinformationprocessingsystems, 37:114018–114047. Hongwen Zhang, Yating Tian, Yuxiang Zhang, MengchengLi,LiangAn,ZhenanSun,andYebin Liu. 2023. Pymaf-x: Towards well-aligned full- body model regression from monocular images. IEEE Transactions on Pattern Analysis and Ma- chine Intelligence, 45(10):12287–1230...

work page 2023

[1] [1]

Evaluation of Pose Estimation Systems for Sign Language Translation

Introduction Sign language processing (SLP) is gaining ground within Natural Language Processing (NLP), yet it remains substantially underrepresented (Bragg et al., 2019; Yin et al., 2021; Müller et al., 2022). In spoken-language NLP, many core modeling deci- sions and preprocessing choices have been sys- tematically studied and benchmarked. In contrast, ...

work page internal anchor Pith review Pith/arXiv arXiv 2019

[2] [2]

Pose Estimation Pose estimation systems extract human skeletal keypoints from video, representing body, hand, and facial articulators as spatio-temporal trajecto- ries

Background 2.1. Pose Estimation Pose estimation systems extract human skeletal keypoints from video, representing body, hand, and facial articulators as spatio-temporal trajecto- ries. Modern pipelines typically rely on deep learn- ing–baseddetectorssuchasOpenPose(Caoetal., 2021),MediaPipeHolistic(Lugaresietal.,2019;Gr- ishchenko and Bazarevsky, 2020), or...

work page 2021

[3] [3]

Wholebody

Pose Estimators AsshowninTable1, weconsiderbothposeestima- torswidelyusedinprevioussignlanguageprocess- ing research—such as MediaPipe (Lugaresi et al., 2019)andOpenPose(Caoetal.,2021)—aswellas morerecentsystemsthathavenotbeenextensively used but appear to be strong candidates. All eval- uated methods are human pose estimators rather than models specializ...

work page 2019

[4] [4]

to enhance both standard accuracy and ro- bustness under domain shift. Evaluated against both in-domain and out-of-distribution benchmarks (Jin et al., 2020a; Ju et al., 2023) (e.g., human and stylized images), SDPose achieves competi- tiveresultswithstrongcross-domaingeneralization, highlighting the potential of diffusion-based priors in structured visio...

work page 2023

[5] [5]

We eval- uate each estimator within an identical translation pipeline to measure its impact on translation qual- ity (Section 4.1)

Methodology of Experiments This study compares multiple pose estimators in the context of sign language translation. We eval- uate each estimator within an identical translation pipeline to measure its impact on translation qual- ity (Section 4.1). To contextualize these results, we also analyze estimator behavior with respect to temporal instability, occ...

work page 2024

[6] [6]

Translation Results Table 2 shows the evaluation results of our sign lan- guage translation (SLT) experiments

Results & Discussion 5.1. Translation Results Table 2 shows the evaluation results of our sign lan- guage translation (SLT) experiments. Every result is an average across three training runs. Overall, Figure 2: Examples of erroneous SMPLest-X hand poseestimatesoverlaidonoriginalframes,cropped to highlight the hands. In both examples (left: How2Sign (Duart...

work page 2021

[7] [7]

Ourexperimentsshowthat this default is not necessarily optimal: several esti- mators outperform MediaPipe on Phoenix, includ- ing SDPose, Sapiens, AlphaPose, and MMPose Wholebody

Conclusions We presented a controlled comparison of pose es- timators for pose-based SLT, motivated by the fact that most prior SLT pipelines default to MediaPipe asaconvenientchoice. Ourexperimentsshowthat this default is not necessarily optimal: several esti- mators outperform MediaPipe on Phoenix, includ- ing SDPose, Sapiens, AlphaPose, and MMPose Whol...

work page

[8] [8]

Accordingly, this datasethasnotableflaws

Limitations and Future Work Limitations of the Phoenix datasetIt should be noted that the signing in the Phoenix dataset is done live by hearing interpreters. Accordingly, this datasethasnotableflaws. Duetothetimepressure of the live setting, the interpreters may omit some information. Furthermore,thesigningisaninterpre- tation of German spoken language, ...

work page

[9] [9]

Bibliographical References MykhayloAndriluka,UmarIqbal,EldarInsafutdinov, Leonid Pishchulin, Anton Milan, Juergen Gall, and Bernt Schiele. 2018. Posetrack: A bench- mark for human pose estimation and tracking. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5167–5176. Safaeid Hossain Arib, Rabeya Akter, Sejuti Rah- man, and S...

work page 2018

[10] [10]

In2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7784–7793

Neural sign language translation. In2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7784–7793. Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2021. Openpose: Real- time multi-person 2d pose estimation using part affinityfields.IEEETransactionsonPatternAnal- ysis and Machine Intelligence, 43(1):172–186. Zhe C...

work page 2021

[11] [11]

In2021 IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 2734–2743

How2sign: A large-scale multimodal dataset for continuous american sign language. In2021 IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 2734–2743. Chengyu Fan and Tahiya Chowdhury. 2025. When pose estimation fails: Measuring occlusion for reliable multimodal interaction. InCompanion Proceedingsofthe27thInternationalConferen...

work page 2025

[12] [12]

CristianLazo-Quispe,JoeHuamani-Malca,Manuel Huamán-Ramos, Pablo Rivas, and Tomas Cerny

Openpifpaf: Composite fields for semantic keypoint detection and spatio-temporal associa- tion.IEEETransactionsonIntelligentTransporta- tion Systems, 23(8):13498–13511. CristianLazo-Quispe,JoeHuamani-Malca,Manuel Huamán-Ramos, Pablo Rivas, and Tomas Cerny

work page

[13] [13]

In LXAI Workshop Thirty-sixth Conference on Neu- ral Information Processing Systems (NeurIPS 2022)

Impact of pose estimation models for landmark-based sign language recognition. In LXAI Workshop Thirty-sixth Conference on Neu- ral Information Processing Systems (NeurIPS 2022). Dongxu Li, Cristian Rodriguez Opazo, Xin Yu, and Hongdong Li. 2020. Word-level deep sign lan- guage recognition from video: A new large-scale dataset and methods comparison. In20...

work page 2022

[14] [14]

Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J

Benchmarking 3d human pose estima- tion models under occlusions.arXiv preprint arXiv:2504.10350. Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J. Black. 2015. Smpl: a skinned multi-person linear model.ACM Trans. Graph., 34(6). Camillo Lugaresi, Jiuqiang Tang, Hadon Nash, Chris McClanahan, Esha Uboweja, Michael Hays, Fan Zhan...

work page arXiv 2015

[15] [15]

InProceedingsoftheThirdInternational Workshop on Automatic Translation for Signed and Spoken Languages (AT4SSL), pages 1–6, Geneva, Switzerland

Pose-based sign language appearance transfer. InProceedingsoftheThirdInternational Workshop on Automatic Translation for Signed and Spoken Languages (AT4SSL), pages 1–6, Geneva, Switzerland. European Association for Machine Translation. Amit Moryossef, Ioannis Tsochantaridis, Roee Aha- roni, Sarah Ebling, and Srini Narayanan. 2020. Real-time sign language...

work page 2020

[16] [16]

InProceedings of the 58th Annual MeetingoftheAssociationforComputationalLin- guistics, pages 7881–7892, Online

BLEURT: Learning robust metrics for text generation. InProceedings of the 58th Annual MeetingoftheAssociationforComputationalLin- guistics, pages 7881–7892, Online. Association for Computational Linguistics. Valerie Sutton. 1990.Lessons in SignWriting. Sign- Writing. Laia Tarrés, Gerard I. Gállego, Amanda Duarte, Jordi Torres, and Xavier Giró-i Nieto. 202...

work page 1990

[17] [17]

Hongwen Zhang, Yating Tian, Yuxiang Zhang, MengchengLi,LiangAn,ZhenanSun,andYebin Liu

Scaling sign language translation.Ad- vancesinneuralinformationprocessingsystems, 37:114018–114047. Hongwen Zhang, Yating Tian, Yuxiang Zhang, MengchengLi,LiangAn,ZhenanSun,andYebin Liu. 2023. Pymaf-x: Towards well-aligned full- body model regression from monocular images. IEEE Transactions on Pattern Analysis and Ma- chine Intelligence, 45(10):12287–1230...

work page 2023