Hybrid Autoregressive-Diffusion Model for Real-Time Sign Language Production

Mano Manoharan; Maoxiao Ye; Xinfeng Ye

arxiv: 2507.09105 · v4 · pith:WFY6UO5Lnew · submitted 2025-07-12 · 💻 cs.CV

Hybrid Autoregressive-Diffusion Model for Real-Time Sign Language Production

Maoxiao Ye , Xinfeng Ye , Mano Manoharan This is my paper

classification 💻 cs.CV

keywords languageproductionsignautoregressive-diffusioncausalframegenerationhow2sign

0 comments

read the original abstract

Earlier Sign Language Production (SLP) models typically relied on autoregressive decoding, which naturally preserves temporal causality but suffers from error accumulation at inference time. More recent diffusion-based approaches improve generation quality through iterative denoising, yet their sequence-level refinement process introduces substantial latency. To address this trade-off, we propose HybridSign, a hybrid autoregressive-diffusion model for low-latency sign language production that combines causal frame generation with flow-based diffusion refinement. A Multi-Scale Pose Representation module captures fine-grained articulator features, while a Confidence-Aware Causal Attention mechanism leverages joint-level confidence scores to improve robustness under noisy 2D pose observations. Experiments on PHOENIX14T and How2Sign show that HybridSign consistently achieves the best quality--efficiency trade-off among the compared baselines. On the How2Sign test split, it reaches BLEU-1/4 scores of 30.12/6.48 and DTW of 3.89, while reducing time-to-first-frame to 5.90s and increasing throughput to 10.17 FPS under a 60-frame evaluation protocol.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Marrying Generative Model of Healthcare Events with Digital Twin of Social Determinants of Health for Disease Reasoning
cs.AI 2026-05 unverdicted novelty 5.0

A generative framework using geometric diffusion for brain networks and tabular diffusion for other organs integrates ICD-coded SDoH proxies to improve disease reasoning on UK Biobank data.