Dub- wise: Video-guided speech duration control in multi- modal llm-based text-to-speech for dubbing

Neha Sahipjohn, Ashishkumar Gudmalwar, Nirmesh Shah, Pankaj Wasnik, Rajiv Ratn Shah, “Dubwise: Video-guided speech duration control in multimodal llm-based text-to-speech for dubbing,” inINTER- SPEECH, Kos Island, Greece · 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Can Hierarchical Cross-Modal Fusion Predict Human Perception of AI Dubbed Content?

eess.AS · 2026-03-30 · unverdicted · novelty 4.0

A hierarchical cross-modal fusion architecture with LoRA adapters predicts human perception of AI-dubbed clips at PCC > 0.75 after training on 12k Hindi-English examples and human MOS fine-tuning.

citing papers explorer

Showing 1 of 1 citing paper.

Can Hierarchical Cross-Modal Fusion Predict Human Perception of AI Dubbed Content? eess.AS · 2026-03-30 · unverdicted · none · ref 6
A hierarchical cross-modal fusion architecture with LoRA adapters predicts human perception of AI-dubbed clips at PCC > 0.75 after training on 12k Hindi-English examples and human MOS fine-tuning.

Dub- wise: Video-guided speech duration control in multi- modal llm-based text-to-speech for dubbing

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer