CLEAR achieves end-to-end mask-free video subtitle removal via dual-encoder self-supervised orthogonality and LoRA-based generation feedback, delivering +6.77 dB PSNR gains and strong zero-shot multilingual performance.
Diffstr: Controlled diffusion models for scene text removal.arXiv preprint arXiv:2410.21721
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
CLEAR: Context-Aware Learning with End-to-End Mask-Free Inference for Adaptive Video Subtitle Removal
CLEAR achieves end-to-end mask-free video subtitle removal via dual-encoder self-supervised orthogonality and LoRA-based generation feedback, delivering +6.77 dB PSNR gains and strong zero-shot multilingual performance.