MULTITEXTEDIT benchmark reveals that all tested text-in-image editing models show pronounced degradation on non-English languages, especially Hebrew and Arabic, mainly in text accuracy and script fidelity.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 3verdicts
UNVERDICTED 3representative citing papers
PA-BDM adapts block diffusion by switching to causal intra-block denoising and dynamically committing reliable prefixes to KV cache, yielding higher accuracy and 71.6% higher throughput than a comparable baseline on document benchmarks.
VPiT enables pretrained LLMs to perform both visual understanding and generation by predicting discrete text tokens and continuous visual tokens, with understanding data proving more effective than generation-specific data.
citing papers explorer
-
MULTITEXTEDIT: Benchmarking Cross-Lingual Degradation in Text-in-Image Editing
MULTITEXTEDIT benchmark reveals that all tested text-in-image editing models show pronounced degradation on non-English languages, especially Hebrew and Arabic, mainly in text accuracy and script fidelity.
-
Prefix-Adaptive Block Diffusion for Efficient Document Recognition
PA-BDM adapts block diffusion by switching to causal intra-block denoising and dynamically committing reliable prefixes to KV cache, yielding higher accuracy and 71.6% higher throughput than a comparable baseline on document benchmarks.
-
MetaMorph: Multimodal Understanding and Generation via Instruction Tuning
VPiT enables pretrained LLMs to perform both visual understanding and generation by predicting discrete text tokens and continuous visual tokens, with understanding data proving more effective than generation-specific data.