MULTITEXTEDIT benchmark reveals that all tested text-in-image editing models show pronounced degradation on non-English languages, especially Hebrew and Arabic, mainly in text accuracy and script fidelity.
arXiv preprint arXiv:2303.17870 , year=
5 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 5representative citing papers
UniVL unifies vision and language into one mask-rendered input processed by an OCR backbone to condition diffusion models for spatially grounded image generation without a standalone text encoder.
CAGE uses LLM-generated code for label-correct diagrams followed by ControlNet-conditioned diffusion refinement to produce both accurate and visually engaging educational graphics, backed by the new EduDiagram-2K dataset.
SkyReels-Text enables simultaneous fine-grained editing of multiple text regions in posters using arbitrary glyph patches for font control without labels or test-time fine-tuning.
citing papers explorer
-
MULTITEXTEDIT: Benchmarking Cross-Lingual Degradation in Text-in-Image Editing
MULTITEXTEDIT benchmark reveals that all tested text-in-image editing models show pronounced degradation on non-English languages, especially Hebrew and Arabic, mainly in text accuracy and script fidelity.
-
UniVL: Unified Vision-Language Embedding for Spatially Grounded Contextual Image Generation
UniVL unifies vision and language into one mask-rendered input processed by an OCR backbone to condition diffusion models for spatially grounded image generation without a standalone text encoder.
-
CAGE: Bridging the Accuracy-Aesthetics Gap in Educational Diagrams via Code-Anchored Generative Enhancement
CAGE uses LLM-generated code for label-correct diagrams followed by ControlNet-conditioned diffusion refinement to produce both accurate and visually engaging educational graphics, backed by the new EduDiagram-2K dataset.
-
SkyReels-Text: Fine-Grained Font-Controllable Text Editing for Poster Design
SkyReels-Text enables simultaneous fine-grained editing of multiple text regions in posters using arbitrary glyph patches for font control without labels or test-time fine-tuning.
- TextAlign: Preference Alignment for Text Rendering with Hierarchical Rewards