MULTITEXTEDIT benchmark reveals that all tested text-in-image editing models show pronounced degradation on non-English languages, especially Hebrew and Arabic, mainly in text accuracy and script fidelity.
Glyphdraw: Seamlessly rendering text with intricate spatial structures in text-to-image generation
7 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 7verdicts
UNVERDICTED 7representative citing papers
A transcoder-based in-place replacement of the bottleneck layer enables selective concept removal in modern diffusion and autoregressive image models without degrading output quality.
SteerVTE adds lightweight style and dual-granularity glyph adapters to a frozen video diffusion model, introduces a glyph-aware loss and progressive training, and releases a 1M synthetic dataset to enable accurate video text editing.
UniVL unifies vision and language into one mask-rendered input processed by an OCR backbone to condition diffusion models for spatially grounded image generation without a standalone text encoder.
TextAlign uses a hierarchical VLM reward for preference alignment to boost text accuracy in generative models like FLUX.1-dev.
CAGE uses LLM-generated code for label-correct diagrams followed by ControlNet-conditioned diffusion refinement to produce both accurate and visually engaging educational graphics, backed by the new EduDiagram-2K dataset.
SkyReels-Text enables simultaneous fine-grained editing of multiple text regions in posters using arbitrary glyph patches for font control without labels or test-time fine-tuning.
citing papers explorer
No citing papers match the current filters.