arXiv preprint arXiv:2303.17870 , year=

Jian Ma, Mingjun Zhao, Chen Chen, Ruichen Wang, Di Niu, Haonan Lu, Xiaodong Lin · 2023 · arXiv 2303.17870

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

representative citing papers

MULTITEXTEDIT: Benchmarking Cross-Lingual Degradation in Text-in-Image Editing

cs.CV · 2026-05-04 · unverdicted · novelty 7.0

MULTITEXTEDIT benchmark reveals that all tested text-in-image editing models show pronounced degradation on non-English languages, especially Hebrew and Arabic, mainly in text accuracy and script fidelity.

UniVL: Unified Vision-Language Embedding for Spatially Grounded Contextual Image Generation

cs.CV · 2026-05-20 · unverdicted · novelty 6.0

UniVL unifies vision and language into one mask-rendered input processed by an OCR backbone to condition diffusion models for spatially grounded image generation without a standalone text encoder.

CAGE: Bridging the Accuracy-Aesthetics Gap in Educational Diagrams via Code-Anchored Generative Enhancement

cs.CV · 2026-04-06 · unverdicted · novelty 6.0

CAGE uses LLM-generated code for label-correct diagrams followed by ControlNet-conditioned diffusion refinement to produce both accurate and visually engaging educational graphics, backed by the new EduDiagram-2K dataset.

SkyReels-Text: Fine-Grained Font-Controllable Text Editing for Poster Design

cs.CV · 2025-11-17 · unverdicted · novelty 6.0

SkyReels-Text enables simultaneous fine-grained editing of multiple text regions in posters using arbitrary glyph patches for font control without labels or test-time fine-tuning.

TextAlign: Preference Alignment for Text Rendering with Hierarchical Rewards

cs.CV · 2026-05-19

citing papers explorer

Showing 5 of 5 citing papers.

MULTITEXTEDIT: Benchmarking Cross-Lingual Degradation in Text-in-Image Editing cs.CV · 2026-05-04 · unverdicted · none · ref 49
MULTITEXTEDIT benchmark reveals that all tested text-in-image editing models show pronounced degradation on non-English languages, especially Hebrew and Arabic, mainly in text accuracy and script fidelity.
UniVL: Unified Vision-Language Embedding for Spatially Grounded Contextual Image Generation cs.CV · 2026-05-20 · unverdicted · none · ref 18
UniVL unifies vision and language into one mask-rendered input processed by an OCR backbone to condition diffusion models for spatially grounded image generation without a standalone text encoder.
CAGE: Bridging the Accuracy-Aesthetics Gap in Educational Diagrams via Code-Anchored Generative Enhancement cs.CV · 2026-04-06 · unverdicted · none · ref 17
CAGE uses LLM-generated code for label-correct diagrams followed by ControlNet-conditioned diffusion refinement to produce both accurate and visually engaging educational graphics, backed by the new EduDiagram-2K dataset.
SkyReels-Text: Fine-Grained Font-Controllable Text Editing for Poster Design cs.CV · 2025-11-17 · unverdicted · none · ref 20
SkyReels-Text enables simultaneous fine-grained editing of multiple text regions in posters using arbitrary glyph patches for font control without labels or test-time fine-tuning.
TextAlign: Preference Alignment for Text Rendering with Hierarchical Rewards cs.CV · 2026-05-19 · unreviewed · ref 19

arXiv preprint arXiv:2303.17870 , year=

fields

years

verdicts

representative citing papers

citing papers explorer