SVGDreamer introduces semantic-driven image vectorization (SIVE) and vectorized particle-based score distillation (VPSD) to produce editable, high-quality, diverse SVGs from text.
Textdiffuser: Diffusion models as text painters
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 7verdicts
UNVERDICTED 7roles
dataset 1polarities
use dataset 1representative citing papers
A restarted dual-stream inference approach with glyph priors and attention-guided masks improves occluded text rendering in pretrained diffusion models without fine-tuning.
StyleTextGen proposes a dual-branch style encoder, text style consistency loss, and mask-guided inference to achieve superior style consistency and cross-lingual performance in multilingual scene text generation on a new bilingual benchmark.
FontFusion adds hierarchical token conditioning, position-aware embeddings, and multi-level dropping to DiT diffusion models, yielding 76% relative gains on decorative fonts and 68-76% consistency improvements via a dual DeepFont+DINOv2 encoder.
T2I models frequently exhibit semantic errors, logical inconsistencies, and incorrect reasoning steps in visual text generation tasks, unlike text-only models.
A two-stage method predicts an intermediate Canny map for structure then renders the image conditioned on appearance and structure, paired with a 100k text-aware dataset, to improve detail preservation in subject-driven generation.
FineEdit adds multi-level bounding box injection to diffusion image editing, releases a 1.2M-pair dataset with box annotations, and shows better instruction following and background consistency than prior open models on new and existing benchmarks.
citing papers explorer
-
SVGDreamer: Text Guided SVG Generation with Diffusion Model
SVGDreamer introduces semantic-driven image vectorization (SIVE) and vectorized particle-based score distillation (VPSD) to produce editable, high-quality, diverse SVGs from text.
-
Training-Free Occluded Text Rendering via Glyph Priors and Attention-Guided Semantic Blending
A restarted dual-stream inference approach with glyph priors and attention-guided masks improves occluded text rendering in pretrained diffusion models without fine-tuning.
-
StyleTextGen: Style-Conditioned Multilingual Scene Text Generation
StyleTextGen proposes a dual-branch style encoder, text style consistency loss, and mask-guided inference to achieve superior style consistency and cross-lingual performance in multilingual scene text generation on a new bilingual benchmark.
-
FontFusion: Enhancing Generative Text in Diffusion Models with Typographic Conditioning
FontFusion adds hierarchical token conditioning, position-aware embeddings, and multi-level dropping to DiT diffusion models, yielding 76% relative gains on decorative fonts and 68-76% consistency improvements via a dual DeepFont+DINOv2 encoder.
-
Evaluating Reasoning Fidelity in Visual Text Generation
T2I models frequently exhibit semantic errors, logical inconsistencies, and incorrect reasoning steps in visual text generation tasks, unlike text-only models.
-
Decomposing Subject-Driven Image Generation via Intermediate Structural Prediction
A two-stage method predicts an intermediate Canny map for structure then renders the image conditioned on appearance and structure, paired with a 100k text-aware dataset, to improve detail preservation in subject-driven generation.
-
FineEdit: Fine-Grained Image Edit with Bounding Box Guidance
FineEdit adds multi-level bounding box injection to diffusion image editing, releases a 1.2M-pair dataset with box annotations, and shows better instruction following and background consistency than prior open models on new and existing benchmarks.