StableSketcher improves text-to-sketch generation by fine-tuning a diffusion VAE and adding a VQA-based RL reward, while releasing the SketchDUO dataset of sketches with captions and QA pairs.
Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2025 2representative citing papers
Ouroboros uses two single-step diffusion models with cycle consistency for forward and inverse rendering, extending intrinsic decomposition to indoor/outdoor scenes with faster inference than multi-step methods.
citing papers explorer
-
StableSketcher: Enhancing Diffusion Model for Pixel-based Sketch Generation via Visual Question Answering Feedback
StableSketcher improves text-to-sketch generation by fine-tuning a diffusion VAE and adding a VQA-based RL reward, while releasing the SketchDUO dataset of sketches with captions and QA pairs.
-
Ouroboros: Single-step Diffusion Models for Cycle-consistent Forward and Inverse Rendering
Ouroboros uses two single-step diffusion models with cycle consistency for forward and inverse rendering, extending intrinsic decomposition to indoor/outdoor scenes with faster inference than multi-step methods.