hub

Dreambooth: Fine tuning text-to-image diffusion models for subject- driven generation

· 2022 · arXiv 2208.12242

15 Pith papers cite this work. Polarity classification is still indexing.

15 Pith papers citing it

read on arXiv browse 15 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 4

citation-polarity summary

background 4

representative citing papers

Learning Interactive Real-World Simulators

cs.AI · 2023-10-09 · conditional · novelty 7.0

UniSim learns a universal real-world simulator from orchestrated diverse datasets, enabling zero-shot deployment of policies trained purely in simulation.

Adding Conditional Control to Text-to-Image Diffusion Models

cs.CV · 2023-02-10 · conditional · novelty 7.0

ControlNet adds spatial conditioning controls to pretrained text-to-image diffusion models via zero convolutions for stable fine-tuning on small or large datasets.

PostureObjectstitch: Anomaly Image Generation Considering Assembly Relationships in Industrial Scenarios

cs.CV · 2026-04-15 · unverdicted · novelty 6.0

PostureObjectStitch generates assembly-aware anomaly images by decoupling multi-view features into high-frequency, texture and RGB components, modulating them temporally in a diffusion model, and applying conditional loss plus geometric priors to preserve correct component relationships.

IdGlow: Dynamic Identity Modulation for Multi-Subject Generation

cs.CV · 2026-02-28 · unverdicted · novelty 6.0

IdGlow is a progressive two-stage diffusion framework that uses task-adaptive timestep scheduling, temporal gating, VLM prompt synthesis, and group-level DPO to balance identity preservation and scene coherence in multi-subject image generation.

FLUX.1 Kontext: Flow Matching for In-Context Image Generation and Editing in Latent Space

cs.GR · 2025-06-17 · unverdicted · novelty 6.0

FLUX.1 Kontext unifies image generation and editing via flow matching and sequence concatenation, delivering improved multi-turn consistency and speed on the new KontextBench benchmark.

Training Diffusion Models with Reinforcement Learning

cs.LG · 2023-05-22 · unverdicted · novelty 6.0

DDPO uses policy gradients on the denoising process to optimize diffusion models for arbitrary rewards like human feedback or compressibility.

DiFaReli++: Diffusion Face Relighting with Consistent Cast Shadows

cs.CV · 2023-04-19 · unverdicted · novelty 6.0

DiFaReli++ conditions a DDIM on shading references and inferred shadow maps to relight single-view faces with consistent shadows, trained only on 2D images and claiming SOTA on Multi-PIE.

Aligning Text-to-Image Models using Human Feedback

cs.LG · 2023-02-23 · unverdicted · novelty 6.0

A three-stage fine-tuning process uses human ratings to train a reward model and then improves text-to-image alignment by maximizing reward-weighted likelihood.

eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers

cs.CV · 2022-11-02 · unverdicted · novelty 6.0

An ensemble of stage-specialized text-to-image diffusion models improves prompt alignment over single shared-parameter models while preserving visual quality and inference speed.

Woosh: A Sound Effects Foundation Model

cs.SD · 2026-04-02 · accept · novelty 5.0

Woosh is a new publicly released foundation model optimized for high-quality sound effect generation from text or video, showing competitive or better results than open alternatives like Stable Audio Open.

SynMotion: Semantic-Visual Adaptation for Motion Customized Video Generation

cs.CV · 2025-06-30 · unverdicted · novelty 5.0

SynMotion combines disentangled semantic embeddings, parameter-efficient motion adapters, and alternate subject-motion training on a new SPV dataset to improve motion customization in text-to-video and image-to-video generation.

SOWing Information: Cultivating Contextual Coherence with MLLMs in Image Generation

cs.CV · 2024-11-28 · unverdicted · novelty 5.0

SOW uses MLLMs and attention to selectively control unidirectional diffusion for pixel-level fidelity and contextual coherence in text-vision-to-image tasks.

PrefPaint: Enhancing Medical Image Inpainting through Expert Human Feedback

cs.CV · 2025-06-27 · unverdicted · novelty 4.0

PrefPaint uses D3PO and a Model Tree web interface to incorporate gastroenterologist feedback into Stable Diffusion inpainting, producing anatomically accurate polyp images that outperform prior methods in user studies.

TextBoost: Boosting Text Encoder for Personalized Text-to-Image Generation

cs.CV · 2024-09-12 · unverdicted · novelty 4.0

TextBoost is a one-shot personalization technique that selectively fine-tunes the text encoder of diffusion models using causality-preserving adaptation and lightweight adapters to reduce parameters and storage.

NeRF: Neural Radiance Field in 3D Vision: A Comprehensive Review (Updated Post-Gaussian Splatting)

cs.CV · 2022-10-01 · unverdicted · novelty 2.0

A literature survey of NeRF and neural field methods from 2020-2025, organized by architecture and application taxonomies with benchmarks and dataset overviews, covering both pre- and post-Gaussian Splatting periods.

citing papers explorer

Showing 15 of 15 citing papers.

Learning Interactive Real-World Simulators cs.AI · 2023-10-09 · conditional · none · ref 104
UniSim learns a universal real-world simulator from orchestrated diverse datasets, enabling zero-shot deployment of policies trained purely in simulation.
Adding Conditional Control to Text-to-Image Diffusion Models cs.CV · 2023-02-10 · conditional · none · ref 75
ControlNet adds spatial conditioning controls to pretrained text-to-image diffusion models via zero convolutions for stable fine-tuning on small or large datasets.
PostureObjectstitch: Anomaly Image Generation Considering Assembly Relationships in Industrial Scenarios cs.CV · 2026-04-15 · unverdicted · none · ref 31
PostureObjectStitch generates assembly-aware anomaly images by decoupling multi-view features into high-frequency, texture and RGB components, modulating them temporally in a diffusion model, and applying conditional loss plus geometric priors to preserve correct component relationships.
IdGlow: Dynamic Identity Modulation for Multi-Subject Generation cs.CV · 2026-02-28 · unverdicted · none · ref 27
IdGlow is a progressive two-stage diffusion framework that uses task-adaptive timestep scheduling, temporal gating, VLM prompt synthesis, and group-level DPO to balance identity preservation and scene coherence in multi-subject image generation.
FLUX.1 Kontext: Flow Matching for In-Context Image Generation and Editing in Latent Space cs.GR · 2025-06-17 · unverdicted · none · ref 43
FLUX.1 Kontext unifies image generation and editing via flow matching and sequence concatenation, delivering improved multi-turn consistency and speed on the new KontextBench benchmark.
Training Diffusion Models with Reinforcement Learning cs.LG · 2023-05-22 · unverdicted · none · ref 23
DDPO uses policy gradients on the denoising process to optimize diffusion models for arbitrary rewards like human feedback or compressibility.
DiFaReli++: Diffusion Face Relighting with Consistent Cast Shadows cs.CV · 2023-04-19 · unverdicted · none · ref 62
DiFaReli++ conditions a DDIM on shading references and inferred shadow maps to relight single-view faces with consistent shadows, trained only on 2D images and claiming SOTA on Multi-PIE.
Aligning Text-to-Image Models using Human Feedback cs.LG · 2023-02-23 · unverdicted · none · ref 16
A three-stage fine-tuning process uses human ratings to train a reward model and then improves text-to-image alignment by maximizing reward-weighted likelihood.
eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers cs.CV · 2022-11-02 · unverdicted · none · ref 62
An ensemble of stage-specialized text-to-image diffusion models improves prompt alignment over single shared-parameter models while preserving visual quality and inference speed.
Woosh: A Sound Effects Foundation Model cs.SD · 2026-04-02 · accept · none · ref 50
Woosh is a new publicly released foundation model optimized for high-quality sound effect generation from text or video, showing competitive or better results than open alternatives like Stable Audio Open.
SynMotion: Semantic-Visual Adaptation for Motion Customized Video Generation cs.CV · 2025-06-30 · unverdicted · none · ref 63
SynMotion combines disentangled semantic embeddings, parameter-efficient motion adapters, and alternate subject-motion training on a new SPV dataset to improve motion customization in text-to-video and image-to-video generation.
SOWing Information: Cultivating Contextual Coherence with MLLMs in Image Generation cs.CV · 2024-11-28 · unverdicted · none · ref 10
SOW uses MLLMs and attention to selectively control unidirectional diffusion for pixel-level fidelity and contextual coherence in text-vision-to-image tasks.
PrefPaint: Enhancing Medical Image Inpainting through Expert Human Feedback cs.CV · 2025-06-27 · unverdicted · none · ref 25
PrefPaint uses D3PO and a Model Tree web interface to incorporate gastroenterologist feedback into Stable Diffusion inpainting, producing anatomically accurate polyp images that outperform prior methods in user studies.
TextBoost: Boosting Text Encoder for Personalized Text-to-Image Generation cs.CV · 2024-09-12 · unverdicted · none · ref 36
TextBoost is a one-shot personalization technique that selectively fine-tunes the text encoder of diffusion models using causality-preserving adaptation and lightweight adapters to reduce parameters and storage.
NeRF: Neural Radiance Field in 3D Vision: A Comprehensive Review (Updated Post-Gaussian Splatting) cs.CV · 2022-10-01 · unverdicted · none · ref 192
A literature survey of NeRF and neural field methods from 2020-2025, organized by architecture and application taxonomies with benchmarks and dataset overviews, covering both pre- and post-Gaussian Splatting periods.

Dreambooth: Fine tuning text-to-image diffusion models for subject- driven generation

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer