UniSim learns a universal real-world simulator from orchestrated diverse datasets, enabling zero-shot deployment of policies trained purely in simulation.
hub
Dreambooth: Fine tuning text-to-image diffusion models for subject- driven generation
15 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
background 4polarities
background 4representative citing papers
ControlNet adds spatial conditioning controls to pretrained text-to-image diffusion models via zero convolutions for stable fine-tuning on small or large datasets.
PostureObjectStitch generates assembly-aware anomaly images by decoupling multi-view features into high-frequency, texture and RGB components, modulating them temporally in a diffusion model, and applying conditional loss plus geometric priors to preserve correct component relationships.
IdGlow is a progressive two-stage diffusion framework that uses task-adaptive timestep scheduling, temporal gating, VLM prompt synthesis, and group-level DPO to balance identity preservation and scene coherence in multi-subject image generation.
FLUX.1 Kontext unifies image generation and editing via flow matching and sequence concatenation, delivering improved multi-turn consistency and speed on the new KontextBench benchmark.
DDPO uses policy gradients on the denoising process to optimize diffusion models for arbitrary rewards like human feedback or compressibility.
DiFaReli++ conditions a DDIM on shading references and inferred shadow maps to relight single-view faces with consistent shadows, trained only on 2D images and claiming SOTA on Multi-PIE.
A three-stage fine-tuning process uses human ratings to train a reward model and then improves text-to-image alignment by maximizing reward-weighted likelihood.
An ensemble of stage-specialized text-to-image diffusion models improves prompt alignment over single shared-parameter models while preserving visual quality and inference speed.
Woosh is a new publicly released foundation model optimized for high-quality sound effect generation from text or video, showing competitive or better results than open alternatives like Stable Audio Open.
SynMotion combines disentangled semantic embeddings, parameter-efficient motion adapters, and alternate subject-motion training on a new SPV dataset to improve motion customization in text-to-video and image-to-video generation.
SOW uses MLLMs and attention to selectively control unidirectional diffusion for pixel-level fidelity and contextual coherence in text-vision-to-image tasks.
PrefPaint uses D3PO and a Model Tree web interface to incorporate gastroenterologist feedback into Stable Diffusion inpainting, producing anatomically accurate polyp images that outperform prior methods in user studies.
TextBoost is a one-shot personalization technique that selectively fine-tunes the text encoder of diffusion models using causality-preserving adaptation and lightweight adapters to reduce parameters and storage.
A literature survey of NeRF and neural field methods from 2020-2025, organized by architecture and application taxonomies with benchmarks and dataset overviews, covering both pre- and post-Gaussian Splatting periods.
citing papers explorer
-
Learning Interactive Real-World Simulators
UniSim learns a universal real-world simulator from orchestrated diverse datasets, enabling zero-shot deployment of policies trained purely in simulation.
-
Adding Conditional Control to Text-to-Image Diffusion Models
ControlNet adds spatial conditioning controls to pretrained text-to-image diffusion models via zero convolutions for stable fine-tuning on small or large datasets.
-
PostureObjectstitch: Anomaly Image Generation Considering Assembly Relationships in Industrial Scenarios
PostureObjectStitch generates assembly-aware anomaly images by decoupling multi-view features into high-frequency, texture and RGB components, modulating them temporally in a diffusion model, and applying conditional loss plus geometric priors to preserve correct component relationships.
-
IdGlow: Dynamic Identity Modulation for Multi-Subject Generation
IdGlow is a progressive two-stage diffusion framework that uses task-adaptive timestep scheduling, temporal gating, VLM prompt synthesis, and group-level DPO to balance identity preservation and scene coherence in multi-subject image generation.
-
FLUX.1 Kontext: Flow Matching for In-Context Image Generation and Editing in Latent Space
FLUX.1 Kontext unifies image generation and editing via flow matching and sequence concatenation, delivering improved multi-turn consistency and speed on the new KontextBench benchmark.
-
Training Diffusion Models with Reinforcement Learning
DDPO uses policy gradients on the denoising process to optimize diffusion models for arbitrary rewards like human feedback or compressibility.
-
DiFaReli++: Diffusion Face Relighting with Consistent Cast Shadows
DiFaReli++ conditions a DDIM on shading references and inferred shadow maps to relight single-view faces with consistent shadows, trained only on 2D images and claiming SOTA on Multi-PIE.
-
Aligning Text-to-Image Models using Human Feedback
A three-stage fine-tuning process uses human ratings to train a reward model and then improves text-to-image alignment by maximizing reward-weighted likelihood.
-
eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers
An ensemble of stage-specialized text-to-image diffusion models improves prompt alignment over single shared-parameter models while preserving visual quality and inference speed.
-
Woosh: A Sound Effects Foundation Model
Woosh is a new publicly released foundation model optimized for high-quality sound effect generation from text or video, showing competitive or better results than open alternatives like Stable Audio Open.
-
SynMotion: Semantic-Visual Adaptation for Motion Customized Video Generation
SynMotion combines disentangled semantic embeddings, parameter-efficient motion adapters, and alternate subject-motion training on a new SPV dataset to improve motion customization in text-to-video and image-to-video generation.
-
SOWing Information: Cultivating Contextual Coherence with MLLMs in Image Generation
SOW uses MLLMs and attention to selectively control unidirectional diffusion for pixel-level fidelity and contextual coherence in text-vision-to-image tasks.
-
PrefPaint: Enhancing Medical Image Inpainting through Expert Human Feedback
PrefPaint uses D3PO and a Model Tree web interface to incorporate gastroenterologist feedback into Stable Diffusion inpainting, producing anatomically accurate polyp images that outperform prior methods in user studies.
-
TextBoost: Boosting Text Encoder for Personalized Text-to-Image Generation
TextBoost is a one-shot personalization technique that selectively fine-tunes the text encoder of diffusion models using causality-preserving adaptation and lightweight adapters to reduce parameters and storage.
-
NeRF: Neural Radiance Field in 3D Vision: A Comprehensive Review (Updated Post-Gaussian Splatting)
A literature survey of NeRF and neural field methods from 2020-2025, organized by architecture and application taxonomies with benchmarks and dataset overviews, covering both pre- and post-Gaussian Splatting periods.