hub

Denoising dif- fusion probabilistic models.Advances in neural information processing systems, 33:6840–6851

Jonathan Ho, Ajay Jain, Pieter Abbeel · 2020

21 Pith papers cite this work. Polarity classification is still indexing.

21 Pith papers citing it

browse 21 citing papers

hub tools

JSON dossier citing papers JSON

citation-role summary

background 4

citation-polarity summary

background 4

representative citing papers

VDE: Training-Free Accelerating Rectified Flow Model via Velocity Decomposition and Estimation

cs.CV · 2026-05-22 · unverdicted · novelty 7.0

VDE accelerates rectified flow models like Flux by 3.22x with LPIPS of 0.069 via velocity decomposition into parallel/orthogonal components plus periodic full-pass anchoring.

CoReDiT: Spatial Coherence-Guided Token Pruning and Reconstruction for Efficient Diffusion Transformers

cs.CV · 2026-05-13 · unverdicted · novelty 7.0

CoReDiT reduces self-attention FLOPs in DiTs by up to 55% via linear-time spatial coherence pruning and neighbor-based reconstruction, delivering 1.33x-1.72x speedups with maintained quality.

HP-Edit: A Human-Preference Post-Training Framework for Image Editing

cs.CV · 2026-04-21 · unverdicted · novelty 7.0

HP-Edit introduces a post-training framework and RealPref-50K dataset that uses a VLM-based HP-Scorer to align diffusion image editing models with human preferences, improving outputs on Qwen-Image-Edit-2509.

ASTRA: Enhancing Multi-Subject Generation with Retrieval-Augmented Pose Guidance and Disentangled Position Embedding

cs.CV · 2026-04-15 · unverdicted · novelty 7.0

ASTRA disentangles subject identity from pose structure in diffusion transformers via retrieval-augmented pose guidance, asymmetric EURoPE embeddings, and a DSM adapter to improve multi-subject generation.

AvatarPointillist: AutoRegressive 4D Gaussian Avatarization

cs.CV · 2026-04-06 · unverdicted · novelty 7.0

AvatarPointillist autoregressively generates adaptive 3D point clouds via Transformer for photorealistic 4D Gaussian avatars from one image, jointly predicting animation bindings and using a conditioned Gaussian decoder.

MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE

cs.AI · 2025-07-29 · unverdicted · novelty 7.0

MixGRPO speeds up GRPO for flow-based image generators by restricting SDE sampling and optimization to a sliding window while using ODE elsewhere, cutting training time by up to 71% with better alignment performance.

AttriStory: Fine-grained Attribute Realization for Visual Storytelling with Diffusion Models

cs.CV · 2026-05-20 · unverdicted · novelty 6.0

AttriStory adds a benchmark and AttriLoss-based latent optimization to improve faithful rendering of fine-grained attributes such as clothing color and texture in diffusion-model visual storytelling.

GOR-IS: 3D Gaussian Object Removal in the Intrinsic Space

cs.CV · 2026-05-01 · unverdicted · novelty 6.0

GOR-IS removes objects from 3D Gaussian Splatting reconstructions by performing inpainting in an intrinsic decomposition space that explicitly models light transport for consistent global lighting and non-Lambertian surfaces.

Denoising, Fast and Slow: Difficulty-Aware Adaptive Sampling for Image Generation

cs.CV · 2026-04-21 · unverdicted · novelty 6.0

Patch Forcing enables diffusion models to denoise image patches at varying rates based on predicted difficulty, advancing easier regions first to improve context and achieve better generation quality on ImageNet while scaling to text-to-image tasks.

Seen-to-Scene: Keep the Seen, Generate the Unseen for Video Outpainting

cs.CV · 2026-04-16 · unverdicted · novelty 6.0

Seen-to-Scene unifies propagation-based and generation-based approaches for video outpainting via fine-tuned flow completion and reference-guided latent propagation to deliver superior temporal coherence and efficiency.

Beyond Voxel 3D Editing: Learning from 3D Masks and Self-Constructed Data

cs.CV · 2026-04-15 · unverdicted · novelty 6.0

BVE framework enables text-guided 3D editing beyond voxel limits by combining self-constructed data, lightweight semantic injection, and annotation-free masking to preserve local invariance.

Towards Robust Content Watermarking Against Removal and Forgery Attacks

cs.CV · 2026-04-08 · unverdicted · novelty 6.0

ISTS watermarking dynamically controls injection based on prompt semantics and uses two-sided detection to resist removal and forgery attacks in diffusion models.

Erasure or Erosion? Evaluating Compositional Degradation in Unlearned Text-To-Image Diffusion Models

cs.CV · 2026-04-06 · unverdicted · novelty 6.0

Unlearning methods that strongly erase concepts from text-to-image diffusion models consistently degrade performance on attribute binding, spatial reasoning, and counting tasks.

HandDreamer: Zero-Shot Text to 3D Hand Model Generation using Corrective Hand Shape Guidance

cs.CV · 2026-04-06 · unverdicted · novelty 6.0

HandDreamer is the first zero-shot text-to-3D method for hands that uses MANO initialization, skeleton-guided diffusion, and corrective shape guidance to produce view-consistent models.

HVG-3D: Bridging Real and Simulation Domains for 3D-Conditional Hand-Object Interaction Video Synthesis

cs.CV · 2026-03-31 · unverdicted · novelty 6.0

HVG-3D uses a 3D-aware diffusion architecture with ControlNet to synthesize high-fidelity hand-object interaction videos from 3D control signals, achieving state-of-the-art spatial fidelity and temporal coherence on the TASTE-Rob dataset.

WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling

cs.CV · 2025-12-16 · unverdicted · novelty 6.0

WorldPlay uses dual action representation, reconstituted context memory, and context forcing distillation to produce consistent 720p streaming video at 24 FPS for interactive world modeling.

Less is More: Data-Efficient Adaptation for Controllable Text-to-Video Generation

cs.CV · 2025-11-21 · unverdicted · novelty 6.0

Fine-tuning text-to-video models on sparse low-quality synthetic data for physical camera controls outperforms fine-tuning on photorealistic data.

Ouroboros: Single-step Diffusion Models for Cycle-consistent Forward and Inverse Rendering

cs.CV · 2025-08-20 · unverdicted · novelty 6.0

Ouroboros uses two single-step diffusion models with cycle consistency for forward and inverse rendering, extending intrinsic decomposition to indoor/outdoor scenes with faster inference than multi-step methods.

RealDiffusion: Physics-informed Attention for Multi-character Storybook Generation

cs.CV · 2026-05-12 · unverdicted · novelty 5.0

RealDiffusion uses heat diffusion as a dissipative prior and a region-aware stochastic process inside a training-free physics-informed attention mechanism to improve multi-character coherence while preserving narrative dynamism in sequential image generation.

MegaStyle: Constructing Diverse and Scalable Style Dataset via Consistent Text-to-Image Style Mapping

cs.CV · 2026-04-09 · unverdicted · novelty 5.0

A scalable pipeline generates an intra-consistent, inter-diverse 1.4M style image dataset from text-to-image models and uses it to train a style encoder and generalizable style transfer model.

PureCC: Pure Learning for Text-to-Image Concept Customization

cs.CV · 2026-03-08 · unverdicted · novelty 5.0

PureCC introduces a decoupled learning objective, dual-branch training pipeline with frozen extractor, and adaptive guidance scale λ* for high-fidelity concept customization while preserving original model behavior in text-to-image generation.

citing papers explorer

Showing 21 of 21 citing papers.

VDE: Training-Free Accelerating Rectified Flow Model via Velocity Decomposition and Estimation cs.CV · 2026-05-22 · unverdicted · none · ref 11
VDE accelerates rectified flow models like Flux by 3.22x with LPIPS of 0.069 via velocity decomposition into parallel/orthogonal components plus periodic full-pass anchoring.
CoReDiT: Spatial Coherence-Guided Token Pruning and Reconstruction for Efficient Diffusion Transformers cs.CV · 2026-05-13 · unverdicted · none · ref 11
CoReDiT reduces self-attention FLOPs in DiTs by up to 55% via linear-time spatial coherence pruning and neighbor-based reconstruction, delivering 1.33x-1.72x speedups with maintained quality.
HP-Edit: A Human-Preference Post-Training Framework for Image Editing cs.CV · 2026-04-21 · unverdicted · none · ref 12
HP-Edit introduces a post-training framework and RealPref-50K dataset that uses a VLM-based HP-Scorer to align diffusion image editing models with human preferences, improving outputs on Qwen-Image-Edit-2509.
ASTRA: Enhancing Multi-Subject Generation with Retrieval-Augmented Pose Guidance and Disentangled Position Embedding cs.CV · 2026-04-15 · unverdicted · none · ref 10
ASTRA disentangles subject identity from pose structure in diffusion transformers via retrieval-augmented pose guidance, asymmetric EURoPE embeddings, and a DSM adapter to improve multi-subject generation.
AvatarPointillist: AutoRegressive 4D Gaussian Avatarization cs.CV · 2026-04-06 · unverdicted · none · ref 25
AvatarPointillist autoregressively generates adaptive 3D point clouds via Transformer for photorealistic 4D Gaussian avatars from one image, jointly predicting animation bindings and using a conditioned Gaussian decoder.
MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE cs.AI · 2025-07-29 · unverdicted · none · ref 8
MixGRPO speeds up GRPO for flow-based image generators by restricting SDE sampling and optimization to a sliding window while using ODE elsewhere, cutting training time by up to 71% with better alignment performance.
AttriStory: Fine-grained Attribute Realization for Visual Storytelling with Diffusion Models cs.CV · 2026-05-20 · unverdicted · none · ref 10
AttriStory adds a benchmark and AttriLoss-based latent optimization to improve faithful rendering of fine-grained attributes such as clothing color and texture in diffusion-model visual storytelling.
GOR-IS: 3D Gaussian Object Removal in the Intrinsic Space cs.CV · 2026-05-01 · unverdicted · none · ref 12
GOR-IS removes objects from 3D Gaussian Splatting reconstructions by performing inpainting in an intrinsic decomposition space that explicitly models light transport for consistent global lighting and non-Lambertian surfaces.
Denoising, Fast and Slow: Difficulty-Aware Adaptive Sampling for Image Generation cs.CV · 2026-04-21 · unverdicted · none · ref 22
Patch Forcing enables diffusion models to denoise image patches at varying rates based on predicted difficulty, advancing easier regions first to improve context and achieve better generation quality on ImageNet while scaling to text-to-image tasks.
Seen-to-Scene: Keep the Seen, Generate the Unseen for Video Outpainting cs.CV · 2026-04-16 · unverdicted · none · ref 7
Seen-to-Scene unifies propagation-based and generation-based approaches for video outpainting via fine-tuned flow completion and reference-guided latent propagation to deliver superior temporal coherence and efficiency.
Beyond Voxel 3D Editing: Learning from 3D Masks and Self-Constructed Data cs.CV · 2026-04-15 · unverdicted · none · ref 26
BVE framework enables text-guided 3D editing beyond voxel limits by combining self-constructed data, lightweight semantic injection, and annotation-free masking to preserve local invariance.
Towards Robust Content Watermarking Against Removal and Forgery Attacks cs.CV · 2026-04-08 · unverdicted · none · ref 21
ISTS watermarking dynamically controls injection based on prompt semantics and uses two-sided detection to resist removal and forgery attacks in diffusion models.
Erasure or Erosion? Evaluating Compositional Degradation in Unlearned Text-To-Image Diffusion Models cs.CV · 2026-04-06 · unverdicted · none · ref 11
Unlearning methods that strongly erase concepts from text-to-image diffusion models consistently degrade performance on attribute binding, spatial reasoning, and counting tasks.
HandDreamer: Zero-Shot Text to 3D Hand Model Generation using Corrective Hand Shape Guidance cs.CV · 2026-04-06 · unverdicted · none · ref 11
HandDreamer is the first zero-shot text-to-3D method for hands that uses MANO initialization, skeleton-guided diffusion, and corrective shape guidance to produce view-consistent models.
HVG-3D: Bridging Real and Simulation Domains for 3D-Conditional Hand-Object Interaction Video Synthesis cs.CV · 2026-03-31 · unverdicted · none · ref 22
HVG-3D uses a 3D-aware diffusion architecture with ControlNet to synthesize high-fidelity hand-object interaction videos from 3D control signals, achieving state-of-the-art spatial fidelity and temporal coherence on the TASTE-Rob dataset.
WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling cs.CV · 2025-12-16 · unverdicted · none · ref 19
WorldPlay uses dual action representation, reconstituted context memory, and context forcing distillation to produce consistent 720p streaming video at 24 FPS for interactive world modeling.
Less is More: Data-Efficient Adaptation for Controllable Text-to-Video Generation cs.CV · 2025-11-21 · unverdicted · none · ref 12
Fine-tuning text-to-video models on sparse low-quality synthetic data for physical camera controls outperforms fine-tuning on photorealistic data.
Ouroboros: Single-step Diffusion Models for Cycle-consistent Forward and Inverse Rendering cs.CV · 2025-08-20 · unverdicted · none · ref 23
Ouroboros uses two single-step diffusion models with cycle consistency for forward and inverse rendering, extending intrinsic decomposition to indoor/outdoor scenes with faster inference than multi-step methods.
RealDiffusion: Physics-informed Attention for Multi-character Storybook Generation cs.CV · 2026-05-12 · unverdicted · none · ref 10
RealDiffusion uses heat diffusion as a dissipative prior and a region-aware stochastic process inside a training-free physics-informed attention mechanism to improve multi-character coherence while preserving narrative dynamism in sequential image generation.
MegaStyle: Constructing Diverse and Scalable Style Dataset via Consistent Text-to-Image Style Mapping cs.CV · 2026-04-09 · unverdicted · none · ref 16
A scalable pipeline generates an intra-consistent, inter-diverse 1.4M style image dataset from text-to-image models and uses it to train a style encoder and generalizable style transfer model.
PureCC: Pure Learning for Text-to-Image Concept Customization cs.CV · 2026-03-08 · unverdicted · none · ref 17
PureCC introduces a decoupled learning objective, dual-branch training pipeline with frozen extractor, and adaptive guidance scale λ* for high-fidelity concept customization while preserving original model behavior in text-to-image generation.

Denoising dif- fusion probabilistic models.Advances in neural information processing systems, 33:6840–6851

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer