Diffusion Policy models robot actions as a conditional diffusion process, outperforming prior state-of-the-art methods by 46.9% on average across 12 manipulation tasks from four benchmarks.
hub Mixed citations
Elucidating the Design Space of Diffusion-Based Generative Models
Mixed citation behavior. Most common role is background (56%).
abstract
We argue that the theory and practice of diffusion-based generative models are currently unnecessarily convoluted and seek to remedy the situation by presenting a design space that clearly separates the concrete design choices. This lets us identify several changes to both the sampling and training processes, as well as preconditioning of the score networks. Together, our improvements yield new state-of-the-art FID of 1.79 for CIFAR-10 in a class-conditional setting and 1.97 in an unconditional setting, with much faster sampling (35 network evaluations per image) than prior designs. To further demonstrate their modular nature, we show that our design changes dramatically improve both the efficiency and quality obtainable with pre-trained score networks from previous work, including improving the FID of a previously trained ImageNet-64 model from 2.07 to near-SOTA 1.55, and after re-training with our proposed improvements to a new SOTA of 1.36.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
Rectified flow learns straight-path neural ODEs for distribution transport, yielding efficient generative models and domain transfers that work well even with a single simulation step.
A covariance-aware extension of DDIM sampling for pixel-space diffusion models that uses Tweedie's formula and Fourier decomposition to model reverse-process covariance and improves sample quality at low NFE.
PNAPO augments preference data with prior noise pairs and uses straight-line interpolation to create a tighter surrogate objective for offline alignment of rectified flow models.
Tempered Guided Diffusion uses annealed SMC to produce consistent particle approximations to the posterior for training-free conditional diffusion sampling, outperforming independent guided trajectories in experiments.
Z²-Sampling implicitly realizes zero-cost zigzag trajectories for curvature-aware semantic alignment in diffusion models by reducing multi-step paths via operator dualities and temporal caching while synthesizing a directional derivative penalty.
A diffusion generative inverse model conditioned on temperature targets produces diverse, physically plausible urban vegetation patterns that achieve specified regional temperature shifts.
GVCC achieves the lowest LPIPS on UVG at bitrates down to 0.003 bpp by encoding stochastic innovations in a marginal-preserving stochastic process derived from a pretrained rectified-flow video model, with 65% LPIPS reduction over DCVC-RT.
Béz ierFlow parameterizes stochastic interpolant schedulers as Béz ier functions to learn optimal sampling trajectories, achieving 2-3x better few-step performance than prior timestep optimization methods.
Diff-ANO uses conditional consistency models and adjoint neural operator surrogates to enable fast, high-quality USCT reconstructions under sparse and partial views by replacing slow PDE solvers and enabling few-step sampling.
Proposes an advection-diffusion PDE corruption process with stochastic velocity fields and Lattice Boltzmann solver for diffusion models, generalizing prior PDE methods.
Analytic solution of full-batch gradient flow for linear and convolutional denoisers in diffusion models yields a universal inverse-variance spectral law for learning times of eigenmodes.
Imagen Video generates high-definition text-conditional videos via a cascade of base and super-resolution diffusion models, achieving high fidelity and controllability.
GEARS is a geometry-first generative framework that learns domain-invariant encoders and permutation-equivariant diffusion generators to reconstruct intrinsic 2D cell coordinates and distance matrices from unpaired scRNA-seq guided by ST.
Recasts sampling-based nonconvex optimization as smoothed gradient descent to obtain non-asymptotic convergence guarantees and introduces the DIDA annealed algorithm that converges to the global optimum.
DSL provides a continuous embedding framework where one denoiser supports a family of SNR paths for discrete sequences, improving MAUVE scores on OpenWebText and allowing random-order and hybrid sampling from a fine-tuned MDLM checkpoint.
SAMG uses spatially adaptive guidance scales derived from a geometric analysis of classifier-free guidance to resolve the detail-artifact dilemma in diffusion-based image and video generation.
Diffusion models overfit denoising loss at intermediate noise but generalize in inference as model error smooths the flow field and sampling paths avoid memorized noisy training data.
GraphBSI uses Bayesian Sample Inference as noise-controlled SDEs to generate discrete graphs in one shot, achieving state-of-the-art results on molecular benchmarks Moses and GuacaMol.
Biased noise sampling for rectified flows combined with a bidirectional text-image transformer architecture yields state-of-the-art high-resolution text-to-image results that scale predictably with model size.
Stable Video Diffusion scales latent video diffusion models via text-to-image pretraining, video pretraining on curated data, and high-quality finetuning to produce competitive text-to-video and image-to-video results while enabling motion LoRA and multi-view 3D applications.
SDXL improves upon prior Stable Diffusion versions through a larger UNet backbone, dual text encoders, novel conditioning, and a refinement model, producing higher-fidelity images competitive with black-box state-of-the-art generators.
Shap-E encodes 3D assets into implicit function parameters then uses a conditional diffusion model to generate new ones from text, enabling fast multi-representation 3D asset creation.
Diffusion models recover known ENSO variability structure from synthetic LIM data when given enough samples, but require pre-training on CMIP6 plus fine-tuning to match observations with the ~700 samples available in ERSSTv5.
citing papers explorer
No citing papers match the current filters.