Rectified flow learns straight-path neural ODEs for distribution transport, yielding efficient generative models and domain transfers that work well even with a single simulation step.
hub
Ilvr: Conditioning method for denoising diffusion probabilistic models
18 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
Textual Inversion learns a single embedding vector from a few images to represent personal concepts inside the text embedding space of a frozen text-to-image model, enabling their composition in natural language prompts.
LatentFT uses latent-space Fourier transforms and frequency masking in diffusion autoencoders to enable timescale-specific manipulation of musical structure in generative models.
A diffusion generative inverse model conditioned on temperature targets produces diverse, physically plausible urban vegetation patterns that achieve specified regional temperature shifts.
LPNSR derives optimal intermediate noise for diffusion SR via MLE and implements it with an LR-guided noise predictor, reaching SOTA perceptual quality in 4 steps without text priors.
LooseRoPE modulates RoPE in diffusion attention maps to continuously trade off between preserving a pasted object's identity and harmonizing it with its new surroundings.
SDEdit performs guided image synthesis and editing by adding noise to inputs and refining them via denoising with a diffusion model's SDE prior, outperforming GAN methods in human studies without task-specific training.
Error in approximating the tangent conditional score by the unconditional score in diffusion models is bounded by dimension-free conditional mutual information, with a projected-Langevin method outperforming baselines in inpainting and super-resolution.
RF-CMG synthesizes high-quality mmWave and RFID signals from WiFi using a diffusion model with Modality-Guided Embedding for high-frequency details and Low-Frequency Modality Consistency to preserve physical structure.
StructDiff adds adaptive receptive fields and 3D positional encoding to a single-scale diffusion model to preserve structure and enable spatial control in single-image generation.
MENO enhances neural operators with MeanFlow to restore multi-scale accuracy in dynamical system predictions while keeping inference costs low, achieving up to 2x better power spectrum accuracy and 12x faster inference than diffusion-enhanced baselines on phase-field, Kolmogorov flow, and active-m<f
Under a Gaussian prior assumption, zero-shot diffusion posterior samplers for inverse problems admit closed-form spectral representations that enable a new parameter-selection framework balancing perceptual quality and signal fidelity.
GenFocal uses probabilistic ML to downscale coarse climate projections to fine-scale weather events without paired training data and samples rare high-impact events more accurately than prior methods.
DragNUWA integrates text, image, and trajectory controls into a diffusion video model using a Trajectory Sampler, Multiscale Fusion, and Adaptive Training to enable fine-grained open-domain video generation.
TopoStyle provides an interactive system using 2D diffusion models for 2.5D topology optimization that supports hand-drawn and point-based edits plus masking to enable iterative customization balancing performance and aesthetics.
SocialMirror reconstructs 3D meshes of closely interacting humans from monocular videos using semantic guidance from vision-language models and geometric constraints in a diffusion model to handle occlusions and maintain temporal and spatial consistency.
A dual ascent optimization framework is introduced for MAP estimation with diffusion priors, claimed to outperform prior methods on image restoration in quality, noise robustness, speed, and data fidelity.
SOW uses MLLMs and attention to selectively control unidirectional diffusion for pixel-level fidelity and contextual coherence in text-vision-to-image tasks.
citing papers explorer
-
Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow
Rectified flow learns straight-path neural ODEs for distribution transport, yielding efficient generative models and domain transfers that work well even with a single simulation step.
-
An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
Textual Inversion learns a single embedding vector from a few images to represent personal concepts inside the text embedding space of a frozen text-to-image model, enabling their composition in natural language prompts.
-
Latent Fourier Transform
LatentFT uses latent-space Fourier transforms and frequency masking in diffusion autoencoders to enable timescale-specific manipulation of musical structure in generative models.
-
Conflated Inverse Modeling to Generate Diverse and Temperature-Change Inducing Urban Vegetation Patterns
A diffusion generative inverse model conditioned on temperature targets produces diverse, physically plausible urban vegetation patterns that achieve specified regional temperature shifts.
-
LPNSR: Optimal Noise-Guided Diffusion Image Super-Resolution Via Learnable Noise Prediction
LPNSR derives optimal intermediate noise for diffusion SR via MLE and implements it with an LR-guided noise predictor, reaching SOTA perceptual quality in 4 steps without text priors.
-
LooseRoPE: Content-aware Attention Manipulation for Semantic Harmonization
LooseRoPE modulates RoPE in diffusion attention maps to continuously trade off between preserving a pasted object's identity and harmonizing it with its new surroundings.
-
SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations
SDEdit performs guided image synthesis and editing by adding noise to inputs and refining them via denoising with a diffusion model's SDE prior, outperforming GAN methods in human studies without task-specific training.
-
Conditional Diffusion Under Linear Constraints: Langevin Mixing and Information-Theoretic Guarantees
Error in approximating the tangent conditional score by the unconditional score in diffusion models is bounded by dimension-free conditional mutual information, with a projected-Langevin method outperforming baselines in inpainting and super-resolution.
-
Cross-Modal Generation: From Commodity WiFi to High-Fidelity mmWave and RFID Sensing
RF-CMG synthesizes high-quality mmWave and RFID signals from WiFi using a diffusion model with Modality-Guided Embedding for high-frequency details and Low-Frequency Modality Consistency to preserve physical structure.
-
StructDiff: A Structure-Preserving and Spatially Controllable Diffusion Model for Single-Image Generation
StructDiff adds adaptive receptive fields and 3D positional encoding to a single-scale diffusion model to preserve structure and enable spatial control in single-image generation.
-
MENO: MeanFlow-Enhanced Neural Operators for Dynamical Systems
MENO enhances neural operators with MeanFlow to restore multi-scale accuracy in dynamical system predictions while keeping inference costs low, achieving up to 2x better power spectrum accuracy and 12x faster inference than diffusion-enhanced baselines on phase-field, Kolmogorov flow, and active-m<f
-
Analyzing and Guiding Zero-Shot Posterior Sampling in Diffusion Models
Under a Gaussian prior assumption, zero-shot diffusion posterior samplers for inverse problems admit closed-form spectral representations that enable a new parameter-selection framework balancing perceptual quality and signal fidelity.
-
Regional climate risk assessment from climate models using probabilistic machine learning
GenFocal uses probabilistic ML to downscale coarse climate projections to fine-scale weather events without paired training data and samples rare high-impact events more accurately than prior methods.
-
DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory
DragNUWA integrates text, image, and trajectory controls into a diffusion video model using a Trajectory Sampler, Multiscale Fusion, and Adaptive Training to enable fine-grained open-domain video generation.
-
TopoStyle: Supporting Iterative Design with Generative AI for 2.5D Topology Optimization
TopoStyle provides an interactive system using 2D diffusion models for 2.5D topology optimization that supports hand-drawn and point-based edits plus masking to enable iterative customization balancing performance and aesthetics.
-
SocialMirror: Reconstructing 3D Human Interaction Behaviors from Monocular Videos with Semantic and Geometric Guidance
SocialMirror reconstructs 3D meshes of closely interacting humans from monocular videos using semantic guidance from vision-language models and geometric constraints in a diffusion model to handle occlusions and maintain temporal and spatial consistency.
-
Dual Ascent Diffusion for Inverse Problems
A dual ascent optimization framework is introduced for MAP estimation with diffusion priors, claimed to outperform prior methods on image restoration in quality, noise robustness, speed, and data fidelity.
-
SOWing Information: Cultivating Contextual Coherence with MLLMs in Image Generation
SOW uses MLLMs and attention to selectively control unidirectional diffusion for pixel-level fidelity and contextual coherence in text-vision-to-image tasks.