super hub Mixed citations

Denoising Diffusion Implicit Models

Chenlin Meng, Jiaming Song · 2020 · cs.LG · arXiv 2010.02502

Mixed citation behavior. Most common role is background (67%).

446 Pith papers citing it

Background 67% of classified citations

open full Pith review browse 446 citing papers more from Chenlin Meng arXiv PDF

abstract

Denoising diffusion probabilistic models (DDPMs) have achieved high quality image generation without adversarial training, yet they require simulating a Markov chain for many steps to produce a sample. To accelerate sampling, we present denoising diffusion implicit models (DDIMs), a more efficient class of iterative implicit probabilistic models with the same training procedure as DDPMs. In DDPMs, the generative process is defined as the reverse of a Markovian diffusion process. We construct a class of non-Markovian diffusion processes that lead to the same training objective, but whose reverse process can be much faster to sample from. We empirically demonstrate that DDIMs can produce high quality samples $10 \times$ to $50 \times$ faster in terms of wall-clock time compared to DDPMs, allow us to trade off computation for sample quality, and can perform semantically meaningful image interpolation directly in the latent space.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 58 method 23 baseline 2

citation-polarity summary

background 56 use method 23 baseline 2 support 1 unclear 1

claims ledger

abstract Denoising diffusion probabilistic models (DDPMs) have achieved high quality image generation without adversarial training, yet they require simulating a Markov chain for many steps to produce a sample. To accelerate sampling, we present denoising diffusion implicit models (DDIMs), a more efficient class of iterative implicit probabilistic models with the same training procedure as DDPMs. In DDPMs, the generative process is defined as the reverse of a Markovian diffusion process. We construct a class of non-Markovian diffusion processes that lead to the same training objective, but whose revers

authors

and Stefano Ermon Chenlin Meng Jiaming Song

co-cited works

representative citing papers

ActivityForensics: A Comprehensive Benchmark for Localizing Manipulated Activity in Videos

cs.CV · 2026-04-04 · unverdicted · novelty 8.0

ActivityForensics is the first large-scale benchmark for temporally localizing activity-level forgeries in videos, paired with a diffusion-based baseline called TADiff.

Flow-GRPO: Training Flow Matching Models via Online RL

cs.CV · 2025-05-08 · unverdicted · novelty 8.0

Flow-GRPO is the first online RL method for flow matching models, raising GenEval accuracy from 63% to 95% and text-rendering accuracy from 59% to 92% with little reward hacking.

Consistency Models

cs.LG · 2023-03-02 · conditional · novelty 8.0

Consistency models achieve fast one-step generation with SOTA FID of 3.55 on CIFAR-10 and 6.20 on ImageNet 64x64 by directly mapping noise to data, outperforming prior distillation techniques.

Splatshot: 3D Face Avatar Generation from a Single Unconstrained Photo

cs.CV · 2026-05-31 · unverdicted · novelty 7.0

SplatShot is a training-free method that inserts per-step 3DGS refitting and photometric feedback into diffusion denoising to enforce multi-view consistency for single-photo 3D face avatars.

Decoupled Residual Denoising Diffusion Models for Unified and Data Efficient Image-to-Image Translation

cs.CV · 2026-05-31 · unverdicted · novelty 7.0

DRDD decouples diffusion into independent noise and residual stages to preserve domain harmonization and enable unified data-efficient I2I translation.

Sample-Efficient Diffusion-based Reinforcement Learning with Critic Guidance

cs.RO · 2026-05-28 · unverdicted · novelty 7.0

CGPO integrates training-free critic guidance into diffusion denoising to produce high-Q actions as regression targets, yielding SOTA results on MuJoCo locomotion and successful Franka arm grasping.

Midpoint Generative Models

cs.LG · 2026-05-28 · unverdicted · novelty 7.0

Midpoint Generative Models define a midpoint divergence from flow matching symmetry and derive its variational form as a tractable objective for training competitive one-step generators.

Point Tracking Improves World Action Models

cs.RO · 2026-05-22 · unverdicted · novelty 7.0

JOPAT jointly models pixels, point tracks, and actions in a diffusion transformer and reports gains over pixel-only baselines on long-horizon robot tasks with occlusion and off-screen motion.

DFSAttn: Dynamic Fine-grained Sparse Attention for Efficient Video Generation

cs.CV · 2026-05-22 · unverdicted · novelty 7.0

DFSAttn is a training-free framework for dynamic fine-grained sparse attention in video DiTs that achieves up to 2.1x speedup while preserving generation quality via Hilbert reordering, hierarchical scoring, and adaptive caching.

VDE: Training-Free Accelerating Rectified Flow Model via Velocity Decomposition and Estimation

cs.CV · 2026-05-22 · unverdicted · novelty 7.0

VDE accelerates rectified flow models like Flux by 3.22x with LPIPS of 0.069 via velocity decomposition into parallel/orthogonal components plus periodic full-pass anchoring.

Linear-DPO: Linear Direct Preference Optimization for Diffusion and Flow-Matching Generative Models

cs.CV · 2026-05-20 · unverdicted · novelty 7.0

Linear-DPO replaces sigmoid utility with linear utility and adds EMA reference to improve preference alignment in diffusion and flow-matching text-to-image models.

DrawMotion: Generating 3D Human Motions by Freehand Drawing

cs.CV · 2026-05-20 · unverdicted · novelty 7.0

DrawMotion is a diffusion-based framework that fuses text and hand-drawn stickman conditions via a Multi-Condition Module and training-free guidance to generate 3D human motions.

CAdam: Context-Adaptive Moment Estimation for 3D Gaussian Densification in Generative Distillation

cs.LG · 2026-05-20 · unverdicted · novelty 7.0

CAdam reinterprets densification in generative 3DGS as signal verification via gradient-moment interference, quantile context, and SNR gating to achieve large reductions in primitive count with comparable quality.

DISC: Decoupling Instruction from State-Conditioned Control via Policy Generation

cs.RO · 2026-05-20 · unverdicted · novelty 7.0

A hypernetwork generates complete task-specific visuomotor policy parameters from instructions alone to structurally eliminate observation leakage in language-conditioned robotic control.

BrepForge: Factorized B-rep Synthesis via Wireframe Composition and Boundary-Conditioned Surface Instantiation

cs.GR · 2026-05-19 · unverdicted · novelty 7.0

BrepForge factorizes B-rep synthesis into face-aware autoregressive wireframe composition followed by boundary-conditioned surface instantiation using learning-free geometric priors.

Inference-Time Scaling in Diffusion Models through Iterative Partial Refinement

cs.LG · 2026-05-19 · unverdicted · novelty 7.0

IPR improves valid solution rates on MNIST Sudoku from 55.8% to 75.0% by iteratively refining partial regions in sequential diffusion models without external verifiers or reward models.

PolycubeNet: A Dual-latent Diffusion Model for Polycube-Based Hexahedral Mesh Generation

cs.GR · 2026-05-19 · unverdicted · novelty 7.0

PolycubeNet applies a dual-latent diffusion architecture to generate polycube point clouds from input point clouds, enabling robust hexahedral mesh creation without surface segmentation or templates.

Functionalization via Structure Completion and Motion Rectification

cs.CV · 2026-05-18 · unverdicted · novelty 7.0

Object functionalization is cast as neural graph completion over a functional graph of parts, contacts, and motions, followed by geometry realization that also rectifies erroneous motions, demonstrated on furniture with a new paired dataset.

StreamingEffect: Real-Time Human-Centric Video Effect Generation

cs.CV · 2026-05-16 · unverdicted · novelty 7.0

StreamingEffect enables real-time 720p human-centric video effect generation on one GPU via teacher-student distillation, keyframe control, and a new 130K video dataset.

Towards Generalized Image Manipulation Localization via Score-based Model

cs.CV · 2026-05-16 · conditional · novelty 7.0

DiffIML applies score-based generative modeling to image manipulation localization, recovering coherent masks iteratively from noise to improve generalization on unseen manipulation types.

VMU-Diff: A Coarse-to-fine Multi-source Data Fusion Framework for Precipitation Nowcasting

cs.CV · 2026-05-14 · unverdicted · novelty 7.0

VMU-Diff improves precipitation nowcasting via coarse multi-source Vision Mamba fusion followed by residual conditional diffusion refinement.

HASTE: Training-Free Video Diffusion Acceleration via Head-Wise Adaptive Sparse Attention

cs.CV · 2026-05-14 · unverdicted · novelty 7.0

HASTE delivers up to 1.93x speedup on Wan2.1 video DiTs via head-wise adaptive sparse attention using temporal mask reuse and error-guided per-head calibration while preserving video quality.

What if Tomorrow is the World Cup Final? Counterfactual Time Series Forecasting with Textual Conditions

cs.LG · 2026-05-14 · unverdicted · novelty 7.0

Introduces the task of counterfactual time series forecasting with textual conditions plus a text-attribution mechanism that improves accuracy by distinguishing mutable from immutable factors.

Training-Free Generative Sampling via Moment-Matched Score Smoothing

stat.ML · 2026-05-14 · unverdicted · novelty 7.0

MM-SOLD is a training-free particle sampler whose large-particle limit converges to a moment-matched Gibbs distribution obtained by exponentially tilting a score-smoothed target.

citing papers explorer

Showing 50 of 446 citing papers.

$Z^2$-Sampling: Zero-Cost Zigzag Trajectories for Semantic Alignment in Diffusion Models cs.CV · 2026-04-26 · unverdicted · none · ref 42 · internal anchor
Z²-Sampling implicitly realizes zero-cost zigzag trajectories for curvature-aware semantic alignment in diffusion models by reducing multi-step paths via operator dualities and temporal caching while synthesizing a directional derivative penalty.
ReImagine: Rethinking Controllable High-Quality Human Video Generation via Image-First Synthesis cs.CV · 2026-04-21 · unverdicted · none · ref 42 · internal anchor
ReImagine decouples human appearance from temporal consistency via pretrained image backbones, SMPL-X motion guidance, and training-free video diffusion refinement to generate high-quality controllable videos.
HP-Edit: A Human-Preference Post-Training Framework for Image Editing cs.CV · 2026-04-21 · unverdicted · none · ref 40 · internal anchor
HP-Edit introduces a post-training framework and RealPref-50K dataset that uses a VLM-based HP-Scorer to align diffusion image editing models with human preferences, improving outputs on Qwen-Image-Edit-2509.
Learning to Credit the Right Steps: Objective-aware Process Optimization for Visual Generation cs.CV · 2026-04-21 · unverdicted · none · ref 38 · internal anchor
OTCA improves GRPO training for visual generation by estimating step importance in trajectories and adaptively weighting multiple reward objectives.
Denoise and Align: Diffusion-Driven Foreground Knowledge Prompting for Open-Vocabulary Temporal Action Detection cs.CV · 2026-04-20 · unverdicted · none · ref 43 · internal anchor
DFAlign uses diffusion-based denoising to generate foreground knowledge prompts that improve cross-modal alignment for detecting unseen actions in untrimmed videos, reporting state-of-the-art results on OV-TAD benchmarks.
Long-Text-to-Image Generation via Compositional Prompt Decomposition cs.CV · 2026-04-20 · unverdicted · none · ref 54 · internal anchor
PRISM lets pre-trained text-to-image models handle long prompts by breaking them into compositional parts, predicting noise separately, and merging outputs via energy-based conjunction, matching fine-tuned models while generalizing better to prompts over 500 tokens.
TacticGen: Grounding Adaptable and Scalable Generation of Football Tactics cs.AI · 2026-04-20 · conditional · none · ref 80 · internal anchor
TacticGen generates realistic, adaptable football tactics via a multi-agent diffusion transformer trained on 3.3M events and 100M frames, supporting rule-, language-, or model-based guidance at inference time.
View-Consistent 3D Scene Editing via Dual-Path Structural Correspondense and Semantic Continuity cs.CV · 2026-04-20 · unverdicted · none · ref 11 · internal anchor
A dual-path consistency framework for text-driven 3D scene editing that models cross-view dependencies via structural correspondence and semantic continuity, trained on a newly constructed paired multi-view dataset.
Structure-Adaptive Sparse Diffusion in Voxel Space for 3D Medical Image Enhancement cs.CV · 2026-04-20 · unverdicted · none · ref 24 · internal anchor
A sparse voxel-space diffusion method with structure-adaptive modulation achieves up to 10x training speedup and state-of-the-art results for 3D medical image denoising and super-resolution.
AST: Adaptive, Seamless, and Training-Free Precise Speech Editing cs.SD · 2026-04-17 · unverdicted · none · ref 24 · internal anchor
AST enables seamless speech editing by latent recomposition on pre-trained TTS models plus adaptive weak fact guidance, plus a new dataset and WDTW metric, claiming 70% WER reduction and better temporal consistency without training.
From Competition to Coopetition: Coopetitive Training-Free Image Editing Based on Text Guidance cs.CV · 2026-04-17 · unverdicted · none · ref 13 · internal anchor
CoEdit is a zero-shot coopetitive framework for text-guided image editing that uses dual-entropy attention manipulation and entropic latent refinement to improve editing harmony and structural preservation.
High-Speed Full-Color HDR Imaging via Unwrapping Modulo-Encoded Spike Streams cs.CV · 2026-04-16 · unverdicted · none · ref 49 · internal anchor
An exposure-decoupled modulo formulation and iteration-free diffusion-prior unwrapping enable 1000 FPS full-color HDR imaging on spike cameras while cutting bandwidth from 20 Gbps to 6 Gbps.
AbdomenGen: Sequential Volume-Conditioned Diffusion Framework for Abdominal Anatomy Generation cs.CV · 2026-04-14 · unverdicted · none · ref 15 · internal anchor
A sequential diffusion framework generates controllable abdominal anatomies with a Volume Control Scalar that decouples organ size from body habitus, achieving Dice scores around 0.83 and reducing distributional mismatch by 73.6% in a hepatomegaly example.
MAST: Mask-Guided Attention Mass Allocation for Training-Free Multi-Style Transfer cs.CV · 2026-04-14 · unverdicted · none · ref 21 · internal anchor
MAST is a mask-guided attention allocation method that enables artifact-free multi-style transfer in diffusion models by anchoring layout, distributing attention mass, scaling sharpness, and injecting details.
Invertible Diffusion for Low-Memory Channel Gain Map Construction in Wireless Communication Networks eess.SP · 2026-04-13 · unverdicted · none · ref 20 · internal anchor
InvDiff-CGM uses invertible architectures in diffusion and U-Net plus a multi-scale prior injector to construct CGMs with 85% lower peak training memory and 38.02 dB PSNR on RadioMap3DSeer.
Diffusion Inpainting MIMO-OFDM Channels with Limited Noisy Observations eess.SP · 2026-04-10 · unverdicted · none · ref 44 · internal anchor
A Conditional Diffusion Transformer recovers full MIMO-OFDM channels from sparse noisy pilots, delivering over 5 dB gain versus baselines even at 1/32 pilot density and completing inference in 10 steps.
DMax: Aggressive Parallel Decoding for dLLMs cs.LG · 2026-04-09 · conditional · none · ref 71 · 2 links · internal anchor
DMax uses On-Policy Uniform Training and Soft Parallel Decoding to enable aggressive parallelism in dLLMs, raising TPF on GSM8K from 2.04 to 5.47 and on MBPP from 2.71 to 5.86 while preserving accuracy.
MoZoo:Unleashing Video Diffusion power in animal fur and muscle simulation cs.GR · 2026-04-08 · unverdicted · none · ref 39 · internal anchor
MoZoo generates high-fidelity animal videos with fur and muscle dynamics from coarse meshes by extending video diffusion with role-aware RoPE and asymmetric decoupled attention, trained on a new synthetic-to-real dataset.
Referring-Aware Visuomotor Policy Learning for Closed-Loop Manipulation cs.RO · 2026-04-07 · unverdicted · none · ref 34 · internal anchor
ReV is a referring-aware visuomotor policy using coupled diffusion heads for real-time trajectory replanning in robotic manipulation, trained solely via targeted perturbations to expert demonstrations and achieving higher success rates in simulated and real tasks.
Your Pre-trained Diffusion Model Secretly Knows Restoration cs.CV · 2026-04-06 · unverdicted · none · ref 45 · internal anchor
Pre-trained diffusion models inherently support image restoration that can be unlocked by optimizing prompt embeddings at the text encoder output using a diffusion bridge formulation, achieving competitive results on models like WAN and FLUX without fine-tuning.
InfBaGel: Human-Object-Scene Interaction Generation with Dynamic Perception and Iterative Refinement cs.CV · 2026-04-06 · unverdicted · none · ref 14 · internal anchor
InfBaGel generates consistent human-object-scene interactions via dynamic perception during iterative refinement in a consistency model, bump-aware guidance to avoid collisions, and hybrid training that mixes synthesized pseudo-samples with real HSI data.
Bridging Restoration and Diagnosis: A Comprehensive Benchmark for Retinal Fundus Enhancement cs.CV · 2026-04-04 · unverdicted · none · ref 31 · internal anchor
EyeBench-V2 is a new benchmark that evaluates retinal fundus enhancement models using downstream clinical tasks, generalization tests, and structured expert assessments to measure real diagnostic utility.
PromptEvolver: Prompt Inversion through Evolutionary Optimization in Natural-Language Space cs.LG · 2026-04-03 · unverdicted · none · ref 40 · internal anchor
PromptEvolver recovers high-fidelity natural language prompts for given images by evolving them via genetic algorithm guided by a vision-language model, outperforming prior methods on benchmarks.
Diffusion-Based Generative Priors for Efficient Beam Alignment in Directional Networks eess.SP · 2026-03-30 · unverdicted · none · ref 8 · internal anchor
A conditional diffusion model learns beam alignment priors to guide efficient top-k sweeps, achieving Hit@1 of 0.61 and improving over deterministic baselines by 180% on simulated data.
GVCC: Zero-Shot Video Compression via Codebook-Driven Stochastic Rectified Flow cs.CV · 2026-03-27 · unverdicted · none · ref 28 · internal anchor
GVCC achieves the lowest LPIPS on UVG at bitrates down to 0.003 bpp by encoding stochastic innovations in a marginal-preserving stochastic process derived from a pretrained rectified-flow video model, with 65% LPIPS reduction over DCVC-RT.
LPNSR: Optimal Noise-Guided Diffusion Image Super-Resolution Via Learnable Noise Prediction cs.CV · 2026-03-22 · conditional · none · ref 13 · internal anchor
LPNSR derives optimal intermediate noise for diffusion SR via MLE and implements it with an LR-guided noise predictor, reaching SOTA perceptual quality in 4 steps without text priors.
Generative Control as Optimization: Time Unconditional Flow Matching for Adaptive and Robust Robotic Control cs.RO · 2026-03-18 · conditional · none · ref 24 · internal anchor
GeCO replaces time-dependent flow matching with time-unconditional optimization, enabling adaptive inference and intrinsic OOD detection for robotic imitation learning.
Setting-Matched and Semantics-Scaled Benchmarking of One-Step Generative Models Against Multistep Diffusion and Flow Models cs.CV · 2026-03-15 · unverdicted · none · ref 23 · internal anchor
Matched benchmarking reveals FID misleads in few-step regimes under CFG, prompting CLIP-scaled and PickScore-scaled FID and IS variants for better semantic evaluation of one-step image generators.
From Baselines to Transport Geodesics: Axiomatic Attribution via Optimal Generative Flows cs.LG · 2026-03-05 · unverdicted · none · ref 10 · internal anchor
Transport-geodesic attribution via optimal generative flows selects principled paths for feature attributions by minimizing kinetic action.
MultiAnimate: Pose-Guided Image Animation Made Extensible cs.CV · 2026-02-25 · unverdicted · none · ref 27 · internal anchor
MultiAnimate adds Identifier Assigner and Identifier Adapter modules to diffusion video models so they can handle multiple characters without identity mix-ups, generalizing from two-character training data to more characters.
Action-to-Action Flow Matching cs.RO · 2026-02-07 · unverdicted · none · ref 16 · internal anchor
A2A flow matching starts action generation from prior proprioceptive actions in latent space to enable single-step high-quality predictions in robotic policies.
Contour Refinement using Discrete Diffusion in Low Data Regime cs.CV · 2026-02-05 · unverdicted · none · ref 19 · internal anchor
A CNN-based discrete diffusion method refines sparse contours from segmentation masks using simplified denoising steps and minimal post-processing, outperforming baselines on small medical and environmental datasets while running 3.5 times faster.
Not All Denoising Steps Are Equal: Model Scheduling for Faster Masked Diffusion Language Models cs.LG · 2026-02-04 · unverdicted · none · ref 35 · internal anchor
Early and late denoising steps in masked diffusion LMs are robust to smaller-model replacement, enabling 17% FLOPs reduction with modest generative quality loss.
Information Filtering via Variational Regularization for Robot Manipulation cs.RO · 2026-01-29 · unverdicted · none · ref 14 · internal anchor
Variational Regularization imposes an adaptive information bottleneck on noisy intermediate features in DP3-UNet and DP3-DiT policies, consistently raising task success rates on RoboTwin2.0, Adroit, and MetaWorld while achieving new state-of-the-art results.
Breaking the Resolution Barrier: Arbitrary-resolution Deep Image Steganography Framework cs.CV · 2026-01-22 · unverdicted · none · ref 22 · internal anchor
ARDIS enables arbitrary-resolution deep image steganography via frequency decoupling in hiding and latent-guided implicit reconstruction for blind recovery.
When Backdoors Meet Partial Observability: Attacking Real-World Reinforcement Learning cs.RO · 2026-01-20 · unverdicted · none · ref 23 · internal anchor
DGBA enables reliable backdoor attacks on real-world RL policies under partial observability by learning stochastic visual triggers via conditional diffusion and using advantage-based poisoning at critical states.
ATATA: One Algorithm to Align Them All cs.CV · 2026-01-16 · unverdicted · none · ref 50 · internal anchor
ATATA enables fast joint inference of structurally aligned pairs using Rectified Flow models via segment transport, improving state-of-the-art for image and video generation while matching 3D quality at much higher speed.
Differentiable Surrogate for Detector Simulation and Design with Diffusion Models physics.ins-det · 2026-01-09 · unverdicted · none · ref 20 · internal anchor
A LoRA-adapted conditional diffusion surrogate for electromagnetic calorimeter showers matches key observables within 2% RMSE and reproduces directional trends in design-utility gradients.
SV-GS: Sparse View 4D Reconstruction with Skeleton-Driven Gaussian Splatting cs.CV · 2026-01-01 · unverdicted · none · ref 46 · internal anchor
SV-GS estimates a time-dependent skeleton pose plus fine deformations to enable 4D Gaussian splatting from sparse views, outperforming prior sparse methods by up to 34% PSNR on synthetic data and matching dense monocular baselines on real data with far fewer frames.
Stream-DiffVSR: Low-Latency Streamable Video Super-Resolution via Auto-Regressive Diffusion cs.CV · 2025-12-29 · conditional · none · ref 64 · internal anchor
Stream-DiffVSR enables practical low-latency video super-resolution by combining a four-step distilled denoiser, auto-regressive temporal guidance, and a temporal processor in a strictly causal pipeline.
Iterative Inference-time Scaling with Adaptive Frequency Steering for Image Super-Resolution cs.CV · 2025-12-29 · unverdicted · none · ref 30 · internal anchor
IAFS is a training-free iterative inference-time scaling framework that uses adaptive frequency-aware particle fusion to resolve the perception-fidelity conflict in diffusion super-resolution models, outperforming prior scaling strategies.
GLUE: Coordinating Pre-Trained Generative Models for System-Level Design cs.CE · 2025-12-22 · conditional · none · ref 26 · internal anchor
GLUE orchestrates frozen pre-trained generative models into a system-level design generator that enforces feasibility, performance, and diversity, with data-driven and data-free variants benchmarked on UAV design.
LangDriveCTRL: Natural Language Controllable Driving Scene Editing with Multi-modal Agents cs.CV · 2025-12-19 · unverdicted · none · ref 37 · internal anchor
LangDriveCTRL decomposes driving videos into 3D scene graphs and uses an agentic pipeline with specialized multi-modal agents to perform language-controlled object and behavior edits, achieving nearly 2x higher instruction alignment than prior state-of-the-art methods.
Omni-Attribute: Open-vocabulary Attribute Encoder for Visual Concept Personalization cs.CV · 2025-12-11 · unverdicted · none · ref 62 · internal anchor
Omni-Attribute is a new open-vocabulary image attribute encoder trained on semantically linked pairs with dual objectives to produce disentangled representations for personalization and compositional generation.
RDSplat: Robust Watermarking for 3D Gaussian Splatting Against 2D and 3D Diffusion Editing cs.CV · 2025-12-07 · conditional · none · ref 42 · internal anchor
RDSplat is the first 3D Gaussian Splatting watermarking method that maintains 0.701 bit accuracy against both 2D and 3D diffusion editing by embedding only in low-frequency primitives selected via FAPS.
Multimodal Diffusion Forcing for Forceful Manipulation cs.RO · 2025-11-06 · unverdicted · none · ref 46 · internal anchor
Multimodal Diffusion Forcing trains a diffusion model on partially masked multimodal robot trajectories to learn temporal and cross-modal dependencies for forceful manipulation.
Noise Aggregation Analysis Driven by Small-Noise Injection: Efficient Membership Inference for Diffusion Models cs.CV · 2025-10-18 · unverdicted · none · ref 34 · internal anchor
Introduces noise aggregation analysis with single-step small-noise injection to enable efficient and accurate membership inference attacks on diffusion models.
Exploring Cross-Modal Flows for Few-Shot Learning cs.CV · 2025-10-16 · unverdicted · none · ref 21 · internal anchor
FMA introduces flow matching for multi-step cross-modal feature alignment in few-shot learning, using fixed coupling, noise augmentation, and early-stopping to outperform one-step PEFT methods.
Coevolutionary Continuous Discrete Diffusion: Make Your Diffusion Language Model a Latent Reasoner cs.AI · 2025-10-03 · unverdicted · none · ref 37 · internal anchor
CCDD defines a joint multimodal diffusion on continuous representation space and discrete token space to combine expressivity with explicit token supervision for diffusion language models.
Diffusion and Flow-based Copulas: Forgetting and Remembering Dependencies stat.ML · 2025-09-24 · unverdicted · none · ref 58 · internal anchor
Diffusion and flow processes forget dependencies to define valid copulas then learn to remember them for density estimation and sampling, outperforming prior copula methods on complex datasets.

Denoising Diffusion Implicit Models

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer