Energy-navigated trajectory shaping during training produces 8-step discrete flow matching students that achieve 32% lower perplexity than 1024-step teachers on 170M language models with unchanged inference cost.
hub
How to train your energy-based models
14 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
background 2representative citing papers
Patient-specific energy manifolds from baseline mpMRI scans act as fixed geometric references to monitor longitudinal evolution of voxel distributions in sequence space for neuro-oncology proof-of-concept cases.
Langevin sampling on the modern Hopfield energy produces training-free stochastic attention that transitions from exact retrieval to generation as temperature rises, with an entropy inflection condition marking the shift.
CreTTA reformulates test-time adaptation of marginal distributions as residual energy learning, producing a contrastive objective that cancels the partition function and uses relative energy differences for adaptive gradient reweighting to avoid overfitting.
Derives Õ(d β² A² / ε⁴) oracle complexity for AIS estimating normalizing constant Z to relative error ε and introduces reverse diffusion sampler for geometric paths with large action.
Stochastic interpolants unify flow-based and diffusion-based generative models by bridging target densities exactly via latent-variable processes whose drifts minimize quadratic objectives.
SDEdit performs guided image synthesis and editing by adding noise to inputs and refining them via denoising with a diffusion model's SDE prior, outperforming GAN methods in human studies without task-specific training.
Decentralized diffusion policies trained with importance sampling score matching enhance exploration and performance in cooperative MARL over Gaussian policy baselines.
Training and sampling in static scalar energy generative models are two instances of the same Lyapunov-driven density transport dynamics on Wasserstein space, differing only by initial condition, which yields a finite stopping criterion for Langevin sampling and additive composition rules that keep
CLIC uses set-valued action targets from interactive human corrections instead of pointwise labels to train more robust imitation learning policies.
Edwin integrates dynamic maximum entropy dimensionality reduction with symbolic regression to recover physically interpretable low-dimensional dynamics from high-dimensional observations that generalize to unseen conditions.
Score-difference flow reduces KL divergence between distributions and is formally equivalent to denoising diffusion models and a hidden subproblem in optimal GAN training under stated conditions.
The tutorial synthesizes diffusion model techniques for generative semantic communications to achieve high compression while preserving meaning in wireless transmission.
citing papers explorer
-
Trajectory as the Teacher: Few-Step Discrete Flow Matching via Energy-Navigated Distillation
Energy-navigated trajectory shaping during training produces 8-step discrete flow matching students that achieve 32% lower perplexity than 1024-step teachers on 170M language models with unchanged inference cost.
-
Energy-based Tissue Manifolds for Longitudinal Multiparametric MRI Analysis
Patient-specific energy manifolds from baseline mpMRI scans act as fixed geometric references to monitor longitudinal evolution of voxel distributions in sequence space for neuro-oncology proof-of-concept cases.
-
Stochastic Attention via Langevin Dynamics on the Modern Hopfield Energy
Langevin sampling on the modern Hopfield energy produces training-free stochastic attention that transitions from exact retrieval to generation as temperature rises, with an entropy inflection condition marking the shift.
-
Contrastive Residual Energy Test-time Adaptation
CreTTA reformulates test-time adaptation of marginal distributions as residual energy learning, producing a contrastive objective that cancels the partition function and uses relative energy differences for adaptive gradient reweighting to avoid overfitting.
-
Complexity Analysis of Normalizing Constant Estimation: from Jarzynski Equality to Annealed Importance Sampling and beyond
Derives Õ(d β² A² / ε⁴) oracle complexity for AIS estimating normalizing constant Z to relative error ε and introduces reverse diffusion sampler for geometric paths with large action.
-
Stochastic Interpolants: A Unifying Framework for Flows and Diffusions
Stochastic interpolants unify flow-based and diffusion-based generative models by bridging target densities exactly via latent-variable processes whose drifts minimize quadratic objectives.
-
SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations
SDEdit performs guided image synthesis and editing by adding noise to inputs and refining them via denoising with a diffusion model's SDE prior, outperforming GAN methods in human studies without task-specific training.
-
Decentralized Diffusion Policy Learning for Enhanced Exploration in Cooperative Multi-agent Reinforcement Learning
Decentralized diffusion policies trained with importance sampling score matching enhance exploration and performance in cooperative MARL over Gaussian policy baselines.
-
Energy Generative Modeling: A Lyapunov-based Energy Matching Perspective
Training and sampling in static scalar energy generative models are two instances of the same Lyapunov-driven density transport dynamics on Wasserstein space, differing only by initial condition, which yields a finite stopping criterion for Langevin sampling and additive composition rules that keep
-
From Action Labels to Sets: Rethinking Action Supervision for Imitation Learning from Corrective Feedback
CLIC uses set-valued action targets from interactive human corrections instead of pointwise labels to train more robust imitation learning policies.
-
Discovering interpretable low-dimensional dynamics using maximum entropy
Edwin integrates dynamic maximum entropy dimensionality reduction with symbolic regression to recover physically interpretable low-dimensional dynamics from high-dimensional observations that generalize to unseen conditions.
-
The Score-Difference Flow for Implicit Generative Modeling
Score-difference flow reduces KL divergence between distributions and is formally equivalent to denoising diffusion models and a hidden subproblem in optimal GAN training under stated conditions.
-
Generative AI Meets 6G and Beyond: Diffusion Models for Semantic Communications
The tutorial synthesizes diffusion model techniques for generative semantic communications to achieve high compression while preserving meaning in wireless transmission.
- Autoregressive Language Models are Secretly Energy-Based Models: Insights into the Lookahead Capabilities of Next-Token Prediction