Learning Energy-Based Models from Stochastic Interpolants using Spatiotemporal Differences

Arto Klami; Hanlin Yu; Michael U. Gutmann; Omar Chehab; Partha Kaushik; RuiKang OuYang

arxiv: 2605.26850 · v1 · pith:A5ISV53Qnew · submitted 2026-05-26 · 💻 cs.LG

Learning Energy-Based Models from Stochastic Interpolants using Spatiotemporal Differences

Hanlin Yu , RuiKang OuYang , Partha Kaushik , Arto Klami , Michael U. Gutmann , Omar Chehab This is my paper

Pith reviewed 2026-06-29 19:05 UTC · model grok-4.3

classification 💻 cs.LG

keywords energy-based modelsstochastic interpolantsnoise-contrastive estimationdensity estimationdiffusion modelsspatiotemporal differences

0 comments

The pith

Spatiotemporal differences overcome distinct failure modes when learning energy-based models from stochastic interpolants.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to establish that energy-based models trained on data corrupted via stochastic interpolants can be learned more reliably by estimating the energy through joint differences in both space and time. Purely spatial difference estimators and purely temporal ones each break down in identifiable ways, limiting accuracy in density estimation tasks. The proposed stNCE framework combines the two to create a single consistent estimator that recovers many prior methods as special cases while also generating fresh training objectives. This matters for applications such as image generation and molecular modeling because it offers a route to stable training of models that represent data densities directly.

Core claim

stNCE learns the energy of the joint density over data and time by forming contrastive losses from differences taken simultaneously across spatial coordinates and the interpolation time index; this unifies existing spatial-only and temporal-only approaches and produces new objectives whose performance on image and molecule datasets is competitive with leading density estimation techniques.

What carries the argument

Spatiotemporal Noise-Contrastive Estimation (stNCE), an estimator that constructs noise-contrastive losses from joint spatiotemporal differences of the energy function.

If this is right

Many existing denoising score matching and related objectives for diffusion-style energy models become special cases of the stNCE loss.
New training objectives can be obtained simply by choosing different weighting or sampling schedules inside the stNCE framework.
The resulting models achieve density estimation performance on image and molecule data that matches or approaches current state-of-the-art methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same joint-difference construction may extend to other corruption processes that involve both a spatial domain and an auxiliary scheduling variable.
Because stNCE reduces estimator variance by averaging across two dimensions, it could improve sample efficiency when the data dimension is very large.
The unification suggests that further variants could be derived by replacing the contrastive loss with other divergence measures while keeping the spatiotemporal difference structure.

Load-bearing premise

The distinct failure modes of purely spatial and purely temporal estimators are the dominant practical obstacles, and joint spatiotemporal differences resolve them without creating comparable new instabilities or biases.

What would settle it

A controlled experiment on a low-dimensional synthetic density where stNCE either underperforms the best spatial-only or temporal-only baseline or exhibits new variance or bias patterns not seen in the separate estimators would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.26850 by Arto Klami, Hanlin Yu, Michael U. Gutmann, Omar Chehab, Partha Kaushik, RuiKang OuYang.

**Figure 1.** Figure 1: Left: interpolant density pd(x, t). We learn it via spatial differences in x, temporal differences t, or spatiotemporal differences in both x and t. Right: empirical error of different learning methods. Temporal methods incur high estimation error when the data distribution has high density in areas of low density for the noise distribution (abusively, “support mismatch”). Spatial methods incur high estima… view at source ↗

**Figure 2.** Figure 2: Ability to compute log-likelihood differences between two data points using learned spatial [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Choice of kernels. Left: departs from the data manifold (blue). Right: remains on the data manifold (blue). Mixture kernel. This kernel perturbs either x or t: pn(x ′ , t′ | x, t) = 1 2 pn(t ′ | t) δx(x ′ ) + 1 2 pn(x ′ | x) δt(t ′ ). (17) We refer to this variant as stNCE-m. It recovers a mixture of tNCE and tCNCE objectives, as we show in Appendix B. White noise kernel. This kernel perturbs both x and t… view at source ↗

**Figure 4.** Figure 4: Samples drawn using stNCE-s+DSM [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: ImageNet test set images with the highest densities (left) and the lowest densities (right) as [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

**Figure 6.** Figure 6: Left: samples drawn using model trained using stNCE-s; right: samples drawn using model [PITH_FULL_IMAGE:figures/full_fig_p042_6.png] view at source ↗

**Figure 7.** Figure 7: We generate density-density plots, where the model’s learned energies are normalized using log [PITH_FULL_IMAGE:figures/full_fig_p042_7.png] view at source ↗

**Figure 8.** Figure 8: Samples drawn using diffusion sampler (left) and molecular dynamic simulations (right) with [PITH_FULL_IMAGE:figures/full_fig_p043_8.png] view at source ↗

read the original abstract

Learning an energy-based model from data samples is a central problem in machine learning. Many recent and popular methods, such as denoising score matching for training energy-based diffusion models, use stochastic interpolants to corrupt data samples at different noise levels indexed by a time variable. This defines a joint density over both the data space and time, and most methods learn its energy through either spatial or temporal differences. We identify distinct failure modes for both of these approaches. To solve them, we propose Spatiotemporal Noise-Contrastive Estimation (stNCE), a framework for learning the energy through joint spatiotemporal differences. stNCE unifies many existing methods and leads to new training objectives. Experiments on images and molecules demonstrate performance competitive with state-of-the-art density estimation methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

stNCE gives a joint spatiotemporal difference objective that unifies some prior spatial-only and temporal-only estimators for energy-based models on stochastic interpolants, with competitive image and molecule results claimed.

read the letter

The main thing here is a new training objective called stNCE that learns the energy of a joint density over data and time by taking differences in both space and time at once. The abstract says this fixes separate failure modes that show up when you use only spatial differences or only temporal ones, and that the new objective recovers several existing methods as special cases while also giving fresh ones.

What stands out is the attempt to treat the interpolant as a joint object rather than handling the two axes separately. If the derivations hold, that unification could let people pick objectives more systematically instead of switching between score matching variants. The experiments on images and molecules are presented as competitive with current density estimation baselines, which is the right kind of check for this kind of work.

The soft spot is that the failure modes for the pure spatial and pure temporal estimators are asserted without the actual derivations or counter-examples visible in the abstract, so it is not yet clear how general those modes are or whether the joint version introduces its own biases at scale. The unification claim also needs the full algebra to confirm it does not collapse to something already published. Experiments are only summarized, so we cannot yet judge variance, ablations, or whether the gains are robust across different interpolants.

This is for people already working on energy-based diffusion or score-based models who want to try a different contrastive-style loss. It is worth a serious referee if the math section actually shows the claimed reductions and the experiments include the necessary controls; otherwise it stays at the level of an interesting proposal.

Referee Report

0 major / 3 minor

Summary. The manuscript proposes Spatiotemporal Noise-Contrastive Estimation (stNCE) for learning energy-based models from data via stochastic interpolants. It identifies distinct failure modes of purely spatial and purely temporal difference estimators for the joint density over data and time, introduces joint spatiotemporal differences to address them, shows that stNCE unifies existing approaches such as denoising score matching, derives new training objectives, and reports competitive performance against state-of-the-art density estimation methods on image and molecule datasets.

Significance. If the derivations and experiments hold, the work supplies a unified perspective on training EBMs with stochastic interpolants and a practical mechanism for combining spatial and temporal information. The explicit identification of failure modes and the resulting new objectives could influence subsequent work on diffusion-style and energy-based generative models. The competitive empirical results on standard benchmarks add evidence that the approach is viable for density estimation tasks.

minor comments (3)

[§3.2] §3.2, Eq. (8): the definition of the spatiotemporal difference operator is introduced without an accompanying explicit expansion or pseudocode; adding one would make the unification claim easier to verify against prior spatial-only and temporal-only estimators.
[Table 2] Table 2: the reported negative log-likelihood values lack standard deviations or number of runs; including these would strengthen the claim of competitiveness with SOTA methods.
[Figure 5] Figure 5 caption: the visualization of learned energies does not state the temperature or normalization used, which affects direct comparison to the theoretical joint density.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of the manuscript, recognition of its significance in unifying training methods for energy-based models via stochastic interpolants, and recommendation for minor revision. No specific major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity; derivation rests on standard stochastic interpolants without self-referential reduction

full rationale

The paper's central proposal introduces stNCE as a framework using joint spatiotemporal differences on stochastic interpolants to learn energy-based models. The abstract and provided material describe identifying failure modes in spatial/temporal estimators and unifying existing methods via this approach, without any equations or steps that reduce a claimed prediction or result back to a fitted parameter or self-citation by construction. No load-bearing self-citation chains, ansatzes smuggled via prior work, or renamings of known results are evident. The derivation appears self-contained against external benchmarks like standard density estimation methods, with experiments asserted as competitive but not internally forced by the proposal itself.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only view yields no identifiable free parameters, axioms, or invented entities; the method is described at the level of a new estimation principle built on existing stochastic interpolants.

pith-pipeline@v0.9.1-grok · 5673 in / 1008 out tokens · 33229 ms · 2026-06-29T19:05:58.316926+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Few-Step Boltzmann Generators via Scalable Likelihood Flow Maps
cs.LG 2026-06 unverdicted novelty 6.0

SCALLOP replaces Hutchinson's trace estimator with a scalable, vectorized likelihood distillation objective for F2D2 flow maps, cutting training variance and time while improving performance on molecular Boltzmann gen...

Reference graph

Works this paper leans on

83 extracted references · 10 canonical work pages · cited by 1 Pith paper · 3 internal anchors

[1]

Aggarwal, J

R. Aggarwal, J. Chen, N. M. Boffi, and D. Koes. BoltzNCE: Learning likelihoods for boltzmann generation with stochastic interpolants and noise contrastive estimation. InConference on neural information processing systems, 2025

2025
[2]

X. Ai, Y. He, A. Gu, R. Salakhutdinov, J. Kolter, N. Boffi, and M. Simchowitz. Joint distillation for fast likelihood evaluation and sampling in flow-based models. InInternational Conference on Learning Representations, 2026

2026
[3]

Albergo, N

M. Albergo, N. M. Boffi, and E. Vanden-Eijnden. Stochastic interpolants: A unifying framework for flows and diffusions.Journal of Machine Learning Research, 26(209):1–80, 2025

2025
[4]

Ansel, E

J. Ansel, E. Yang, H. He, N. Gimelshein, A. Jain, M. Voznesensky, B. Bao, P. Bell, D. Berard, E. Burovski, G. Chauhan, A. Chourdia, W. Constable, A. Desmaison, Z. DeVito, E. Ellison, W. Feng, J. Gong, M. Gschwind, B. Hirsh, S. Huang, K. Kalambarkar, L. Kirsch, M. Lazos, M. Lezcano, Y. Liang, J. Liang, Y. Lu, C. Luk, B. Maher, Y. Pan, C. Puhrsch, M. Reso, ...

2024
[5]

Bartosh, D

G. Bartosh, D. Vetrov, and C. A. Naesseth. Neural flow diffusion models: Learnable forward process for improved diffusion modelling. InAnnual Conference on Neural Information Processing Systems, 2024

2024
[6]

Batatia, D

I. Batatia, D. P. Kov´ acs, G. N. C. Simm, C. Ortner, and G. Cs´ anyi. Mace: Higher order equivariant message passing neural networks for fast and accurate force fields, 2023

2023
[7]

Behrmann, W

J. Behrmann, W. Grathwohl, R. T. Q. Chen, D. Duvenaud, and J.-H. Jacobsen. Invertible residual networks. InInternational Conference on Machine Learning, volume 97, pages 573–582. PMLR, 09–15 Jun 2019

2019
[8]

L. Biewald. Experiment tracking with weights and biases, 2020. URLhttps://www.wandb.com/

2020
[9]

N. M. Boffi, M. S. Albergo, and E. Vanden-Eijnden. How to build a consistency model: Learning flow maps via self-distillation. InThe thirty-ninth annual conference on neural information processing systems, 2026

2026
[10]

Bradbury, R

J. Bradbury, R. Frostig, P. Hawkins, M. J. Johnson, Y. Katariya, C. Leary, D. Maclaurin, G. Necula, A. Paszke, J. VanderPlas, S. Wanderman-Milne, and Q. Zhang. JAX: composable transformations of Python+NumPy programs, 2018. URLhttp://github.com/jax-ml/jax

2018
[11]

C. Ceylan. Conditional noise-contrastive estimation: With application to natural image statistics. Master’s thesis, KTH, School of Computer Science and Communication (CSC), Stockholm, Sweden,
[12]

Independent thesis, advanced level (20 credits / 30 HE credits)
[13]

Ceylan and M

C. Ceylan and M. U. Gutmann. Conditional Noise-Contrastive Estimation of Unnormalised Models. InInternational Conference on Machine Learning, volume 80, pages 726–734. PMLR, July 2018

2018
[14]

Chehab, A

O. Chehab, A. Hyvarinen, and A. Risteski. Provable benefits of annealing for estimating normalizing constants: Importance sampling, noise-contrastive estimation, and beyond. InConference on Neural Information Processing Systems, 2023

2023
[15]

R. T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. Duvenaud. Neural ordinary differential equa- tions, 2019

2019
[16]

W. Chen, S. Li, J. Li, J. Xu, Z. Lin, J. Yang, D. Zeng, J. Paisley, and Q. Zhao. Diffusion secant alignment for score-based density ratio estimation, 2025

2025
[17]

W. Chen, S. Li, J. Li, J. Yang, J. Paisley, and D. Zeng. Dequantified diffusion-schr¨ odinger bridge for density ratio estimation. InInternational Conference on Machine Learning, volume 267, pages 8427–8452. PMLR, 13–19 Jul 2025. 13

2025
[18]

W. Chen, J. Li, S. Li, Z. Lin, J. Yang, J. Paisley, and D. Zeng. A minimum variance path principle for accurate and stable score-based density ratio estimation. InInternational Conference on Learning Representations, 2026

2026
[19]

Chewi, A

S. Chewi, A. Kalavasis, A. Mehrotra, and O. Montasser. DDPM score matching is asymptotically efficient. InICLR 2025 Workshop on Deep Generative Model in Machine Learning: Theory, Principle and Efficacy, 2025

2025
[20]

K. Choi, C. Meng, Y. Song, and S. Ermon. Density ratio estimation via infinitesimal classification. InInternational Conference on Artificial Intelligence and Statistics, volume 151, pages 2552–2573. PMLR, 28–30 Mar 2022

2022
[21]

Chrabaszcz, I

P. Chrabaszcz, I. Loshchilov, and F. Hutter. A downsampled variant of imagenet as an alternative to the cifar datasets, 2017

2017
[22]

Doucet, W

A. Doucet, W. Grathwohl, A. G. Matthews, and H. Strathmann. Score-based diffusion meets an- nealed importance sampling. InAdvances in Neural Information Processing Systems, volume 35, pages 21482–21494. Curran Associates, Inc., 2022

2022
[23]

Du and I

Y. Du and I. Mordatch. Implicit generation and modeling with energy-based models. Ininternational conference on neural information processing systems. Curran Associates Inc., Red Hook, NY, USA, 2019

2019
[24]

Y. Du, C. Durkan, R. Strudel, J. B. Tenenbaum, S. Dieleman, R. Fergus, J. Sohl-Dickstein, A. Doucet, and W. S. Grathwohl. Reduce, Reuse, Recycle: Compositional Generation with Energy- Based Diffusion Models and MCMC. InInternational Conference on Machine Learning, volume 202, pages 8489–8510. PMLR, July 2023

2023
[25]

Eastman, R

P. Eastman, R. Galvelis, R. P. Pel´ aez, C. R. A. Abreu, S. E. Farr, E. Gallicchio, A. Gorenko, M. M. Henry, F. Hu, J. Huang, A. Kr¨ amer, J. Michel, J. A. Mitchell, V. S. Pande, J. P. Rodrigues, J. Rodriguez-Guerra, A. C. Simmonett, S. Singh, J. Swails, P. Turner, Y. Wang, I. Zhang, J. D. Chodera, G. De Fabritiis, and T. E. Markland. OpenMM 8: Molecular ...

work page doi:10.1021/acs.jpcb.3c06662 2024
[26]

R. Gao, E. Nijkamp, D. P. Kingma, Z. Xu, A. M. Dai, and Y. N. Wu. Flow contrastive estimation of energy-based models. In2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pages 7515–7525, 2020

2020
[27]

R. Gao, Y. Song, B. Poole, Y. N. Wu, and D. P. Kingma. Learning energy-based models by diffusion recovery likelihood. InInternational conference on learning representations, 2021

2021
[28]

Gonzalez, N

M. Gonzalez, N. Fernandez Pinto, T. Tran, e. Gherbi, H. Hajri, and N. Masmoudi. SEEDS: Ex- ponential SDE solvers for fast high-quality sampling from diffusion models. InAdvances in neural information processing systems, volume 36, pages 68061–68120. Curran Associates, Inc., 2023

2023
[29]

Goodfellow, J

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative Adversarial Networks.Commun. ACM, 63(11):139–144, Oct. 2020. ISSN 0001-0782. Place: New York, NY, USA

2020
[30]

Grathwohl, R

W. Grathwohl, R. T. Q. Chen, J. Bettencourt, and D. Duvenaud. Scalable reversible generative mod- els with free-form continuous dynamics. InInternational Conference on Learning Representations, 2019

2019
[31]

Gugger, L

S. Gugger, L. Debut, T. Wolf, P. Schmid, Z. Mueller, S. Mangrulkar, M. Sun, and B. Bossan. Accelerate: Training and inference at scale made simple, efficient and adaptable., 2022. URLhttps: //github.com/huggingface/accelerate

2022
[32]

F. Guth, Z. Kadkhodaie, and E. P. Simoncelli. Learning normalized image densities via dual score matching, 2025. eprint: 2506.05310

work page arXiv 2025
[33]

M. U. Gutmann and J.-i. Hirayama. Bregman divergence as general framework to estimate un- normalized statistical models. Inconference on uncertainty in artificial intelligence, UAI’11, pages 283–290, Barcelona, Spain, 2011. AUAI Press. ISBN 978-0-9749039-7-2. 14

2011
[34]

M. U. Gutmann and A. Hyv¨ arinen. Noise-Contrastive Estimation of Unnormalized Statistical Mod- els, with Applications to Natural Image Statistics.Journal of Machine Learning Research, 13(11): 307–361, 2012

2012
[35]

J. He, J. M. Hern´ andez-Lobato, Y. Du, and F. Vargas. RNE: plug-and-play diffusion inference-time control and energy-based training. InInternational Conference on Learning Representations, 2026

2026
[36]

G. E. Hinton. Training products of experts by minimizing contrastive divergence.Neural Computa- tion, 14(8):1771–1800, 2002. doi: 10.1162/089976602760128018

work page doi:10.1162/089976602760128018 2002
[37]

G. E. Hinton. Training products of experts by minimizing contrastive divergence.Neural Comput., 14(8):1771–1800, Aug. 2002. ISSN 0899-7667

2002
[38]

Holderrieth, U

P. Holderrieth, U. Singer, T. Jaakkola, R. T. Q. Chen, Y. Lipman, and B. Karrer. GLASS flows: Efficient inference for reward alignment of flow and diffusion models. InInternational conference on learning representations, 2026

2026
[39]

B. E. Husic, N. E. Charron, D. Lemm, J. Wang, A. P´ erez, M. Majewski, A. Kr¨ amer, Y. Chen, S. Olsson, G. de Fabritiis, F. No´ e, and C. Clementi. Coarse graining molecular dynamics with graph neural networks.The Journal of Chemical Physics, 153(19), Nov. 2020. ISSN 1089-7690. doi: 10.1063/5.0026133. URLhttp://dx.doi.org/10.1063/5.0026133

work page doi:10.1063/5.0026133 2020
[40]

Hyv¨ arinen, J

A. Hyv¨ arinen, J. Karhunen, and E. Oja.Independent Component Analysis. Adaptive and Cognitive Dynamic Systems: Signal Processing, Learning, Communications and Control. Wiley, 2004. ISBN 9780471464198

2004
[41]

Hyv¨ arinen

A. Hyv¨ arinen. Estimation of Non-Normalized Statistical Models by Score Matching.Journal of Machine Learning Research, 6(24):695–709, 2005

2005
[42]

Karras, M

T. Karras, M. Aittala, T. Aila, and S. Laine. Elucidating the Design Space of Diffusion-Based Generative Models. InAdvances in Neural Information Processing Systems, volume 35, pages 26565– 26577. Curran Associates, Inc., 2022

2022
[43]

Kingma, T

D. Kingma, T. Salimans, B. Poole, and J. Ho. Variational diffusion models. InAdvances in Neural Information Processing Systems, volume 34, pages 21696–21707. Curran Associates, Inc., 2021

2021
[44]

D. P. Kingma and J. Ba. Adam: A Method for Stochastic Optimization. InInternational Conference on Learning Representations, 2015

2015
[45]

D. P. Kingma and P. Dhariwal. Glow: Generative flow with invertible 1x1 convolutions. InAdvances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018

2018
[46]

Koehler, A

F. Koehler, A. Heckett, and A. Risteski. Statistical efficiency of score matching: The view from isoperimetry. InThe Eleventh International Conference on Learning Representations, 2023

2023
[47]

C.-H. Lai, Y. Takida, N. Murata, T. Uesaka, Y. Mitsufuji, and S. Ermon. FP-diffusion: Improving score-based diffusion models by enforcing the underlying score fokker-planck equation. InInterna- tional Conference on Machine Learning, volume 202, pages 18365–18398. PMLR, 23–29 Jul 2023

2023
[48]

LeCun, C

Y. LeCun, C. Cortes, and C. Burges. MNIST handwritten digit database.ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist, 2, 2010

2010
[49]

Back to Basics: Let Denoising Generative Models Denoise

T. Li and K. He. Back to basics: Let denoising generative models denoise, 2026. arXiv: 2511.13720 [cs.CV]

work page internal anchor Pith review Pith/arXiv arXiv 2026
[50]

Lindorff-Larsen, S

K. Lindorff-Larsen, S. Piana, R. O. Dror, and D. E. Shaw. How fast-folding proteins fold.Science, 334(6055):517–520, 2011. doi: 10.1126/science.1208351

work page doi:10.1126/science.1208351 2011
[51]

Lipman, R

Y. Lipman, R. T. Q. Chen, H. Ben-Hamu, M. Nickel, and M. Le. Flow Matching for Generative Modeling. InThe Eleventh International Conference on Learning Representations, 2023

2023
[52]

Flow Matching Guide and Code

Y. Lipman, M. Havasi, P. Holderrieth, N. Shaul, M. Le, B. Karrer, R. T. Q. Chen, D. Lopez-Paz, H. Ben-Hamu, and I. Gat. Flow matching guide and code, 2024. arXiv: 2412.06264 [cs.LG]. 15

work page internal anchor Pith review Pith/arXiv arXiv 2024
[53]

Loshchilov and F

I. Loshchilov and F. Hutter. Decoupled weight decay regularization. InInternational conference on learning representations, 2019

2019
[54]

C. Lu, Y. Zhou, F. Bao, J. Chen, C. LI, and J. Zhu. DPM-solver: a fast ODE solver for diffusion probabilistic model sampling in around 10 steps. InAdvances in neural information processing systems, volume 35, pages 5775–5787. Curran Associates, Inc., 2022

2022
[55]

M´ at´ e, S

B. M´ at´ e, S. Klein, T. Golling, and F. Fleuret. Flowification: Everything is a normalizing flow. In Advances in Neural Information Processing Systems, 2022

2022
[56]

Molgedey and H

L. Molgedey and H. G. Schuster. Separation of a mixture of independent signals using time delayed correlations.Phys. Rev. Lett., 72:3634–3637, Jun 1994

1994
[57]

A. Q. Nichol and P. Dhariwal. Improved denoising diffusion probabilistic models. InInternational Conference on Machine Learning, volume 139, pages 8162–8171. PMLR, 18–24 Jul 2021

2021
[58]

tex.eprint: https://www.science.org/doi/pdf/10.1126/science.aaw1147

F. No´ e, S. Olsson, J. K¨ ohler, and H. Wu. Boltzmann generators: Sampling equilibrium states of many-body systems with deep learning.Science, 365(6457):eaaw1147, 2019. tex.eprint: https://www.science.org/doi/pdf/10.1126/science.aaw1147

work page doi:10.1126/science.aaw1147 2019
[59]

A Diffusive Classification Loss for Learning Energy-based Generative Models

R. OuYang, L. Grenioux, and J. M. Hern´ andez-Lobato. A diffusive classification loss for learning energy-based generative models, 2026. arXiv: 2601.21025 [stat.ML]

work page internal anchor Pith review Pith/arXiv arXiv 2026
[60]

Plainer, H

M. Plainer, H. Wu, L. Klein, S. G¨ unnemann, and F. Noe. Consistent sampling and simulation: Molecular dynamics with energy-based diffusion models. InThe thirty-ninth annual conference on neural information processing systems, 2026

2026
[61]

Qin and A

Y. Qin and A. Risteski. Fit like you sample: Sample-efficient generalized score matching from fast mixing diffusions. InConference on Learning Theory, volume 247, pages 4413–4457. PMLR, 30 Jun–03 Jul 2024

2024
[62]

Rhodes, K

B. Rhodes, K. Xu, and M. U. Gutmann. Telescoping density-ratio estimation. InAdvances in Neural Information Processing Systems, volume 33, pages 4905–4916. Curran Associates, Inc., 2020

2020
[63]

Skreta, T

M. Skreta, T. Akhound-Sadegh, V. Ohanesian, R. Bondesan, A. Aspuru-Guzik, A. Doucet, R. Brekel- mans, A. Tong, and K. Neklyudov. Feynman-kac correctors in diffusion: Annealing, guidance, and product of experts. InInternational Conference on Machine Learning, 2025

2025
[64]

Song and S

Y. Song and S. Ermon. Generative Modeling by Estimating Gradients of the Data Distribution. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019

2019
[65]

Y. Song, C. Meng, and S. Ermon. Mintnet: Building invertible neural networks with masked convo- lutions. InNeural Information Processing Systems, volume 32. Curran Associates, Inc., 2019

2019
[66]

Y. Song, S. Garg, J. Shi, and S. Ermon. Sliced Score Matching: A Scalable Approach to Density and Score Estimation. InUncertainty in Artificial Intelligence Conference, volume 115, pages 574–584. PMLR, July 2020

2020
[67]

Y. Song, C. Durkan, I. Murray, and S. Ermon. Maximum Likelihood Training of Score-Based Diffusion Models. InAdvances in Neural Information Processing Systems, volume 34, pages 1415–
[68]

Curran Associates, Inc., 2021

2021
[69]

Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole. Score-Based Gener- ative Modeling through Stochastic Differential Equations. InInternational Conference on Learning Representations, 2021

2021
[70]

Y. Song, P. Dhariwal, M. Chen, and I. Sutskever. Consistency Models. InInternational Conference on Machine Learning, ICML’23. JMLR.org, 2023. Place: Honolulu, Hawaii, USA

2023
[71]

Srivastava, S

A. Srivastava, S. Han, K. Xu, B. Rhodes, and M. U. Gutmann. Estimating the Density Ratio between Distributions with High Discrepancy using Multinomial Logistic Regression.Transactions on Machine Learning Research, 2023. ISSN 2835-8856

2023
[72]

Sugiyama, T

M. Sugiyama, T. Suzuki, and T. Kanamori.Density ratio estimation in machine learning. Cambridge University Press, 2012. 16

2012
[73]

Thornton, L

J. Thornton, L. Bethune, R. Zhang, A. Bradley, P. Nakkiran, and S. Zhai. Composition and control with distilled energy diffusion models and sequential monte carlo. InAISTATS 2025, 2025

2025
[74]

Uehara, T

M. Uehara, T. Matsuda, and F. Komaki. Analysis of noise contrastive estimation from the perspective of asymptotic variance, 2018

2018
[75]

van den Oord, N

A. van den Oord, N. Kalchbrenner, L. Espeholt, k. kavukcuoglu, O. Vinyals, and A. Graves. Con- ditional image generation with pixelcnn decoders. InAdvances in Neural Information Processing Systems, volume 29. Curran Associates, Inc., 2016

2016
[76]

P. Vincent. A Connection Between Score Matching and Denoising Autoencoders.Neural Computa- tion, 23(7):1661–1674, 2011

2011
[77]

L. K. Wenliang and H. Kanagawa. Blindness of score-based methods to isolated components and mixing proportions, 2021. eprint: 2008.10087

work page arXiv 2021
[78]

Yair and T

O. Yair and T. Michaeli. Contrastive Divergence Learning is a Time Reversal Adversarial Game. In International Conference on Learning Representations, 2021

2021
[79]

H. Yu, M. U. Gutmann, A. Klami, and O. Chehab. Conditional noise-contrastive estimation of energy-based models by jumping between modes. InEurIPS 2025 workshop on principles of gener- ative modeling (PriGM), 2025

2025
[80]

H. Yu, A. Klami, A. Hyvarinen, A. Korba, and O. Chehab. Density Ratio Estimation with Condi- tional Probability Paths. InForty-second International Conference on Machine Learning, 2025

2025

Showing first 80 references.

[1] [1]

Aggarwal, J

R. Aggarwal, J. Chen, N. M. Boffi, and D. Koes. BoltzNCE: Learning likelihoods for boltzmann generation with stochastic interpolants and noise contrastive estimation. InConference on neural information processing systems, 2025

2025

[2] [2]

X. Ai, Y. He, A. Gu, R. Salakhutdinov, J. Kolter, N. Boffi, and M. Simchowitz. Joint distillation for fast likelihood evaluation and sampling in flow-based models. InInternational Conference on Learning Representations, 2026

2026

[3] [3]

Albergo, N

M. Albergo, N. M. Boffi, and E. Vanden-Eijnden. Stochastic interpolants: A unifying framework for flows and diffusions.Journal of Machine Learning Research, 26(209):1–80, 2025

2025

[4] [4]

Ansel, E

J. Ansel, E. Yang, H. He, N. Gimelshein, A. Jain, M. Voznesensky, B. Bao, P. Bell, D. Berard, E. Burovski, G. Chauhan, A. Chourdia, W. Constable, A. Desmaison, Z. DeVito, E. Ellison, W. Feng, J. Gong, M. Gschwind, B. Hirsh, S. Huang, K. Kalambarkar, L. Kirsch, M. Lazos, M. Lezcano, Y. Liang, J. Liang, Y. Lu, C. Luk, B. Maher, Y. Pan, C. Puhrsch, M. Reso, ...

2024

[5] [5]

Bartosh, D

G. Bartosh, D. Vetrov, and C. A. Naesseth. Neural flow diffusion models: Learnable forward process for improved diffusion modelling. InAnnual Conference on Neural Information Processing Systems, 2024

2024

[6] [6]

Batatia, D

I. Batatia, D. P. Kov´ acs, G. N. C. Simm, C. Ortner, and G. Cs´ anyi. Mace: Higher order equivariant message passing neural networks for fast and accurate force fields, 2023

2023

[7] [7]

Behrmann, W

J. Behrmann, W. Grathwohl, R. T. Q. Chen, D. Duvenaud, and J.-H. Jacobsen. Invertible residual networks. InInternational Conference on Machine Learning, volume 97, pages 573–582. PMLR, 09–15 Jun 2019

2019

[8] [8]

L. Biewald. Experiment tracking with weights and biases, 2020. URLhttps://www.wandb.com/

2020

[9] [9]

N. M. Boffi, M. S. Albergo, and E. Vanden-Eijnden. How to build a consistency model: Learning flow maps via self-distillation. InThe thirty-ninth annual conference on neural information processing systems, 2026

2026

[10] [10]

Bradbury, R

J. Bradbury, R. Frostig, P. Hawkins, M. J. Johnson, Y. Katariya, C. Leary, D. Maclaurin, G. Necula, A. Paszke, J. VanderPlas, S. Wanderman-Milne, and Q. Zhang. JAX: composable transformations of Python+NumPy programs, 2018. URLhttp://github.com/jax-ml/jax

2018

[11] [11]

C. Ceylan. Conditional noise-contrastive estimation: With application to natural image statistics. Master’s thesis, KTH, School of Computer Science and Communication (CSC), Stockholm, Sweden,

[12] [12]

Independent thesis, advanced level (20 credits / 30 HE credits)

[13] [13]

Ceylan and M

C. Ceylan and M. U. Gutmann. Conditional Noise-Contrastive Estimation of Unnormalised Models. InInternational Conference on Machine Learning, volume 80, pages 726–734. PMLR, July 2018

2018

[14] [14]

Chehab, A

O. Chehab, A. Hyvarinen, and A. Risteski. Provable benefits of annealing for estimating normalizing constants: Importance sampling, noise-contrastive estimation, and beyond. InConference on Neural Information Processing Systems, 2023

2023

[15] [15]

R. T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. Duvenaud. Neural ordinary differential equa- tions, 2019

2019

[16] [16]

W. Chen, S. Li, J. Li, J. Xu, Z. Lin, J. Yang, D. Zeng, J. Paisley, and Q. Zhao. Diffusion secant alignment for score-based density ratio estimation, 2025

2025

[17] [17]

W. Chen, S. Li, J. Li, J. Yang, J. Paisley, and D. Zeng. Dequantified diffusion-schr¨ odinger bridge for density ratio estimation. InInternational Conference on Machine Learning, volume 267, pages 8427–8452. PMLR, 13–19 Jul 2025. 13

2025

[18] [18]

W. Chen, J. Li, S. Li, Z. Lin, J. Yang, J. Paisley, and D. Zeng. A minimum variance path principle for accurate and stable score-based density ratio estimation. InInternational Conference on Learning Representations, 2026

2026

[19] [19]

Chewi, A

S. Chewi, A. Kalavasis, A. Mehrotra, and O. Montasser. DDPM score matching is asymptotically efficient. InICLR 2025 Workshop on Deep Generative Model in Machine Learning: Theory, Principle and Efficacy, 2025

2025

[20] [20]

K. Choi, C. Meng, Y. Song, and S. Ermon. Density ratio estimation via infinitesimal classification. InInternational Conference on Artificial Intelligence and Statistics, volume 151, pages 2552–2573. PMLR, 28–30 Mar 2022

2022

[21] [21]

Chrabaszcz, I

P. Chrabaszcz, I. Loshchilov, and F. Hutter. A downsampled variant of imagenet as an alternative to the cifar datasets, 2017

2017

[22] [22]

Doucet, W

A. Doucet, W. Grathwohl, A. G. Matthews, and H. Strathmann. Score-based diffusion meets an- nealed importance sampling. InAdvances in Neural Information Processing Systems, volume 35, pages 21482–21494. Curran Associates, Inc., 2022

2022

[23] [23]

Du and I

Y. Du and I. Mordatch. Implicit generation and modeling with energy-based models. Ininternational conference on neural information processing systems. Curran Associates Inc., Red Hook, NY, USA, 2019

2019

[24] [24]

Y. Du, C. Durkan, R. Strudel, J. B. Tenenbaum, S. Dieleman, R. Fergus, J. Sohl-Dickstein, A. Doucet, and W. S. Grathwohl. Reduce, Reuse, Recycle: Compositional Generation with Energy- Based Diffusion Models and MCMC. InInternational Conference on Machine Learning, volume 202, pages 8489–8510. PMLR, July 2023

2023

[25] [25]

Eastman, R

P. Eastman, R. Galvelis, R. P. Pel´ aez, C. R. A. Abreu, S. E. Farr, E. Gallicchio, A. Gorenko, M. M. Henry, F. Hu, J. Huang, A. Kr¨ amer, J. Michel, J. A. Mitchell, V. S. Pande, J. P. Rodrigues, J. Rodriguez-Guerra, A. C. Simmonett, S. Singh, J. Swails, P. Turner, Y. Wang, I. Zhang, J. D. Chodera, G. De Fabritiis, and T. E. Markland. OpenMM 8: Molecular ...

work page doi:10.1021/acs.jpcb.3c06662 2024

[26] [26]

R. Gao, E. Nijkamp, D. P. Kingma, Z. Xu, A. M. Dai, and Y. N. Wu. Flow contrastive estimation of energy-based models. In2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pages 7515–7525, 2020

2020

[27] [27]

R. Gao, Y. Song, B. Poole, Y. N. Wu, and D. P. Kingma. Learning energy-based models by diffusion recovery likelihood. InInternational conference on learning representations, 2021

2021

[28] [28]

Gonzalez, N

M. Gonzalez, N. Fernandez Pinto, T. Tran, e. Gherbi, H. Hajri, and N. Masmoudi. SEEDS: Ex- ponential SDE solvers for fast high-quality sampling from diffusion models. InAdvances in neural information processing systems, volume 36, pages 68061–68120. Curran Associates, Inc., 2023

2023

[29] [29]

Goodfellow, J

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative Adversarial Networks.Commun. ACM, 63(11):139–144, Oct. 2020. ISSN 0001-0782. Place: New York, NY, USA

2020

[30] [30]

Grathwohl, R

W. Grathwohl, R. T. Q. Chen, J. Bettencourt, and D. Duvenaud. Scalable reversible generative mod- els with free-form continuous dynamics. InInternational Conference on Learning Representations, 2019

2019

[31] [31]

Gugger, L

S. Gugger, L. Debut, T. Wolf, P. Schmid, Z. Mueller, S. Mangrulkar, M. Sun, and B. Bossan. Accelerate: Training and inference at scale made simple, efficient and adaptable., 2022. URLhttps: //github.com/huggingface/accelerate

2022

[32] [32]

F. Guth, Z. Kadkhodaie, and E. P. Simoncelli. Learning normalized image densities via dual score matching, 2025. eprint: 2506.05310

work page arXiv 2025

[33] [33]

M. U. Gutmann and J.-i. Hirayama. Bregman divergence as general framework to estimate un- normalized statistical models. Inconference on uncertainty in artificial intelligence, UAI’11, pages 283–290, Barcelona, Spain, 2011. AUAI Press. ISBN 978-0-9749039-7-2. 14

2011

[34] [34]

M. U. Gutmann and A. Hyv¨ arinen. Noise-Contrastive Estimation of Unnormalized Statistical Mod- els, with Applications to Natural Image Statistics.Journal of Machine Learning Research, 13(11): 307–361, 2012

2012

[35] [35]

J. He, J. M. Hern´ andez-Lobato, Y. Du, and F. Vargas. RNE: plug-and-play diffusion inference-time control and energy-based training. InInternational Conference on Learning Representations, 2026

2026

[36] [36]

G. E. Hinton. Training products of experts by minimizing contrastive divergence.Neural Computa- tion, 14(8):1771–1800, 2002. doi: 10.1162/089976602760128018

work page doi:10.1162/089976602760128018 2002

[37] [37]

G. E. Hinton. Training products of experts by minimizing contrastive divergence.Neural Comput., 14(8):1771–1800, Aug. 2002. ISSN 0899-7667

2002

[38] [38]

Holderrieth, U

P. Holderrieth, U. Singer, T. Jaakkola, R. T. Q. Chen, Y. Lipman, and B. Karrer. GLASS flows: Efficient inference for reward alignment of flow and diffusion models. InInternational conference on learning representations, 2026

2026

[39] [39]

B. E. Husic, N. E. Charron, D. Lemm, J. Wang, A. P´ erez, M. Majewski, A. Kr¨ amer, Y. Chen, S. Olsson, G. de Fabritiis, F. No´ e, and C. Clementi. Coarse graining molecular dynamics with graph neural networks.The Journal of Chemical Physics, 153(19), Nov. 2020. ISSN 1089-7690. doi: 10.1063/5.0026133. URLhttp://dx.doi.org/10.1063/5.0026133

work page doi:10.1063/5.0026133 2020

[40] [40]

Hyv¨ arinen, J

A. Hyv¨ arinen, J. Karhunen, and E. Oja.Independent Component Analysis. Adaptive and Cognitive Dynamic Systems: Signal Processing, Learning, Communications and Control. Wiley, 2004. ISBN 9780471464198

2004

[41] [41]

Hyv¨ arinen

A. Hyv¨ arinen. Estimation of Non-Normalized Statistical Models by Score Matching.Journal of Machine Learning Research, 6(24):695–709, 2005

2005

[42] [42]

Karras, M

T. Karras, M. Aittala, T. Aila, and S. Laine. Elucidating the Design Space of Diffusion-Based Generative Models. InAdvances in Neural Information Processing Systems, volume 35, pages 26565– 26577. Curran Associates, Inc., 2022

2022

[43] [43]

Kingma, T

D. Kingma, T. Salimans, B. Poole, and J. Ho. Variational diffusion models. InAdvances in Neural Information Processing Systems, volume 34, pages 21696–21707. Curran Associates, Inc., 2021

2021

[44] [44]

D. P. Kingma and J. Ba. Adam: A Method for Stochastic Optimization. InInternational Conference on Learning Representations, 2015

2015

[45] [45]

D. P. Kingma and P. Dhariwal. Glow: Generative flow with invertible 1x1 convolutions. InAdvances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018

2018

[46] [46]

Koehler, A

F. Koehler, A. Heckett, and A. Risteski. Statistical efficiency of score matching: The view from isoperimetry. InThe Eleventh International Conference on Learning Representations, 2023

2023

[47] [47]

C.-H. Lai, Y. Takida, N. Murata, T. Uesaka, Y. Mitsufuji, and S. Ermon. FP-diffusion: Improving score-based diffusion models by enforcing the underlying score fokker-planck equation. InInterna- tional Conference on Machine Learning, volume 202, pages 18365–18398. PMLR, 23–29 Jul 2023

2023

[48] [48]

LeCun, C

Y. LeCun, C. Cortes, and C. Burges. MNIST handwritten digit database.ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist, 2, 2010

2010

[49] [49]

Back to Basics: Let Denoising Generative Models Denoise

T. Li and K. He. Back to basics: Let denoising generative models denoise, 2026. arXiv: 2511.13720 [cs.CV]

work page internal anchor Pith review Pith/arXiv arXiv 2026

[50] [50]

Lindorff-Larsen, S

K. Lindorff-Larsen, S. Piana, R. O. Dror, and D. E. Shaw. How fast-folding proteins fold.Science, 334(6055):517–520, 2011. doi: 10.1126/science.1208351

work page doi:10.1126/science.1208351 2011

[51] [51]

Lipman, R

Y. Lipman, R. T. Q. Chen, H. Ben-Hamu, M. Nickel, and M. Le. Flow Matching for Generative Modeling. InThe Eleventh International Conference on Learning Representations, 2023

2023

[52] [52]

Flow Matching Guide and Code

Y. Lipman, M. Havasi, P. Holderrieth, N. Shaul, M. Le, B. Karrer, R. T. Q. Chen, D. Lopez-Paz, H. Ben-Hamu, and I. Gat. Flow matching guide and code, 2024. arXiv: 2412.06264 [cs.LG]. 15

work page internal anchor Pith review Pith/arXiv arXiv 2024

[53] [53]

Loshchilov and F

I. Loshchilov and F. Hutter. Decoupled weight decay regularization. InInternational conference on learning representations, 2019

2019

[54] [54]

C. Lu, Y. Zhou, F. Bao, J. Chen, C. LI, and J. Zhu. DPM-solver: a fast ODE solver for diffusion probabilistic model sampling in around 10 steps. InAdvances in neural information processing systems, volume 35, pages 5775–5787. Curran Associates, Inc., 2022

2022

[55] [55]

M´ at´ e, S

B. M´ at´ e, S. Klein, T. Golling, and F. Fleuret. Flowification: Everything is a normalizing flow. In Advances in Neural Information Processing Systems, 2022

2022

[56] [56]

Molgedey and H

L. Molgedey and H. G. Schuster. Separation of a mixture of independent signals using time delayed correlations.Phys. Rev. Lett., 72:3634–3637, Jun 1994

1994

[57] [57]

A. Q. Nichol and P. Dhariwal. Improved denoising diffusion probabilistic models. InInternational Conference on Machine Learning, volume 139, pages 8162–8171. PMLR, 18–24 Jul 2021

2021

[58] [58]

tex.eprint: https://www.science.org/doi/pdf/10.1126/science.aaw1147

F. No´ e, S. Olsson, J. K¨ ohler, and H. Wu. Boltzmann generators: Sampling equilibrium states of many-body systems with deep learning.Science, 365(6457):eaaw1147, 2019. tex.eprint: https://www.science.org/doi/pdf/10.1126/science.aaw1147

work page doi:10.1126/science.aaw1147 2019

[59] [59]

A Diffusive Classification Loss for Learning Energy-based Generative Models

R. OuYang, L. Grenioux, and J. M. Hern´ andez-Lobato. A diffusive classification loss for learning energy-based generative models, 2026. arXiv: 2601.21025 [stat.ML]

work page internal anchor Pith review Pith/arXiv arXiv 2026

[60] [60]

Plainer, H

M. Plainer, H. Wu, L. Klein, S. G¨ unnemann, and F. Noe. Consistent sampling and simulation: Molecular dynamics with energy-based diffusion models. InThe thirty-ninth annual conference on neural information processing systems, 2026

2026

[61] [61]

Qin and A

Y. Qin and A. Risteski. Fit like you sample: Sample-efficient generalized score matching from fast mixing diffusions. InConference on Learning Theory, volume 247, pages 4413–4457. PMLR, 30 Jun–03 Jul 2024

2024

[62] [62]

Rhodes, K

B. Rhodes, K. Xu, and M. U. Gutmann. Telescoping density-ratio estimation. InAdvances in Neural Information Processing Systems, volume 33, pages 4905–4916. Curran Associates, Inc., 2020

2020

[63] [63]

Skreta, T

M. Skreta, T. Akhound-Sadegh, V. Ohanesian, R. Bondesan, A. Aspuru-Guzik, A. Doucet, R. Brekel- mans, A. Tong, and K. Neklyudov. Feynman-kac correctors in diffusion: Annealing, guidance, and product of experts. InInternational Conference on Machine Learning, 2025

2025

[64] [64]

Song and S

Y. Song and S. Ermon. Generative Modeling by Estimating Gradients of the Data Distribution. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019

2019

[65] [65]

Y. Song, C. Meng, and S. Ermon. Mintnet: Building invertible neural networks with masked convo- lutions. InNeural Information Processing Systems, volume 32. Curran Associates, Inc., 2019

2019

[66] [66]

Y. Song, S. Garg, J. Shi, and S. Ermon. Sliced Score Matching: A Scalable Approach to Density and Score Estimation. InUncertainty in Artificial Intelligence Conference, volume 115, pages 574–584. PMLR, July 2020

2020

[67] [67]

Y. Song, C. Durkan, I. Murray, and S. Ermon. Maximum Likelihood Training of Score-Based Diffusion Models. InAdvances in Neural Information Processing Systems, volume 34, pages 1415–

[68] [68]

Curran Associates, Inc., 2021

2021

[69] [69]

Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole. Score-Based Gener- ative Modeling through Stochastic Differential Equations. InInternational Conference on Learning Representations, 2021

2021

[70] [70]

Y. Song, P. Dhariwal, M. Chen, and I. Sutskever. Consistency Models. InInternational Conference on Machine Learning, ICML’23. JMLR.org, 2023. Place: Honolulu, Hawaii, USA

2023

[71] [71]

Srivastava, S

A. Srivastava, S. Han, K. Xu, B. Rhodes, and M. U. Gutmann. Estimating the Density Ratio between Distributions with High Discrepancy using Multinomial Logistic Regression.Transactions on Machine Learning Research, 2023. ISSN 2835-8856

2023

[72] [72]

Sugiyama, T

M. Sugiyama, T. Suzuki, and T. Kanamori.Density ratio estimation in machine learning. Cambridge University Press, 2012. 16

2012

[73] [73]

Thornton, L

J. Thornton, L. Bethune, R. Zhang, A. Bradley, P. Nakkiran, and S. Zhai. Composition and control with distilled energy diffusion models and sequential monte carlo. InAISTATS 2025, 2025

2025

[74] [74]

Uehara, T

M. Uehara, T. Matsuda, and F. Komaki. Analysis of noise contrastive estimation from the perspective of asymptotic variance, 2018

2018

[75] [75]

van den Oord, N

A. van den Oord, N. Kalchbrenner, L. Espeholt, k. kavukcuoglu, O. Vinyals, and A. Graves. Con- ditional image generation with pixelcnn decoders. InAdvances in Neural Information Processing Systems, volume 29. Curran Associates, Inc., 2016

2016

[76] [76]

P. Vincent. A Connection Between Score Matching and Denoising Autoencoders.Neural Computa- tion, 23(7):1661–1674, 2011

2011

[77] [77]

L. K. Wenliang and H. Kanagawa. Blindness of score-based methods to isolated components and mixing proportions, 2021. eprint: 2008.10087

work page arXiv 2021

[78] [78]

Yair and T

O. Yair and T. Michaeli. Contrastive Divergence Learning is a Time Reversal Adversarial Game. In International Conference on Learning Representations, 2021

2021

[79] [79]

H. Yu, M. U. Gutmann, A. Klami, and O. Chehab. Conditional noise-contrastive estimation of energy-based models by jumping between modes. InEurIPS 2025 workshop on principles of gener- ative modeling (PriGM), 2025

2025

[80] [80]

H. Yu, A. Klami, A. Hyvarinen, A. Korba, and O. Chehab. Density Ratio Estimation with Condi- tional Probability Paths. InForty-second International Conference on Machine Learning, 2025

2025