VAMP-Diff: VampPrior Latent Diffusion for Photoplethysmography Modeling

Bahman Moraffah; Fatemeh Ghasemi Balouei; Mahesh Banavar; Nathan Willemsen

arxiv: 2605.22851 · v1 · pith:63S4EYXVnew · submitted 2026-05-17 · 📡 eess.SP · cs.LG· eess.IV

VAMP-Diff: VampPrior Latent Diffusion for Photoplethysmography Modeling

Fatemeh Ghasemi Balouei , Nathan Willemsen , Mahesh Banavar , Bahman Moraffah This is my paper

Pith reviewed 2026-05-25 00:16 UTC · model grok-4.3

classification 📡 eess.SP cs.LGeess.IV

keywords photoplethysmographyPPGlatent diffusionVampPriorsignal reconstructiongenerative modelingphysiological signalsCapnoBase

0 comments

The pith

VAMP-Diff jointly trains a temporal PPG encoder, conditional diffusion decoder, and VampPrior on pooled latent to generate realistic signals and reconstruct sharper waveforms than Gaussian baselines while preserving heart and respiratory-rh

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces VAMP-Diff as a variational diffusion model for photoplethysmography signals that combines a temporal encoder, a conditional one-dimensional diffusion decoder, and VampPrior regularization applied to a compact pooled latent. This architecture lets the decoder condition on the full temporal latent during reconstruction rather than sampling from a fixed Gaussian, giving it direct access to beat timing and morphology details. The model is trained end-to-end so that generated samples come from learned VampPrior mixture components. On the CapnoBase dataset the approach yields realistic PPG waveforms, sharper systolic features than Gaussian-prior baselines, preserved heart-rate values, consistent respiratory-rate estimates, and increased reconstruction error on corrupted inputs. A reader would care because the combination supplies both a generative path and an inference path that earlier adversarial or standard variational methods lacked.

Core claim

VAMP-Diff is a jointly trained variational diffusion model consisting of a temporal PPG encoder, a conditional one-dimensional diffusion decoder, and VampPrior regularization on a compact pooled latent; the decoder receives the full temporal latent during diffusion reconstruction, allowing it to recover beat timing and morphology while sampling from the learned VampPrior components instead of a fixed Gaussian prior.

What carries the argument

Conditional one-dimensional diffusion decoder that receives the full temporal latent produced by the encoder and regularized by VampPrior on the pooled latent

If this is right

VAMP-Diff produces realistic PPG signals on the CapnoBase dataset
It reconstructs sharper physiological waveforms than Gaussian-prior baselines
Heart-rate information is preserved in the reconstructions
Respiratory-rate consistency is maintained across generated and reconstructed signals
Reconstruction error rises when input waveforms contain corruptions

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The VampPrior mixture could support clustering of distinct physiological states such as different heart-rate regimes
Reconstruction-error sensitivity offers a route to automated PPG quality scoring without additional classifiers
The same joint-training pattern might transfer to other periodic biosignals such as ECG or arterial pressure waveforms
Using the full temporal latent rather than a pooled code could reduce the blurring that standard VAEs introduce in systolic upstrokes

Load-bearing premise

Joint training of the temporal encoder, conditional diffusion decoder, and VampPrior on the pooled latent allows the decoder to access beat timing and morphology via the full temporal latent during reconstruction.

What would settle it

A direct comparison on the CapnoBase dataset in which VAMP-Diff reconstructions show no increase in sharpness or loss of heart-rate fidelity relative to Gaussian-prior baselines would falsify the claimed advantage.

Figures

Figures reproduced from arXiv: 2605.22851 by Bahman Moraffah, Fatemeh Ghasemi Balouei, Mahesh Banavar, Nathan Willemsen.

**Figure 1.** Figure 1: VAMP-Diff architecture showing the posterior path (top), VampPrior path (bottom), and diffusion decoder (right). [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗

**Figure 2.** Figure 2: Reconstruction of the same test PPG window by three models. (a) Vanilla VAE: good pointwise reconstruction but morphologically smoothed. (b) [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Unconditional generation samples compared to a real PPG window. [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Score distributions, ROC curves, AUROC per corruption type, and [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 6.** Figure 6: Decoded signals from latents interpolated between a low-HR ( [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗

read the original abstract

Photoplethysmography (PPG) has become a ubiquitous physiological signal; however, current generative models still struggle to preserve realistic waveform morphology and learn a latent structure that captures cardiac and respiratory physiology. PPG generators trained with adversarial losses can produce plausible waveforms, but provide no inference path from a real signal to a latent representation. Variational autoencoders, on the other hand, map the PPG data to latent codes, although their decoders often blur systolic upstrokes and dampen amplitude and spectral details. Diffusion models improve waveform fidelity, but typically lack an inference path for reconstruction and physiological analysis. We propose VampPrior Latent Diffusion (VAMP-Diff), a jointly trained variational diffusion model that combines a temporal PPG encoder, a conditional one-dimensional diffusion decoder, and VampPrior regularization on a compact pooled latent. The model uses full temporal latent during diffusion reconstruction, giving the decoder access to beat timing and morphology while generating samples from learned VampPrior components instead of a fixed Gaussian prior. We demonstrate on the CapnoBase dataset that VAMP-Diff produces realistic PPG signals, reconstructs sharper physiological waveforms than Gaussian-prior baselines, preserves heart-rate information, maintains respiratory-rate consistency, and is sensitive to waveform corruptions through reconstruction error.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

VAMP-Diff pairs a temporal encoder and conditional diffusion decoder with VampPrior on a pooled latent to give PPG generation plus an inference path, which lines up with the stated goals.

read the letter

The main contribution is a joint model that encodes PPG into a full temporal latent, regularizes a compact pooled version with VampPrior, and feeds the full latent into a 1D conditional diffusion decoder at reconstruction time. This setup directly targets the gaps called out in the abstract: VAEs that blur systolic upstrokes, adversarial models without an inference route, and standard diffusion without reconstruction. The architecture description is internally consistent and gives the decoder access to beat timing while sampling from learned mixture components rather than a fixed Gaussian. The stress-test note finds no contradiction in how the pieces fit, which is fair given the design. On the positive side, the motivation is practical for a signal used in wearables, and the choice to keep the full temporal latent during diffusion avoids losing morphology information. The soft spot is the evaluation. The abstract claims sharper waveforms, preserved heart-rate and respiratory-rate information, and sensitivity to corruptions on CapnoBase, but those are empirical assertions whose size and reliability depend on the actual metrics, baselines, and controls in the full paper. Without seeing effect sizes or variance, it is hard to tell whether the gains are substantial or modest. This paper is aimed at researchers working on generative models for physiological time series, especially those who need both sampling and reconstruction for data augmentation or analysis. A reader focused on signal processing for health monitoring would find the architecture details useful to examine. It shows clear thinking on the model components and how they address the listed limitations, so it deserves a serious referee to check the experiments and derivations rather than a desk reject.

Referee Report

1 major / 0 minor

Summary. The manuscript proposes VAMP-Diff, a jointly trained variational diffusion model for photoplethysmography (PPG) that combines a temporal PPG encoder producing a full temporal latent, VampPrior regularization on a compact pooled latent, and a conditional 1D diffusion decoder that receives the full temporal latent at reconstruction time. The central claim is that this architecture generates realistic PPG signals on the CapnoBase dataset, reconstructs sharper physiological waveforms than Gaussian-prior baselines, preserves heart-rate information, maintains respiratory-rate consistency, and detects waveform corruptions via reconstruction error.

Significance. If the empirical results hold with proper validation, the approach could advance generative modeling of physiological signals by providing an inference path (unlike pure diffusion or GANs) while using a flexible VampPrior and full temporal conditioning to better preserve beat timing and morphology. The architecture description is internally consistent and avoids obvious circularity.

major comments (1)

[Abstract] Abstract: the claim of empirical superiority (sharper waveforms, HR/RR preservation, corruption sensitivity) on CapnoBase is asserted without any methods details, quantitative metrics, error bars, statistical tests, or derivation steps, rendering the central empirical claims unverifiable from the provided text.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed review and constructive comment regarding the abstract. We agree that the abstract should better support its empirical claims with high-level quantitative indicators to improve verifiability, and we will revise the manuscript accordingly while preserving the abstract's brevity.

read point-by-point responses

Referee: [Abstract] Abstract: the claim of empirical superiority (sharper waveforms, HR/RR preservation, corruption sensitivity) on CapnoBase is asserted without any methods details, quantitative metrics, error bars, statistical tests, or derivation steps, rendering the central empirical claims unverifiable from the provided text.

Authors: We agree that the abstract would be strengthened by including concise quantitative support for the stated claims. In the revised manuscript we will update the abstract to report key metrics from the CapnoBase experiments (e.g., reconstruction MSE, Pearson correlation for heart rate, mean absolute error for respiratory rate, and AUC for corruption detection) together with a brief reference to the evaluation protocol. Full methods, error bars, statistical tests, and derivation details will remain in Sections 3–5 as is conventional; the abstract revision will make the superiority claims verifiable at a summary level without exceeding typical length constraints. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The manuscript describes an empirical ML architecture (temporal encoder + conditional 1D diffusion decoder + VampPrior on pooled latent) and reports dataset results. No equations, derivations, predictions, or first-principles claims appear in the abstract or model description. No self-citation chains, fitted inputs renamed as predictions, or ansatzes are present. The architecture is presented as a design choice whose performance is evaluated externally on CapnoBase; nothing reduces to its inputs by construction. This is the expected non-finding for a methods paper without a claimed mathematical derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no specific free parameters, axioms, or invented entities can be identified or extracted from the provided information.

pith-pipeline@v0.9.0 · 5772 in / 1267 out tokens · 40597 ms · 2026-05-25T00:16:47.137729+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages · 4 internal anchors

[1]

Photoplethysmography and its application in clinical phys- iological measurement,

J. Allen, “Photoplethysmography and its application in clinical phys- iological measurement,”Physiological Measurement, vol. 28, no. 3, pp. R1–R39, 2007

work page 2007
[2]

On the analysis of fingertip photoplethysmogram signals,

M. Elgendi, “On the analysis of fingertip photoplethysmogram signals,” Current Cardiology Reviews, vol. 8, no. 1, pp. 14–25, 2012

work page 2012
[3]

Wearable photoplethysmography for cardiovascular monitoring,

P. H. Charlton, T. Bonnici, L. Tarassenko, and D. A. Clifton, “Wearable photoplethysmography for cardiovascular monitoring,”Proceedings of the IEEE, vol. 110, no. 3, pp. 355–381, 2022

work page 2022
[4]

Respiratory rate estimation using ppg: A deep learning approach,

D. Bian, P. Mehta, and N. Selvaraj, “Respiratory rate estimation using ppg: A deep learning approach,” in2020 42nd annual international conference of the IEEE engineering in Medicine & Biology Society (EMBC), pp. 5948–5952, IEEE, 2020

work page 2020
[5]

An end-to-end and accurate ppg-based respiratory rate estimation approach using cycle generative adversarial networks,

S. A. H. Aqajari, R. Cao, A. H. A. Zargari, and A. M. Rahmani, “An end-to-end and accurate ppg-based respiratory rate estimation approach using cycle generative adversarial networks,” in2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), pp. 744–747, IEEE, 2021

work page 2021
[6]

Generative adversarial networks,

I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y . Bengio, “Generative adversarial networks,” 2014

work page 2014
[7]

Pgans: Personalized generative adversarial networks for ecg synthesis to improve patient-specific deep ecg classifi- cation,

T. Golany and K. Radinsky, “Pgans: Personalized generative adversarial networks for ecg synthesis to improve patient-specific deep ecg classifi- cation,”Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 557–564, 07 2019

work page 2019
[8]

P2e-wgan: Ecg waveform synthesis from ppg with conditional wasserstein generative adversarial networks,

K. V o, E. K. Naeini, A. Naderi, D. Jilani, A. M. Rahmani, N. Dutt, and H. Cao, “P2e-wgan: Ecg waveform synthesis from ppg with conditional wasserstein generative adversarial networks,” inProceedings of the 36th Annual ACM Symposium on Applied Computing, pp. 1030–1036, 2021

work page 2021
[9]

Time-series generative adversarial networks,

J. Yoon, D. Jarrett, and M. van der Schaar, “Time-series generative adversarial networks,” inAdvances in Neural Information Processing Systems(H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alch ´e-Buc, E. Fox, and R. Garnett, eds.), vol. 32, Curran Associates, Inc., 2019

work page 2019
[10]

Denoising diffusion probabilistic models,

J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” Advances in neural information processing systems, vol. 33, pp. 6840– 6851, 2020

work page 2020
[11]

Deep unsupervised learning using nonequilibrium thermodynamics,

J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli, “Deep unsupervised learning using nonequilibrium thermodynamics,” inInternational conference on machine learning, pp. 2256–2265, pmlr, 2015

work page 2015
[12]

Diffusion-based conditional ecg generation with structured state space models,

J. M. L. Alcaraz and N. Strodthoff, “Diffusion-based conditional ecg generation with structured state space models,”Computers in biology and medicine, vol. 163, p. 107115, 2023

work page 2023
[13]

Csdi: Conditional score- based diffusion models for probabilistic time series imputation,

Y . Tashiro, J. Song, Y . Song, and S. Ermon, “Csdi: Conditional score- based diffusion models for probabilistic time series imputation,” 2021

work page 2021
[14]

High- resolution image synthesis with latent diffusion models,

R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High- resolution image synthesis with latent diffusion models,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion (CVPR), pp. 10684–10695, June 2022

work page 2022
[15]

Diffusion autoencoders: Toward a meaningful and decodable represen- tation,

K. Preechakul, N. Chatthee, S. Wizadwongsa, and S. Suwajanakorn, “Diffusion autoencoders: Toward a meaningful and decodable represen- tation,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10619–10629, 2022

work page 2022
[16]

Score-based generative modeling in latent space,

A. Vahdat, K. Kreis, and J. Kautz, “Score-based generative modeling in latent space,” 2021

work page 2021
[17]

Auto-Encoding Variational Bayes

D. P. Kingma and M. Welling, “Auto-encoding variational bayes,”arXiv preprint arXiv:1312.6114, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[18]

Nvae: A deep hierarchical variational au- toencoder,

A. Vahdat and J. Kautz, “Nvae: A deep hierarchical variational au- toencoder,” inAdvances in Neural Information Processing Systems (H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, eds.), vol. 33, pp. 19667–19679, Curran Associates, Inc., 2020

work page 2020
[19]

Very deep vaes generalize autoregressive models and can outperform them on images,

R. Child, “Very deep vaes generalize autoregressive models and can outperform them on images,”arXiv preprint arXiv:2011.10650, 2020

work page arXiv 2011
[20]

Diffusion priors in variational autoen- coders,

A. Wehenkel and G. Louppe, “Diffusion priors in variational autoen- coders,”arXiv preprint arXiv:2106.15671, 2021

work page arXiv 2021
[21]

Variational diffusion models,

D. Kingma, T. Salimans, B. Poole, and J. Ho, “Variational diffusion models,”Advances in neural information processing systems, vol. 34, pp. 21696–21707, 2021

work page 2021
[22]

V AE with a VampPrior,

J. Tomczak and M. Welling, “V AE with a VampPrior,” inProceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics(A. Storkey and F. Perez-Cruz, eds.), vol. 84 ofProceedings of Machine Learning Research, pp. 1214–1223, PMLR, 2018

work page 2018
[23]

Hierarchical vae with a diffusion-based vampprior,

A. Kuzina and J. M. Tomczak, “Hierarchical vae with a diffusion-based vampprior,”arXiv preprint arXiv:2412.01373, 2024

work page arXiv 2024
[24]

Capnobase 8-minute (long) dataset,

W. Karlen, “Capnobase 8-minute (long) dataset,” 2021

work page 2021
[25]

Seeing red: Ppg biometrics using smartphone cameras,

G. Lovisotto, H. Turner, S. Eberz, and I. Martinovic, “Seeing red: Ppg biometrics using smartphone cameras,” in2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), p. 3565–3574, IEEE, June 2020

work page 2020
[26]

Descriptor: Contactless fingerprint image streams and heart rate (cfishr),

N. G. Venkataswamy, O. Olugbenle, M. K. Banavar, and M. H. Im- tiaz, “Descriptor: Contactless fingerprint image streams and heart rate (cfishr),”IEEE Data Descriptions, 2025

work page 2025
[27]

Toward a robust estimation of respiratory rate from pulse oximeters,

M. A. F. Pimentel, A. E. W. Johnson, P. H. Charlton, D. Birrenkott, P. J. Watkinson, L. Tarassenko, and D. A. Clifton, “Toward a robust estimation of respiratory rate from pulse oximeters,”IEEE Transactions on Biomedical Engineering, vol. 64, no. 8, pp. 1914–1923, 2017

work page 1914
[28]

Tutorial: Deriving the Standard Variational Autoencoder (VAE) Loss Function

S. Odaibo, “Tutorial: Deriving the standard variational autoencoder (vae) loss function,”arXiv preprint arXiv:1907.08956, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1907
[29]

U-Net: Convolutional Networks for Biomedical Image Segmentation

O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,”CoRR, vol. abs/1505.04597, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[30]

Denoising Diffusion Implicit Models

J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,” arXiv preprint arXiv:2010.02502, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010
[31]

Film: Visual reasoning with a general conditioning layer,

E. Perez, F. Strub, H. De Vries, V . Dumoulin, and A. Courville, “Film: Visual reasoning with a general conditioning layer,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 32, 2018. APPENDIXA PROOF OFTHEOREM1 Proof.Letq A ϕ (ez|x 0) =A #qϕ(z|x 0)andp A ψ(ez) = 1 K PK k=1 qA ϕ (ez|u k). Given the variational identity on the compact l...

work page 2018

[1] [1]

Photoplethysmography and its application in clinical phys- iological measurement,

J. Allen, “Photoplethysmography and its application in clinical phys- iological measurement,”Physiological Measurement, vol. 28, no. 3, pp. R1–R39, 2007

work page 2007

[2] [2]

On the analysis of fingertip photoplethysmogram signals,

M. Elgendi, “On the analysis of fingertip photoplethysmogram signals,” Current Cardiology Reviews, vol. 8, no. 1, pp. 14–25, 2012

work page 2012

[3] [3]

Wearable photoplethysmography for cardiovascular monitoring,

P. H. Charlton, T. Bonnici, L. Tarassenko, and D. A. Clifton, “Wearable photoplethysmography for cardiovascular monitoring,”Proceedings of the IEEE, vol. 110, no. 3, pp. 355–381, 2022

work page 2022

[4] [4]

Respiratory rate estimation using ppg: A deep learning approach,

D. Bian, P. Mehta, and N. Selvaraj, “Respiratory rate estimation using ppg: A deep learning approach,” in2020 42nd annual international conference of the IEEE engineering in Medicine & Biology Society (EMBC), pp. 5948–5952, IEEE, 2020

work page 2020

[5] [5]

An end-to-end and accurate ppg-based respiratory rate estimation approach using cycle generative adversarial networks,

S. A. H. Aqajari, R. Cao, A. H. A. Zargari, and A. M. Rahmani, “An end-to-end and accurate ppg-based respiratory rate estimation approach using cycle generative adversarial networks,” in2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), pp. 744–747, IEEE, 2021

work page 2021

[6] [6]

Generative adversarial networks,

I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y . Bengio, “Generative adversarial networks,” 2014

work page 2014

[7] [7]

Pgans: Personalized generative adversarial networks for ecg synthesis to improve patient-specific deep ecg classifi- cation,

T. Golany and K. Radinsky, “Pgans: Personalized generative adversarial networks for ecg synthesis to improve patient-specific deep ecg classifi- cation,”Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 557–564, 07 2019

work page 2019

[8] [8]

P2e-wgan: Ecg waveform synthesis from ppg with conditional wasserstein generative adversarial networks,

K. V o, E. K. Naeini, A. Naderi, D. Jilani, A. M. Rahmani, N. Dutt, and H. Cao, “P2e-wgan: Ecg waveform synthesis from ppg with conditional wasserstein generative adversarial networks,” inProceedings of the 36th Annual ACM Symposium on Applied Computing, pp. 1030–1036, 2021

work page 2021

[9] [9]

Time-series generative adversarial networks,

J. Yoon, D. Jarrett, and M. van der Schaar, “Time-series generative adversarial networks,” inAdvances in Neural Information Processing Systems(H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alch ´e-Buc, E. Fox, and R. Garnett, eds.), vol. 32, Curran Associates, Inc., 2019

work page 2019

[10] [10]

Denoising diffusion probabilistic models,

J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” Advances in neural information processing systems, vol. 33, pp. 6840– 6851, 2020

work page 2020

[11] [11]

Deep unsupervised learning using nonequilibrium thermodynamics,

J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli, “Deep unsupervised learning using nonequilibrium thermodynamics,” inInternational conference on machine learning, pp. 2256–2265, pmlr, 2015

work page 2015

[12] [12]

Diffusion-based conditional ecg generation with structured state space models,

J. M. L. Alcaraz and N. Strodthoff, “Diffusion-based conditional ecg generation with structured state space models,”Computers in biology and medicine, vol. 163, p. 107115, 2023

work page 2023

[13] [13]

Csdi: Conditional score- based diffusion models for probabilistic time series imputation,

Y . Tashiro, J. Song, Y . Song, and S. Ermon, “Csdi: Conditional score- based diffusion models for probabilistic time series imputation,” 2021

work page 2021

[14] [14]

High- resolution image synthesis with latent diffusion models,

R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High- resolution image synthesis with latent diffusion models,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion (CVPR), pp. 10684–10695, June 2022

work page 2022

[15] [15]

Diffusion autoencoders: Toward a meaningful and decodable represen- tation,

K. Preechakul, N. Chatthee, S. Wizadwongsa, and S. Suwajanakorn, “Diffusion autoencoders: Toward a meaningful and decodable represen- tation,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10619–10629, 2022

work page 2022

[16] [16]

Score-based generative modeling in latent space,

A. Vahdat, K. Kreis, and J. Kautz, “Score-based generative modeling in latent space,” 2021

work page 2021

[17] [17]

Auto-Encoding Variational Bayes

D. P. Kingma and M. Welling, “Auto-encoding variational bayes,”arXiv preprint arXiv:1312.6114, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[18] [18]

Nvae: A deep hierarchical variational au- toencoder,

A. Vahdat and J. Kautz, “Nvae: A deep hierarchical variational au- toencoder,” inAdvances in Neural Information Processing Systems (H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, eds.), vol. 33, pp. 19667–19679, Curran Associates, Inc., 2020

work page 2020

[19] [19]

Very deep vaes generalize autoregressive models and can outperform them on images,

R. Child, “Very deep vaes generalize autoregressive models and can outperform them on images,”arXiv preprint arXiv:2011.10650, 2020

work page arXiv 2011

[20] [20]

Diffusion priors in variational autoen- coders,

A. Wehenkel and G. Louppe, “Diffusion priors in variational autoen- coders,”arXiv preprint arXiv:2106.15671, 2021

work page arXiv 2021

[21] [21]

Variational diffusion models,

D. Kingma, T. Salimans, B. Poole, and J. Ho, “Variational diffusion models,”Advances in neural information processing systems, vol. 34, pp. 21696–21707, 2021

work page 2021

[22] [22]

V AE with a VampPrior,

J. Tomczak and M. Welling, “V AE with a VampPrior,” inProceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics(A. Storkey and F. Perez-Cruz, eds.), vol. 84 ofProceedings of Machine Learning Research, pp. 1214–1223, PMLR, 2018

work page 2018

[23] [23]

Hierarchical vae with a diffusion-based vampprior,

A. Kuzina and J. M. Tomczak, “Hierarchical vae with a diffusion-based vampprior,”arXiv preprint arXiv:2412.01373, 2024

work page arXiv 2024

[24] [24]

Capnobase 8-minute (long) dataset,

W. Karlen, “Capnobase 8-minute (long) dataset,” 2021

work page 2021

[25] [25]

Seeing red: Ppg biometrics using smartphone cameras,

G. Lovisotto, H. Turner, S. Eberz, and I. Martinovic, “Seeing red: Ppg biometrics using smartphone cameras,” in2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), p. 3565–3574, IEEE, June 2020

work page 2020

[26] [26]

Descriptor: Contactless fingerprint image streams and heart rate (cfishr),

N. G. Venkataswamy, O. Olugbenle, M. K. Banavar, and M. H. Im- tiaz, “Descriptor: Contactless fingerprint image streams and heart rate (cfishr),”IEEE Data Descriptions, 2025

work page 2025

[27] [27]

Toward a robust estimation of respiratory rate from pulse oximeters,

M. A. F. Pimentel, A. E. W. Johnson, P. H. Charlton, D. Birrenkott, P. J. Watkinson, L. Tarassenko, and D. A. Clifton, “Toward a robust estimation of respiratory rate from pulse oximeters,”IEEE Transactions on Biomedical Engineering, vol. 64, no. 8, pp. 1914–1923, 2017

work page 1914

[28] [28]

Tutorial: Deriving the Standard Variational Autoencoder (VAE) Loss Function

S. Odaibo, “Tutorial: Deriving the standard variational autoencoder (vae) loss function,”arXiv preprint arXiv:1907.08956, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1907

[29] [29]

U-Net: Convolutional Networks for Biomedical Image Segmentation

O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,”CoRR, vol. abs/1505.04597, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[30] [30]

Denoising Diffusion Implicit Models

J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,” arXiv preprint arXiv:2010.02502, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010

[31] [31]

Film: Visual reasoning with a general conditioning layer,

E. Perez, F. Strub, H. De Vries, V . Dumoulin, and A. Courville, “Film: Visual reasoning with a general conditioning layer,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 32, 2018. APPENDIXA PROOF OFTHEOREM1 Proof.Letq A ϕ (ez|x 0) =A #qϕ(z|x 0)andp A ψ(ez) = 1 K PK k=1 qA ϕ (ez|u k). Given the variational identity on the compact l...

work page 2018