pith. sign in

arxiv: 2508.05423 · v2 · submitted 2025-08-07 · 💻 cs.LG · stat.ML

Negative Binomial Variational Autoencoders for Overdispersed Latent Modeling

Pith reviewed 2026-05-18 23:51 UTC · model grok-4.3

classification 💻 cs.LG stat.ML
keywords negative binomialvariational autoencoderoverdispersiondiscrete latent variablesKL divergence estimationreparameterizationcount data modelingneural spike modeling
0
0 comments X

The pith

Negative binomial latents let VAEs model overdispersed count data more flexibly than Poisson while keeping training stable.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces NegBio-VAE to replace the equal mean-variance assumption of Poisson VAEs with a negative binomial distribution that includes an explicit dispersion parameter. This change targets the common case where observed variance exceeds the mean, as seen in neural spike counts. The authors supply new techniques for KL divergence estimation and reparameterization so that the richer distribution remains trainable and the latent representations stay interpretable as discrete counts. A sympathetic reader would care because the model stays biologically closer to spike-based signaling while delivering better reconstruction, generation, and downstream utility on real datasets.

Core claim

NegBio-VAE is a negative-binomial latent-variable model with a dispersion parameter for flexible spike count modeling. It preserves interpretability while improving representation quality and training feasibility via novel KL estimation and reparameterization. Experiments on four datasets demonstrate that NegBio-VAE consistently achieves superior reconstruction and generation performance compared to competing single-layer VAE baselines, and yields robust, informative latent representations for downstream tasks.

What carries the argument

Negative binomial distribution with an explicit dispersion parameter, made tractable by custom KL estimation and reparameterization that avoid the instability of direct sampling from the overdispersed prior.

If this is right

  • Superior reconstruction and generation performance on four tested datasets relative to single-layer VAE baselines.
  • More informative and robust latent representations that improve results on downstream tasks.
  • Verified stability and robustness across ablation studies on the dispersion parameter and training components.
  • Retention of discrete, count-based interpretability that aligns with spike-based neural signaling.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same dispersion-handling approach could be inserted into multi-layer or hierarchical VAEs to scale the benefit beyond single-layer models.
  • Domains that already use count data, such as single-cell genomics or text modeling, could adopt the same negative binomial prior for better fit.
  • The model opens a direct route to test whether artificial networks that respect overdispersion produce more realistic simulations of neural population activity.

Load-bearing premise

The new KL estimation and reparameterization methods allow stable training of the negative binomial latent model without approximation errors that erase its gains over the Poisson baseline.

What would settle it

Training the identical architecture with a standard Poisson-style KL approximation and finding that reconstruction and generation metrics drop to the level of the original Poisson VAE.

Figures

Figures reproduced from arXiv: 2508.05423 by Feng Zhou, Jinhao Sheng, Quyu Kong, Wenxin Zhang, Yixuan Zhang.

Figure 1
Figure 1. Figure 1: Reconstructions from different VAEs (256 dims) on MNIST. Despite using the same latent dimensionality, [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: t-SNE of latent representations from different VAEs on MNIST with latent dimensionality fixed at [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Ablation study of NegBio-VAE. a explores the effect of KL weighting via [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Empirical distributions of Negative Binomial samples generated using Continuous-Time Simulation and [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Reconstruction comparison across datasets using a shared architecture and 256-dimensional latent space. [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Comparison of four NegBio-VAE variants (DS+C, DS+G, MC+C, MC+G) across validation reconstruction [PITH_FULL_IMAGE:figures/full_fig_p019_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Comparison of convergence and oscillation statistics across four NegBio-VAE variants. Each bar shows the [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗
read the original abstract

Although artificial neural networks are often described as brain-inspired, their representations typically rely on continuous activations, such as the continuous latent variables in variational autoencoders (VAEs), which limits their biological plausibility compared to the discrete spike-based signaling in real neurons. Extensions like the Poisson VAE introduce discrete count-based latents, but their equal mean-variance assumption fails to capture overdispersion in neural spikes, leading to less expressive and informative representations. To address this, we propose NegBio-VAE, a negative-binomial latent-variable model with a dispersion parameter for flexible spike count modeling. NegBio-VAE preserves interpretability while improving representation quality and training feasibility via novel KL estimation and reparameterization. Experiments on four datasets demonstrate that NegBio-VAE consistently achieves superior reconstruction and generation performance compared to competing single-layer VAE baselines, and yields robust, informative latent representations for downstream tasks. Extensive ablation studies are performed to verify the model's robustness w.r.t. various components. Our code is available at https://github.com/co234/NegBio-VAE.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes NegBio-VAE, a variational autoencoder using negative binomial latent variables with a dispersion parameter to model overdispersed count data such as neural spikes. It introduces novel KL estimation and reparameterization techniques to enable stable training of this discrete latent model, claiming to preserve interpretability while improving representation quality over Poisson VAEs. Experiments on four datasets demonstrate superior reconstruction and generation performance compared to single-layer VAE baselines, with ablations verifying robustness and code released for reproducibility.

Significance. If the novel KL estimation and reparameterization maintain an unbiased or low-bias ELBO without introducing systematic errors that explain the gains, the work could meaningfully advance discrete latent modeling for count data by relaxing the mean-variance equality of Poisson latents. The emphasis on biological plausibility, combined with reported improvements in downstream task utility and the release of code, would strengthen its contribution to the VAE literature for overdispersed data.

major comments (2)
  1. §3 (Method, KL estimation subsection): The central claim that the proposed KL estimation and reparameterization for negative binomial latents yield stable gradients and preserve the ELBO requires explicit demonstration that the estimator is unbiased (or has quantifiable low bias) relative to the true KL with the chosen prior. Without a derivation, variance analysis, or comparison to Monte Carlo ground truth, it remains possible that optimization artifacts rather than the overdispersion modeling drive the reported gains over the Poisson baseline.
  2. §4 (Experiments, performance tables): The abstract and results claim consistent superiority in reconstruction and generation, yet the provided description lacks error bars, standard deviations across runs, or statistical tests. This weakens the ability to attribute improvements specifically to the negative binomial dispersion parameter versus other implementation choices.
minor comments (2)
  1. Abstract: Dataset names are not specified despite the claim of evaluation on four datasets; adding them would aid readers in assessing generality.
  2. Notation: The parameterization of the negative binomial (e.g., how the dispersion parameter interacts with the mean in the approximate posterior) could be stated more explicitly in the main text to reduce ambiguity for readers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments on our manuscript. We address each major comment point by point below, indicating the revisions we will make to improve clarity and rigor.

read point-by-point responses
  1. Referee: §3 (Method, KL estimation subsection): The central claim that the proposed KL estimation and reparameterization for negative binomial latents yield stable gradients and preserve the ELBO requires explicit demonstration that the estimator is unbiased (or has quantifiable low bias) relative to the true KL with the chosen prior. Without a derivation, variance analysis, or comparison to Monte Carlo ground truth, it remains possible that optimization artifacts rather than the overdispersion modeling drive the reported gains over the Poisson baseline.

    Authors: We acknowledge that the original manuscript introduces the KL estimation and reparameterization techniques for stable training but does not include a formal derivation of unbiasedness, variance analysis, or direct comparison to Monte Carlo ground truth. To address this concern, we will add a dedicated subsection deriving the estimator and proving it is unbiased relative to the true KL under the chosen negative binomial prior. We will also include variance analysis and empirical comparisons against Monte Carlo estimates to demonstrate that performance gains arise from overdispersion modeling rather than optimization artifacts. revision: yes

  2. Referee: §4 (Experiments, performance tables): The abstract and results claim consistent superiority in reconstruction and generation, yet the provided description lacks error bars, standard deviations across runs, or statistical tests. This weakens the ability to attribute improvements specifically to the negative binomial dispersion parameter versus other implementation choices.

    Authors: We agree that the lack of error bars, standard deviations, and statistical tests weakens the attribution of improvements. In the revised manuscript, we will report all performance metrics as averages over multiple independent runs with standard deviation error bars. We will also add statistical significance tests (such as paired t-tests) between NegBio-VAE and Poisson VAE baselines to better isolate the contribution of the dispersion parameter. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation introduces dispersion parameter and KL techniques as independent modeling choices

full rationale

The paper extends the standard VAE ELBO framework by replacing the Poisson latent with a negative binomial distribution that includes an explicit dispersion parameter to capture overdispersion. This dispersion is introduced as a modeling choice rather than derived from or fitted to the same objective function that defines the claimed performance gains. The novel KL estimation and reparameterization are presented as technical contributions to enable stable training, without any equations or self-citations that reduce the ELBO or the reported improvements to quantities defined by construction from the fitted parameters or prior author results. Experiments and ablations compare against external baselines on multiple datasets, and the provided code allows external verification, confirming the central claims rest on independent content rather than self-referential reduction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The model rests on the standard VAE evidence lower bound plus the statistical assumption that negative binomial better captures neural overdispersion than Poisson; no new entities are postulated.

free parameters (1)
  • dispersion parameter
    Extra parameter introduced to control variance beyond the mean in the negative binomial latent distribution.
axioms (1)
  • domain assumption Negative binomial distribution provides a flexible model for overdispersed count data such as neural spikes
    Invoked to justify replacing the Poisson assumption in the latent layer.

pith-pipeline@v0.9.0 · 5725 in / 1234 out tokens · 38469 ms · 2026-05-18T23:51:16.349011+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages

  1. [1]

    The handbook of brain theory and neural networks

    Michael A Arbib. The handbook of brain theory and neural networks. MIT press, 2003

  2. [2]

    Temporal precision of spike trains in extrastriate cortex of the behaving macaque monkey

    Wyeth Bair and Christof Koch. Temporal precision of spike trains in extrastriate cortex of the behaving macaque monkey. Neural computation, 8(6):1185–1202, 1996

  3. [3]

    Benna, Mattia Rigotti, Jérôme Munuera, Stefano Fusi, and C

    Silvia Bernardi, Marcus K. Benna, Mattia Rigotti, Jérôme Munuera, Stefano Fusi, and C. Daniel Salzman. The geometry of abstraction in the hippocampus and prefrontal cortex. Cell, 183(4):954–967.e21, Nov 2020. doi: 10.1016/j.cell.2020.09.031

  4. [4]

    Spiking denoising diffusion probabilistic models

    Jiahang Cao, Ziqing Wang, Hanzhong Guo, Hao Cheng, Qiang Zhang, and Renjing Xu. Spiking denoising diffusion probabilistic models. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, pages 4912–4921, 2024

  5. [5]

    Lisnn: Improving spiking neural networks with lateral interactions for robust object recognition

    Xiang Cheng, Yunzhe Hao, Jiaming Xu, and Bo Xu. Lisnn: Improving spiking neural networks with lateral interactions for robust object recognition. In IJCAI, pages 1519–1525. Yokohama, 2020

  6. [6]

    The mnist database of handwritten digit images for machine learning research

    Li Deng. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine, 29(6):141–142, 2012. 9 A PREPRINT - AUGUST 29, 2025

  7. [7]

    The nbp negative binomial model for assessing differential gene expression from rna-seq

    Yanming Di, Daniel W Schafer, Jason S Cumbie, and Jeff H Chang. The nbp negative binomial model for assessing differential gene expression from rna-seq. Statistical applications in genetics and molecular biology, 10 (1), 2011

  8. [8]

    Learning disentangled joint continuous and discrete representations

    Emilien Dupont. Learning disentangled joint continuous and discrete representations. Advances in neural information processing systems, 31, 2018

  9. [9]

    Incorporating learnable membrane time constant to enhance learning of spiking neural networks

    Wei Fang, Zhaofei Yu, Yanqi Chen, Timothée Masquelier, Tiejun Huang, and Yonghong Tian. Incorporating learnable membrane time constant to enhance learning of spiking neural networks. InProceedings of the IEEE/CVF international conference on computer vision, pages 2661–2671, 2021

  10. [10]

    Spiking generative adversarial network with attention scoring decoding

    Linghao Feng, Dongcheng Zhao, and Yi Zeng. Spiking generative adversarial network with attention scoring decoding. Neural Networks, 178:106423, 2024

  11. [11]

    Som-vae: Inter- pretable discrete representation learning on time series

    Vincent Fortuin, Matthias Hüser, Francesco Locatello, Heiko Strathmann, and Gunnar Rätsch. Som-vae: Inter- pretable discrete representation learning on time series. In International Conference on Learning Representations, 2019

  12. [12]

    Spiking neural networks

    Samanwoy Ghosh-Dastidar and Hojjat Adeli. Spiking neural networks. International journal of neural systems, 19(04):295–308, 2009

  13. [13]

    Rapid neural coding in the retina with relative spike latencies

    Tim Gollisch and Markus Meister. Rapid neural coding in the retina with relative spike latencies. science, 319 (5866):1108–1111, 2008

  14. [14]

    Categorical reparameterization with gumbel-softmax

    Eric Jang, Shixiang Gu, and Ben Poole. Categorical reparameterization with gumbel-softmax. In International Conference on Learning Representations, 2017

  15. [15]

    Fully spiking variational autoencoder

    Hiromichi Kamata, Yusuke Mukuta, and Tatsuya Harada. Fully spiking variational autoencoder. In Proceedings of the AAAI conference on artificial intelligence, volume 36, pages 7059–7067, 2022

  16. [16]

    Latent diffusion for neural spiking data

    Jaivardhan Kapoor, Auguste Schulz, Julius Vetter, Felix Pei, Richard Gao, and Jakob H Macke. Latent diffusion for neural spiking data. Advances in Neural Information Processing Systems, 37:118119–118154, 2024

  17. [17]

    Kaufman, Marcus K

    Matthew T. Kaufman, Marcus K. Benna, Mattia Rigotti, Fabio Stefanini, Stefano Fusi, and Anne K. Churchland. The implications of categorical and category-free mixed selectivity on representational geometries. Current Opinion in Neurobiology, 77:102644, Dec 2022. doi: 10.1016/j.conb.2022.102644

  18. [18]

    Auto-encoding variational bayes, 2014

    Diederik P Kingma and Max Welling. Auto-encoding variational bayes, 2014

  19. [19]

    Auto-encoding variational bayes, 2013

    Diederik P Kingma, Max Welling, et al. Auto-encoding variational bayes, 2013

  20. [20]

    Spiking-gan: A spiking generative adversarial network using time-to- first-spike coding

    Vineet Kotariya and Udayan Ganguly. Spiking-gan: A spiking generative adversarial network using time-to- first-spike coding. In 2022 International Joint Conference on Neural Networks (IJCNN) , pages 1–7. IEEE, 2022

  21. [21]

    Learning multiple layers of features from tiny images

    Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, 2009

  22. [22]

    Human-level concept learning through probabilistic program induction

    Brenden M Lake, Ruslan Salakhutdinov, and Joshua B Tenenbaum. Human-level concept learning through probabilistic program induction. Science, 350(6266):1332–1338, 2015

  23. [23]

    Mnist handwritten digit database

    Yann LeCun, Corinna Cortes, and CJ Burges. Mnist handwritten digit database. ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist, 2, 2010

  24. [24]

    Differentiable spike: Rethinking gradient-descent for training spiking neural networks

    Yuhang Li, Yufei Guo, Shanghang Zhang, Shikuang Deng, Yongqing Hai, and Shi Gu. Differentiable spike: Rethinking gradient-descent for training spiking neural networks. Advances in neural information processing systems, 34:23426–23439, 2021

  25. [25]

    Spiking-diffusion: Vector quantized discrete diffusion model with spiking neural networks

    Mingxuan Liu, Jie Gan, Rui Wen, Tao Li, Yongli Chen, and Hong Chen. Spiking-diffusion: Vector quantized discrete diffusion model with spiking neural networks. In 2024 5th International Conference on Computer, Big Data and Artificial Intelligence (ICCBD+ AI), pages 627–631. IEEE, 2024

  26. [26]

    Reliability of spike timing in neocortical neurons

    Zachary F Mainen and Terrence J Sejnowski. Reliability of spike timing in neocortical neurons. Science, 268 (5216):1503–1506, 1995

  27. [27]

    Predictive coding, variational autoencoders, and biological connections

    Joseph Marino. Predictive coding, variational autoencoders, and biological connections. Neural Computation, 34 (1):1–44, 2022

  28. [28]

    Using tweedie distributions for fitting spike count data

    Dina Moshitch and Israel Nelken. Using tweedie distributions for fitting spike count data. Journal of neuroscience methods, 225:13–28, 2014

  29. [29]

    Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y . Ng. Reading digits in natural images with unsupervised feature learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning, 2011. http://ufldl.stanford.edu/housenumbers/nips2011_housenumbers.pdf. 10 A PREPRINT - AUGUST 29, 2025

  30. [30]

    Neuronal spike trains and stochastic point processes: Ii

    Donald H Perkel, George L Gerstein, and George P Moore. Neuronal spike trains and stochastic point processes: Ii. simultaneous spike trains. Biophysical journal, 7(4):419–440, 1967

  31. [31]

    Fully bayesian inference for neural models with negative-binomial spiking

    Jonathan Pillow and James Scott. Fully bayesian inference for neural models with negative-binomial spiking. Advances in neural information processing systems, 25, 2012

  32. [32]

    Deterministic decoding for discrete data in variational autoencoders

    Daniil Polykovskiy and Dmitry Vetrov. Deterministic decoding for discrete data in variational autoencoders. In International conference on artificial intelligence and statistics, pages 3046–3056. PMLR, 2020

  33. [33]

    Warden, Xiao-Jing Wang, Nathaniel D

    Mattia Rigotti, Omri Barak, Melissa R. Warden, Xiao-Jing Wang, Nathaniel D. Daw, Earl K. Miller, and Stefano Fusi. The importance of mixed selectivity in complex cognitive tasks. Nature, 497:585–590, 05 2013. doi: https://doi.org/10.1038/nature12160

  34. [34]

    Discrete variational autoencoders

    Jason Tyler Rolfe. Discrete variational autoencoders. In International Conference on Learning Representations, 2017

  35. [35]

    Spiking generative adversarial networks with a neural network discriminator: Local training, bayesian models, and continual meta-learning

    Bleema Rosenfeld, Osvaldo Simeone, and Bipin Rajendran. Spiking generative adversarial networks with a neural network discriminator: Local training, bayesian models, and continual meta-learning. IEEE Transactions on Computers, 71(11):2778–2791, 2022

  36. [36]

    The negative binomial distribution

    GJS Ross and DA Preece. The negative binomial distribution. Journal of the Royal Statistical Society: Series D (The Statistician), 34(3):323–335, 1985

  37. [37]

    Flexible models for spike count data with both over-and under-dispersion

    Ian H Stevenson. Flexible models for spike count data with both over-and under-dispersion. Journal of computa- tional neuroscience, 41:29–43, 2016

  38. [38]

    Unsupervised learning predicts human perception and misperception of gloss

    Katherine R Storrs, Barton L Anderson, and Roland W Fleming. Unsupervised learning predicts human perception and misperception of gloss. Nature Human Behaviour, 5(10):1402–1417, 2021

  39. [39]

    Testing the odds of inherent vs

    Wahiba Taouali, Giacomo Benvenuti, Pascal Wallisch, Frédéric Chavane, and Laurent U Perrinet. Testing the odds of inherent vs. observed overdispersion in neural spike counts. Journal of neurophysiology, 115(1):434–444, 2016

  40. [40]

    Deep learning in spiking neural networks

    Amirhossein Tavanaei, Masoud Ghodrati, Saeed Reza Kheradpisheh, Timothée Masquelier, and Anthony Maida. Deep learning in spiking neural networks. Neural networks, 111:47–63, 2019

  41. [41]

    Hierarchical vaes provide a normative account of motion processing in the primate brain

    Hadi Vafaii, Jacob Yates, and Daniel Butts. Hierarchical vaes provide a normative account of motion processing in the primate brain. Advances in Neural Information Processing Systems, 36:46152–46190, 2023

  42. [42]

    Poisson variational autoencoder

    Hadi Vafaii, Dekel Galor, and Jacob Yates. Poisson variational autoencoder. Advances in Neural Information Processing Systems, 37:44871–44906, 2024

  43. [43]

    Brain-inspired replay for continual learning with artificial neural networks

    Gido M Van de Ven, Hava T Siegelmann, and Andreas S Tolias. Brain-inspired replay for continual learning with artificial neural networks. Nature communications, 11(1):4069, 2020

  44. [44]

    Neural discrete representation learning.Advances in neural information processing systems, 30, 2017

    Aaron Van Den Oord, Oriol Vinyals, et al. Neural discrete representation learning.Advances in neural information processing systems, 30, 2017

  45. [45]

    Differentially private spiking variational autoencoder

    Srishti Yadav, Anshul Pundhir, Tanish Goyal, Balasubramanian Raman, and Sanjeev Kumar. Differentially private spiking variational autoencoder. In International Conference on Pattern Recognition, pages 96–112. Springer, 2025

  46. [46]

    Esvae: An efficient spiking variational autoencoder with reparameterizable poisson spiking sampling

    Qiugang Zhan, Ran Tao, Xiurui Xie, Guisong Liu, Malu Zhang, Huajin Tang, and Yang Yang. Esvae: An efficient spiking variational autoencoder with reparameterizable poisson spiking sampling. arXiv preprint arXiv:2310.14839, 2023

  47. [47]

    Variational autoencoders for sparse and overdispersed discrete data

    He Zhao, Piyush Rai, Lan Du, Wray Buntine, Dinh Phung, and Mingyuan Zhou. Variational autoencoders for sparse and overdispersed discrete data. In International conference on artificial intelligence and statistics, pages 1684–1694. PMLR, 2020

  48. [48]

    Going deeper with directly-trained larger spiking neural networks

    Hanle Zheng, Yujie Wu, Lei Deng, Yifan Hu, and Guoqi Li. Going deeper with directly-trained larger spiking neural networks. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 11062–11070, 2021

  49. [49]

    Negative binomial process count and mixture modeling

    Mingyuan Zhou and Lawrence Carin. Negative binomial process count and mixture modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(2):307–320, 2013. 11 A PREPRINT - AUGUST 29, 2025 A Derivation of KL Term We derive an analytical expression for the KL divergence between two Negative Binomial distributions under the assumption that t...

  50. [50]

    Limit the maximum count value to Zmax

  51. [51]

    , Zmax, log Poi(z) = z log λ − λ − log Γ(z + 1),

    Compute the log-probability for z = 0, 1, . . . , Zmax, log Poi(z) = z log λ − λ − log Γ(z + 1),

  52. [52]

    For each z, generate noise ϵz ∼ Gumbel(0, 1)

  53. [53]

    The proof of this reparameterization can be found in Jang et al

    Apply the Gumbel-Softmax trick with temperature τ, ˜z = ZmaxX z=0 z · softmax log Poi(z) + ϵz τ , where τ → 0 recovers discrete sampling. The proof of this reparameterization can be found in Jang et al. [14], and will not be repeated here. (2) Continuous-Time Simulation This method models Poisson processes with intensity λ on [0, 1] using exponentially di...

  54. [54]

    Sample inter-arrival times from an exponential distribution: {si}M i=1 ∼ Exponential(λ), where M is a sufficiently large integer, the exponential distribution is easily reparameterized and PyTorch contains an implementation

  55. [55]

    Accumulate inter-arrival times: Sn = nX i=1 si, 1 ≤ n ≤ M

  56. [56]

    4", whereas some models render it more like a “9

    Soft count of events: ˜z = MX n=1 σ 1 − Sn τ , where τ → 0 recovers discrete sampling. 13 A PREPRINT - AUGUST 29, 2025 5 10 15 20 25 30 35 40 45 Count 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14Probability Continuous-Time vs Gumbel-Softmax Continuous-Time Gumbel-Softmax Figure 4: Empirical distributions of Negative Binomial samples generated using Continuous-...

  57. [57]

    +” or “–

    A lower DNR indicates a more active and effective latent space, with fewer collapsed units. Among all models, the Gaussian V AE exhibits an extremely high DNR of 0.998, suggesting that nearly all latent dimensions are inactive. The Poisson V AE (0.002) and Categorical V AE (0.0117) show moderate sparsity, while both the Laplace V AE and our NegBio-V AE ac...