Negative Binomial Variational Autoencoders for Overdispersed Latent Modeling
Pith reviewed 2026-05-18 23:51 UTC · model grok-4.3
The pith
Negative binomial latents let VAEs model overdispersed count data more flexibly than Poisson while keeping training stable.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
NegBio-VAE is a negative-binomial latent-variable model with a dispersion parameter for flexible spike count modeling. It preserves interpretability while improving representation quality and training feasibility via novel KL estimation and reparameterization. Experiments on four datasets demonstrate that NegBio-VAE consistently achieves superior reconstruction and generation performance compared to competing single-layer VAE baselines, and yields robust, informative latent representations for downstream tasks.
What carries the argument
Negative binomial distribution with an explicit dispersion parameter, made tractable by custom KL estimation and reparameterization that avoid the instability of direct sampling from the overdispersed prior.
If this is right
- Superior reconstruction and generation performance on four tested datasets relative to single-layer VAE baselines.
- More informative and robust latent representations that improve results on downstream tasks.
- Verified stability and robustness across ablation studies on the dispersion parameter and training components.
- Retention of discrete, count-based interpretability that aligns with spike-based neural signaling.
Where Pith is reading between the lines
- The same dispersion-handling approach could be inserted into multi-layer or hierarchical VAEs to scale the benefit beyond single-layer models.
- Domains that already use count data, such as single-cell genomics or text modeling, could adopt the same negative binomial prior for better fit.
- The model opens a direct route to test whether artificial networks that respect overdispersion produce more realistic simulations of neural population activity.
Load-bearing premise
The new KL estimation and reparameterization methods allow stable training of the negative binomial latent model without approximation errors that erase its gains over the Poisson baseline.
What would settle it
Training the identical architecture with a standard Poisson-style KL approximation and finding that reconstruction and generation metrics drop to the level of the original Poisson VAE.
Figures
read the original abstract
Although artificial neural networks are often described as brain-inspired, their representations typically rely on continuous activations, such as the continuous latent variables in variational autoencoders (VAEs), which limits their biological plausibility compared to the discrete spike-based signaling in real neurons. Extensions like the Poisson VAE introduce discrete count-based latents, but their equal mean-variance assumption fails to capture overdispersion in neural spikes, leading to less expressive and informative representations. To address this, we propose NegBio-VAE, a negative-binomial latent-variable model with a dispersion parameter for flexible spike count modeling. NegBio-VAE preserves interpretability while improving representation quality and training feasibility via novel KL estimation and reparameterization. Experiments on four datasets demonstrate that NegBio-VAE consistently achieves superior reconstruction and generation performance compared to competing single-layer VAE baselines, and yields robust, informative latent representations for downstream tasks. Extensive ablation studies are performed to verify the model's robustness w.r.t. various components. Our code is available at https://github.com/co234/NegBio-VAE.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes NegBio-VAE, a variational autoencoder using negative binomial latent variables with a dispersion parameter to model overdispersed count data such as neural spikes. It introduces novel KL estimation and reparameterization techniques to enable stable training of this discrete latent model, claiming to preserve interpretability while improving representation quality over Poisson VAEs. Experiments on four datasets demonstrate superior reconstruction and generation performance compared to single-layer VAE baselines, with ablations verifying robustness and code released for reproducibility.
Significance. If the novel KL estimation and reparameterization maintain an unbiased or low-bias ELBO without introducing systematic errors that explain the gains, the work could meaningfully advance discrete latent modeling for count data by relaxing the mean-variance equality of Poisson latents. The emphasis on biological plausibility, combined with reported improvements in downstream task utility and the release of code, would strengthen its contribution to the VAE literature for overdispersed data.
major comments (2)
- §3 (Method, KL estimation subsection): The central claim that the proposed KL estimation and reparameterization for negative binomial latents yield stable gradients and preserve the ELBO requires explicit demonstration that the estimator is unbiased (or has quantifiable low bias) relative to the true KL with the chosen prior. Without a derivation, variance analysis, or comparison to Monte Carlo ground truth, it remains possible that optimization artifacts rather than the overdispersion modeling drive the reported gains over the Poisson baseline.
- §4 (Experiments, performance tables): The abstract and results claim consistent superiority in reconstruction and generation, yet the provided description lacks error bars, standard deviations across runs, or statistical tests. This weakens the ability to attribute improvements specifically to the negative binomial dispersion parameter versus other implementation choices.
minor comments (2)
- Abstract: Dataset names are not specified despite the claim of evaluation on four datasets; adding them would aid readers in assessing generality.
- Notation: The parameterization of the negative binomial (e.g., how the dispersion parameter interacts with the mean in the approximate posterior) could be stated more explicitly in the main text to reduce ambiguity for readers.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed comments on our manuscript. We address each major comment point by point below, indicating the revisions we will make to improve clarity and rigor.
read point-by-point responses
-
Referee: §3 (Method, KL estimation subsection): The central claim that the proposed KL estimation and reparameterization for negative binomial latents yield stable gradients and preserve the ELBO requires explicit demonstration that the estimator is unbiased (or has quantifiable low bias) relative to the true KL with the chosen prior. Without a derivation, variance analysis, or comparison to Monte Carlo ground truth, it remains possible that optimization artifacts rather than the overdispersion modeling drive the reported gains over the Poisson baseline.
Authors: We acknowledge that the original manuscript introduces the KL estimation and reparameterization techniques for stable training but does not include a formal derivation of unbiasedness, variance analysis, or direct comparison to Monte Carlo ground truth. To address this concern, we will add a dedicated subsection deriving the estimator and proving it is unbiased relative to the true KL under the chosen negative binomial prior. We will also include variance analysis and empirical comparisons against Monte Carlo estimates to demonstrate that performance gains arise from overdispersion modeling rather than optimization artifacts. revision: yes
-
Referee: §4 (Experiments, performance tables): The abstract and results claim consistent superiority in reconstruction and generation, yet the provided description lacks error bars, standard deviations across runs, or statistical tests. This weakens the ability to attribute improvements specifically to the negative binomial dispersion parameter versus other implementation choices.
Authors: We agree that the lack of error bars, standard deviations, and statistical tests weakens the attribution of improvements. In the revised manuscript, we will report all performance metrics as averages over multiple independent runs with standard deviation error bars. We will also add statistical significance tests (such as paired t-tests) between NegBio-VAE and Poisson VAE baselines to better isolate the contribution of the dispersion parameter. revision: yes
Circularity Check
No significant circularity; derivation introduces dispersion parameter and KL techniques as independent modeling choices
full rationale
The paper extends the standard VAE ELBO framework by replacing the Poisson latent with a negative binomial distribution that includes an explicit dispersion parameter to capture overdispersion. This dispersion is introduced as a modeling choice rather than derived from or fitted to the same objective function that defines the claimed performance gains. The novel KL estimation and reparameterization are presented as technical contributions to enable stable training, without any equations or self-citations that reduce the ELBO or the reported improvements to quantities defined by construction from the fitted parameters or prior author results. Experiments and ablations compare against external baselines on multiple datasets, and the provided code allows external verification, confirming the central claims rest on independent content rather than self-referential reduction.
Axiom & Free-Parameter Ledger
free parameters (1)
- dispersion parameter
axioms (1)
- domain assumption Negative binomial distribution provides a flexible model for overdispersed count data such as neural spikes
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We propose NegBio-VAE, a negative-binomial latent-variable model with a dispersion parameter... two KL estimation strategies: Monte Carlo... dispersion sharing... two reparameterizations (Gumbel-Softmax and continuous-time)
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanembed_eq_pow unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
NB(z; r, p) = ∫ Poi(z|λ) Gamma(λ; r, p/(1-p)) dλ
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
The handbook of brain theory and neural networks
Michael A Arbib. The handbook of brain theory and neural networks. MIT press, 2003
work page 2003
-
[2]
Temporal precision of spike trains in extrastriate cortex of the behaving macaque monkey
Wyeth Bair and Christof Koch. Temporal precision of spike trains in extrastriate cortex of the behaving macaque monkey. Neural computation, 8(6):1185–1202, 1996
work page 1996
-
[3]
Benna, Mattia Rigotti, Jérôme Munuera, Stefano Fusi, and C
Silvia Bernardi, Marcus K. Benna, Mattia Rigotti, Jérôme Munuera, Stefano Fusi, and C. Daniel Salzman. The geometry of abstraction in the hippocampus and prefrontal cortex. Cell, 183(4):954–967.e21, Nov 2020. doi: 10.1016/j.cell.2020.09.031
-
[4]
Spiking denoising diffusion probabilistic models
Jiahang Cao, Ziqing Wang, Hanzhong Guo, Hao Cheng, Qiang Zhang, and Renjing Xu. Spiking denoising diffusion probabilistic models. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, pages 4912–4921, 2024
work page 2024
-
[5]
Lisnn: Improving spiking neural networks with lateral interactions for robust object recognition
Xiang Cheng, Yunzhe Hao, Jiaming Xu, and Bo Xu. Lisnn: Improving spiking neural networks with lateral interactions for robust object recognition. In IJCAI, pages 1519–1525. Yokohama, 2020
work page 2020
-
[6]
The mnist database of handwritten digit images for machine learning research
Li Deng. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine, 29(6):141–142, 2012. 9 A PREPRINT - AUGUST 29, 2025
work page 2012
-
[7]
The nbp negative binomial model for assessing differential gene expression from rna-seq
Yanming Di, Daniel W Schafer, Jason S Cumbie, and Jeff H Chang. The nbp negative binomial model for assessing differential gene expression from rna-seq. Statistical applications in genetics and molecular biology, 10 (1), 2011
work page 2011
-
[8]
Learning disentangled joint continuous and discrete representations
Emilien Dupont. Learning disentangled joint continuous and discrete representations. Advances in neural information processing systems, 31, 2018
work page 2018
-
[9]
Incorporating learnable membrane time constant to enhance learning of spiking neural networks
Wei Fang, Zhaofei Yu, Yanqi Chen, Timothée Masquelier, Tiejun Huang, and Yonghong Tian. Incorporating learnable membrane time constant to enhance learning of spiking neural networks. InProceedings of the IEEE/CVF international conference on computer vision, pages 2661–2671, 2021
work page 2021
-
[10]
Spiking generative adversarial network with attention scoring decoding
Linghao Feng, Dongcheng Zhao, and Yi Zeng. Spiking generative adversarial network with attention scoring decoding. Neural Networks, 178:106423, 2024
work page 2024
-
[11]
Som-vae: Inter- pretable discrete representation learning on time series
Vincent Fortuin, Matthias Hüser, Francesco Locatello, Heiko Strathmann, and Gunnar Rätsch. Som-vae: Inter- pretable discrete representation learning on time series. In International Conference on Learning Representations, 2019
work page 2019
-
[12]
Samanwoy Ghosh-Dastidar and Hojjat Adeli. Spiking neural networks. International journal of neural systems, 19(04):295–308, 2009
work page 2009
-
[13]
Rapid neural coding in the retina with relative spike latencies
Tim Gollisch and Markus Meister. Rapid neural coding in the retina with relative spike latencies. science, 319 (5866):1108–1111, 2008
work page 2008
-
[14]
Categorical reparameterization with gumbel-softmax
Eric Jang, Shixiang Gu, and Ben Poole. Categorical reparameterization with gumbel-softmax. In International Conference on Learning Representations, 2017
work page 2017
-
[15]
Fully spiking variational autoencoder
Hiromichi Kamata, Yusuke Mukuta, and Tatsuya Harada. Fully spiking variational autoencoder. In Proceedings of the AAAI conference on artificial intelligence, volume 36, pages 7059–7067, 2022
work page 2022
-
[16]
Latent diffusion for neural spiking data
Jaivardhan Kapoor, Auguste Schulz, Julius Vetter, Felix Pei, Richard Gao, and Jakob H Macke. Latent diffusion for neural spiking data. Advances in Neural Information Processing Systems, 37:118119–118154, 2024
work page 2024
-
[17]
Matthew T. Kaufman, Marcus K. Benna, Mattia Rigotti, Fabio Stefanini, Stefano Fusi, and Anne K. Churchland. The implications of categorical and category-free mixed selectivity on representational geometries. Current Opinion in Neurobiology, 77:102644, Dec 2022. doi: 10.1016/j.conb.2022.102644
-
[18]
Auto-encoding variational bayes, 2014
Diederik P Kingma and Max Welling. Auto-encoding variational bayes, 2014
work page 2014
-
[19]
Auto-encoding variational bayes, 2013
Diederik P Kingma, Max Welling, et al. Auto-encoding variational bayes, 2013
work page 2013
-
[20]
Spiking-gan: A spiking generative adversarial network using time-to- first-spike coding
Vineet Kotariya and Udayan Ganguly. Spiking-gan: A spiking generative adversarial network using time-to- first-spike coding. In 2022 International Joint Conference on Neural Networks (IJCNN) , pages 1–7. IEEE, 2022
work page 2022
-
[21]
Learning multiple layers of features from tiny images
Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, 2009
work page 2009
-
[22]
Human-level concept learning through probabilistic program induction
Brenden M Lake, Ruslan Salakhutdinov, and Joshua B Tenenbaum. Human-level concept learning through probabilistic program induction. Science, 350(6266):1332–1338, 2015
work page 2015
-
[23]
Mnist handwritten digit database
Yann LeCun, Corinna Cortes, and CJ Burges. Mnist handwritten digit database. ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist, 2, 2010
work page 2010
-
[24]
Differentiable spike: Rethinking gradient-descent for training spiking neural networks
Yuhang Li, Yufei Guo, Shanghang Zhang, Shikuang Deng, Yongqing Hai, and Shi Gu. Differentiable spike: Rethinking gradient-descent for training spiking neural networks. Advances in neural information processing systems, 34:23426–23439, 2021
work page 2021
-
[25]
Spiking-diffusion: Vector quantized discrete diffusion model with spiking neural networks
Mingxuan Liu, Jie Gan, Rui Wen, Tao Li, Yongli Chen, and Hong Chen. Spiking-diffusion: Vector quantized discrete diffusion model with spiking neural networks. In 2024 5th International Conference on Computer, Big Data and Artificial Intelligence (ICCBD+ AI), pages 627–631. IEEE, 2024
work page 2024
-
[26]
Reliability of spike timing in neocortical neurons
Zachary F Mainen and Terrence J Sejnowski. Reliability of spike timing in neocortical neurons. Science, 268 (5216):1503–1506, 1995
work page 1995
-
[27]
Predictive coding, variational autoencoders, and biological connections
Joseph Marino. Predictive coding, variational autoencoders, and biological connections. Neural Computation, 34 (1):1–44, 2022
work page 2022
-
[28]
Using tweedie distributions for fitting spike count data
Dina Moshitch and Israel Nelken. Using tweedie distributions for fitting spike count data. Journal of neuroscience methods, 225:13–28, 2014
work page 2014
-
[29]
Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y . Ng. Reading digits in natural images with unsupervised feature learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning, 2011. http://ufldl.stanford.edu/housenumbers/nips2011_housenumbers.pdf. 10 A PREPRINT - AUGUST 29, 2025
work page 2011
-
[30]
Neuronal spike trains and stochastic point processes: Ii
Donald H Perkel, George L Gerstein, and George P Moore. Neuronal spike trains and stochastic point processes: Ii. simultaneous spike trains. Biophysical journal, 7(4):419–440, 1967
work page 1967
-
[31]
Fully bayesian inference for neural models with negative-binomial spiking
Jonathan Pillow and James Scott. Fully bayesian inference for neural models with negative-binomial spiking. Advances in neural information processing systems, 25, 2012
work page 2012
-
[32]
Deterministic decoding for discrete data in variational autoencoders
Daniil Polykovskiy and Dmitry Vetrov. Deterministic decoding for discrete data in variational autoencoders. In International conference on artificial intelligence and statistics, pages 3046–3056. PMLR, 2020
work page 2020
-
[33]
Warden, Xiao-Jing Wang, Nathaniel D
Mattia Rigotti, Omri Barak, Melissa R. Warden, Xiao-Jing Wang, Nathaniel D. Daw, Earl K. Miller, and Stefano Fusi. The importance of mixed selectivity in complex cognitive tasks. Nature, 497:585–590, 05 2013. doi: https://doi.org/10.1038/nature12160
-
[34]
Discrete variational autoencoders
Jason Tyler Rolfe. Discrete variational autoencoders. In International Conference on Learning Representations, 2017
work page 2017
-
[35]
Bleema Rosenfeld, Osvaldo Simeone, and Bipin Rajendran. Spiking generative adversarial networks with a neural network discriminator: Local training, bayesian models, and continual meta-learning. IEEE Transactions on Computers, 71(11):2778–2791, 2022
work page 2022
-
[36]
The negative binomial distribution
GJS Ross and DA Preece. The negative binomial distribution. Journal of the Royal Statistical Society: Series D (The Statistician), 34(3):323–335, 1985
work page 1985
-
[37]
Flexible models for spike count data with both over-and under-dispersion
Ian H Stevenson. Flexible models for spike count data with both over-and under-dispersion. Journal of computa- tional neuroscience, 41:29–43, 2016
work page 2016
-
[38]
Unsupervised learning predicts human perception and misperception of gloss
Katherine R Storrs, Barton L Anderson, and Roland W Fleming. Unsupervised learning predicts human perception and misperception of gloss. Nature Human Behaviour, 5(10):1402–1417, 2021
work page 2021
-
[39]
Testing the odds of inherent vs
Wahiba Taouali, Giacomo Benvenuti, Pascal Wallisch, Frédéric Chavane, and Laurent U Perrinet. Testing the odds of inherent vs. observed overdispersion in neural spike counts. Journal of neurophysiology, 115(1):434–444, 2016
work page 2016
-
[40]
Deep learning in spiking neural networks
Amirhossein Tavanaei, Masoud Ghodrati, Saeed Reza Kheradpisheh, Timothée Masquelier, and Anthony Maida. Deep learning in spiking neural networks. Neural networks, 111:47–63, 2019
work page 2019
-
[41]
Hierarchical vaes provide a normative account of motion processing in the primate brain
Hadi Vafaii, Jacob Yates, and Daniel Butts. Hierarchical vaes provide a normative account of motion processing in the primate brain. Advances in Neural Information Processing Systems, 36:46152–46190, 2023
work page 2023
-
[42]
Poisson variational autoencoder
Hadi Vafaii, Dekel Galor, and Jacob Yates. Poisson variational autoencoder. Advances in Neural Information Processing Systems, 37:44871–44906, 2024
work page 2024
-
[43]
Brain-inspired replay for continual learning with artificial neural networks
Gido M Van de Ven, Hava T Siegelmann, and Andreas S Tolias. Brain-inspired replay for continual learning with artificial neural networks. Nature communications, 11(1):4069, 2020
work page 2020
-
[44]
Neural discrete representation learning.Advances in neural information processing systems, 30, 2017
Aaron Van Den Oord, Oriol Vinyals, et al. Neural discrete representation learning.Advances in neural information processing systems, 30, 2017
work page 2017
-
[45]
Differentially private spiking variational autoencoder
Srishti Yadav, Anshul Pundhir, Tanish Goyal, Balasubramanian Raman, and Sanjeev Kumar. Differentially private spiking variational autoencoder. In International Conference on Pattern Recognition, pages 96–112. Springer, 2025
work page 2025
-
[46]
Esvae: An efficient spiking variational autoencoder with reparameterizable poisson spiking sampling
Qiugang Zhan, Ran Tao, Xiurui Xie, Guisong Liu, Malu Zhang, Huajin Tang, and Yang Yang. Esvae: An efficient spiking variational autoencoder with reparameterizable poisson spiking sampling. arXiv preprint arXiv:2310.14839, 2023
-
[47]
Variational autoencoders for sparse and overdispersed discrete data
He Zhao, Piyush Rai, Lan Du, Wray Buntine, Dinh Phung, and Mingyuan Zhou. Variational autoencoders for sparse and overdispersed discrete data. In International conference on artificial intelligence and statistics, pages 1684–1694. PMLR, 2020
work page 2020
-
[48]
Going deeper with directly-trained larger spiking neural networks
Hanle Zheng, Yujie Wu, Lei Deng, Yifan Hu, and Guoqi Li. Going deeper with directly-trained larger spiking neural networks. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 11062–11070, 2021
work page 2021
-
[49]
Negative binomial process count and mixture modeling
Mingyuan Zhou and Lawrence Carin. Negative binomial process count and mixture modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(2):307–320, 2013. 11 A PREPRINT - AUGUST 29, 2025 A Derivation of KL Term We derive an analytical expression for the KL divergence between two Negative Binomial distributions under the assumption that t...
work page 2013
-
[50]
Limit the maximum count value to Zmax
-
[51]
, Zmax, log Poi(z) = z log λ − λ − log Γ(z + 1),
Compute the log-probability for z = 0, 1, . . . , Zmax, log Poi(z) = z log λ − λ − log Γ(z + 1),
-
[52]
For each z, generate noise ϵz ∼ Gumbel(0, 1)
-
[53]
The proof of this reparameterization can be found in Jang et al
Apply the Gumbel-Softmax trick with temperature τ, ˜z = ZmaxX z=0 z · softmax log Poi(z) + ϵz τ , where τ → 0 recovers discrete sampling. The proof of this reparameterization can be found in Jang et al. [14], and will not be repeated here. (2) Continuous-Time Simulation This method models Poisson processes with intensity λ on [0, 1] using exponentially di...
-
[54]
Sample inter-arrival times from an exponential distribution: {si}M i=1 ∼ Exponential(λ), where M is a sufficiently large integer, the exponential distribution is easily reparameterized and PyTorch contains an implementation
-
[55]
Accumulate inter-arrival times: Sn = nX i=1 si, 1 ≤ n ≤ M
-
[56]
4", whereas some models render it more like a “9
Soft count of events: ˜z = MX n=1 σ 1 − Sn τ , where τ → 0 recovers discrete sampling. 13 A PREPRINT - AUGUST 29, 2025 5 10 15 20 25 30 35 40 45 Count 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14Probability Continuous-Time vs Gumbel-Softmax Continuous-Time Gumbel-Softmax Figure 4: Empirical distributions of Negative Binomial samples generated using Continuous-...
work page 2025
-
[57]
A lower DNR indicates a more active and effective latent space, with fewer collapsed units. Among all models, the Gaussian V AE exhibits an extremely high DNR of 0.998, suggesting that nearly all latent dimensions are inactive. The Poisson V AE (0.002) and Categorical V AE (0.0117) show moderate sparsity, while both the Laplace V AE and our NegBio-V AE ac...
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.