Negative Binomial Variational Autoencoders for Overdispersed Latent Modeling

Feng Zhou; Jinhao Sheng; Quyu Kong; Wenxin Zhang; Yixuan Zhang

arxiv: 2508.05423 · v2 · submitted 2025-08-07 · 💻 cs.LG · stat.ML

Negative Binomial Variational Autoencoders for Overdispersed Latent Modeling

Yixuan Zhang , Jinhao Sheng , Wenxin Zhang , Quyu Kong , Feng Zhou This is my paper

Pith reviewed 2026-05-18 23:51 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords negative binomialvariational autoencoderoverdispersiondiscrete latent variablesKL divergence estimationreparameterizationcount data modelingneural spike modeling

0 comments

The pith

Negative binomial latents let VAEs model overdispersed count data more flexibly than Poisson while keeping training stable.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces NegBio-VAE to replace the equal mean-variance assumption of Poisson VAEs with a negative binomial distribution that includes an explicit dispersion parameter. This change targets the common case where observed variance exceeds the mean, as seen in neural spike counts. The authors supply new techniques for KL divergence estimation and reparameterization so that the richer distribution remains trainable and the latent representations stay interpretable as discrete counts. A sympathetic reader would care because the model stays biologically closer to spike-based signaling while delivering better reconstruction, generation, and downstream utility on real datasets.

Core claim

NegBio-VAE is a negative-binomial latent-variable model with a dispersion parameter for flexible spike count modeling. It preserves interpretability while improving representation quality and training feasibility via novel KL estimation and reparameterization. Experiments on four datasets demonstrate that NegBio-VAE consistently achieves superior reconstruction and generation performance compared to competing single-layer VAE baselines, and yields robust, informative latent representations for downstream tasks.

What carries the argument

Negative binomial distribution with an explicit dispersion parameter, made tractable by custom KL estimation and reparameterization that avoid the instability of direct sampling from the overdispersed prior.

If this is right

Superior reconstruction and generation performance on four tested datasets relative to single-layer VAE baselines.
More informative and robust latent representations that improve results on downstream tasks.
Verified stability and robustness across ablation studies on the dispersion parameter and training components.
Retention of discrete, count-based interpretability that aligns with spike-based neural signaling.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same dispersion-handling approach could be inserted into multi-layer or hierarchical VAEs to scale the benefit beyond single-layer models.
Domains that already use count data, such as single-cell genomics or text modeling, could adopt the same negative binomial prior for better fit.
The model opens a direct route to test whether artificial networks that respect overdispersion produce more realistic simulations of neural population activity.

Load-bearing premise

The new KL estimation and reparameterization methods allow stable training of the negative binomial latent model without approximation errors that erase its gains over the Poisson baseline.

What would settle it

Training the identical architecture with a standard Poisson-style KL approximation and finding that reconstruction and generation metrics drop to the level of the original Poisson VAE.

Figures

Figures reproduced from arXiv: 2508.05423 by Feng Zhou, Jinhao Sheng, Quyu Kong, Wenxin Zhang, Yixuan Zhang.

**Figure 2.** Figure 2: t-SNE of latent representations from different VAEs on MNIST with latent dimensionality fixed at [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Ablation study of NegBio-VAE. a explores the effect of KL weighting via [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Empirical distributions of Negative Binomial samples generated using Continuous-Time Simulation and [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

**Figure 5.** Figure 5: Reconstruction comparison across datasets using a shared architecture and 256-dimensional latent space. [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗

**Figure 6.** Figure 6: Comparison of four NegBio-VAE variants (DS+C, DS+G, MC+C, MC+G) across validation reconstruction [PITH_FULL_IMAGE:figures/full_fig_p019_6.png] view at source ↗

**Figure 7.** Figure 7: Comparison of convergence and oscillation statistics across four NegBio-VAE variants. Each bar shows the [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗

read the original abstract

Although artificial neural networks are often described as brain-inspired, their representations typically rely on continuous activations, such as the continuous latent variables in variational autoencoders (VAEs), which limits their biological plausibility compared to the discrete spike-based signaling in real neurons. Extensions like the Poisson VAE introduce discrete count-based latents, but their equal mean-variance assumption fails to capture overdispersion in neural spikes, leading to less expressive and informative representations. To address this, we propose NegBio-VAE, a negative-binomial latent-variable model with a dispersion parameter for flexible spike count modeling. NegBio-VAE preserves interpretability while improving representation quality and training feasibility via novel KL estimation and reparameterization. Experiments on four datasets demonstrate that NegBio-VAE consistently achieves superior reconstruction and generation performance compared to competing single-layer VAE baselines, and yields robust, informative latent representations for downstream tasks. Extensive ablation studies are performed to verify the model's robustness w.r.t. various components. Our code is available at https://github.com/co234/NegBio-VAE.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes NegBio-VAE, a variational autoencoder using negative binomial latent variables with a dispersion parameter to model overdispersed count data such as neural spikes. It introduces novel KL estimation and reparameterization techniques to enable stable training of this discrete latent model, claiming to preserve interpretability while improving representation quality over Poisson VAEs. Experiments on four datasets demonstrate superior reconstruction and generation performance compared to single-layer VAE baselines, with ablations verifying robustness and code released for reproducibility.

Significance. If the novel KL estimation and reparameterization maintain an unbiased or low-bias ELBO without introducing systematic errors that explain the gains, the work could meaningfully advance discrete latent modeling for count data by relaxing the mean-variance equality of Poisson latents. The emphasis on biological plausibility, combined with reported improvements in downstream task utility and the release of code, would strengthen its contribution to the VAE literature for overdispersed data.

major comments (2)

§3 (Method, KL estimation subsection): The central claim that the proposed KL estimation and reparameterization for negative binomial latents yield stable gradients and preserve the ELBO requires explicit demonstration that the estimator is unbiased (or has quantifiable low bias) relative to the true KL with the chosen prior. Without a derivation, variance analysis, or comparison to Monte Carlo ground truth, it remains possible that optimization artifacts rather than the overdispersion modeling drive the reported gains over the Poisson baseline.
§4 (Experiments, performance tables): The abstract and results claim consistent superiority in reconstruction and generation, yet the provided description lacks error bars, standard deviations across runs, or statistical tests. This weakens the ability to attribute improvements specifically to the negative binomial dispersion parameter versus other implementation choices.

minor comments (2)

Abstract: Dataset names are not specified despite the claim of evaluation on four datasets; adding them would aid readers in assessing generality.
Notation: The parameterization of the negative binomial (e.g., how the dispersion parameter interacts with the mean in the approximate posterior) could be stated more explicitly in the main text to reduce ambiguity for readers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments on our manuscript. We address each major comment point by point below, indicating the revisions we will make to improve clarity and rigor.

read point-by-point responses

Referee: §3 (Method, KL estimation subsection): The central claim that the proposed KL estimation and reparameterization for negative binomial latents yield stable gradients and preserve the ELBO requires explicit demonstration that the estimator is unbiased (or has quantifiable low bias) relative to the true KL with the chosen prior. Without a derivation, variance analysis, or comparison to Monte Carlo ground truth, it remains possible that optimization artifacts rather than the overdispersion modeling drive the reported gains over the Poisson baseline.

Authors: We acknowledge that the original manuscript introduces the KL estimation and reparameterization techniques for stable training but does not include a formal derivation of unbiasedness, variance analysis, or direct comparison to Monte Carlo ground truth. To address this concern, we will add a dedicated subsection deriving the estimator and proving it is unbiased relative to the true KL under the chosen negative binomial prior. We will also include variance analysis and empirical comparisons against Monte Carlo estimates to demonstrate that performance gains arise from overdispersion modeling rather than optimization artifacts. revision: yes
Referee: §4 (Experiments, performance tables): The abstract and results claim consistent superiority in reconstruction and generation, yet the provided description lacks error bars, standard deviations across runs, or statistical tests. This weakens the ability to attribute improvements specifically to the negative binomial dispersion parameter versus other implementation choices.

Authors: We agree that the lack of error bars, standard deviations, and statistical tests weakens the attribution of improvements. In the revised manuscript, we will report all performance metrics as averages over multiple independent runs with standard deviation error bars. We will also add statistical significance tests (such as paired t-tests) between NegBio-VAE and Poisson VAE baselines to better isolate the contribution of the dispersion parameter. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation introduces dispersion parameter and KL techniques as independent modeling choices

full rationale

The paper extends the standard VAE ELBO framework by replacing the Poisson latent with a negative binomial distribution that includes an explicit dispersion parameter to capture overdispersion. This dispersion is introduced as a modeling choice rather than derived from or fitted to the same objective function that defines the claimed performance gains. The novel KL estimation and reparameterization are presented as technical contributions to enable stable training, without any equations or self-citations that reduce the ELBO or the reported improvements to quantities defined by construction from the fitted parameters or prior author results. Experiments and ablations compare against external baselines on multiple datasets, and the provided code allows external verification, confirming the central claims rest on independent content rather than self-referential reduction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The model rests on the standard VAE evidence lower bound plus the statistical assumption that negative binomial better captures neural overdispersion than Poisson; no new entities are postulated.

free parameters (1)

dispersion parameter
Extra parameter introduced to control variance beyond the mean in the negative binomial latent distribution.

axioms (1)

domain assumption Negative binomial distribution provides a flexible model for overdispersed count data such as neural spikes
Invoked to justify replacing the Poisson assumption in the latent layer.

pith-pipeline@v0.9.0 · 5725 in / 1234 out tokens · 38469 ms · 2026-05-18T23:51:16.349011+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose NegBio-VAE, a negative-binomial latent-variable model with a dispersion parameter... two KL estimation strategies: Monte Carlo... dispersion sharing... two reparameterizations (Gumbel-Softmax and continuous-time)
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_eq_pow unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

NB(z; r, p) = ∫ Poi(z|λ) Gamma(λ; r, p/(1-p)) dλ

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages

[1]

The handbook of brain theory and neural networks

Michael A Arbib. The handbook of brain theory and neural networks. MIT press, 2003

work page 2003
[2]

Temporal precision of spike trains in extrastriate cortex of the behaving macaque monkey

Wyeth Bair and Christof Koch. Temporal precision of spike trains in extrastriate cortex of the behaving macaque monkey. Neural computation, 8(6):1185–1202, 1996

work page 1996
[3]

Benna, Mattia Rigotti, Jérôme Munuera, Stefano Fusi, and C

Silvia Bernardi, Marcus K. Benna, Mattia Rigotti, Jérôme Munuera, Stefano Fusi, and C. Daniel Salzman. The geometry of abstraction in the hippocampus and prefrontal cortex. Cell, 183(4):954–967.e21, Nov 2020. doi: 10.1016/j.cell.2020.09.031

work page doi:10.1016/j.cell.2020.09.031 2020
[4]

Spiking denoising diffusion probabilistic models

Jiahang Cao, Ziqing Wang, Hanzhong Guo, Hao Cheng, Qiang Zhang, and Renjing Xu. Spiking denoising diffusion probabilistic models. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, pages 4912–4921, 2024

work page 2024
[5]

Lisnn: Improving spiking neural networks with lateral interactions for robust object recognition

Xiang Cheng, Yunzhe Hao, Jiaming Xu, and Bo Xu. Lisnn: Improving spiking neural networks with lateral interactions for robust object recognition. In IJCAI, pages 1519–1525. Yokohama, 2020

work page 2020
[6]

The mnist database of handwritten digit images for machine learning research

Li Deng. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine, 29(6):141–142, 2012. 9 A PREPRINT - AUGUST 29, 2025

work page 2012
[7]

The nbp negative binomial model for assessing differential gene expression from rna-seq

Yanming Di, Daniel W Schafer, Jason S Cumbie, and Jeff H Chang. The nbp negative binomial model for assessing differential gene expression from rna-seq. Statistical applications in genetics and molecular biology, 10 (1), 2011

work page 2011
[8]

Learning disentangled joint continuous and discrete representations

Emilien Dupont. Learning disentangled joint continuous and discrete representations. Advances in neural information processing systems, 31, 2018

work page 2018
[9]

Incorporating learnable membrane time constant to enhance learning of spiking neural networks

Wei Fang, Zhaofei Yu, Yanqi Chen, Timothée Masquelier, Tiejun Huang, and Yonghong Tian. Incorporating learnable membrane time constant to enhance learning of spiking neural networks. InProceedings of the IEEE/CVF international conference on computer vision, pages 2661–2671, 2021

work page 2021
[10]

Spiking generative adversarial network with attention scoring decoding

Linghao Feng, Dongcheng Zhao, and Yi Zeng. Spiking generative adversarial network with attention scoring decoding. Neural Networks, 178:106423, 2024

work page 2024
[11]

Som-vae: Inter- pretable discrete representation learning on time series

Vincent Fortuin, Matthias Hüser, Francesco Locatello, Heiko Strathmann, and Gunnar Rätsch. Som-vae: Inter- pretable discrete representation learning on time series. In International Conference on Learning Representations, 2019

work page 2019
[12]

Spiking neural networks

Samanwoy Ghosh-Dastidar and Hojjat Adeli. Spiking neural networks. International journal of neural systems, 19(04):295–308, 2009

work page 2009
[13]

Rapid neural coding in the retina with relative spike latencies

Tim Gollisch and Markus Meister. Rapid neural coding in the retina with relative spike latencies. science, 319 (5866):1108–1111, 2008

work page 2008
[14]

Categorical reparameterization with gumbel-softmax

Eric Jang, Shixiang Gu, and Ben Poole. Categorical reparameterization with gumbel-softmax. In International Conference on Learning Representations, 2017

work page 2017
[15]

Fully spiking variational autoencoder

Hiromichi Kamata, Yusuke Mukuta, and Tatsuya Harada. Fully spiking variational autoencoder. In Proceedings of the AAAI conference on artificial intelligence, volume 36, pages 7059–7067, 2022

work page 2022
[16]

Latent diffusion for neural spiking data

Jaivardhan Kapoor, Auguste Schulz, Julius Vetter, Felix Pei, Richard Gao, and Jakob H Macke. Latent diffusion for neural spiking data. Advances in Neural Information Processing Systems, 37:118119–118154, 2024

work page 2024
[17]

Kaufman, Marcus K

Matthew T. Kaufman, Marcus K. Benna, Mattia Rigotti, Fabio Stefanini, Stefano Fusi, and Anne K. Churchland. The implications of categorical and category-free mixed selectivity on representational geometries. Current Opinion in Neurobiology, 77:102644, Dec 2022. doi: 10.1016/j.conb.2022.102644

work page doi:10.1016/j.conb.2022.102644 2022
[18]

Auto-encoding variational bayes, 2014

Diederik P Kingma and Max Welling. Auto-encoding variational bayes, 2014

work page 2014
[19]

Auto-encoding variational bayes, 2013

Diederik P Kingma, Max Welling, et al. Auto-encoding variational bayes, 2013

work page 2013
[20]

Spiking-gan: A spiking generative adversarial network using time-to- first-spike coding

Vineet Kotariya and Udayan Ganguly. Spiking-gan: A spiking generative adversarial network using time-to- first-spike coding. In 2022 International Joint Conference on Neural Networks (IJCNN) , pages 1–7. IEEE, 2022

work page 2022
[21]

Learning multiple layers of features from tiny images

Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, 2009

work page 2009
[22]

Human-level concept learning through probabilistic program induction

Brenden M Lake, Ruslan Salakhutdinov, and Joshua B Tenenbaum. Human-level concept learning through probabilistic program induction. Science, 350(6266):1332–1338, 2015

work page 2015
[23]

Mnist handwritten digit database

Yann LeCun, Corinna Cortes, and CJ Burges. Mnist handwritten digit database. ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist, 2, 2010

work page 2010
[24]

Differentiable spike: Rethinking gradient-descent for training spiking neural networks

Yuhang Li, Yufei Guo, Shanghang Zhang, Shikuang Deng, Yongqing Hai, and Shi Gu. Differentiable spike: Rethinking gradient-descent for training spiking neural networks. Advances in neural information processing systems, 34:23426–23439, 2021

work page 2021
[25]

Spiking-diffusion: Vector quantized discrete diffusion model with spiking neural networks

Mingxuan Liu, Jie Gan, Rui Wen, Tao Li, Yongli Chen, and Hong Chen. Spiking-diffusion: Vector quantized discrete diffusion model with spiking neural networks. In 2024 5th International Conference on Computer, Big Data and Artificial Intelligence (ICCBD+ AI), pages 627–631. IEEE, 2024

work page 2024
[26]

Reliability of spike timing in neocortical neurons

Zachary F Mainen and Terrence J Sejnowski. Reliability of spike timing in neocortical neurons. Science, 268 (5216):1503–1506, 1995

work page 1995
[27]

Predictive coding, variational autoencoders, and biological connections

Joseph Marino. Predictive coding, variational autoencoders, and biological connections. Neural Computation, 34 (1):1–44, 2022

work page 2022
[28]

Using tweedie distributions for fitting spike count data

Dina Moshitch and Israel Nelken. Using tweedie distributions for fitting spike count data. Journal of neuroscience methods, 225:13–28, 2014

work page 2014
[29]

Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y . Ng. Reading digits in natural images with unsupervised feature learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning, 2011. http://ufldl.stanford.edu/housenumbers/nips2011_housenumbers.pdf. 10 A PREPRINT - AUGUST 29, 2025

work page 2011
[30]

Neuronal spike trains and stochastic point processes: Ii

Donald H Perkel, George L Gerstein, and George P Moore. Neuronal spike trains and stochastic point processes: Ii. simultaneous spike trains. Biophysical journal, 7(4):419–440, 1967

work page 1967
[31]

Fully bayesian inference for neural models with negative-binomial spiking

Jonathan Pillow and James Scott. Fully bayesian inference for neural models with negative-binomial spiking. Advances in neural information processing systems, 25, 2012

work page 2012
[32]

Deterministic decoding for discrete data in variational autoencoders

Daniil Polykovskiy and Dmitry Vetrov. Deterministic decoding for discrete data in variational autoencoders. In International conference on artificial intelligence and statistics, pages 3046–3056. PMLR, 2020

work page 2020
[33]

Warden, Xiao-Jing Wang, Nathaniel D

Mattia Rigotti, Omri Barak, Melissa R. Warden, Xiao-Jing Wang, Nathaniel D. Daw, Earl K. Miller, and Stefano Fusi. The importance of mixed selectivity in complex cognitive tasks. Nature, 497:585–590, 05 2013. doi: https://doi.org/10.1038/nature12160

work page doi:10.1038/nature12160 2013
[34]

Discrete variational autoencoders

Jason Tyler Rolfe. Discrete variational autoencoders. In International Conference on Learning Representations, 2017

work page 2017
[35]

Spiking generative adversarial networks with a neural network discriminator: Local training, bayesian models, and continual meta-learning

Bleema Rosenfeld, Osvaldo Simeone, and Bipin Rajendran. Spiking generative adversarial networks with a neural network discriminator: Local training, bayesian models, and continual meta-learning. IEEE Transactions on Computers, 71(11):2778–2791, 2022

work page 2022
[36]

The negative binomial distribution

GJS Ross and DA Preece. The negative binomial distribution. Journal of the Royal Statistical Society: Series D (The Statistician), 34(3):323–335, 1985

work page 1985
[37]

Flexible models for spike count data with both over-and under-dispersion

Ian H Stevenson. Flexible models for spike count data with both over-and under-dispersion. Journal of computa- tional neuroscience, 41:29–43, 2016

work page 2016
[38]

Unsupervised learning predicts human perception and misperception of gloss

Katherine R Storrs, Barton L Anderson, and Roland W Fleming. Unsupervised learning predicts human perception and misperception of gloss. Nature Human Behaviour, 5(10):1402–1417, 2021

work page 2021
[39]

Testing the odds of inherent vs

Wahiba Taouali, Giacomo Benvenuti, Pascal Wallisch, Frédéric Chavane, and Laurent U Perrinet. Testing the odds of inherent vs. observed overdispersion in neural spike counts. Journal of neurophysiology, 115(1):434–444, 2016

work page 2016
[40]

Deep learning in spiking neural networks

Amirhossein Tavanaei, Masoud Ghodrati, Saeed Reza Kheradpisheh, Timothée Masquelier, and Anthony Maida. Deep learning in spiking neural networks. Neural networks, 111:47–63, 2019

work page 2019
[41]

Hierarchical vaes provide a normative account of motion processing in the primate brain

Hadi Vafaii, Jacob Yates, and Daniel Butts. Hierarchical vaes provide a normative account of motion processing in the primate brain. Advances in Neural Information Processing Systems, 36:46152–46190, 2023

work page 2023
[42]

Poisson variational autoencoder

Hadi Vafaii, Dekel Galor, and Jacob Yates. Poisson variational autoencoder. Advances in Neural Information Processing Systems, 37:44871–44906, 2024

work page 2024
[43]

Brain-inspired replay for continual learning with artificial neural networks

Gido M Van de Ven, Hava T Siegelmann, and Andreas S Tolias. Brain-inspired replay for continual learning with artificial neural networks. Nature communications, 11(1):4069, 2020

work page 2020
[44]

Neural discrete representation learning.Advances in neural information processing systems, 30, 2017

Aaron Van Den Oord, Oriol Vinyals, et al. Neural discrete representation learning.Advances in neural information processing systems, 30, 2017

work page 2017
[45]

Differentially private spiking variational autoencoder

Srishti Yadav, Anshul Pundhir, Tanish Goyal, Balasubramanian Raman, and Sanjeev Kumar. Differentially private spiking variational autoencoder. In International Conference on Pattern Recognition, pages 96–112. Springer, 2025

work page 2025
[46]

Esvae: An efficient spiking variational autoencoder with reparameterizable poisson spiking sampling

Qiugang Zhan, Ran Tao, Xiurui Xie, Guisong Liu, Malu Zhang, Huajin Tang, and Yang Yang. Esvae: An efficient spiking variational autoencoder with reparameterizable poisson spiking sampling. arXiv preprint arXiv:2310.14839, 2023

work page arXiv 2023
[47]

Variational autoencoders for sparse and overdispersed discrete data

He Zhao, Piyush Rai, Lan Du, Wray Buntine, Dinh Phung, and Mingyuan Zhou. Variational autoencoders for sparse and overdispersed discrete data. In International conference on artificial intelligence and statistics, pages 1684–1694. PMLR, 2020

work page 2020
[48]

Going deeper with directly-trained larger spiking neural networks

Hanle Zheng, Yujie Wu, Lei Deng, Yifan Hu, and Guoqi Li. Going deeper with directly-trained larger spiking neural networks. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 11062–11070, 2021

work page 2021
[49]

Negative binomial process count and mixture modeling

Mingyuan Zhou and Lawrence Carin. Negative binomial process count and mixture modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(2):307–320, 2013. 11 A PREPRINT - AUGUST 29, 2025 A Derivation of KL Term We derive an analytical expression for the KL divergence between two Negative Binomial distributions under the assumption that t...

work page 2013
[50]

Limit the maximum count value to Zmax

work page
[51]

, Zmax, log Poi(z) = z log λ − λ − log Γ(z + 1),

Compute the log-probability for z = 0, 1, . . . , Zmax, log Poi(z) = z log λ − λ − log Γ(z + 1),

work page
[52]

For each z, generate noise ϵz ∼ Gumbel(0, 1)

work page
[53]

The proof of this reparameterization can be found in Jang et al

Apply the Gumbel-Softmax trick with temperature τ, ˜z = ZmaxX z=0 z · softmax log Poi(z) + ϵz τ , where τ → 0 recovers discrete sampling. The proof of this reparameterization can be found in Jang et al. [14], and will not be repeated here. (2) Continuous-Time Simulation This method models Poisson processes with intensity λ on [0, 1] using exponentially di...

work page
[54]

Sample inter-arrival times from an exponential distribution: {si}M i=1 ∼ Exponential(λ), where M is a sufficiently large integer, the exponential distribution is easily reparameterized and PyTorch contains an implementation

work page
[55]

Accumulate inter-arrival times: Sn = nX i=1 si, 1 ≤ n ≤ M

work page
[56]

4", whereas some models render it more like a “9

Soft count of events: ˜z = MX n=1 σ 1 − Sn τ , where τ → 0 recovers discrete sampling. 13 A PREPRINT - AUGUST 29, 2025 5 10 15 20 25 30 35 40 45 Count 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14Probability Continuous-Time vs Gumbel-Softmax Continuous-Time Gumbel-Softmax Figure 4: Empirical distributions of Negative Binomial samples generated using Continuous-...

work page 2025
[57]

+” or “–

A lower DNR indicates a more active and effective latent space, with fewer collapsed units. Among all models, the Gaussian V AE exhibits an extremely high DNR of 0.998, suggesting that nearly all latent dimensions are inactive. The Poisson V AE (0.002) and Categorical V AE (0.0117) show moderate sparsity, while both the Laplace V AE and our NegBio-V AE ac...

work page 2025

[1] [1]

The handbook of brain theory and neural networks

Michael A Arbib. The handbook of brain theory and neural networks. MIT press, 2003

work page 2003

[2] [2]

Temporal precision of spike trains in extrastriate cortex of the behaving macaque monkey

Wyeth Bair and Christof Koch. Temporal precision of spike trains in extrastriate cortex of the behaving macaque monkey. Neural computation, 8(6):1185–1202, 1996

work page 1996

[3] [3]

Benna, Mattia Rigotti, Jérôme Munuera, Stefano Fusi, and C

Silvia Bernardi, Marcus K. Benna, Mattia Rigotti, Jérôme Munuera, Stefano Fusi, and C. Daniel Salzman. The geometry of abstraction in the hippocampus and prefrontal cortex. Cell, 183(4):954–967.e21, Nov 2020. doi: 10.1016/j.cell.2020.09.031

work page doi:10.1016/j.cell.2020.09.031 2020

[4] [4]

Spiking denoising diffusion probabilistic models

Jiahang Cao, Ziqing Wang, Hanzhong Guo, Hao Cheng, Qiang Zhang, and Renjing Xu. Spiking denoising diffusion probabilistic models. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, pages 4912–4921, 2024

work page 2024

[5] [5]

Lisnn: Improving spiking neural networks with lateral interactions for robust object recognition

Xiang Cheng, Yunzhe Hao, Jiaming Xu, and Bo Xu. Lisnn: Improving spiking neural networks with lateral interactions for robust object recognition. In IJCAI, pages 1519–1525. Yokohama, 2020

work page 2020

[6] [6]

The mnist database of handwritten digit images for machine learning research

Li Deng. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine, 29(6):141–142, 2012. 9 A PREPRINT - AUGUST 29, 2025

work page 2012

[7] [7]

The nbp negative binomial model for assessing differential gene expression from rna-seq

Yanming Di, Daniel W Schafer, Jason S Cumbie, and Jeff H Chang. The nbp negative binomial model for assessing differential gene expression from rna-seq. Statistical applications in genetics and molecular biology, 10 (1), 2011

work page 2011

[8] [8]

Learning disentangled joint continuous and discrete representations

Emilien Dupont. Learning disentangled joint continuous and discrete representations. Advances in neural information processing systems, 31, 2018

work page 2018

[9] [9]

Incorporating learnable membrane time constant to enhance learning of spiking neural networks

Wei Fang, Zhaofei Yu, Yanqi Chen, Timothée Masquelier, Tiejun Huang, and Yonghong Tian. Incorporating learnable membrane time constant to enhance learning of spiking neural networks. InProceedings of the IEEE/CVF international conference on computer vision, pages 2661–2671, 2021

work page 2021

[10] [10]

Spiking generative adversarial network with attention scoring decoding

Linghao Feng, Dongcheng Zhao, and Yi Zeng. Spiking generative adversarial network with attention scoring decoding. Neural Networks, 178:106423, 2024

work page 2024

[11] [11]

Som-vae: Inter- pretable discrete representation learning on time series

Vincent Fortuin, Matthias Hüser, Francesco Locatello, Heiko Strathmann, and Gunnar Rätsch. Som-vae: Inter- pretable discrete representation learning on time series. In International Conference on Learning Representations, 2019

work page 2019

[12] [12]

Spiking neural networks

Samanwoy Ghosh-Dastidar and Hojjat Adeli. Spiking neural networks. International journal of neural systems, 19(04):295–308, 2009

work page 2009

[13] [13]

Rapid neural coding in the retina with relative spike latencies

Tim Gollisch and Markus Meister. Rapid neural coding in the retina with relative spike latencies. science, 319 (5866):1108–1111, 2008

work page 2008

[14] [14]

Categorical reparameterization with gumbel-softmax

Eric Jang, Shixiang Gu, and Ben Poole. Categorical reparameterization with gumbel-softmax. In International Conference on Learning Representations, 2017

work page 2017

[15] [15]

Fully spiking variational autoencoder

Hiromichi Kamata, Yusuke Mukuta, and Tatsuya Harada. Fully spiking variational autoencoder. In Proceedings of the AAAI conference on artificial intelligence, volume 36, pages 7059–7067, 2022

work page 2022

[16] [16]

Latent diffusion for neural spiking data

Jaivardhan Kapoor, Auguste Schulz, Julius Vetter, Felix Pei, Richard Gao, and Jakob H Macke. Latent diffusion for neural spiking data. Advances in Neural Information Processing Systems, 37:118119–118154, 2024

work page 2024

[17] [17]

Kaufman, Marcus K

Matthew T. Kaufman, Marcus K. Benna, Mattia Rigotti, Fabio Stefanini, Stefano Fusi, and Anne K. Churchland. The implications of categorical and category-free mixed selectivity on representational geometries. Current Opinion in Neurobiology, 77:102644, Dec 2022. doi: 10.1016/j.conb.2022.102644

work page doi:10.1016/j.conb.2022.102644 2022

[18] [18]

Auto-encoding variational bayes, 2014

Diederik P Kingma and Max Welling. Auto-encoding variational bayes, 2014

work page 2014

[19] [19]

Auto-encoding variational bayes, 2013

Diederik P Kingma, Max Welling, et al. Auto-encoding variational bayes, 2013

work page 2013

[20] [20]

Spiking-gan: A spiking generative adversarial network using time-to- first-spike coding

Vineet Kotariya and Udayan Ganguly. Spiking-gan: A spiking generative adversarial network using time-to- first-spike coding. In 2022 International Joint Conference on Neural Networks (IJCNN) , pages 1–7. IEEE, 2022

work page 2022

[21] [21]

Learning multiple layers of features from tiny images

Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, 2009

work page 2009

[22] [22]

Human-level concept learning through probabilistic program induction

Brenden M Lake, Ruslan Salakhutdinov, and Joshua B Tenenbaum. Human-level concept learning through probabilistic program induction. Science, 350(6266):1332–1338, 2015

work page 2015

[23] [23]

Mnist handwritten digit database

Yann LeCun, Corinna Cortes, and CJ Burges. Mnist handwritten digit database. ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist, 2, 2010

work page 2010

[24] [24]

Differentiable spike: Rethinking gradient-descent for training spiking neural networks

Yuhang Li, Yufei Guo, Shanghang Zhang, Shikuang Deng, Yongqing Hai, and Shi Gu. Differentiable spike: Rethinking gradient-descent for training spiking neural networks. Advances in neural information processing systems, 34:23426–23439, 2021

work page 2021

[25] [25]

Spiking-diffusion: Vector quantized discrete diffusion model with spiking neural networks

Mingxuan Liu, Jie Gan, Rui Wen, Tao Li, Yongli Chen, and Hong Chen. Spiking-diffusion: Vector quantized discrete diffusion model with spiking neural networks. In 2024 5th International Conference on Computer, Big Data and Artificial Intelligence (ICCBD+ AI), pages 627–631. IEEE, 2024

work page 2024

[26] [26]

Reliability of spike timing in neocortical neurons

Zachary F Mainen and Terrence J Sejnowski. Reliability of spike timing in neocortical neurons. Science, 268 (5216):1503–1506, 1995

work page 1995

[27] [27]

Predictive coding, variational autoencoders, and biological connections

Joseph Marino. Predictive coding, variational autoencoders, and biological connections. Neural Computation, 34 (1):1–44, 2022

work page 2022

[28] [28]

Using tweedie distributions for fitting spike count data

Dina Moshitch and Israel Nelken. Using tweedie distributions for fitting spike count data. Journal of neuroscience methods, 225:13–28, 2014

work page 2014

[29] [29]

Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y . Ng. Reading digits in natural images with unsupervised feature learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning, 2011. http://ufldl.stanford.edu/housenumbers/nips2011_housenumbers.pdf. 10 A PREPRINT - AUGUST 29, 2025

work page 2011

[30] [30]

Neuronal spike trains and stochastic point processes: Ii

Donald H Perkel, George L Gerstein, and George P Moore. Neuronal spike trains and stochastic point processes: Ii. simultaneous spike trains. Biophysical journal, 7(4):419–440, 1967

work page 1967

[31] [31]

Fully bayesian inference for neural models with negative-binomial spiking

Jonathan Pillow and James Scott. Fully bayesian inference for neural models with negative-binomial spiking. Advances in neural information processing systems, 25, 2012

work page 2012

[32] [32]

Deterministic decoding for discrete data in variational autoencoders

Daniil Polykovskiy and Dmitry Vetrov. Deterministic decoding for discrete data in variational autoencoders. In International conference on artificial intelligence and statistics, pages 3046–3056. PMLR, 2020

work page 2020

[33] [33]

Warden, Xiao-Jing Wang, Nathaniel D

Mattia Rigotti, Omri Barak, Melissa R. Warden, Xiao-Jing Wang, Nathaniel D. Daw, Earl K. Miller, and Stefano Fusi. The importance of mixed selectivity in complex cognitive tasks. Nature, 497:585–590, 05 2013. doi: https://doi.org/10.1038/nature12160

work page doi:10.1038/nature12160 2013

[34] [34]

Discrete variational autoencoders

Jason Tyler Rolfe. Discrete variational autoencoders. In International Conference on Learning Representations, 2017

work page 2017

[35] [35]

Spiking generative adversarial networks with a neural network discriminator: Local training, bayesian models, and continual meta-learning

Bleema Rosenfeld, Osvaldo Simeone, and Bipin Rajendran. Spiking generative adversarial networks with a neural network discriminator: Local training, bayesian models, and continual meta-learning. IEEE Transactions on Computers, 71(11):2778–2791, 2022

work page 2022

[36] [36]

The negative binomial distribution

GJS Ross and DA Preece. The negative binomial distribution. Journal of the Royal Statistical Society: Series D (The Statistician), 34(3):323–335, 1985

work page 1985

[37] [37]

Flexible models for spike count data with both over-and under-dispersion

Ian H Stevenson. Flexible models for spike count data with both over-and under-dispersion. Journal of computa- tional neuroscience, 41:29–43, 2016

work page 2016

[38] [38]

Unsupervised learning predicts human perception and misperception of gloss

Katherine R Storrs, Barton L Anderson, and Roland W Fleming. Unsupervised learning predicts human perception and misperception of gloss. Nature Human Behaviour, 5(10):1402–1417, 2021

work page 2021

[39] [39]

Testing the odds of inherent vs

Wahiba Taouali, Giacomo Benvenuti, Pascal Wallisch, Frédéric Chavane, and Laurent U Perrinet. Testing the odds of inherent vs. observed overdispersion in neural spike counts. Journal of neurophysiology, 115(1):434–444, 2016

work page 2016

[40] [40]

Deep learning in spiking neural networks

Amirhossein Tavanaei, Masoud Ghodrati, Saeed Reza Kheradpisheh, Timothée Masquelier, and Anthony Maida. Deep learning in spiking neural networks. Neural networks, 111:47–63, 2019

work page 2019

[41] [41]

Hierarchical vaes provide a normative account of motion processing in the primate brain

Hadi Vafaii, Jacob Yates, and Daniel Butts. Hierarchical vaes provide a normative account of motion processing in the primate brain. Advances in Neural Information Processing Systems, 36:46152–46190, 2023

work page 2023

[42] [42]

Poisson variational autoencoder

Hadi Vafaii, Dekel Galor, and Jacob Yates. Poisson variational autoencoder. Advances in Neural Information Processing Systems, 37:44871–44906, 2024

work page 2024

[43] [43]

Brain-inspired replay for continual learning with artificial neural networks

Gido M Van de Ven, Hava T Siegelmann, and Andreas S Tolias. Brain-inspired replay for continual learning with artificial neural networks. Nature communications, 11(1):4069, 2020

work page 2020

[44] [44]

Neural discrete representation learning.Advances in neural information processing systems, 30, 2017

Aaron Van Den Oord, Oriol Vinyals, et al. Neural discrete representation learning.Advances in neural information processing systems, 30, 2017

work page 2017

[45] [45]

Differentially private spiking variational autoencoder

Srishti Yadav, Anshul Pundhir, Tanish Goyal, Balasubramanian Raman, and Sanjeev Kumar. Differentially private spiking variational autoencoder. In International Conference on Pattern Recognition, pages 96–112. Springer, 2025

work page 2025

[46] [46]

Esvae: An efficient spiking variational autoencoder with reparameterizable poisson spiking sampling

Qiugang Zhan, Ran Tao, Xiurui Xie, Guisong Liu, Malu Zhang, Huajin Tang, and Yang Yang. Esvae: An efficient spiking variational autoencoder with reparameterizable poisson spiking sampling. arXiv preprint arXiv:2310.14839, 2023

work page arXiv 2023

[47] [47]

Variational autoencoders for sparse and overdispersed discrete data

He Zhao, Piyush Rai, Lan Du, Wray Buntine, Dinh Phung, and Mingyuan Zhou. Variational autoencoders for sparse and overdispersed discrete data. In International conference on artificial intelligence and statistics, pages 1684–1694. PMLR, 2020

work page 2020

[48] [48]

Going deeper with directly-trained larger spiking neural networks

Hanle Zheng, Yujie Wu, Lei Deng, Yifan Hu, and Guoqi Li. Going deeper with directly-trained larger spiking neural networks. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 11062–11070, 2021

work page 2021

[49] [49]

Negative binomial process count and mixture modeling

Mingyuan Zhou and Lawrence Carin. Negative binomial process count and mixture modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(2):307–320, 2013. 11 A PREPRINT - AUGUST 29, 2025 A Derivation of KL Term We derive an analytical expression for the KL divergence between two Negative Binomial distributions under the assumption that t...

work page 2013

[50] [50]

Limit the maximum count value to Zmax

work page

[51] [51]

, Zmax, log Poi(z) = z log λ − λ − log Γ(z + 1),

Compute the log-probability for z = 0, 1, . . . , Zmax, log Poi(z) = z log λ − λ − log Γ(z + 1),

work page

[52] [52]

For each z, generate noise ϵz ∼ Gumbel(0, 1)

work page

[53] [53]

The proof of this reparameterization can be found in Jang et al

Apply the Gumbel-Softmax trick with temperature τ, ˜z = ZmaxX z=0 z · softmax log Poi(z) + ϵz τ , where τ → 0 recovers discrete sampling. The proof of this reparameterization can be found in Jang et al. [14], and will not be repeated here. (2) Continuous-Time Simulation This method models Poisson processes with intensity λ on [0, 1] using exponentially di...

work page

[54] [54]

Sample inter-arrival times from an exponential distribution: {si}M i=1 ∼ Exponential(λ), where M is a sufficiently large integer, the exponential distribution is easily reparameterized and PyTorch contains an implementation

work page

[55] [55]

Accumulate inter-arrival times: Sn = nX i=1 si, 1 ≤ n ≤ M

work page

[56] [56]

4", whereas some models render it more like a “9

Soft count of events: ˜z = MX n=1 σ 1 − Sn τ , where τ → 0 recovers discrete sampling. 13 A PREPRINT - AUGUST 29, 2025 5 10 15 20 25 30 35 40 45 Count 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14Probability Continuous-Time vs Gumbel-Softmax Continuous-Time Gumbel-Softmax Figure 4: Empirical distributions of Negative Binomial samples generated using Continuous-...

work page 2025

[57] [57]

+” or “–

A lower DNR indicates a more active and effective latent space, with fewer collapsed units. Among all models, the Gaussian V AE exhibits an extremely high DNR of 0.998, suggesting that nearly all latent dimensions are inactive. The Poisson V AE (0.002) and Categorical V AE (0.0117) show moderate sparsity, while both the Laplace V AE and our NegBio-V AE ac...

work page 2025