arxiv: 2604.03779 · v1 · submitted 2026-04-04 · 💻 cs.LG · cs.AI

Recognition: no theorem link

CountsDiff: A Diffusion Model on the Natural Numbers for Generation and Imputation of Count-Based Data

Renzo G. Soatto , Anders Hoel , Greycen Ren , Shorna Alam , Stephen Bates , Nikolaos P. Daskalakis , Caroline Uhler , Maria Skoularidou

Authors on Pith no claims yet

Pith reviewed 2026-05-13 18:11 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords diffusion modelscount datanatural numbersRNA-seq imputationgenerative modelingdiscrete diffusionsingle-cell datablackout diffusion

0 comments

The pith

CountsDiff reparameterizes blackout diffusion with a survival schedule to generate and impute count data on the natural numbers, matching state-of-the-art performance on images and RNA-seq tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents CountsDiff as a diffusion framework built specifically for data whose values are non-negative integers. It simplifies an earlier blackout diffusion approach by defining the process directly through a survival probability schedule and an explicit loss-weighting term, then layers on continuous-time training, classifier-free guidance, and reverse dynamics that permit non-monotone paths. The authors test a basic version of the model on CIFAR-10 and CelebA images plus fetal and heart single-cell RNA-seq atlases. Even without extensive tuning, this version equals or exceeds leading discrete generative models and current RNA-seq imputation techniques, showing that count-valued data can be handled natively by diffusion rather than through transformations or discretization.

Core claim

CountsDiff extends the Blackout diffusion framework by simplifying its formulation through a direct parameterization in terms of a survival probability schedule and an explicit loss weighting. This introduces flexibility through design parameters with direct analogues in existing diffusion modeling frameworks. Beyond this reparameterization, CountsDiff introduces features from modern diffusion models, previously absent in counts-based domains, including continuous-time training, classifier-free guidance, and churn/remasking reverse dynamics that allow non-monotone reverse trajectories. On natural image datasets and single-cell RNA-seq imputation tasks, even a simple instantiation matches or,

What carries the argument

The survival probability schedule together with explicit loss weighting, which directly parameterizes the forward and reverse processes on the natural numbers and supports continuous-time sampling plus classifier-free guidance.

If this is right

Count-valued observations can be generated or completed directly by diffusion without first mapping them to continuous or token spaces.
Biological count assays such as single-cell gene expression can be imputed at accuracy levels competitive with specialized methods.
Design choices such as the survival schedule become tunable in the same manner as noise schedules in continuous diffusion models.
Reverse trajectories in discrete count domains can become non-monotone, allowing the model to revisit earlier states during sampling.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same survival-schedule machinery could be applied to other ordinal count domains such as population statistics or event frequencies with little change to the core code.
Further performance gains are likely if the survival schedule is optimized per dataset rather than held fixed across domains.
Because the formulation already supports classifier-free guidance, conditional generation tasks like class- or covariate-conditioned count synthesis become immediately feasible.
Hybrid models that combine CountsDiff with existing discrete diffusion techniques may reduce the remaining performance gap to continuous diffusion on mixed data types.

Load-bearing premise

The reparameterization and added features generalize beyond the tested image and RNA-seq datasets without requiring extensive per-domain tuning of the survival schedule and loss weights.

What would settle it

A head-to-head comparison on an unseen count dataset such as word-frequency counts or daily traffic counts where the simple CountsDiff instantiation falls below the best discrete baseline after only minimal schedule adjustment would falsify the claim of broad applicability.

Figures

Figures reproduced from arXiv: 2604.03779 by Anders Hoel, Caroline Uhler, Greycen Ren, Maria Skoularidou, Nikolaos P. Daskalakis, Renzo G. Soatto, Shorna Alam, Stephen Bates.

**Figure 1.** Figure 1: Visualization of CountsDiff’s forward corruption process (top) and reverse sampling process (bottom). The top diagram depicts the progression of a p-schedule, a pure death process. The bottom shows a single step of the generalized, birth-death sampling process. distribution of xs defined in equation 4: xs = nt + bt nt ∼ Bin(xt, 1 − σt,s), bt ∼ Bin(x0 − xt, βt,s) (8) See Appendix B.9 for a proof of this pro… view at source ↗

**Figure 3.** Figure 3: 5 images guided by each Cifar10 class sampled from CountsDiff with guidance scale 2.0 and ηrescale = 0.005. decrement manifests as smoothing. Taken to the extreme, we see dramatic oversmoothing, which results in a complete removal of texture and, eventually, perspective as ηrescale → 1. See [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Nine images drawn from CountsDiff trained on CIFAR10 with ηrescale attrition schedule for varying levels of ηrescale 5.2.1. QUANTITATIVE RESULTS We find both moderate guidance and small, nonzero attrition to improve the FID and IS of samples generated by CountsDiff. Generalizing the FI p-schedule from Santos et al. (2023) to continuous time improves FID. Across hyperparameters, FI noise schedule general… view at source ↗

**Figure 2.** Figure 2: Histogram of model-generated samples versus ground truth and distributional distance metrics (top); variance statistics (bottom) for a subset of dimensions. Existing diffusion models exhibit failure cases even in a simple toy dataset: Gaussian diffusion suffers from mode collapse, while masked diffusion overfits outliers (inflated variance). Full results for all ten dimensions can be found in Appendix E.1… view at source ↗

**Figure 5.** Figure 5: Converted p-schedule from Blackout Diffusion (see B.2) versus cosine p-schedule B.7. Comparing proposed p-schedule and weighting with Blackout Diffusion As was the case with early linear noise schedules in Gaussian Diffusion, the exponential p-schedule described in Blackout Diffusion has potentially undesirable properties 0 and 1, where p(t) is almost completely flat. The cosine schedule, on the other hand… view at source ↗

**Figure 6.** Figure 6: Weights from Blackout Diffusion versus proposed p-schedule We then have the rate of an instantaneous transition from i to i + 1 as R (rev) i,i+1(s) = R (fw) i+1,i(t) q(xs = i + 1 | x0) q(xs = i | x0) = (i + 1)µ(s) q(xs = i + 1 | x0) q(xs = i | x0) = (i + 1)µ(s) [PITH_FULL_IMAGE:figures/full_fig_p020_6.png] view at source ↗

**Figure 7.** Figure 7: Histograms of marginals of dimensions 2-9 32 [PITH_FULL_IMAGE:figures/full_fig_p032_7.png] view at source ↗

**Figure 8.** Figure 8: Joint Kernel Density Estimate (KDE) plots between dimensions 0 and 1 of real data (blue contours) versus model-generated samples (red dashed contours). Gaussian Diffusion suffers from mode collapse, resulting in a much tighter, less diverse distribution. Masked Diffusion exhibits a broader, more diffuse distribution with ’leaked’ probability mass and slightly less correlation between the dimensions, indica… view at source ↗

**Figure 9.** Figure 9: Comparison of Binomial sampling standard rounding, Poisson with no rounding, and Binomial sampling with stochastic rounding across two different counts regimes. 34 [PITH_FULL_IMAGE:figures/full_fig_p034_9.png] view at source ↗

**Figure 10.** Figure 10: Train losses and validation metrics throughout training 35 [PITH_FULL_IMAGE:figures/full_fig_p035_10.png] view at source ↗

**Figure 11.** Figure 11: 25 images drawn from CountsDiff trained on CelebA with increasing ηrescale. 36 [PITH_FULL_IMAGE:figures/full_fig_p036_11.png] view at source ↗

read the original abstract

Diffusion models have excelled at generative tasks for both continuous and token-based domains, but their application to discrete ordinal data remains underdeveloped. We present CountsDiff, a diffusion framework designed to natively model distributions on the natural numbers. CountsDiff extends the Blackout diffusion framework by simplifying its formulation through a direct parameterization in terms of a survival probability schedule and an explicit loss weighting. This introduces flexibility through design parameters with direct analogues in existing diffusion modeling frameworks. Beyond this reparameterization, CountsDiff introduces features from modern diffusion models, previously absent in counts-based domains, including continuous-time training, classifier-free guidance, and churn/remasking reverse dynamics that allow non-monotone reverse trajectories. We propose an initial instantiation of CountsDiff and validate it on natural image datasets (CIFAR-10, CelebA), exploring the effects of varying the introduced design parameters in a complex, well-studied, and interpretable data domain. We then highlight biological count assays as a natural use case, evaluating CountsDiff on single-cell RNA-seq imputation in a fetal cell and heart cell atlas. Remarkably, we find that even this simple instantiation matches or surpasses the performance of a state-of-the-art discrete generative model and leading RNA-seq imputation methods, while leaving substantial headroom for further gains through optimized design choices in future work.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CountsDiff reparameterizes blackout diffusion with an explicit survival schedule and adds continuous-time training plus classifier-free guidance for count data, but the performance claims stay high-level without numbers or controls.

read the letter

The paper's main contribution is a cleaner formulation of blackout diffusion that works directly on natural numbers by parameterizing the forward process through a survival probability schedule. They also bring in continuous-time training, classifier-free guidance, and churn/remasking reverse steps that were missing from earlier count-based diffusion work. This setup gives explicit design knobs that map to things people already tune in standard diffusion models, which is useful for count data in biology and elsewhere. They test the basic version on CIFAR-10 and CelebA to check how the new parameters behave, then move to RNA-seq imputation on fetal-cell and heart-cell atlases, claiming it matches or beats current discrete generative models and leading imputation methods. That application direction makes sense because count-valued measurements are common there and continuous approximations often lose information. The framework itself looks like a reasonable engineering step that could reduce post-hoc rounding or ad-hoc fixes. The soft spot is the evidence base. The abstract states competitive results but gives no tables, error bars, or ablation breakdowns, so it is hard to see how much the added features actually move the needle versus the base blackout idea. The stress-test point about transferring the survival schedule and loss weights from images to zero-inflated count distributions is worth checking; if those choices need heavy retuning per domain, the generalization story is weaker than presented. This is for people building generative models for discrete or ordinal data, especially single-cell work. It deserves a serious referee because the core reparameterization is new and the target domain is practical, even though the current write-up will need more quantitative detail to stand up.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces CountsDiff, a diffusion framework for distributions on the natural numbers. It reparameterizes Blackout diffusion via a survival probability schedule and explicit loss weighting to introduce flexibility with direct analogues in standard diffusion models, then adds continuous-time training, classifier-free guidance, and churn/remasking reverse dynamics. The work validates an initial instantiation on CIFAR-10 and CelebA to explore design parameters, then applies the same choices to single-cell RNA-seq imputation on fetal-cell and heart-cell atlases, claiming that even this simple version matches or surpasses state-of-the-art discrete generative models and leading RNA-seq imputation methods.

Significance. If the performance claims are substantiated, the reparameterization supplies a flexible, interpretable route to adapt diffusion models to ordinal count data while importing modern techniques (continuous-time training, guidance, non-monotone trajectories) that have been absent from count-based domains. The biological application is a natural fit given the zero-inflated, heavy-tailed character of scRNA-seq, and the explicit design parameters open a clear path for future per-domain optimization.

major comments (2)

Abstract: the central claim that 'even this simple instantiation matches or surpasses the performance of a state-of-the-art discrete generative model and leading RNA-seq imputation methods' is presented without any quantitative tables, error bars, ablation details, or baseline numbers, which is load-bearing for evaluating whether the reparameterization and added features actually deliver the reported gains on either image or count data.
Experimental section on RNA-seq (fetal-cell and heart-cell atlases): the survival probability schedule and loss-weighting coefficients are transferred unchanged from the CIFAR-10/CelebA experiments; no ablation or sensitivity analysis is reported for these choices under zero-inflated count statistics, leaving the generalization claim vulnerable to the possibility that performance is an artifact of the particular datasets rather than evidence of robust transfer.

minor comments (2)

Notation and §3: provide an explicit side-by-side comparison of the new survival-probability parameterization against the original Blackout diffusion formulation so readers can verify the claimed simplification and the direct analogues to existing diffusion schedules.
Figures and results: any sample-generation or imputation figures should include side-by-side quantitative metrics (e.g., FID, imputation error, or log-likelihood) against the cited SOTA baselines rather than qualitative visuals alone.

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper defines CountsDiff via an explicit reparameterization of the Blackout diffusion framework using a survival probability schedule and loss weighting, presented as design choices with direct analogues in existing diffusion models. This reparameterization is introduced for added flexibility rather than as a self-referential prediction. Performance claims rest on empirical evaluation across CIFAR-10, CelebA, and scRNA-seq datasets, with no equations or steps reducing reported results to inputs fitted from the same data by construction. No load-bearing self-citations, uniqueness theorems, or ansatzes imported from prior author work are evident in the derivation; the model extensions (continuous-time training, classifier-free guidance, churn/remasking) are described as additions from modern diffusion literature. The central claims therefore remain independent of the inputs.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The framework rests on the standard diffusion Markov chain assumptions plus the choice of a survival probability schedule that must be specified by the user; no new physical entities are postulated.

free parameters (2)

survival probability schedule
User-specified schedule that controls the forward noising process; its functional form and parameters are design choices analogous to noise schedules in continuous diffusion.
loss weighting coefficients
Explicit weighting terms introduced in the simplified loss; their values are free parameters chosen during training.

axioms (1)

domain assumption The forward process remains a valid Markov chain on the natural numbers when parameterized by survival probabilities.
Invoked when extending Blackout diffusion to the new parameterization.

pith-pipeline@v0.9.0 · 5567 in / 1263 out tokens · 31907 ms · 2026-05-13T18:11:02.686346+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

7 extracted references · 7 canonical work pages · 1 internal anchor

[1]

Classifier-Free Diffusion Guidance

PMLR, 2023. Dhariwal, P. and Nichol, A. Diffusion models beat GANs on image synthesis.Advances in neural information processing systems, 34:8780–8794, 2021. Feller, W. et al.An introduction to probability theory and its applications, volume 963. Wiley New York, 1971. Gayoso, A., Lopez, R., Xing, G., Boyeau, P., Valiol- lah Pour Amiri, V ., Hong, J., Wu, K...

work page internal anchor Pith review Pith/arXiv arXiv 2023
[2]

Nonnegative inputsx 0 are mapped to latent counts viaz 0 ∼Poisson(λx 0),λ≥1

work page
[3]

CountsDiff is applied directly to model the distribution ofz 0

work page
[4]

constant

Generated samples are divided by λ at inference time. Chen & Zhou (2023) show that the original distribution of x0 is recovered asλ→ ∞. 14 CountsDiff: A Diffusion Model on the Natural Numbers This simple procedure would extend the benefits of CountsDiff (guidance, schedule design, loss weighting, and attrition) to JUMP and therefore provide a principled w...

work page 2023
[5]

30 CountsDiff: A Diffusion Model on the Natural Numbers

Deep research queries to Gemini and ChatGPT were used for retrieval and discovery of related works, to ensure fair credit was given to works we may not have been previously aware of. 30 CountsDiff: A Diffusion Model on the Natural Numbers

work page
[6]

AI IDE assistants were used to aid in debugging, figure generation, and implementation of certain simple, canonical methods

work page
[7]

LLM assistants were used intermittently to polish already written text to make it more comprehensible to readers. 31 CountsDiff: A Diffusion Model on the Natural Numbers 0.0 2.5 5.0 7.5 10.0 12.5 15.0 0.0 0.5 1.0Dim 2 m-MMD: 0.00 m-W1: 0.03 CountsDiff Joint-MMD: 0.001 Joint-SWD: 0.79 Real CountsDiff 0.0 2.5 5.0 7.5 10.0 12.5 15.0 m-MMD: 0.01 m-W1: 0.09 Ga...

work page