pith. machine review for the scientific record. sign in

arxiv: 2604.03779 · v1 · submitted 2026-04-04 · 💻 cs.LG · cs.AI

Recognition: no theorem link

CountsDiff: A Diffusion Model on the Natural Numbers for Generation and Imputation of Count-Based Data

Authors on Pith no claims yet

Pith reviewed 2026-05-13 18:11 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords diffusion modelscount datanatural numbersRNA-seq imputationgenerative modelingdiscrete diffusionsingle-cell datablackout diffusion
0
0 comments X

The pith

CountsDiff reparameterizes blackout diffusion with a survival schedule to generate and impute count data on the natural numbers, matching state-of-the-art performance on images and RNA-seq tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents CountsDiff as a diffusion framework built specifically for data whose values are non-negative integers. It simplifies an earlier blackout diffusion approach by defining the process directly through a survival probability schedule and an explicit loss-weighting term, then layers on continuous-time training, classifier-free guidance, and reverse dynamics that permit non-monotone paths. The authors test a basic version of the model on CIFAR-10 and CelebA images plus fetal and heart single-cell RNA-seq atlases. Even without extensive tuning, this version equals or exceeds leading discrete generative models and current RNA-seq imputation techniques, showing that count-valued data can be handled natively by diffusion rather than through transformations or discretization.

Core claim

CountsDiff extends the Blackout diffusion framework by simplifying its formulation through a direct parameterization in terms of a survival probability schedule and an explicit loss weighting. This introduces flexibility through design parameters with direct analogues in existing diffusion modeling frameworks. Beyond this reparameterization, CountsDiff introduces features from modern diffusion models, previously absent in counts-based domains, including continuous-time training, classifier-free guidance, and churn/remasking reverse dynamics that allow non-monotone reverse trajectories. On natural image datasets and single-cell RNA-seq imputation tasks, even a simple instantiation matches or,

What carries the argument

The survival probability schedule together with explicit loss weighting, which directly parameterizes the forward and reverse processes on the natural numbers and supports continuous-time sampling plus classifier-free guidance.

If this is right

  • Count-valued observations can be generated or completed directly by diffusion without first mapping them to continuous or token spaces.
  • Biological count assays such as single-cell gene expression can be imputed at accuracy levels competitive with specialized methods.
  • Design choices such as the survival schedule become tunable in the same manner as noise schedules in continuous diffusion models.
  • Reverse trajectories in discrete count domains can become non-monotone, allowing the model to revisit earlier states during sampling.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same survival-schedule machinery could be applied to other ordinal count domains such as population statistics or event frequencies with little change to the core code.
  • Further performance gains are likely if the survival schedule is optimized per dataset rather than held fixed across domains.
  • Because the formulation already supports classifier-free guidance, conditional generation tasks like class- or covariate-conditioned count synthesis become immediately feasible.
  • Hybrid models that combine CountsDiff with existing discrete diffusion techniques may reduce the remaining performance gap to continuous diffusion on mixed data types.

Load-bearing premise

The reparameterization and added features generalize beyond the tested image and RNA-seq datasets without requiring extensive per-domain tuning of the survival schedule and loss weights.

What would settle it

A head-to-head comparison on an unseen count dataset such as word-frequency counts or daily traffic counts where the simple CountsDiff instantiation falls below the best discrete baseline after only minimal schedule adjustment would falsify the claim of broad applicability.

Figures

Figures reproduced from arXiv: 2604.03779 by Anders Hoel, Caroline Uhler, Greycen Ren, Maria Skoularidou, Nikolaos P. Daskalakis, Renzo G. Soatto, Shorna Alam, Stephen Bates.

Figure 1
Figure 1. Figure 1: Visualization of CountsDiff’s forward corruption process (top) and reverse sampling process (bottom). The top diagram depicts the progression of a p-schedule, a pure death process. The bottom shows a single step of the generalized, birth-death sampling process. distribution of xs defined in equation 4: xs = nt + bt nt ∼ Bin(xt, 1 − σt,s), bt ∼ Bin(x0 − xt, βt,s) (8) See Appendix B.9 for a proof of this pro… view at source ↗
Figure 3
Figure 3. Figure 3: 5 images guided by each Cifar10 class sampled from CountsDiff with guidance scale 2.0 and ηrescale = 0.005. decrement manifests as smoothing. Taken to the extreme, we see dramatic oversmoothing, which results in a com￾plete removal of texture and, eventually, perspective as ηrescale → 1. See [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Nine images drawn from CountsDiff trained on CIFAR￾10 with ηrescale attrition schedule for varying levels of ηrescale 5.2.1. QUANTITATIVE RESULTS We find both moderate guidance and small, nonzero attri￾tion to improve the FID and IS of samples generated by CountsDiff. Generalizing the FI p-schedule from Santos et al. (2023) to continuous time improves FID. Across hyper￾parameters, FI noise schedule general… view at source ↗
Figure 2
Figure 2. Figure 2: Histogram of model-generated samples versus ground truth and distributional distance metrics (top); variance statistics (bottom) for a subset of dimensions. Existing diffusion models exhibit failure cases even in a simple toy dataset: Gaussian diffu￾sion suffers from mode collapse, while masked diffusion overfits outliers (inflated variance). Full results for all ten dimensions can be found in Appendix E.1… view at source ↗
Figure 5
Figure 5. Figure 5: Converted p-schedule from Blackout Diffusion (see B.2) versus cosine p-schedule B.7. Comparing proposed p-schedule and weighting with Blackout Diffusion As was the case with early linear noise schedules in Gaussian Diffusion, the exponential p-schedule described in Blackout Diffusion has potentially undesirable properties 0 and 1, where p(t) is almost completely flat. The cosine schedule, on the other hand… view at source ↗
Figure 6
Figure 6. Figure 6: Weights from Blackout Diffusion versus proposed p-schedule We then have the rate of an instantaneous transition from i to i + 1 as R (rev) i,i+1(s) = R (fw) i+1,i(t) q(xs = i + 1 | x0) q(xs = i | x0) = (i + 1)µ(s) q(xs = i + 1 | x0) q(xs = i | x0) = (i + 1)µ(s) [PITH_FULL_IMAGE:figures/full_fig_p020_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Histograms of marginals of dimensions 2-9 32 [PITH_FULL_IMAGE:figures/full_fig_p032_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Joint Kernel Density Estimate (KDE) plots between dimensions 0 and 1 of real data (blue contours) versus model-generated samples (red dashed contours). Gaussian Diffusion suffers from mode collapse, resulting in a much tighter, less diverse distribution. Masked Diffusion exhibits a broader, more diffuse distribution with ’leaked’ probability mass and slightly less correlation between the dimensions, indica… view at source ↗
Figure 9
Figure 9. Figure 9: Comparison of Binomial sampling standard rounding, Poisson with no rounding, and Binomial sampling with stochastic rounding across two different counts regimes. 34 [PITH_FULL_IMAGE:figures/full_fig_p034_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Train losses and validation metrics throughout training 35 [PITH_FULL_IMAGE:figures/full_fig_p035_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: 25 images drawn from CountsDiff trained on CelebA with increasing ηrescale. 36 [PITH_FULL_IMAGE:figures/full_fig_p036_11.png] view at source ↗
read the original abstract

Diffusion models have excelled at generative tasks for both continuous and token-based domains, but their application to discrete ordinal data remains underdeveloped. We present CountsDiff, a diffusion framework designed to natively model distributions on the natural numbers. CountsDiff extends the Blackout diffusion framework by simplifying its formulation through a direct parameterization in terms of a survival probability schedule and an explicit loss weighting. This introduces flexibility through design parameters with direct analogues in existing diffusion modeling frameworks. Beyond this reparameterization, CountsDiff introduces features from modern diffusion models, previously absent in counts-based domains, including continuous-time training, classifier-free guidance, and churn/remasking reverse dynamics that allow non-monotone reverse trajectories. We propose an initial instantiation of CountsDiff and validate it on natural image datasets (CIFAR-10, CelebA), exploring the effects of varying the introduced design parameters in a complex, well-studied, and interpretable data domain. We then highlight biological count assays as a natural use case, evaluating CountsDiff on single-cell RNA-seq imputation in a fetal cell and heart cell atlas. Remarkably, we find that even this simple instantiation matches or surpasses the performance of a state-of-the-art discrete generative model and leading RNA-seq imputation methods, while leaving substantial headroom for further gains through optimized design choices in future work.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces CountsDiff, a diffusion framework for distributions on the natural numbers. It reparameterizes Blackout diffusion via a survival probability schedule and explicit loss weighting to introduce flexibility with direct analogues in standard diffusion models, then adds continuous-time training, classifier-free guidance, and churn/remasking reverse dynamics. The work validates an initial instantiation on CIFAR-10 and CelebA to explore design parameters, then applies the same choices to single-cell RNA-seq imputation on fetal-cell and heart-cell atlases, claiming that even this simple version matches or surpasses state-of-the-art discrete generative models and leading RNA-seq imputation methods.

Significance. If the performance claims are substantiated, the reparameterization supplies a flexible, interpretable route to adapt diffusion models to ordinal count data while importing modern techniques (continuous-time training, guidance, non-monotone trajectories) that have been absent from count-based domains. The biological application is a natural fit given the zero-inflated, heavy-tailed character of scRNA-seq, and the explicit design parameters open a clear path for future per-domain optimization.

major comments (2)
  1. Abstract: the central claim that 'even this simple instantiation matches or surpasses the performance of a state-of-the-art discrete generative model and leading RNA-seq imputation methods' is presented without any quantitative tables, error bars, ablation details, or baseline numbers, which is load-bearing for evaluating whether the reparameterization and added features actually deliver the reported gains on either image or count data.
  2. Experimental section on RNA-seq (fetal-cell and heart-cell atlases): the survival probability schedule and loss-weighting coefficients are transferred unchanged from the CIFAR-10/CelebA experiments; no ablation or sensitivity analysis is reported for these choices under zero-inflated count statistics, leaving the generalization claim vulnerable to the possibility that performance is an artifact of the particular datasets rather than evidence of robust transfer.
minor comments (2)
  1. Notation and §3: provide an explicit side-by-side comparison of the new survival-probability parameterization against the original Blackout diffusion formulation so readers can verify the claimed simplification and the direct analogues to existing diffusion schedules.
  2. Figures and results: any sample-generation or imputation figures should include side-by-side quantitative metrics (e.g., FID, imputation error, or log-likelihood) against the cited SOTA baselines rather than qualitative visuals alone.

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper defines CountsDiff via an explicit reparameterization of the Blackout diffusion framework using a survival probability schedule and loss weighting, presented as design choices with direct analogues in existing diffusion models. This reparameterization is introduced for added flexibility rather than as a self-referential prediction. Performance claims rest on empirical evaluation across CIFAR-10, CelebA, and scRNA-seq datasets, with no equations or steps reducing reported results to inputs fitted from the same data by construction. No load-bearing self-citations, uniqueness theorems, or ansatzes imported from prior author work are evident in the derivation; the model extensions (continuous-time training, classifier-free guidance, churn/remasking) are described as additions from modern diffusion literature. The central claims therefore remain independent of the inputs.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The framework rests on the standard diffusion Markov chain assumptions plus the choice of a survival probability schedule that must be specified by the user; no new physical entities are postulated.

free parameters (2)
  • survival probability schedule
    User-specified schedule that controls the forward noising process; its functional form and parameters are design choices analogous to noise schedules in continuous diffusion.
  • loss weighting coefficients
    Explicit weighting terms introduced in the simplified loss; their values are free parameters chosen during training.
axioms (1)
  • domain assumption The forward process remains a valid Markov chain on the natural numbers when parameterized by survival probabilities.
    Invoked when extending Blackout diffusion to the new parameterization.

pith-pipeline@v0.9.0 · 5567 in / 1263 out tokens · 31907 ms · 2026-05-13T18:11:02.686346+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

7 extracted references · 7 canonical work pages · 1 internal anchor

  1. [1]

    Classifier-Free Diffusion Guidance

    PMLR, 2023. Dhariwal, P. and Nichol, A. Diffusion models beat GANs on image synthesis.Advances in neural information processing systems, 34:8780–8794, 2021. Feller, W. et al.An introduction to probability theory and its applications, volume 963. Wiley New York, 1971. Gayoso, A., Lopez, R., Xing, G., Boyeau, P., Valiol- lah Pour Amiri, V ., Hong, J., Wu, K...

  2. [2]

    Nonnegative inputsx 0 are mapped to latent counts viaz 0 ∼Poisson(λx 0),λ≥1

  3. [3]

    CountsDiff is applied directly to model the distribution ofz 0

  4. [4]

    constant

    Generated samples are divided by λ at inference time. Chen & Zhou (2023) show that the original distribution of x0 is recovered asλ→ ∞. 14 CountsDiff: A Diffusion Model on the Natural Numbers This simple procedure would extend the benefits of CountsDiff (guidance, schedule design, loss weighting, and attrition) to JUMP and therefore provide a principled w...

  5. [5]

    30 CountsDiff: A Diffusion Model on the Natural Numbers

    Deep research queries to Gemini and ChatGPT were used for retrieval and discovery of related works, to ensure fair credit was given to works we may not have been previously aware of. 30 CountsDiff: A Diffusion Model on the Natural Numbers

  6. [6]

    AI IDE assistants were used to aid in debugging, figure generation, and implementation of certain simple, canonical methods

  7. [7]

    LLM assistants were used intermittently to polish already written text to make it more comprehensible to readers. 31 CountsDiff: A Diffusion Model on the Natural Numbers 0.0 2.5 5.0 7.5 10.0 12.5 15.0 0.0 0.5 1.0Dim 2 m-MMD: 0.00 m-W1: 0.03 CountsDiff Joint-MMD: 0.001 Joint-SWD: 0.79 Real CountsDiff 0.0 2.5 5.0 7.5 10.0 12.5 15.0 m-MMD: 0.01 m-W1: 0.09 Ga...